Monday, 27 April 2026

Building a Self-Hosted LLM Router in Go: Semantic Memory, Tool Calling, and 18 Phases of Debugging

I run local language models on two GPUs — a GTX 1650 for general chat and an RTX 3050 on the LAN for heavier coding tasks. A Python FastAPI router handled dispatching between them. It worked, but it was slow to start, awkward to deploy on Kubernetes, and the codebase was accumulating duct tape. So I rewrote it in Go over a weekend. What followed was 18 phases of debugging, design decisions, and one very persistent tool-calling loop.

This is the story of that rewrite.

The starting point

The Python router already did a lot: keyword heuristics to pick a GPU target, an intent classification model for ambiguous cases, CPU speculative draft planning, auto-continuation for truncated responses, and SSE streaming. The goal wasn't to simplify — it was to make the whole thing deployable as a single stateless binary in Kubernetes without a virtualenv or a pip install in sight.

A single Go binary. No dependency hell. No container that's 800MB of Python packages.

The initial structure mapped cleanly onto Go packages:

cmd/router/         — entrypoint
internal/config/    — env config
internal/router/    — HTTP handlers + routing logic
internal/intent/    — intent classifier client
internal/inference/ — streaming proxy
internal/memory/    — PostgreSQL layer
internal/events/    — NATS event bus

Phases 1–3: Ports, pods, and prompt pollution

Deployment issues came fast. The first was a port mismatch — the router listened on :8080 but the Kubernetes service had targetPort: 8000. A five-second fix. The more confusing problem was that /healthz kept returning the Python service's response even after deploying the Go pod. The rollout hadn't completed; I was port-forwarding to the wrong pod. Lesson: always wait on kubectl rollout status.

Routing bugs came next. The intent service — a 1.5B parameter model meant to classify messages as "general" or "coding" — was silently ignoring its classification instructions and generating code instead of JSON. The cause was sending the full conversation history to a model that small. It just couldn't hold the instruction in context. The fix matched what the Python original did: classify only the last user message, truncated to 500 characters.

But even after that fix, the intent model was unreliable enough under real load that I disabled it entirely. The heuristic — keyword matching on the prompt — turned out to be faster, deterministic, and frankly more correct for the workloads I throw at it. The intent model stays in the codebase for when a better model comes along.

Phase 4: Giving the router a memory

The most interesting engineering problem in the whole project was conversation tracking. OpenWebUI — the chat frontend — sends no session ID. Every request is stateless from the client's perspective. I needed to reconstruct conversation continuity from the content alone.

The solution: a rolling SHA256 hash of prior user messages. On each turn, the router hashes all previous user messages it has seen, looks that hash up in PostgreSQL, and finds (or creates) the conversation row. After finding it, it immediately writes the next turn's expected hash so the lookup succeeds on the following request.

This broke in several interesting ways:

  • SHA256 of an empty string is a constant — every new conversation shared the same DB row until I added a guard for the empty-hash case.
  • Including assistant responses in the hash meant the hash changed unpredictably. Fix: hash only user messages.
  • Index-based exclusion (i < len(messages)-1) broke when message ordering varied. Fix: exclude by content match instead.

The schema ended up as three tables: conversations, messages, and embeddings. The embeddings table uses pgvector's vector(768) type for semantic retrieval.

Phases 5–6: Async events and semantic memory

Every stored message publishes a NATS event. Workers downstream handle the slow stuff asynchronously: a summarisation worker periodically condenses long conversations using one of the inference backends; an embedding worker sends message content to a nomic-embed-text service running on a GPU slice and stores the resulting 768-dimensional vector.

Context injection became a hybrid of two retrieval strategies. The last 4 messages by chronological order provide conversational flow; the top 4 by pgvector cosine similarity recover relevant older context. The semantic pass excludes anything already in the chronological window to avoid duplication.

One subtle ordering bug: I was fetching history after storing the current user message, so the current message appeared in the retrieved history and got filtered out, leaving the context empty. Flipping the fetch to happen before the store fixed it.

Phase 7: Taming OpenWebUI's background requests

OpenWebUI sends three silent background requests after every user message: one for follow-up question suggestions, one for a conversation title, one for tags. Each contains the full conversation history. Because my heuristic router looks for technical keywords, these requests were being sent to the 3050 (the coding backend), locking subsequent routing decisions.

The fix was an isSystemRequest() check that detects the ### Task: prefix these requests share, routes them to the 1650, and skips DB storage entirely. The tricky part was ensuring this check runs at the very top of the handler, before any database operations — an early version stored the system messages first and checked second, polluting the conversation history.

Phases 8–10: Load balancing, IDE integration, MCP

Health-check-based load balancing was straightforward: poll /health on each backend before routing, fall back to the other if the preferred one is down or has no slots. Since the 3050 is a physical node on my LAN, it can go offline at any time. The fallback to the 1650 handles this gracefully.

Integrating Continue.dev as an IDE assistant exposed a new class of problems. Continue sends tools arrays in every agent request — structured tool definitions that the router was silently stripping. The fix required extending the request struct to capture tool definitions, converting them to plain-text system messages the models can follow, parsing {"tool": "...", "args": {...}} patterns in responses, and returning proper OpenAI tool-call SSE so Continue could execute them.

Codebase indexing required adding a POST /v1/embeddings endpoint that proxies to the nomic-embed-text service in OpenAI format. Continue's bundled all-MiniLM-L6-v2 model failed because the WASM runtime couldn't initialise; removing it and pointing to the router's endpoint resolved the issue.

Phases 11–18: The tool-calling rabbit hole

The agent mode work turned into its own multi-phase project. The model (Qwen2.5-Coder-7B) would correctly call ls, receive a file listing, then call ls again. And again. Indefinitely.

The root cause wasn't the model being bad at tool use — it was that Qwen2.5-Coder-7B doesn't natively reason about tool results. It can emit tool call JSON, but it can't interpret the results and decide what to do next. Several approaches failed before landing on stateless in-request loop detection:

detectToolLoop(messages, repeatThreshold=2, maxTurns=30)
→ Guard 1: same tool+args called N times consecutively
→ Guard 2: total tool turns ≥ maxTurns hard cap
→ On trigger: extract unique results, build plain prompt,
  call StreamAgent(..., jsonSchema=false), return text

A MCP race condition added further complexity: Continue fires the next request before the MCP server returns the tool result, so the same call+result pair appears 10–25 times in body.Messages. A deduplicateToolMessages() pass collapses consecutive identical assistant→tool pairs before building the prompt.

Other improvements in this phase: buffered agent streaming (buffer the entire response, inspect for tool calls, send either structured SSE or plain text — never both); a next-file hint injection that tells the model which file to read next after a glob search; grammar_json_schema constraint via llama.cpp to force valid JSON output; and storing completed read_file results in PostgreSQL so follow-up questions about specific files don't lose their context.

Infrastructure, end state

ServiceModelHardwareRole
llama-inferenceQwen2.5-3B-Q4GTX 1650General chat
llama-embeddingnomic-embed-v1.5GTX 1650 ×0.5Embeddings
RTX 3050 nodeQwen2.5-Coder-7B-Q5RTX 3050 (LAN)Coding / agent
llm-routerGo binaryKubernetesControl plane
PostgreSQL + pgvectorpg16KubernetesMemory
NATS JetStreamKubernetesEvent bus

Working

  • Chat routing (heuristic)
  • Conversation memory (rolling hash)
  • Hybrid semantic + chronological context injection
  • Agent mode always routed to 3050
  • Tool calls initiated and executed with loop detection
  • Codebase indexing via /v1/embeddings
  • File contents persisted to memory after agent sessions
  • System request detection and lightweight routing

Still flaky / not working

  • Model occasionally loops before the next-file hint fires
  • Glob search can exhaust max turns before reading any files
  • Auto-continuation disabled in agent mode
  • Intent model disabled — too unreliable with current models
  • Draft model disabled — leaks planning steps into responses

What I'd do differently

The hash-based conversation tracking is the thing I'm least happy with. It's clever but fragile — there's a race condition on turn 2 and it breaks down when conversation order is inconsistent. A proper session ID passed by the client would eliminate all of it. If you control the frontend, use a session ID.

The tool-calling architecture is fundamentally limited by the model. Qwen2.5-Coder-7B is impressive for its size, but it wasn't trained to reason about tool results in a multi-turn loop. The right fix is a model with native function calling — not more prompt engineering. Every hack I added (next-file hints, JSON schema constraints, loop detection) is load-bearing scaffolding around a gap in the model's capabilities.

Go was the right choice. The router is a single binary, deploys in seconds, handles concurrent SSE streams cleanly, and the type system caught several category errors that would have been silent bugs in Python.

What's next

Auto-enrollment (backends register themselves, zero config changes for new nodes), request queueing via NATS when all slots are full, a local agent sidecar on the workstation with real filesystem access, and — when a better model becomes available — re-enabling the intent classifier and the draft planner.

The backlog is longer than when I started. That feels about right.

Saturday, 25 April 2026

Under the Hood: How magic-auth Works

The previous post covered getting magic-auth up and running with Docker Compose. This one goes deeper — into the design decisions, security model, and how the moving parts actually fit together. If you've ever wondered what a self-hosted OIDC Identity Provider looks like from the inside, this is that post.


The Server: Go and Nothing Else

magic-auth is written in Go using only the standard library's net/http package — no web framework, no ORM, no router library. This is a deliberate choice. The binary is compiled to a scratch container, meaning the final Docker image contains a single executable and nothing else: no shell, no libc, no package manager, no attack surface beyond the server itself. The result is an image around 10 MB in size.

The schema — users, sessions, clients, tokens, RBAC rules — is created and migrated automatically on startup. There is no manual database setup step. The server supports two storage backends selectable via environment variable: rqlite, a lightweight distributed SQLite over Raft, and PostgreSQL. For most self-hosted deployments rqlite is the simpler choice since it runs as its own container with no external dependencies.


The Magic Link: What Actually Happens

When a user submits their email address, the server does the following:

  1. Looks up whether the address is registered. If it is not, the response is identical to the success case — a deliberate measure to prevent user enumeration.
  2. Generates a cryptographically random token, stores a bcrypt hash of it in the database against the user's session record, and constructs a verification URL containing the token and session ID.
  3. Publishes a JSON payload to NATS JetStream. The server's job ends here — it does not speak SMTP. Whatever consumer you have subscribed to that NATS subject is responsible for delivering the email.

The verification URL contains two parameters: a session ID and a token. When the user clicks the link, the server retrieves the session, verifies the token against the stored bcrypt hash, and then checks the browser fingerprint.

The fingerprint is an HMAC computed from the user's IP address, User-Agent header, and Accept-Language header at the time the magic link was requested. The same HMAC is recomputed at the time the link is clicked. If the values do not match — because the link was opened on a different device, from a different network, or in a different browser — the verification is rejected. This is a security tradeoff worth understanding: it prevents a stolen link from being used from a different context, but it also means a link forwarded from a desktop email client opened on a phone will fail.

Magic links are single-use. Once a token is verified it is deleted from the database. The link is also time-limited to 15 minutes.


JWT Signing: RS256 and ES256

magic-auth issues signed JWTs for all tokens — access tokens, refresh tokens, and the OIDC id_token. Two signing algorithms are supported:

  • RS256 (RSASSA-PKCS1-v1_5 with SHA-256) — uses a 2048-bit RSA key pair. Most broadly compatible with third-party libraries and services.
  • ES256 (ECDSA with P-256 and SHA-256) — uses a smaller EC key pair. Produces smaller tokens and verifies faster, but slightly less universally supported.

The private key is supplied as a PEM-encoded environment variable at startup. The corresponding public key is exposed via the standard JWKS endpoint at /.well-known/jwks.json, which includes the x5c certificate chain field. Any service that needs to verify tokens can fetch the public key from this endpoint and verify signatures locally without calling back to the IdP.

The OIDC discovery document at /.well-known/openid-configuration points to all the standard endpoints and declares the supported signing algorithms, so compliant clients can configure themselves automatically from a single URL.


Token Lifetimes and Rotation

Token lifetimes are fixed values baked into the server:

TokenLifetime
Access token8 hours
Refresh token14 days
Refresh token renewal window7 days
Magic link15 minutes, single-use

Refresh tokens rotate on every use. When a client presents a refresh token, the server issues a new access token and a new refresh token, and the old refresh token is immediately invalidated. If a previously revoked refresh token is ever presented again — indicating possible token theft — the server revokes all active sessions for that user immediately. This is the standard refresh token rotation security model described in RFC 6819 and the OAuth 2.0 Security Best Current Practice.

Roles are embedded in the JWT payload at every token issuance and refresh. This means role changes take effect at the next token mint — no logout is required.


The OIDC Layer

magic-auth implements a complete OpenID Connect Authorization Server. The full endpoint surface is:

GET  /.well-known/openid-configuration   Discovery document
GET  /.well-known/jwks.json              Public key set
POST /oauth/register                     Dynamic client registration (RFC 7591)
GET  /oauth/authorize                    Authorization code flow
POST /oauth/token                        Token exchange / refresh
GET  /oauth/userinfo                     Claims for the bearer
POST /oauth/revoke                       Token revocation (RFC 7009)

Dynamic client registration (RFC 7591) means new applications can register themselves programmatically with a single API call — no admin portal required for client onboarding. The server supports both confidential clients (server-side apps with a client_secret) and public clients (SPAs and mobile apps using PKCE with no secret).

PKCE (Proof Key for Code Exchange, RFC 7636) is required for public clients. It prevents authorization code interception attacks by binding the authorization request to a secret known only to the initiating client. The code_challenge is a SHA-256 hash of a random code_verifier; the verifier is submitted at token exchange and verified server-side. Even if an attacker intercepts the authorization code, they cannot exchange it without the original verifier.


The PKCE Client Implementation in magic-auth-ui

The companion management UI implements PKCE entirely in the browser using the Web Crypto API — no third-party OAuth library involved. Here is what happens step by step when the UI initiates a login:

  1. Generate a 256-bit random code_verifier using crypto.getRandomValues
  2. Compute the code_challenge as BASE64URL(SHA256(verifier)) using crypto.subtle.digest
  3. Generate a 128-bit random state for CSRF protection
  4. Generate a 128-bit random nonce for id_token replay protection
  5. Store the verifier, state, nonce, and the intended post-login destination in sessionStorage
  6. Redirect the browser to /oauth/authorize with all parameters

On the callback after the user has clicked their magic link:

  1. Validate the returned state against the stored value — mismatch means a possible CSRF and the flow aborts
  2. Delete the one-time values from sessionStorage immediately
  3. POST the authorization code and code_verifier to /oauth/token
  4. Verify the returned id_token client-side: fetch the correct signing key from JWKS by kid, import it via crypto.subtle.importKey, verify the signature, check iss, aud, exp, iat, and nonce
  5. Store the access token in JS module memory only — it is never written to localStorage or sessionStorage
  6. Store the refresh token in sessionStorage — it survives page reloads within the same tab but is cleared when the tab is closed

The JWKS cache is held in memory and keyed by kid. If a token arrives with an unknown kid — which would happen after a key rotation — the cache is refreshed automatically.

Silent Token Refresh

The access token is kept alive by a proactive refresh timer. When tokens are stored, a setTimeout is scheduled to fire 60 seconds before the access token expires. If the refresh succeeds, new tokens are stored and the timer is rescheduled. If the refresh fails — because the refresh token has expired or been revoked — tokens are cleared and the user is redirected to the login page.

On a full page reload, the in-memory access token is lost. The router's global navigation guard runs auth.init() on the first navigation, which checks for a refresh token in sessionStorage and attempts a silent refresh before deciding whether the user is authenticated. This means sessions survive tab refreshes without prompting the user to sign in again.


Email Delivery via NATS JetStream

Decoupling email delivery from the authentication server is one of the more useful design decisions in magic-auth. Rather than bundling SMTP configuration into the server, magic-auth publishes a structured JSON message to a NATS JetStream subject and leaves delivery entirely to an external consumer.

The payload looks like this:

{
  "to":      ["user@example.com"],
  "subject": "Your sign-in link",
  "body":    "Click the link below to sign in:\n\nhttps://auth.example.com/api/auth/verify?id=...&token=...",
  "is_html": false,
  "cc":      [],
  "bcc":     [],
  "headers": {
    "From":         "noreply@example.com",
    "X-Mailer":     "magiclink-auth",
    "X-Token-Type": "magic-link"
  }
}

The NATS stream is created automatically on startup if it does not already exist. The stream is configured with a maximum age of 24 hours and a maximum size of 128 MB by default, both overridable via environment variables. This means if your email consumer is temporarily down, messages will be retained for up to 24 hours and delivered when the consumer reconnects — rather than silently dropped.

The consumer can be written in any language. The only contract is: subscribe to the configured subject, deliver the email, call msg.Ack(). If delivery fails, do not ack — NATS will redeliver. Add a dead-letter queue for messages that exhaust retries.


The Role System

Roles are resolved fresh at every token issuance using a four-level priority chain. Given a user and a client, the server evaluates in this order and uses the first match:

  1. User role override — an explicit per-user, per-client assignment set via the admin API. This is the highest priority and overrides everything else.
  2. RBAC email rule — a rule matching the user's exact email address for this client.
  3. RBAC domain rule — a rule matching the user's email domain for this client. Useful for granting all users at a company a specific role without listing each address individually.
  4. Config default — falls back to ["user"]. Configurable per server.

Rules with client_id="*" match all clients, including direct-flow tokens. This makes it straightforward to grant a global admin role from a single rule without repeating it per client.

Custom roles can be created per client and optionally set as the default role for first-time logins to that client. This allows each application to define its own role vocabulary while still delegating authentication to a central IdP.

Because roles are embedded in the JWT at mint time, the server needs no separate token introspection call to enforce them. Applications can validate the JWT signature locally using the JWKS endpoint and read roles directly from the roles claim.


SSO Session Sharing

The SSO layer is built on top of the standard authentication flow rather than replacing it. When SSO is enabled globally and a client opts in, the server sets an additional cookie — __idp_session — after successful authentication. This cookie is HttpOnly, SameSite=Lax, and scoped to the IdP domain.

On a subsequent login request to another opted-in client, the server checks whether the submitted email matches the active SSO session. If it does, the server skips the magic link step entirely and proceeds directly to issuing an authorization code. If the emails do not match — because the user wants to switch accounts — the normal flow runs regardless.

This design means the user always has to type their email. There is no invisible automatic sign-in. The ability to switch accounts is always present, and the SSO session can never silently sign in under the wrong identity.

The SSO session is cleared on POST /api/auth/logout, POST /oauth/revoke, and GET /logout. The GET /logout endpoint is designed for cross-origin logout redirects — it clears the SSO cookie and then redirects the browser to the URL specified in the redirect query parameter.


Server Configuration Without Restarts

Runtime configuration — SSO toggle, session TTL, registration policy, allowed redirect domains — is stored in the database rather than in environment variables. This means it can be changed via the API and takes effect immediately, with a 30-second cache to reduce database reads. No container restart is needed.

Environment variables still handle secrets and infrastructure concerns: signing keys, database DSN, NATS URL, HMAC secrets. These are genuinely startup-time concerns. The distinction is deliberate: operational configuration belongs in the database, secrets belong in environment variables.


The Direct Magic Link Flow

Not every application needs full OIDC. magic-auth also supports a simpler direct flow for apps that just want session cookies managed by the IdP:

POST /api/auth/request    # Submit email, trigger magic link
GET  /api/auth/verify     # User clicks link — cookies are set
GET  /api/auth/me         # Check the current session
POST /api/auth/refresh    # Rotate refresh token
POST /api/auth/logout     # Clear all cookies and SSO session

In this flow the server sets access_token, refresh_id, and refresh_token cookies directly on successful verification. There is no authorization code redirect. This is simpler to integrate for server-rendered applications that do not need portable JWTs — though the cookies are still signed JWTs, just delivered as cookies rather than via the token endpoint.


What This Adds Up To

The architectural picture is a small, auditable server with clearly separated concerns: authentication logic in Go, email delivery decoupled via NATS, storage pluggable between rqlite and Postgres, token signing via standard asymmetric keys, and a full OIDC surface that any compliant client can consume without custom integration work.

None of these are novel ideas individually. The value is in how tightly they fit together in something small enough to understand completely, deploy in minutes, and operate without a dedicated platform team.

The Docker images are on Docker Hub:
API: jlcox1970/magiclink-auth
UI: jlcox1970/magiclink-ui

Setup guide: Building a Passwordless Auth System with magic-auth

Building a Passwordless Auth System with Magic Links (OAuth2/OIDC Included)

Passwordless Authentication with magic-auth: A Complete Setup Guide

Passwords are a liability. They get phished, reused, breached, and forgotten. Magic links — those one-click sign-in URLs sent to your email — offer a far cleaner user experience with a meaningfully smaller attack surface. magic-auth is a self-hosted, passwordless authentication server and full OpenID Connect (OIDC) Identity Provider written in Go. It handles the entire auth lifecycle: magic link delivery, JWT issuance, token rotation, SSO session sharing, and role-based access control — all from a ~10 MB scratch container.

This post walks through spinning up magic-auth and its companion management UI (magic-auth-ui) using Docker, wiring in an email delivery pipeline, and integrating your own applications via OAuth2/OIDC.


What Is magic-auth?

At its core, magic-auth does three things:

  1. Issues magic links — a user submits their email address, receives a time-limited, device-fingerprinted URL, and clicks it to authenticate. No password is ever stored or transmitted.
  2. Acts as a full OIDC IdP — it signs RS256 or ES256 JWTs and exposes all the standard OIDC endpoints, so any app that speaks OAuth2/OIDC can delegate auth to it.
  3. Manages roles and SSO — a built-in RBAC system lets you assign roles per-user per-app, and an optional SSO session layer lets users skip the email step once they're already signed in to another connected app.

The backend is pure Go using the standard net/http library. Storage is either rqlite (a lightweight distributed SQLite) or PostgreSQL. Email delivery is decoupled via NATS JetStream — magic-auth publishes a JSON message and any consumer you choose handles the actual SMTP/SES/SendGrid delivery.


Architecture Overview

┌─────────────┐     OIDC/OAuth2      ┌───────────────┐
│  Your App   │◄─────────────────────│  magic-auth   │
└─────────────┘                      │  (port 8080)  │
                                     └───────┬───────┘
                                             │ NATS JetStream
                                     ┌───────▼───────┐
                                     │  Email Worker │ (Node.js / Go / anything)
                                     └───────────────┘

┌─────────────────────┐              ┌───────────────┐
│  magic-auth-ui      │◄───────────│  magic-auth   │
│  (Admin Dashboard)  │  PKCE OIDC   │  (port 8080)  │
└─────────────────────┘              └───────────────┘

The UI is a separate Vue 3 SPA that authenticates against magic-auth using PKCE, and provides a web-based admin console for managing users, clients, and RBAC rules.


Prerequisites

  • Docker and Docker Compose installed
  • An SMTP relay, SendGrid, SES, or any email delivery service your worker can call
  • openssl available on your local machine (for key generation)

Step 1 — Generate a Signing Key

magic-auth signs JWTs using either RSA (RS256) or EC (ES256). Generate a key before writing any compose config:

# Option A: RSA (RS256) — most broadly compatible
openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 \
  | openssl pkey -traditional > private.pem

# Option B: EC (ES256) — smaller tokens, faster verification
openssl ecparam -name prime256v1 -genkey -noout \
  | openssl pkey > ec-private.pem

Keep this file safe — it's what makes your JWTs trustworthy.


Step 2 — Docker Compose

Create a docker-compose.yml:

services:
  magic-auth:
    image: jlcox1970/magiclink-auth:latest
    ports:
      - "8080:8080"
    environment:
      ISSUER:             "https://auth.example.com"
      JWT_SECRET:         "use-a-real-32-char-secret-here!!"
      FINGERPRINT_SECRET: "another-32-char-secret-here!!!!!"
      JWK_PRIVATE_KEY: |
        -----BEGIN RSA PRIVATE KEY-----
        <paste contents of private.pem here>
        -----END RSA PRIVATE KEY-----
      DB_DRIVER:    "rqlite"
      RQLITE_URL:   "http://rqlite:4001"
      NATS_URL:     "nats://nats:4222"
      FROM_ADDRESS: "noreply@example.com"
      BASE_URL:     "https://auth.example.com"
      LOG_LEVEL:    "info"
      SECURE_COOKIES: "true"
      RBAC_RULES: '[{"client_id":"*","principal":"you@example.com","principal_type":"email","roles":["global_admin"]}]'
    depends_on: [rqlite, nats]

  magic-auth-ui:
    image: jlcox1970/magiclink-ui:latest
    ports:
      - "3000:3000"
    environment:
      VITE_API_URL:      "https://auth.example.com"
      VITE_CLIENT_ID:    ""   # fill in after Step 4
      VITE_REDIRECT_URI: "https://admin.example.com/auth/callback"

  rqlite:
    image: rqlite/rqlite:8
    volumes: [rqlite-data:/rqlite/file]
    command: ["-node-id","1","-http-addr","0.0.0.0:4001","-raft-addr","0.0.0.0:4002"]

  nats:
    image: nats:2-alpine
    command: ["-js"]

volumes:
  rqlite-data:

A few things to note:

  • ISSUER and BASE_URL should be your public-facing HTTPS URL. They must match exactly — they appear in JWT iss claims and magic link URLs.
  • JWT_SECRET and FINGERPRINT_SECRET each need to be at least 32 characters. Use openssl rand -hex 32 to generate them.
  • SECURE_COOKIES: "true" requires HTTPS. For local development set it to "false".
  • The RBAC_RULES variable seeds your first global admin. It only applies on first boot when no DB rules exist yet, so it's safe to leave set permanently.

Step 3 — Wire Up Email Delivery

magic-auth does not send email itself. It publishes a JSON message to NATS JetStream on the emails.send subject. You need a consumer that picks that up and calls your email provider. Here is a minimal Node.js example using nodemailer:

import { connect, StringCodec } from "nats";
import nodemailer from "nodemailer";

const nc = await connect({ servers: "nats://localhost:4222" });
const js = nc.jetstream();
const sc = StringCodec();

const transporter = nodemailer.createTransport({
  host: "smtp.example.com",
  port: 587,
  auth: { user: "user", pass: "pass" }
});

const consumer = await js.consumers.get("EMAILS", "email-sender");
for await (const msg of await consumer.consume()) {
  const payload = JSON.parse(sc.decode(msg.data));
  await transporter.sendMail({
    from:    payload.headers["From"],
    to:      payload.to.join(", "),
    subject: payload.subject,
    text:    payload.body,
  });
  msg.ack();
}

The magic link in payload.body is valid for 15 minutes, single-use, and bound to the browser that made the request via an HMAC fingerprint of IP address, User-Agent, and Accept-Language. A link forwarded to a different device or network will be rejected — a deliberate security tradeoff.


Step 4 — Register the UI as an OIDC Client

With magic-auth running, register the management UI as a public PKCE client:

curl -X POST http://localhost:8080/oauth/register \
  -H "Content-Type: application/json" \
  -d '{
    "client_name":                "magic-auth-ui",
    "redirect_uris":              ["https://admin.example.com/auth/callback"],
    "token_endpoint_auth_method": "none"
  }'

The response includes a client_id. Copy it into your compose file as VITE_CLIENT_ID and restart the UI container. No client_secret is issued for public clients — PKCE takes its place.


Step 5 — Verify the Stack

# Health check
curl http://localhost:8080/api/health
# {"status":"ok"}

# OIDC discovery document
curl http://localhost:8080/.well-known/openid-configuration

# Public key set
curl http://localhost:8080/.well-known/jwks.json

Open http://localhost:3000 in your browser. You will be redirected to the magic-auth login UI. Enter your admin email and click the link that arrives. You will land in the management dashboard with global_admin access.


Understanding the Sign-In Flow

Here is what actually happens when a user authenticates through the OIDC flow:

  1. User hits a protected route → browser redirects to /login
  2. The app generates a PKCE code_verifier + code_challenge (S256), stores the verifier in sessionStorage, and redirects to /oauth/authorize
  3. magic-auth validates the client and redirects to its built-in login UI (or your custom one via LOGIN_UI_URL)
  4. User enters their email → magic-auth publishes to NATS → your email worker sends the link
  5. User clicks the link → magic-auth opens /api/auth/verify in a new tab, validates the device fingerprint and token, creates an authorization code, broadcasts the callback URL via BroadcastChannel, then closes the tab
  6. The waiting login page receives the broadcast and navigates to /auth/callback?code=...&state=...
  7. The app exchanges the code at POST /oauth/token with the PKCE verifier — no client secret needed

The issued access token contains standard OIDC claims including sub, email, name, roles, iss, aud, exp, and iat. Access tokens last 8 hours, refresh tokens last 14 days with a 7-day renewal window. Refresh tokens rotate on every use — replaying a revoked token triggers immediate revocation of all sessions for that user.


Integrating Your Own Application

Confidential Client (server-side app)

curl -X POST http://localhost:8080/oauth/register \
  -H "Content-Type: application/json" \
  -d '{
    "client_name":   "my-app",
    "redirect_uris": ["https://my-app.example.com/auth/callback"]
  }'

Store the returned client_id and client_secret — the secret is shown only once. Use the standard authorization code flow and exchange the code at POST /oauth/token with your credentials.

Public Client (SPA / mobile — PKCE)

Add "token_endpoint_auth_method": "none" to the registration and include code_challenge and code_challenge_method=S256 in the authorize URL. No client secret is used — the PKCE verifier proves possession at token exchange instead.


Role-Based Access Control

Roles are embedded in the JWT at every issuance and refresh. There are three built-in roles:

RoleAccess
global_adminFull access to all users, clients, and server configuration
app_adminScoped to their own client — manages users who have logged into their app
userDefault — self-service profile only

You can define custom roles per client and set one as the default for new logins to that client. Role resolution priority (first match wins): explicit per-user assignment → RBAC email rule → RBAC domain rule → config default (["user"]).

# Assign app_admin to everyone at example.com for a specific client
curl -X POST http://localhost:8080/oauth/rbac/rules \
  -H "Authorization: Bearer <global_admin_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "client_id":      "<client_id>",
    "principal":      "example.com",
    "principal_type": "domain",
    "roles":          ["app_admin"]
  }'

# Test resolution for a specific user + client
curl "http://localhost:8080/oauth/rbac/resolve?email=alice@example.com&client_id=<client_id>" \
  -H "Authorization: Bearer <global_admin_token>"

SSO Session Sharing

Once a user is signed in to one magic-auth app, they can skip the email step on other connected apps — they enter their email and are signed in immediately. Both conditions must be true: the global SSO toggle must be on, and the destination client must have SSO enabled.

# Enable SSO globally
curl -X PUT http://localhost:8080/api/admin/config \
  -H "Authorization: Bearer <global_admin_token>" \
  -H "Content-Type: application/json" \
  -d '{"sso_session_enabled": "true", "sso_session_ttl_hours": "168"}'

# Opt a client in
curl -X PUT http://localhost:8080/api/admin/clients/<client_id>/sso \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"enabled": true}'

The SSO session is stored in an HttpOnly, SameSite=Lax cookie scoped to the IdP domain and is cleared on logout, token revocation, and the GET /logout endpoint. The user always enters their email, preserving account-switching ability and preventing silent sign-in under the wrong identity.


The Management UI

magic-auth-ui is a Vue 3 SPA that serves as a full admin console. It authenticates using PKCE — no separate admin password, just the same magic link flow as every other user. Route access is role-gated:

RouteRequired Role
/dashboard, /profileAny authenticated user
/admin/users, /admin/rbacapp_admin or global_admin
/admin/clientsglobal_admin only

From the UI you can manage users, assign roles, create custom roles per client, toggle SSO per client, and update server configuration — all without touching the API directly.


Production Checklist

  • Replace SECURE_COOKIES: "false" with "true" (requires HTTPS)
  • Use randomly generated 32+ character values for JWT_SECRET and FINGERPRINT_SECRET — try openssl rand -hex 32
  • Set ISSUER and BASE_URL to your public HTTPS URL — they must match
  • Set ALLOWED_REDIRECT_DOMAINS to restrict which redirect URIs are permitted at client registration
  • Set registration_open: false once all clients are registered, or lock down registration_allowed_domains
  • Store JWK_PRIVATE_KEY in a secrets manager (Docker Secrets, Kubernetes Secret, Vault) — not inline in the compose file
  • Run rqlite with a persistent volume and consider a multi-node cluster for high availability
  • Configure your email worker with retries and a dead-letter queue — a failed delivery means a user cannot sign in

Switching to PostgreSQL

If you prefer Postgres over rqlite, update two environment variables and remove the rqlite service:

DB_DRIVER:    "postgres"
POSTGRES_DSN: "postgres://user:pass@postgres:5432/magicauth?sslmode=require"

Schema migrations run automatically on startup — no manual CREATE TABLE needed.


Summary

magic-auth gives you a complete, self-hosted passwordless auth stack in a single ~10 MB container. Users never touch a password. You get standard OIDC tokens that work with any OAuth2-aware library or middleware. The role system is flexible enough for multi-tenant SaaS apps without being complicated to operate.

The key moving parts are magic-auth itself, rqlite or Postgres for storage, NATS for email queuing, your own email worker, and optionally the management UI. Everything talks over standard protocols — swap out any piece independently as your requirements evolve.

Docker Hub:
API: jlcox1970/magiclink-auth
UI: jlcox1970/magiclink-ui




Magic Auth Deep Dive: Passwordless Auth System with magic-auth, under the hood