Sunday, 30 November 2025

Just a Forward Proxy: Let Varnish Do Its Job

I've been running Varnish for a while now as a package proxy with persistent storage — think RPMs, DEBs, ISOs and similar artifacts. It works well, but for a long time there was always this extra component glued to the back of it: nginx.

nginx started life in this setup as “just the TLS terminator and reverse proxy,” but over time it effectively turned into:

  • Another large container image to build and ship
  • Another quasi-OS to track CVEs for
  • Another moving piece with enough configuration surface area to get creative in all the wrong ways

At some point I realised that for this specific use case, I did not actually want a smart reverse proxy. I wanted something that just proxies. Varnish should be the one doing the heavy lifting.

So I built exactly that.


The Problem: Too Much Web Server, Not Enough Proxy

The traffic pattern here is fairly simple:

  • Varnish is acting as a persistent package cache
  • Upstream is a collection of HTTP mirrors and repositories (Rocky, Fedora, EPEL, and friends)
  • Clients are package managers that expect sane HTTP semantics and sometimes clever mirror URLs

Reality, however, looked more like this:

client -> nginx (TLS) -> varnish -> nginx-proxy (redirects and config sprawl) -> assorted mirrors

On top of that, some RPM mirror URLs were configured to point back to Varnish as a repository server. That meant I needed something upstream that could:

  • Handle plain, boring HTTP proxying reliably
  • Support large streaming responses
  • Avoid buffering or caching (Varnish already has that covered)
  • Perform mirror rewrites and metalink generation for specific ecosystems
  • Not require babysitting yet another full-featured web server or distribution

That is where the forward proxy comes in.


The New Path: nginx Out, Simple Forward Proxy In

The new design is deliberately minimal:

client -> nginx (optional TLS terminator) -> varnish -> forward-proxy -> real mirrors

Or, if you terminate TLS elsewhere (ingress, L4 load balancer, etc.):

client -> varnish -> forward-proxy -> real mirrors

In this setup:

  • Varnish remains the cache with persistent storage
  • The forward proxy is a stateless, streaming HTTP forwarder
  • nginx becomes optional and only exists for TLS if you really want it there

The forward proxy is written in Go, ships as a static binary and runs in a scratch container. It does not serve HTML, does not render templates and does not terminate TLS — it simply proxies.


What the Forward Proxy Actually Does

At a high level, the forward proxy is tailored for:

  • Varnish as an upstream client
  • Deb/RPM/YUM/DNF/APT repository traffic
  • Large numbers of concurrent requests
  • Large file streaming (ISOs, RPMs, DEBs, metadata)
  • Multi-origin proxying based purely on the Host header
  • Repository mirror rewrites (Rocky, Fedora, EPEL, Cisco OpenH264)
  • Fedora metalink XML generation
  • Prometheus metrics and Grafana dashboards
  • Custom DNS resolvers
  • Minimal, scratch-based container deployment

It is not:

  • A general-purpose reverse proxy
  • An origin server
  • A templating engine
  • A TLS terminator

It takes the incoming Host plus URL, decides where to send it (optionally via mirror rules) and then streams bytes upstream and back to Varnish. That is the whole job description.


Transparent Forward Proxying

The proxy operates in a straightforward way:

  • It inspects Host and the request path
  • If there is a matching mirror rule, it rewrites to the configured base URL and template
  • Otherwise, it simply dials the upstream host as-is
  • It streams bytes using io.Copy — no buffering, no temporary files

Key behaviours:

  • Range requests (for resuming downloads) are passed straight through
  • Large files are streamed end-to-end
  • HTTPS upgrade redirects are handled so Varnish can stay happily in HTTP land

This makes it well suited for backing Varnish in a package mirror environment, where the cache sits in the middle and the proxy just needs to be fast, predictable and boring.


Mirror Rewrites via YAML

The interesting logic lives in mirrors.yaml. This is where you describe how to turn distro mirror endpoints into something local and friendly.

mirrors:
  - name: rocky-mirrorlist
    host: mirrors.rockylinux.org
    path_prefix: /mirrorlist
    base_url: http://rockylinux.globelock.home
    repo_split_pattern: "^(?P<base>.*?)-(?P<version>[0-9.]+)$"

    rules:
      - name: altarch-common
        when:
          repo_contains: altarch
        template: "{base_url}/pub/sig/{version}/altarch/{arch}/altarch-common"

      - name: epel-cisco-openh264
        when:
          repo_contains: epel-cisco-openh264
        template: "http://codecs-fedoraproject.globelock.home/openh264/epel/{version}/{arch}/os"

    default_template: "{base_url}/pub/rocky/{version}/{base}/{arch}/os"

  - name: fedora-metalink
    host: mirrors.fedoraproject.org
    path_prefix: /metalink
    base_url: http://mirror.aarnet.edu.au
    response_type: fedora_metalink

    rules:
      - name: fedora-updates
        when:
          repo_contains: updates-released
        template: "{base_url}/pub/fedora/linux/updates/{version}/Everything/{arch}"

      - name: epel
        when:
          repo_contains: epel
        template: "{base_url}/pub/epel/{version}/Everything/{arch}/os"

    default_template: "{base_url}/pub/{base}/{version}/Everything/{arch}/os"

The proxy parses the request (for example: repo, arch, version), runs it through these rules and either:

  • Returns a generated Fedora metalink XML, or
  • Returns a rewritten mirror URL that points at your chosen upstream

From Varnish's perspective, it is simply talking to an HTTP origin that always knows where to find the right mirror.


Fedora Metalink Generation

For Fedora, you can query the proxy like this:

GET /metalink?repo=fedora-42&arch=x86_64

It responds with a valid metalink XML pointing at your chosen mirror, for example:

<?xml version="1.0" encoding="utf-8"?>
<metalink>
  <files>
    <file name="repomd.xml">
      <resources>
        <url protocol="http" type="http">
          http://mirror.aarnet.edu.au/.../repomd.xml
        </url>
      </resources>
    </file>
  </files>
</metalink>

That keeps the Fedora tooling happy while still giving you control over which mirror is actually used.


Metrics and Observability

Metrics are exposed on a separate listener (default :9090) in Prometheus format. You get:

  • Request counts
  • Duration histograms
  • Bytes in and out
  • Upstream error counts
  • In-flight request gauges
  • Per-client request counters

An example p95 latency query:

histogram_quantile(
  0.95,
  sum by (le, host)(rate(proxy_request_duration_seconds_bucket[5m]))
)

Feed that into Grafana and you have a clear picture of how the forward proxy behaves under load.


DNS Control

Rather than inheriting /etc/resolv.conf from some arbitrary container base image, you can explicitly set upstream DNS servers:

UPSTREAM_DNS="1.1.1.1,8.8.8.8"

The proxy uses those resolvers to look up upstream hosts. That makes it easier to isolate DNS behaviour and avoid surprises, particularly in container-heavy environments.


Logging

Logs are Apache-style so they fit in cleanly with existing tooling:

<client-ip> - - [timestamp] "METHOD URL HTTP/x.x" status bytes "ref" "ua" host=X duration=0.xxx

Client IP detection honours:

  1. X-Forwarded-For (first non-localhost)
  2. X-Real-IP
  3. RemoteAddr

So even if you do place nginx or some other load balancer in front, you still get meaningful client attribution.


Run The Container Image

podman run -d \
  -p 8080:8080 \
  -p 9090:9090 \
  -e MIRROR_CONFIG=/mirrors.yaml \
  -e UPSTREAM_DNS="1.1.1.1,8.8.8.8" \
  -v $(pwd)/mirrors.yaml:/mirrors.yaml \
  forward-proxy

If you just want to try it without building anything, there is an image on Docker Hub:

jlcox1970/forward-proxy

Why Bother?

In theory, you can make nginx do most of this. In practice:

  • I do not want a full web server for this job
  • I do not want to track another CVE stream for “just a proxy”
  • I do not want to debug buffering and proxy caching in multiple places
  • I do want a small, purpose-built binary that streams bytes and exposes metrics

By moving this logic into a compact Go service and letting Varnish handle what it is good at (caching, persistence, HTTP semantics), the stack becomes:

  • Easier to reason about
  • Smaller to maintain
  • More predictable under load

Sometimes the right answer really is just a forward proxy.

Saturday, 8 November 2025

Updating the Package Server – Auth, Probes, and a Bit of Cleanup

It’s been a while since I last wrote about the package server project. In that post, I’d just finished stabilizing the upload flow and getting the dynamic mirror logic working. Since then, I’ve pushed a fair number of updates — tightening authentication, cleaning up CI, and finally adding the small operational touches that make it more comfortable to run day to day.

Cleaning up after August

After I wrapped up the mirror work in August, I spent some time chasing edge cases and wiring up the CI. That work landed in a couple of late-September releases — mostly housekeeping and the first experiments toward a proper auth stack.
By November, the branch had grown into a decent fall cleanup: new auth modes, proper health endpoints, saner logs, and the usual round of dependency bumps and formatting fixes.

Authentication done right (or at least, done)

I finally formalized authentication. You can now run the server with no auth, basic file-based auth, an API backend that returns a JWT, or full OIDC integration. That gives me a smooth path from local testing through production, all using the same configuration block.

auth:
  mode: basic
  basic_htpasswd_file: ./.htpasswd
  basic_api_backends:
    - https://{rest server auth}

I spent more time than I’d like to admit making sure those combinations actually worked — the “none/basic/OIDC” wiring was surprisingly fiddly. The payoff, though, is that it now supports the same JWT-based flow I use elsewhere for CI jobs and other internal services.

Health endpoints and quieter logs

While working through container orchestration setups, I finally added proper /health and /ready endpoints. They exist mainly so Kubernetes probes don’t clutter the logs with noise.
At the same time, I reworked logging to use a standard Apache-style format and correctly handle X-Forwarded-For, so it’s finally possible to see who’s actually talking to the server through a proxy chain.

Mirrors that behave like mirrors

The dynamic pull-through mirrors I introduced last time are now more flexible. You can chain mirrors — useful when running a local cache in front of a site-wide one — and control the upstream via environment variables.
Set REGISTRY_UPSTREAM_MIRROR and flip enableProxy to true in the config, and the server will behave as a caching proxy without additional plumbing.

It’s a small feature, but it makes it much easier to run the server as part of a layered mirror setup.

CI and releases

Most of the September commits were about getting CI to publish images reliably. That’s all cleaned up now — image pushes happen automatically, and local builds match what I push from CI.
If you just want to run the latest build, you can grab it directly from Docker Hub:

docker pull jlcox1970/package-server:<tag>

I also standardized the module layout so it’s easier to build locally without wrestling with paths or dependencies.

Upgrading in place

If you’re already running an older instance, upgrades should be painless:

  • Keep your existing mirror and storage settings.
  • Switch your auth mode from “none” to “basic” (and point it to an .htpasswd file).
  • Add basic_api_backends if you want to authenticate against an external service.
  • Redirect liveness and readiness checks to the new endpoints.
  • Optionally enable REGISTRY_UPSTREAM_MIRROR to save bandwidth on cold pulls.

The default behavior hasn’t changed — just more options where they make sense.

Looking ahead

Next up is tightening the OIDC path and publishing a short “cookbook” of operational examples. I’ve had a few requests for topologies that mix public mirrors, internal caches, and private registries, so I’ll document what I’m using in production once it settles down.

As always, the code’s on GitLab at
https://gitlab.com/jlcox70/repository-server

and the Docker image builds are automatically pushed to jlcox1970/package-server on Docker Hub.