Sunday, 30 November 2025

Just a Forward Proxy: Let Varnish Do Its Job

I've been running Varnish for a while now as a package proxy with persistent storage — think RPMs, DEBs, ISOs and similar artifacts. It works well, but for a long time there was always this extra component glued to the back of it: nginx.

nginx started life in this setup as “just the TLS terminator and reverse proxy,” but over time it effectively turned into:

  • Another large container image to build and ship
  • Another quasi-OS to track CVEs for
  • Another moving piece with enough configuration surface area to get creative in all the wrong ways

At some point I realised that for this specific use case, I did not actually want a smart reverse proxy. I wanted something that just proxies. Varnish should be the one doing the heavy lifting.

So I built exactly that.


The Problem: Too Much Web Server, Not Enough Proxy

The traffic pattern here is fairly simple:

  • Varnish is acting as a persistent package cache
  • Upstream is a collection of HTTP mirrors and repositories (Rocky, Fedora, EPEL, and friends)
  • Clients are package managers that expect sane HTTP semantics and sometimes clever mirror URLs

Reality, however, looked more like this:

client -> nginx (TLS) -> varnish -> nginx-proxy (redirects and config sprawl) -> assorted mirrors

On top of that, some RPM mirror URLs were configured to point back to Varnish as a repository server. That meant I needed something upstream that could:

  • Handle plain, boring HTTP proxying reliably
  • Support large streaming responses
  • Avoid buffering or caching (Varnish already has that covered)
  • Perform mirror rewrites and metalink generation for specific ecosystems
  • Not require babysitting yet another full-featured web server or distribution

That is where the forward proxy comes in.


The New Path: nginx Out, Simple Forward Proxy In

The new design is deliberately minimal:

client -> nginx (optional TLS terminator) -> varnish -> forward-proxy -> real mirrors

Or, if you terminate TLS elsewhere (ingress, L4 load balancer, etc.):

client -> varnish -> forward-proxy -> real mirrors

In this setup:

  • Varnish remains the cache with persistent storage
  • The forward proxy is a stateless, streaming HTTP forwarder
  • nginx becomes optional and only exists for TLS if you really want it there

The forward proxy is written in Go, ships as a static binary and runs in a scratch container. It does not serve HTML, does not render templates and does not terminate TLS — it simply proxies.


What the Forward Proxy Actually Does

At a high level, the forward proxy is tailored for:

  • Varnish as an upstream client
  • Deb/RPM/YUM/DNF/APT repository traffic
  • Large numbers of concurrent requests
  • Large file streaming (ISOs, RPMs, DEBs, metadata)
  • Multi-origin proxying based purely on the Host header
  • Repository mirror rewrites (Rocky, Fedora, EPEL, Cisco OpenH264)
  • Fedora metalink XML generation
  • Prometheus metrics and Grafana dashboards
  • Custom DNS resolvers
  • Minimal, scratch-based container deployment

It is not:

  • A general-purpose reverse proxy
  • An origin server
  • A templating engine
  • A TLS terminator

It takes the incoming Host plus URL, decides where to send it (optionally via mirror rules) and then streams bytes upstream and back to Varnish. That is the whole job description.


Transparent Forward Proxying

The proxy operates in a straightforward way:

  • It inspects Host and the request path
  • If there is a matching mirror rule, it rewrites to the configured base URL and template
  • Otherwise, it simply dials the upstream host as-is
  • It streams bytes using io.Copy — no buffering, no temporary files

Key behaviours:

  • Range requests (for resuming downloads) are passed straight through
  • Large files are streamed end-to-end
  • HTTPS upgrade redirects are handled so Varnish can stay happily in HTTP land

This makes it well suited for backing Varnish in a package mirror environment, where the cache sits in the middle and the proxy just needs to be fast, predictable and boring.


Mirror Rewrites via YAML

The interesting logic lives in mirrors.yaml. This is where you describe how to turn distro mirror endpoints into something local and friendly.

mirrors:
  - name: rocky-mirrorlist
    host: mirrors.rockylinux.org
    path_prefix: /mirrorlist
    base_url: http://rockylinux.globelock.home
    repo_split_pattern: "^(?P<base>.*?)-(?P<version>[0-9.]+)$"

    rules:
      - name: altarch-common
        when:
          repo_contains: altarch
        template: "{base_url}/pub/sig/{version}/altarch/{arch}/altarch-common"

      - name: epel-cisco-openh264
        when:
          repo_contains: epel-cisco-openh264
        template: "http://codecs-fedoraproject.globelock.home/openh264/epel/{version}/{arch}/os"

    default_template: "{base_url}/pub/rocky/{version}/{base}/{arch}/os"

  - name: fedora-metalink
    host: mirrors.fedoraproject.org
    path_prefix: /metalink
    base_url: http://mirror.aarnet.edu.au
    response_type: fedora_metalink

    rules:
      - name: fedora-updates
        when:
          repo_contains: updates-released
        template: "{base_url}/pub/fedora/linux/updates/{version}/Everything/{arch}"

      - name: epel
        when:
          repo_contains: epel
        template: "{base_url}/pub/epel/{version}/Everything/{arch}/os"

    default_template: "{base_url}/pub/{base}/{version}/Everything/{arch}/os"

The proxy parses the request (for example: repo, arch, version), runs it through these rules and either:

  • Returns a generated Fedora metalink XML, or
  • Returns a rewritten mirror URL that points at your chosen upstream

From Varnish's perspective, it is simply talking to an HTTP origin that always knows where to find the right mirror.


Fedora Metalink Generation

For Fedora, you can query the proxy like this:

GET /metalink?repo=fedora-42&arch=x86_64

It responds with a valid metalink XML pointing at your chosen mirror, for example:

<?xml version="1.0" encoding="utf-8"?>
<metalink>
  <files>
    <file name="repomd.xml">
      <resources>
        <url protocol="http" type="http">
          http://mirror.aarnet.edu.au/.../repomd.xml
        </url>
      </resources>
    </file>
  </files>
</metalink>

That keeps the Fedora tooling happy while still giving you control over which mirror is actually used.


Metrics and Observability

Metrics are exposed on a separate listener (default :9090) in Prometheus format. You get:

  • Request counts
  • Duration histograms
  • Bytes in and out
  • Upstream error counts
  • In-flight request gauges
  • Per-client request counters

An example p95 latency query:

histogram_quantile(
  0.95,
  sum by (le, host)(rate(proxy_request_duration_seconds_bucket[5m]))
)

Feed that into Grafana and you have a clear picture of how the forward proxy behaves under load.


DNS Control

Rather than inheriting /etc/resolv.conf from some arbitrary container base image, you can explicitly set upstream DNS servers:

UPSTREAM_DNS="1.1.1.1,8.8.8.8"

The proxy uses those resolvers to look up upstream hosts. That makes it easier to isolate DNS behaviour and avoid surprises, particularly in container-heavy environments.


Logging

Logs are Apache-style so they fit in cleanly with existing tooling:

<client-ip> - - [timestamp] "METHOD URL HTTP/x.x" status bytes "ref" "ua" host=X duration=0.xxx

Client IP detection honours:

  1. X-Forwarded-For (first non-localhost)
  2. X-Real-IP
  3. RemoteAddr

So even if you do place nginx or some other load balancer in front, you still get meaningful client attribution.


Run The Container Image

podman run -d \
  -p 8080:8080 \
  -p 9090:9090 \
  -e MIRROR_CONFIG=/mirrors.yaml \
  -e UPSTREAM_DNS="1.1.1.1,8.8.8.8" \
  -v $(pwd)/mirrors.yaml:/mirrors.yaml \
  forward-proxy

If you just want to try it without building anything, there is an image on Docker Hub:

jlcox1970/forward-proxy

Why Bother?

In theory, you can make nginx do most of this. In practice:

  • I do not want a full web server for this job
  • I do not want to track another CVE stream for “just a proxy”
  • I do not want to debug buffering and proxy caching in multiple places
  • I do want a small, purpose-built binary that streams bytes and exposes metrics

By moving this logic into a compact Go service and letting Varnish handle what it is good at (caching, persistence, HTTP semantics), the stack becomes:

  • Easier to reason about
  • Smaller to maintain
  • More predictable under load

Sometimes the right answer really is just a forward proxy.

No comments:

Post a Comment