prom_hosts :: Daniel 'f0o' Preussker

Standard flat-file /etc/hosts name resolution is ancient, but it is still incredibly common in local home labs, small environments, and custom Kubernetes ingress networks. It is simple, stateless, and gets the job done without the headache of provisioning a heavy authoritative BIND or Knot DNS infrastructure.

Most modern networks run CoreDNS to manage internal name resolution, and CoreDNS has a built-in hosts plugin to serve zones straight from flat /etc/hosts files.

But there is a major problem. When you are managing an active infrastructure, the CoreDNS hosts plugin acts as a silent black box.

You have absolutely no idea how many queries are hitting your static entries, when the hosts files were last reloaded by the background thread, or if specific records are causing resolution bottlenecks.

To bring enterprise-grade observability to flat-file name resolution, I built prom_hosts (a custom fork/plugin for CoreDNS written in Go) which exposes native Prometheus telemetry counters directly from the hosts resolution loop.

graph TD
    Client[DNS Client / Pod] -->|A / AAAA Query| CoreDNS[CoreDNS Server]
    CoreDNS -->|Execute Plugin Chain| PromHosts[prom_hosts Plugin]
    
    subgraph prom_hosts engine
        PromHosts -->|1. Parse file / Cache lookup| HF[LookupStaticHost]
        HF -->|Hit: Increment requests_total| Metric[Prometheus Vector Metrics]
        Metric -->|Export Gauge: entries| ProExporter[Prometheus Metrics Endpoint]
        Metric -->|Export Gauge: reload_timestamp_seconds| ProExporter
    end
    
    PromHosts -->|DNS Success Response| Client
    ProExporter -->|Scrape / Metrics Collection| Prom[Prometheus Server]

Adding telemetry to DNS lookups

prom_hosts hooks directly into the CoreDNS query lifecycle. On every incoming DNS resolution request, it determines if the target query can be resolved locally using the static hosts records.

var (
	// hostsEntries is the combined number of entries in hosts and Corefile.
	hostsEntries = promauto.NewGaugeVec(prometheus.GaugeOpts{
		Namespace: plugin.Namespace,
		Subsystem: "hosts",
		Name:      "entries",
		Help:      "The combined number of entries in hosts and Corefile.",
	}, []string{})
	
	// hostsReloadTime is the timestamp of the last reload of hosts file.
	hostsReloadTime = promauto.NewGauge(prometheus.GaugeOpts{
		Namespace: plugin.Namespace,
		Subsystem: "hosts",
		Name:      "reload_timestamp_seconds",
		Help:      "The timestamp of the last reload of hosts file.",
	})
	
	// RequestCount is the amount of requests served from hosts-file/s
	RequestCount = promauto.NewCounterVec(prometheus.CounterOpts{
		Namespace: plugin.Namespace,
		Subsystem: "hosts",
		Name:      "requests_total",
		Help:      "Counter of requests served by hosts plugin.",
	}, []string{})
)

The plugin increments our custom requests_total counter vec, updates the entries gauge with the total parsed records count, and sets the reload_timestamp_seconds whenever CoreDNS detects a hosts-file change and reloads it in memory.

Clean fallthrough routing

In CoreDNS, plugins run in a serial chain. If a plugin cant find an answer for a query, it must cleanly pass execution to the next plugin in the list.

prom_hosts preserves this behavior. It intercepts static lookups for TypePTR, TypeA, and TypeAAAA records, resolves them, increments the query metric counter, and responds.

But if no local record matches the query, it cleanly executes a fallthrough pass, allowing upstream resolvers to handle the request without any configuration friction.

// Only on NXDOMAIN we will fallthrough.
if len(answers) == 0 && !h.otherRecordsExist(qname) {
	if h.Fall.Through(qname) {
		return plugin.NextOrFailure(h.Name(), h.Next, ctx, w, r)
	}

	// We want to send an NXDOMAIN, but because of /etc/hosts' setup we don't have a SOA, so we make it SERVFAIL
	// to at least give an answer back to signals we're having problems resolving this.
	return dns.RcodeServerFailure, nil
}

Observe everything

By plugging prom_hosts into your Corefile, you can easily graph hosts-file lookup traffic in Grafana, configure alerting thresholds on host file reload failures, and observe query loads across your container workloads in real-time.

It is a beautifully simple observability wrapper that proves even legacy flat-file mechanisms can be monitored like modern cloud infrastructure.

Codebase & Contributing: f0o/prom_hosts on Github