3 minutes
Tor Relay Performance Datasets
Analyzing performance characteristics of anonymity networks is notoriously difficult. In the Tor network, bandwidth measurements are traditionally concentrated within a few core directory authorities or testing agents.
Because these measurement nodes are mostly located inside North American and European datacenters, the resulting performance metrics suffer from severe geographical bias.
If you are a privacy researcher trying to analyze how well Tor relays perform in the Global South or the APAC region, existing public datasets are sparse, unreliable, and heavily skewed.
To address this gap and provide an objective, geographically diverse performance map of the Tor network, I built the Autonomous Tor Relay Speed Testing Service and generated a massive, high-fidelity time-series dataset.
graph TD
Co[Central Measurement Coordinator] -->|Distribute Targets| MA1[APAC Agent - Tokyo]
Co -->|Distribute Targets| MA2[Africa Agent - Johannesburg]
Co -->|Distribute Targets| MA3[South America Agent - São Paulo]
Co -->|Distribute Targets| MA4[EU / NA Probing Agents]
MA1 -->|Circuits & Downloads| Tor[Tor Network Relay Nodes]
MA2 -->|Circuits & Downloads| Tor
MA3 -->|Circuits & Downloads| Tor
MA4 -->|Circuits & Downloads| Tor
Tor -->|Speed & Latency Telemetry| Co
Co -->|Gzip Compression| Data[data.gz Raw Datasets]
Co -->|Publish Guide| Web[HTML Researcher Guide]
Decoupling geographic measurement bias
Traditional performance auditing suffers from spatial bias because EU/US measurement probes are geographically close to the majority of high-capacity relays. This makes local speeds look incredibly fast while completely ignoring the network latency and throughput degradation faced by users connecting from remote regions.
To solve this, I deployed measurement agents across a globally distributed server array, including active endpoints in APAC, Africa, and South America, alongside standard North American and European co-location nodes.
The measurement agents continuously build Tor circuits and run speed tests. Over a short three-day testing sprint, the distributed network executed and logged over 1,000,000 individual, high-precision latency and download measurements across every single active Tor relay in the directory.
Structural data archiving
Collecting millions of network speed tests results in a massive, unwieldy pile of raw telemetry. To make these insights useful for academic research, the service aggregates the time-series datapoints and packs them into highly compressed, gzip-archived flat-file databases (data.gz).
+-------------------------------------------------------------+
| Dataset Schema |
+-------------------+-----------------------------------------+
| Field | Description |
+-------------------+-----------------------------------------+
| timestamp | Unix epoch time of the measurement |
| relay_fingerprint | Cryptographic identity of target relay |
| agent_location | Geographic ID of the testing probe |
| download_speed | Bytes per second achieved |
| circuit_latency | Round-trip time in milliseconds |
+-------------------+-----------------------------------------+
Along with the raw databases, the repository hosts a complete, self-contained Researcher’s Guide (index.html) explaining the collection methodologies, measurement parameters, data schema, and potential academic use cases.
Supporting future network engineering
Demonstrating the value of geographically distributed measurements is critical for the future of privacy networks. By proving how geographical distance impacts relay routing performance, this research helps encourage organizations like the Tor Project to expand their official measurement networks.
If you are writing an academic paper on decentralized network routing or want to analyze how congestion affects traffic routing across the Global South, this dataset is for you.
- Codebase, Datasets & Guide: f0o/tor-measurements on Github
Related Content: