Telemetry Governor
The Telemetry Governor is CollectorCtrl's centralized data plane profiler. It lets you tap a configurable percentage of live telemetry from any connected agent, inspect traffic patterns, and run What-If simulations to predict pipeline savings — all without touching your production data path.
1. How Agent Tapping Works
Tapping creates a mirror of live telemetry. The Supervisor on each target agent clones and forwards only the configured percentage of signals to the Governor's built-in OTLP receiver:
Enabling a Tap
- Navigate to Telemetry Governor in the sidebar.
- Locate the target agent in the list.
- Click Toggle Tap — this calls
POST /api/telemetry_governor/taps/togglewith the agent ID. - Set the Sampling Rate (e.g.,
5.0%). The agent will clone that percentage of logs, metrics, and spans and forward them over OTLP/HTTP.
Key principle: The production pipeline remains completely unaffected. Tapping operates on cloned data only.
2. Ingestion Storage Architecture
Tapped telemetry lands in one of two storage backends depending on your deployment configuration:
SQLite (Default — Zero Configuration)
Best suited for development, small fleets, or single-server deployments.
| Property | Detail |
|---|---|
| Setup required | None — automatic |
| Tables | telemetry_governor_logs, telemetry_governor_metrics, telemetry_governor_spans |
| Retention | Pruned automatically every 5 minutes — records older than 1 hour are deleted |
| Limitation | Not recommended for high-volume profiling due to SQLite write-lock contention |
ClickHouse (Enterprise — High-Volume)
For production profiling of large fleets, CollectorCtrl integrates natively with ClickHouse, a columnar analytics database optimised for time-series log data.
| Property | Detail |
|---|---|
| Setup required | Provide ClickHouse connection parameters via environment variables |
| Performance | Sub-second queries over billions of log rows |
| Auto-detection | CollectorCtrl automatically switches to ClickHouse when a valid connection is detected (IsClickHouseConnected()) |
| Fallback | If ClickHouse is unreachable, the server transparently falls back to SQLite |
To enable ClickHouse mode, configure the following environment variables on the Management Server:
COLLECTORCTRL_CLICKHOUSE_ADDR=your-clickhouse-host:9000COLLECTORCTRL_CLICKHOUSE_DB=telemetry_governorCOLLECTORCTRL_CLICKHOUSE_USER=defaultCOLLECTORCTRL_CLICKHOUSE_PASSWORD=your_secure_password3. Deploying the Telemetry Governor Engine
The Telemetry Governor Engine runs as a decoupled data plane containing ClickHouse and an OpenTelemetry Collector.
Step 1: Deploy on Linux Server (Docker Compose)
Log into your target Linux VM, create a directory, and run the unified installer script:
$mkdir telemetry-governor && cd telemetry-governor$curl -fsSL https://raw.githubusercontent.com/CollectorCtrl/CollectorCtrl/main/deploy/telemetry-governor/linux/install.sh -o install.sh$chmod +x install.sh$./install.sh[!NOTE] What the installer does: Spins up the ClickHouse and OTel Collector containers, health checks ClickHouse local TCP port, and automatically provisions the required log/span database tables (
otel_logsandotel_traces) using the embedded SQL schema.
Step 2: Configure Network & Firewalls
Ensure your VM security groups allow inbound traffic on the following ports:
- 9000 (TCP): ClickHouse native TCP port (required by the Management Server to query volume/templates).
- 4317 (gRPC) / 4318 (HTTP): OTel Collector ingestion endpoints (required by target Supervisors to forward tapped telemetry streams).
Step 3: Connect Management Server
Configure the Management Server to query ClickHouse by setting the registry environment variables on the Windows host and restarting the service:
# Run as Administrator to write to system registryPS >$serviceKey = "HKLM:\System\CurrentControlSet\Services\CollectorCtrl"PS >$currentEnv = (Get-ItemProperty -Path $serviceKey).EnvironmentPS >$newEnv = $currentEnv + "COLLECTORCTRL_CLICKHOUSE_ADDR=YOUR_VM_IP:9000" + "COLLECTORCTRL_CLICKHOUSE_PASSWORD=telemetrypassword"PS >Set-ItemProperty -Path $serviceKey -Name "Environment" -Value $newEnv -Type MultiString# Restart the servicePS >Restart-Service -Name "CollectorCtrl"4. Volume Dashboard
The Volume view gives you an at-a-glance breakdown of tapped telemetry currently stored in the Governor:
- Total Logs / Spans — aggregate signal counts
- Logs by Severity —
INFO,WARN,ERROR,DEBUGdistribution - Spans by Agent — which agents are generating the most trace traffic
- Top Metrics by Name — cardinality leaders from your metric streams
This view is served by GET /api/telemetry_governor/volume and automatically queries ClickHouse first, falling back to SQLite if needed.
4. Template Discovery via Governor
While viewing tapped log data, the Governor also powers Template Discovery — a pattern-recognition engine that groups similar log lines into reusable templates using token-frequency analysis.
Navigate to the Templates sub-tab within the Telemetry Governor to:
- Browse automatically extracted log patterns
- Identify which templates consume the most volume
- Promote high-signal templates into the Semantic Mapping Registry
5. What-If Simulation Engine
The simulation engine evaluates filter conditions over your tapped telemetry — giving you a precise prediction of how much data (and cost) you would eliminate by applying a specific OTTL filter before it reaches your backend.
Running a Simulation
Use POST /api/telemetry_governor/simulate with the following JSON payload:
{
"signal": "logs",
"condition": "severity == \"info\" and body contains \"healthcheck\""
}
| Field | Values | Description |
|---|---|---|
signal | logs, metrics, traces | The telemetry signal type to simulate against |
condition | Expression string | The filter condition to evaluate |
Condition Expression Syntax
The simulator supports a concise expression language:
| Operator | Example |
|---|---|
== | severity == "error" |
!= | agent_id != "web-prod-01" |
contains | body contains "healthcheck" |
> / < | duration_ms > 1000 |
>= / <= | duration_ms >= 500 |
and | severity == "info" and body contains "ping" |
Example: Drop Healthcheck Noise
{
"signal": "logs",
"condition": "severity == \"info\" and body contains \"healthcheck\""
}
Simulation Response:
{
"total_records": 48230,
"matched_records": 35682,
"saved_percentage": 74
}
74% of log volume would be dropped before reaching your observability backend — translating directly into lower ingestion and egress costs.
Example: Filter Low-Latency Spans
{
"signal": "traces",
"condition": "duration_ms < 50 and agent_id == \"web-production\""
}
This simulates removing all fast (non-interesting) spans from the web-production agent before forwarding to your trace backend.
6. Pruning & Data Lifecycle
To prevent database bloat, the Governor includes an automatic background pruner:
- Runs on a 5-minute interval (
StartSandboxPruner) - Permanently deletes records from all three tables where
timestampis older than 1 hour - Operates on both the SQLite and ClickHouse backends independently
This means the Governor is always operating on a rolling 1-hour window of telemetry — sufficient for cost analysis and simulation without accumulating unbounded data.
CollectorCtrl