Beta v0.2.0🚀 CollectorCtrl Beta v0.2.0 for Windows & Linux is Live⊞ Windows🐧 LinuxDownload on GitHub →

Telemetry Governor

The Telemetry Governor is CollectorCtrl's centralized data plane profiler. It lets you tap a configurable percentage of live telemetry from any connected agent, inspect traffic patterns, and run What-If simulations to predict pipeline savings — all without touching your production data path.

Telemetry Governor Data Flow


1. How Agent Tapping Works

Tapping creates a mirror of live telemetry. The Supervisor on each target agent clones and forwards only the configured percentage of signals to the Governor's built-in OTLP receiver:

Agent Tapping Data Flow

Enabling a Tap

  1. Navigate to Telemetry Governor in the sidebar.
  2. Locate the target agent in the list.
  3. Click Toggle Tap — this calls POST /api/telemetry_governor/taps/toggle with the agent ID.
  4. Set the Sampling Rate (e.g., 5.0%). The agent will clone that percentage of logs, metrics, and spans and forward them over OTLP/HTTP.

Key principle: The production pipeline remains completely unaffected. Tapping operates on cloned data only.


2. Ingestion Storage Architecture

Tapped telemetry lands in one of two storage backends depending on your deployment configuration:

SQLite (Default — Zero Configuration)

Best suited for development, small fleets, or single-server deployments.

PropertyDetail
Setup requiredNone — automatic
Tablestelemetry_governor_logs, telemetry_governor_metrics, telemetry_governor_spans
RetentionPruned automatically every 5 minutes — records older than 1 hour are deleted
LimitationNot recommended for high-volume profiling due to SQLite write-lock contention

ClickHouse (Enterprise — High-Volume)

For production profiling of large fleets, CollectorCtrl integrates natively with ClickHouse, a columnar analytics database optimised for time-series log data.

PropertyDetail
Setup requiredProvide ClickHouse connection parameters via environment variables
PerformanceSub-second queries over billions of log rows
Auto-detectionCollectorCtrl automatically switches to ClickHouse when a valid connection is detected (IsClickHouseConnected())
FallbackIf ClickHouse is unreachable, the server transparently falls back to SQLite

To enable ClickHouse mode, configure the following environment variables on the Management Server:

.env / Environment Variables
1
2
3
4
COLLECTORCTRL_CLICKHOUSE_ADDR=your-clickhouse-host:9000COLLECTORCTRL_CLICKHOUSE_DB=telemetry_governorCOLLECTORCTRL_CLICKHOUSE_USER=defaultCOLLECTORCTRL_CLICKHOUSE_PASSWORD=your_secure_password

3. Deploying the Telemetry Governor Engine

The Telemetry Governor Engine runs as a decoupled data plane containing ClickHouse and an OpenTelemetry Collector.

Step 1: Deploy on Linux Server (Docker Compose)

Log into your target Linux VM, create a directory, and run the unified installer script:

bash
$mkdir telemetry-governor && cd telemetry-governor
$curl -fsSL https://raw.githubusercontent.com/CollectorCtrl/CollectorCtrl/main/deploy/telemetry-governor/linux/install.sh -o install.sh
$chmod +x install.sh
$./install.sh

[!NOTE] What the installer does: Spins up the ClickHouse and OTel Collector containers, health checks ClickHouse local TCP port, and automatically provisions the required log/span database tables (otel_logs and otel_traces) using the embedded SQL schema.

Step 2: Configure Network & Firewalls

Ensure your VM security groups allow inbound traffic on the following ports:

  • 9000 (TCP): ClickHouse native TCP port (required by the Management Server to query volume/templates).
  • 4317 (gRPC) / 4318 (HTTP): OTel Collector ingestion endpoints (required by target Supervisors to forward tapped telemetry streams).

Step 3: Connect Management Server

Configure the Management Server to query ClickHouse by setting the registry environment variables on the Windows host and restarting the service:

PowerShell
# Run as Administrator to write to system registry
PS >$serviceKey = "HKLM:\System\CurrentControlSet\Services\CollectorCtrl"
PS >$currentEnv = (Get-ItemProperty -Path $serviceKey).Environment
PS >$newEnv = $currentEnv + "COLLECTORCTRL_CLICKHOUSE_ADDR=YOUR_VM_IP:9000" + "COLLECTORCTRL_CLICKHOUSE_PASSWORD=telemetrypassword"
PS >Set-ItemProperty -Path $serviceKey -Name "Environment" -Value $newEnv -Type MultiString
# Restart the service
PS >Restart-Service -Name "CollectorCtrl"

4. Volume Dashboard

The Volume view gives you an at-a-glance breakdown of tapped telemetry currently stored in the Governor:

  • Total Logs / Spans — aggregate signal counts
  • Logs by SeverityINFO, WARN, ERROR, DEBUG distribution
  • Spans by Agent — which agents are generating the most trace traffic
  • Top Metrics by Name — cardinality leaders from your metric streams

This view is served by GET /api/telemetry_governor/volume and automatically queries ClickHouse first, falling back to SQLite if needed.


4. Template Discovery via Governor

While viewing tapped log data, the Governor also powers Template Discovery — a pattern-recognition engine that groups similar log lines into reusable templates using token-frequency analysis.

Navigate to the Templates sub-tab within the Telemetry Governor to:

  • Browse automatically extracted log patterns
  • Identify which templates consume the most volume
  • Promote high-signal templates into the Semantic Mapping Registry

5. What-If Simulation Engine

The simulation engine evaluates filter conditions over your tapped telemetry — giving you a precise prediction of how much data (and cost) you would eliminate by applying a specific OTTL filter before it reaches your backend.

Running a Simulation

Use POST /api/telemetry_governor/simulate with the following JSON payload:

{
  "signal": "logs",
  "condition": "severity == \"info\" and body contains \"healthcheck\""
}
FieldValuesDescription
signallogs, metrics, tracesThe telemetry signal type to simulate against
conditionExpression stringThe filter condition to evaluate

Condition Expression Syntax

The simulator supports a concise expression language:

OperatorExample
==severity == "error"
!=agent_id != "web-prod-01"
containsbody contains "healthcheck"
> / <duration_ms > 1000
>= / <=duration_ms >= 500
andseverity == "info" and body contains "ping"

Example: Drop Healthcheck Noise

{
  "signal": "logs",
  "condition": "severity == \"info\" and body contains \"healthcheck\""
}

Simulation Response:

{
  "total_records": 48230,
  "matched_records": 35682,
  "saved_percentage": 74
}

74% of log volume would be dropped before reaching your observability backend — translating directly into lower ingestion and egress costs.

Example: Filter Low-Latency Spans

{
  "signal": "traces",
  "condition": "duration_ms < 50 and agent_id == \"web-production\""
}

This simulates removing all fast (non-interesting) spans from the web-production agent before forwarding to your trace backend.


6. Pruning & Data Lifecycle

To prevent database bloat, the Governor includes an automatic background pruner:

  • Runs on a 5-minute interval (StartSandboxPruner)
  • Permanently deletes records from all three tables where timestamp is older than 1 hour
  • Operates on both the SQLite and ClickHouse backends independently

This means the Governor is always operating on a rolling 1-hour window of telemetry — sufficient for cost analysis and simulation without accumulating unbounded data.