Beta v0.2.0🚀 CollectorCtrl Beta v0.2.0 for Windows & Linux is Live⊞ Windows🐧 LinuxDownload on GitHub →

Fleet Orchestration & Policy Management

Orchestrating telemetry pipelines across a diverse fleet of thousands of servers requires a policy-driven engine. CollectorCtrl eliminates manual configuration tasks by replacing them with Fleet Policies, target selectors, and progressive rollout workflows.


1. Fleet Policies & Selection Criteria

A Fleet Policy defines a unified OpenTelemetry configuration baseline that is automatically applied to a group of collectors matching specific criteria. Rather than statically linking collectors to files, CollectorCtrl uses label-based matching.

Fleet Policies Management

Kubernetes-Style Target Rules

Policies target agents using dynamic, label-based queries. You can combine exact labels and logical expressions to build targeted rules:

  1. matchLabels: Key-value pairs that must match exactly.
    • Example: deployment.environment: "production", region: "us-east"
  2. matchExpressions: Advanced operators for complex queries. Supporting:
    • In / NotIn: Value must exist inside or outside a list.
    • Exists / DoesNotExist: Checks for the presence of a tag.
    • Equals / NotEquals: Basic comparison operations.
{
  "matchLabels": {
    "app.class": "databases"
  },
  "matchExpressions": [
    {
      "key": "host.os",
      "operator": "In",
      "values": ["windows", "linux"]
    },
    {
      "key": "compliance.pci",
      "operator": "Exists"
    }
  ]
}

Policy Ordering & Priority Resolution

An agent can match multiple policies simultaneously. CollectorCtrl resolves this by evaluating policies based on Priority (ordered descending):

  • High-priority templates (e.g. PCI audit rules) are evaluated first.
  • The system merges matching policies sequentially, with higher priority settings overriding lower ones.
  • The combined result is merged with the node's Override Config to compile the final Effective Config.

2. Progressive Rollouts & Canaries

To prevent widespread downtime, CollectorCtrl implements progressive release strategies for policy deployment.

  • Canary Ring Deployments: Specify a target rollout percentage (e.g. deploy to 10% of matching nodes). The server selects a random canary ring of agents matching your policy rules.
  • Health Check Monitoring: The system tracks the status of nodes in the canary group. If a supervisor reports a configuration parsing error (config_error) or if a node's health status drops, the rollout pauses automatically.
  • Approval Gates: Administrators can review canary health metrics, then choose to Promote the policy to 100% of the fleet or Abort to revert all affected nodes to their last working state.

3. Version History & Policy Snapshots

Every policy modification and publish action creates an immutable Policy Snapshot in the system database.

  • Audit Trail: Each snapshot logs the user ID, timestamp, target selectors, and the full YAML config state.
  • Visual Diff Tool: View color-coded line diffs comparing changes between the active version and historical versions.
  • Instant Rollbacks: If a telemetry pipeline issue is detected, select a previous policy snapshot and click Revert. The server instantly pushes the historical configuration hash to all matching supervisors.

4. YAML Merge Logic & Pipeline Alignment

When merging global policy templates with local overrides, CollectorCtrl uses a deep merge processor that prevents common OTel errors:

Pipeline Key Alignment (AlignPolicyConfig)

In the standard OTel Collector, naming pipelines can be error-prone (e.g. a policy defines metrics but a node defines metrics/local). The compiler automatically maps these elements to prevent duplicate key conflicts or validation crashes:

[Policy Template]            [Local Override]            [Compiled Result]
metrics                      metrics/local               metrics/local
  receivers: [otlp]     +      processors: [batch]  =>     receivers: [otlp]
                                                           processors: [batch]

List Overwrite Behavior

To allow policies to cleanly add or remove receivers, YAML lists (such as receivers: [otlp, hostmetrics]) are completely overwritten by overrides during a merge, rather than appended.

Policy Template:

service:
  pipelines:
    metrics:
      receivers: [otlp, hostmetrics]

Override Configuration:

service:
  pipelines:
    metrics:
      receivers: [otlp] # Completely replaces the base list