Version: 1.2.0

Configuration: Overview

In order to create its telemetry pipelines, DataStream uses five key components: Devices, Targets, Pipelines, Processors, and Routes. Configuration involves handling and managing text-based structured files (YAML) that specify values for various settings required for running these components.

The various stages where these components are used and how they connect to each other can be described schematically as:

Ingest [Source₁, Source₂, …, Source_n]
↓
Preprocess (Normalize) ↦ Route [Enrich ∘ Transform ∘ Select] ↦ Postprocess (Normalize)
↓
Forward [Destination₁, Destination₂, …, Destination_n]

Schematically:

To ingest data from Sources and to communicate with them, DataStream uses Devices which are listeners. Each dedicated device type is conformant with the layout of a specific log data generator.
For the Preprocessing, Routing, and Postprocessing stages, DataStream uses Pipelines which are sequential arrangements of Processors, i.e. functions that handle and transform data stored in various fields for a wide variety of purposes. (They also help normalize data for the purposes of either transformation or enrichment as well as storage.)
To forward processed data to Destinations and to communicate with them, DataStream uses Targets which are senders. Each dedicated target type is conformant with the layout of a specific log data receiver.

By using these dedicated components, you can design powerful and efficient telemetry systems very elegantly. You only need to understand how they work, how they interact, and how they can be configured and combined to achieve your objectives.

Directory Tree

VirtualMetric's installation folder contains the following directory tree:

The default installation includes the following key directories and configuration files:

Director/ - Director application installation
- Director/config/ - Main configuration directory containing user-customizable settings
  - Director/config/devices/ - Input source configurations (syslog, kafka, http, etc.)
  - Director/config/routes/ - Data routing and flow control definitions
  - Director/config/targets/ - Output destination configurations (elasticsearch, sentinel, etc.)
  - Director/config/vmetric.yml - Primary system configuration file
  - Director/config/Examples/from-syslog-tcp.yml - Sample configuration demonstrating basic setup
- Director/package/ - System-provided templates and resources (read-only)
  - Director/package/database/ - GeoIP databases for geographic enrichment
  - Director/package/definitions/ - Pre-built pipeline and module definitions
  - Director/package/lookups/ - Reference tables for protocol, application, and vendor identification
  - Director/package/mibs/ - SNMP MIB files for network device monitoring
  - Director/package/agent/ - Cross-platform agent builds
- Director/storage/ - Director runtime data and system state
- Director/cert.pem and Director/key.pem - TLS certificates for secure communications
Agent/ - Agent application installation
- Agent/config/ - Agent-specific configuration
  - Agent/config/vmetric.yml - Director connection configuration (text file)
  - Agent/config/vmetric.vmf - Device configuration received from Director (binary file)
- Agent/storage/ - Agent runtime data

All configuration files needed to define and run telemetry streams can be found—and are placed in—specific folders under this tree.

YAML Files

DataStream uses YAML-based configuration files to implement the logic of telemetry pipeline components. These human-readable structured files define device configurations, processing pipelines, routing rules, and output targets.

Configuration Organization

Configuration files are organized in three main directories under config/:

devices/ - Input source configurations for data ingestion (syslog, kafka, http, netflow, etc.)
routes/ - Data routing and conditional flow control between devices, pipelines, and targets
targets/ - Output destination configurations (elasticsearch, microsoft sentinel, splunk, etc.)

Each directory can contain multiple YAML files organized according to your preferred structure - either grouped by function, environment, or kept as individual files per component.

Package vs User Configurations

DataStream separates system-provided and user-customizable configurations:

package/ - Contains system-provided templates and definitions that are updated with new software versions
user/ - Contains custom configurations that take precedence over package definitions and are preserved during updates

warning

Never modify files under package/ directly. Copy files to the corresponding user/ location before customizing them.

File Discovery and Organization

Files can be organized by purpose, environment, or data stream type for better maintainability.

tip

Configuration files can be placed anywhere within the config/ directory structure. Director discovers all YAML files by recursively scanning subdirectories.

To illustrate, a target configuration file can be named as, e.g.

PowerShell
Bash

<vm_root>\Director\config\target.yml

-or-

<vm_root>\Director\config\targets\outputs.yml

-or-

<vm_root>\Director\config\targets\outputs\sentinel.yml

<vm_root>/Director/config/target.yml

-or-

<vm_root>/Director/config/targets/outputs.yml

-or-

<vm_root>/Director/config/targets/outputs/sentinel.yml

As the nesting level increases, the file names become more specific, offering additional context for classification.

Select the organizational style that best suits your needs.

General Format

All components follow a consistent YAML structure that emphasizes readability and maintainability:

All components have properties—i.e. YAML fields with parameters—that define their specific behavior. Some of these are mandatory and must be present in every component.

info

Configurations must conform to these syntactic rules.

Commonly Used Fields

A few are worth mentioning since they are frequently used:

Identification - Every component requires a unique identification, indicated by:
- id and name on devices. This will be e numeric value
- name on all others: this is a human-readable value
Status Control - Components can be enabled or disabled by setting the status field.
Environment Variables - Sensitive information such as ${PASSWORD} can be stored in system variables:
Tagging - Optionally, components can have descriptions in their tag fields to document their purposes for better organization and facilitating searching.

tip

The Overview sections of the component types summarize their common fields under the Configuration heading.

The Schema paragraphs of specific components provide their fields.

Scoping and Referencing

Field names within the same YAML file are all part of the same scope, and they can be referred to from other fields. This can be done as in two ways:

Directly - with plain syntax:

processors:
  - json:
      field: user_name
      ...
  - set:
      field: display
      value: user_name

Indirectly -
- with the so-called mustache syntax:
```
processors:
  - json:
      field: user_name
      ...
  - set:
      field: display
      value: "Sent by {{user_name}}"
```
  caution
  The double mustache operators do not escape HTML entities. A third pair have to be added for that purpose.
  Assume, for example, that we have field a named menu which contains the text "Users > Workspace → Book©" in encoded form. In that case, {{menu}} returns "Users > Workspace → Book©" whereas {{{menu}}} returns "Users > Workspace → Book©".
  This may be relevant due to security considerations.
- with the so-called dot notation:
```
processors:
  - json:
      field: user
      ...
  - set:
      field: display
      value: user.name
```

Meta Fields

DataStream uses the following meta fields to carry out some of its pipeline marshalling operations in the background.

`_ingest` Field

The _ingest field serves as a temporary internal namespace for data processing operations within the pipeline. It contains ephemeral metadata that facilitates processor operations and is automatically cleaned up after processing.

Key Subfields:

_ingest._key - Current iteration key during foreach processing (array index or map key)
_ingest._value - Current iteration value during foreach processing
_ingest._message - Complete JSON representation during enrich operations

Usage:

Fields are populated only during specific processor operations (foreach, enrich, error handling)
Automatically deleted after processing completes
Used for configuration evaluation and error metadata capture
Fields may be null or undefined (?) when not actively being used by a processor

`_vmetric` Field

The _vmetric field serves as a system-level metadata container for VirtualMetric-specific operational data.

Contents:

Device identification and metadata (ID, name, type, tags)
Processing pipeline information and configuration
System service context and ingestion channel details

Usage:

Persists throughout the processing lifecycle for consistent system context
Integrates with field manipulation system supporting dot notation access
Follows similar nullability patterns to _ingest, where subfields may be undefined (?) when not actively populated by system operations

Data Flow

DataStream implements a modular architecture of components working together to create complete data flows. The following example illustrates a security monitoring flow:

Example: Assume we have a route named critical_security defined like so:

routes:
  - name: critical_security
    devices:
      - name: firewall_logs
    if: "event.severity == 'critical'"
    pipelines:
      - name: security_enrichment
    targets:
      - name: security_elasticsearch
      - name: security_team_notification

This route refers to:

a device named firewall_logs:

devices:
  - id: 1
    name: firewall_logs
    type: syslog
    properties:
      port: 514

a pipeline named security_enrichment:

pipelines:
  - name: security_enrichment
    processors:
      - grok:
          field: message
          patterns:
            - "%{CISCOFW106001}"
      - set:
          field: event.category
          value: security
      - geoip:
          field: source.ip
          target_field: source.geo

two targets named security_elasticsearch and security_team_notification respectively:

targets:
  - name: security_elasticsearch
    type: elasticsearch
    properties:
      url: "https://es.example.com:9200"
      index: "security-%{+yyyy.MM.dd}"
  - name: security_team_notification
    type: webhook
    properties:
      url: "https://alerts.example.com/security"
      method: POST

This configuration is intended to implement the following:

the device collects firewall logs from Syslog that have critical status
the pipeline selects (via the grok processor) Cisco events, categorizes them as security events, and enriches them by adding their source IP and geographic location
the targets forward the events curated to Elasticsearch and a notification system for a security team

tip

Refer to component-specific documentation for the details of available options.

Directory Tree​

YAML Files​

Configuration Organization​

Package vs User Configurations​

File Discovery and Organization​

General Format​

Commonly Used Fields​

Scoping and Referencing​

Meta Fields​

_ingest Field​

_vmetric Field​

Data Flow​

Directory Tree

YAML Files

Configuration Organization

Package vs User Configurations

File Discovery and Organization

General Format

Commonly Used Fields

Scoping and Referencing

Meta Fields

`_ingest` Field

`_vmetric` Field

Data Flow