Configuration: Overview
In order to create its telemetry pipelines, DataStream uses five key components: Devices, Targets, Pipelines, Processors, and Routes. Configuration involves handling and managing text-based structured files (YAML) that specify values for various settings required for running these components.
The various stages where these components are used and how they connect to each other can be described schematically as:
Ingest [Source1, Source2, …, Sourcen]
↓
Preprocess (Normalize) ↦ Route [Enrich ∘ Transform ∘ Select] ↦ Postprocess (Normalize)
↓
Forward [Destination1, Destination2, …, Destinationn]
Schematically:
-
To ingest data from Sources and to communicate with them, DataStream uses Devices which are listeners. Each dedicated device type is conformant with the layout of a specific log data generator.
-
For the Preprocessing, Routing, and Postprocessing stages, DataStream uses Pipelines which are sequential arrangements of Processors, i.e. functions that handle and transform data stored in various fields for a wide variety of purposes. (They also help normalize data for the purposes of either transformation or enrichment as well as storage.)
-
To forward processed data to Destinations and to communicate with them, DataStream uses Targets which are senders. Each dedicated target type is conformant with the layout of a specific log data receiver.
By using these dedicated components, you can design powerful and efficient telemetry systems very elegantly. You only need to understand how they work, how they interact, and how they can be configured and combined to achieve your objectives.
Directory Tree
VirtualMetric's installation folder contains the following directory tree:
The default installation includes the following key directories and configuration files:
Director/
- Director application installationDirector/config/
- Main configuration directory containing user-customizable settingsDirector/config/devices/
- Input source configurations (syslog, kafka, http, etc.)Director/config/routes/
- Data routing and flow control definitionsDirector/config/targets/
- Output destination configurations (elasticsearch, sentinel, etc.)Director/config/vmetric.yml
- Primary system configuration fileDirector/config/Examples/from-syslog-tcp.yml
- Sample configuration demonstrating basic setup
Director/package/
- System-provided templates and resources (read-only)Director/package/database/
- GeoIP databases for geographic enrichmentDirector/package/definitions/
- Pre-built pipeline and module definitionsDirector/package/lookups/
- Reference tables for protocol, application, and vendor identificationDirector/package/mibs/
- SNMP MIB files for network device monitoringDirector/package/agent/
- Cross-platform agent builds
Director/storage/
- Director runtime data and system stateDirector/cert.pem
andDirector/key.pem
- TLS certificates for secure communications
Agent/
- Agent application installationAgent/config/
- Agent-specific configurationAgent/config/vmetric.yml
- Director connection configuration (text file)Agent/config/vmetric.vmf
- Device configuration received from Director (binary file)
Agent/storage/
- Agent runtime data
All configuration files needed to define and run telemetry streams can be found—and are placed in—specific folders under this tree.
YAML Files
DataStream uses YAML-based configuration files to implement the logic of telemetry pipeline components. These human-readable structured files define device configurations, processing pipelines, routing rules, and output targets.
Configuration Organization
Configuration files are organized in three main directories under config/
:
devices/
- Input source configurations for data ingestion (syslog, kafka, http, netflow, etc.)routes/
- Data routing and conditional flow control between devices, pipelines, and targetstargets/
- Output destination configurations (elasticsearch, azure sentinel, splunk, etc.)
Each directory can contain multiple YAML files organized according to your preferred structure - either grouped by function, environment, or kept as individual files per component.
Package vs User Configurations
DataStream separates system-provided and user-customizable configurations:
package/
- Contains system-provided templates and definitions that are updated with new software versionsuser/
- Contains custom configurations that take precedence over package definitions and are preserved during updates
Never modify files under package/
directly. Copy files to the corresponding user/
location before customizing them.
File Discovery and Organization
Files can be organized by purpose, environment, or data stream type for better maintainability.
Configuration files can be placed anywhere within the config/
directory structure. Director discovers all YAML files by recursively scanning subdirectories.
To illustrate, a target configuration file can be named as, e.g.
- PowerShell
- Bash
<vm_root>\Director\config\target.yml
-or-
<vm_root>\Director\config\targets\outputs.yml
-or-
<vm_root>\Director\config\targets\outputs\sentinel.yml
<vm_root>/Director/config/target.yml
-or-
<vm_root>/Director/config/targets/outputs.yml
-or-
<vm_root>/Director/config/targets/outputs/sentinel.yml
As the nesting level increases, the file names become more specific, offering additional context for classification.
Select the organizational style that best suits your needs.
General Format
All components follow a consistent YAML structure that emphasizes readability and maintainability:
All components have properties—i.e. YAML fields with parameters—that define their specific behavior. Some of these are mandatory and must be present in every component.
Configurations must conform to these syntactic rules.
Commonly Used Fields
A few are worth mentioning since they are frequently used:
-
Identification - Every component requires a unique identification, indicated by:
id
andname
on devices. This will be e numeric valuename
on all others: this is a human-readable value
-
Status Control - Components can be enabled or disabled by setting the
status
field. -
Environment Variables - Sensitive information such as
${PASSWORD}
can be stored in system variables: -
Tagging - Optionally, components can have descriptions in their
tag
fields to document their purposes for better organization and facilitating searching.
The Overview sections of the component types summarize their common fields under the Configuration heading.
The Schema paragraphs of specific components provide their fields.
Scoping and Referencing
Field names within the same YAML file are all part of the same scope, and they can be referred to from other fields. This can be done as in two ways:
-
Directly - with plain syntax:
processors:
- json:
field: user_name
...
- set:
field: display
value: user_name -
Indirectly -
-
with the so-called mustache syntax:
processors:
- json:
field: user_name
...
- set:
field: display
value: "Sent by {{user_name}}"cautionThe double mustache operators do not escape HTML entities. A third pair have to be added for that purpose.
Assume, for example, that we have field a named
menu
which contains the text "Users > Workspace → Book©" in encoded form. In that case,{{menu}}
returns "Users > Workspace → Book©" whereas{{{menu}}}
returns "Users > Workspace → Book©".This may be relevant due to security considerations.
-
with the so-called dot notation:
processors:
- json:
field: user
...
- set:
field: display
value: user.name
-
Meta Fields
DataStream uses the following meta fields to carry out some of its pipeline marshalling operations in the background.
_ingest
Field
The _ingest
field serves as a temporary internal namespace for data processing operations within the pipeline. It contains ephemeral metadata that facilitates processor operations and is automatically cleaned up after processing.
Key Subfields:
_ingest._key
- Current iteration key duringforeach
processing (array index or map key)_ingest._value
- Current iteration value duringforeach
processing_ingest._message
- Complete JSON representation duringenrich
operations
Usage:
- Fields are populated only during specific processor operations (
foreach
,enrich
, error handling) - Automatically deleted after processing completes
- Used for configuration evaluation and error metadata capture
- Fields may be
null
or undefined (?
) when not actively being used by a processor
_vmetric
Field
The _vmetric
field serves as a system-level metadata container for VirtualMetric-specific operational data.
Contents:
- Device identification and metadata (ID, name, type, tags)
- Processing pipeline information and configuration
- System service context and ingestion channel details
Usage:
- Persists throughout the processing lifecycle for consistent system context
- Integrates with field manipulation system supporting dot notation access
- Follows similar nullability patterns to
_ingest
, where subfields may be undefined (?
) when not actively populated by system operations
Data Flow
DataStream implements a modular architecture of components working together to create complete data flows. The following example illustrates a security monitoring flow:
Example: Assume we have a route named critical_security
defined like so:
routes:
- name: critical_security
devices:
- name: firewall_logs
if: "event.severity == 'critical'"
pipelines:
- name: security_enrichment
targets:
- name: security_elasticsearch
- name: security_team_notification
This route refers to:
-
a device named
firewall_logs
:devices:
- id: 1
name: firewall_logs
type: syslog
properties:
port: 514 -
a pipeline named
security_enrichment
:pipelines:
- name: security_enrichment
processors:
- grok:
field: message
patterns:
- "%{CISCOFW106001}"
- set:
field: event.category
value: security
- geoip:
field: source.ip
target_field: source.geo -
two targets named
security_elasticsearch
andsecurity_team_notification
respectively:targets:
- name: security_elasticsearch
type: elasticsearch
properties:
url: "https://es.example.com:9200"
index: "security-%{+yyyy.MM.dd}"
- name: security_team_notification
type: webhook
properties:
url: "https://alerts.example.com/security"
method: POST
This configuration is intended to implement the following:
-
the device collects firewall logs from Syslog that have critical status
-
the pipeline selects (via the
grok
processor) Cisco events, categorizes them as security events, and enriches them by adding their source IP and geographic location -
the targets forward the events curated to Elasticsearch and a notification system for a security team
Refer to component-specific documentation for the details of available options.