Skip to main content

Datasets and Profiles

Datasets and Profiles provide reusable data collection rule templates that standardize how telemetry is collected across device fleets. Instead of configuring each device's data collection individually, you define a dataset once and assign it to multiple devices.

Definitions

A Dataset is a reusable data collection rule template that defines what data to collect and how to collect it. Each dataset specifies a collection type (Windows Event Logs, DNS Logs, etc.) and the configuration parameters for that type. Datasets can optionally reference a preprocessing pipeline for inline processing of collected data.

A Profile is a grouping layer that composes multiple datasets into a single assignable unit. Profiles allow you to bundle related collection rules and apply them to devices as a set.

Type hierarchy: Each dataset has a type (category) and a definition type (specific collector). The type groups datasets into categories — windows, wec, or linux — while the definition type identifies the exact collector implementation (e.g., windows_security_log_collector). This two-level classification drives device compatibility and determines which configuration interface is presented.

Status lifecycle: Datasets and profiles have a status of active, passive, or deleted. Active items are applied to their assigned devices. Passive items remain configured but are not actively applied. Deleted items are soft-deleted and no longer visible in the UI.

Relationship to Devices: Datasets and profiles have a many-to-many relationship with devices. A single dataset can be assigned to multiple devices, and a single device can have multiple datasets assigned to it. This eliminates repetitive per-device configuration and ensures consistent data collection across your fleet.

Processing flow context: Datasets and profiles operate at the device layer of the DataStream processing flow. They govern what data a device collects before it enters preprocessing and pipeline stages.

Provider → Device (dataset rules applied here) → Preprocessing → Pipeline → Postprocessing → Target → Consumer

Management

Deletion Constraints

A dataset cannot be deleted if it is assigned to a device or included in a profile. Likewise, a profile cannot be deleted if it is assigned to a device. Remove all associations before deleting. When a deletion is blocked, the UI displays the specific conflicting devices or profiles that must be unassigned first.

Creating a Dataset

Dataset creation uses a multi-step wizard:

Step 1 — Define Dataset

Enter the dataset name and description.

Step 2 — Configure Dataset

Configure the type-specific collection rules. The configuration interface adapts based on the dataset type. Each type also supports an optional preprocessing pipeline assignment.

Windows (compatible with Windows devices):

  • Windows Security Events (windows_security_log_collector): Event category selector with four modes — ALL, MINIMAL, COMMON, or CUSTOM. Custom mode opens an XML editor for XPath filter expressions.
  • Windows Event Logs (windows_event_log_collector): Basic mode selects predefined channels (Application, System) with severity level filters. Custom mode provides an XPath expression editor with optional DCR config import.
  • Data Collection Rule Collector (data_collection_rule_collector): Custom-only XPath editor for Data Collection Rule queries. Supports importing DCR configuration that is automatically converted to XPath format.
  • Windows Firewall Logs (windows_firewall_log_collector): Profile selection for firewall log collection — Domain, Private, and/or Public.
  • Windows DNS Logs (windows_dns_log_collector): Include/exclude filter system with configurable conditions for DNS query fields (event ID, response code, question type, IP addresses, question name).

WEC (compatible with WEC devices):

  • Windows Event Collector Subscription (windows_event_collector_subscription): Custom-only XPath editor for Event Collector Subscription queries. Shares the same XPath editing interface as Windows Event Logs custom mode but without the DCR import option.

Linux (compatible with Linux devices):

  • Linux System Events (linux_host_log_collector): File path input for the system log source.
  • Linux Audit Events (linux_audit_report_log_collector): File path input for the audit log source.
  • Linux Firewall Events (linux_firewall_log_collector): File path input for the firewall log source.
Advanced Dataset Types

The backend supports additional dataset types that are not exposed in the UI: windows_main_log_collector, windows_system_log_collector, windows_application_log_collector, windows_object_access_log_collector, and windows_security_threat_analyzer. These are used internally and may appear in API responses or configuration exports.

For file-based log collection, see File Log Datasets below.

Step 3 — Assign Devices

Select one or more devices to assign this dataset to. The device list supports multi-select with search filtering.

Step 4 — Review

Review the complete dataset configuration summary before creation. Verify assigned devices and collection rules.

Dataset Detail View

After creation, each dataset has a detail page with three tabs:

General Settings Tab

View and edit the dataset name, description, type, and status (active or passive).

Assigned Devices Tab

View and manage the list of devices assigned to this dataset. Add or remove device assignments.

Dataset Configuration Tab

View and edit the type-specific collection rules for this dataset.

Dataset Operations

  • Clone: Create a copy of an existing dataset with all its configuration. The cloned dataset requires a new name and can be modified independently.
  • Delete: Remove a dataset. A confirmation modal displays before deletion to prevent accidental removal.

Creating a Profile

Profile creation uses a multi-step wizard. Profiles are created with active status by default.

Step 1 — Define Profile

Enter the profile name and description.

Step 2 — Select Datasets

Select one or more existing datasets to include in this profile. The dataset list supports multi-select with filtering.

Step 3 — Assign Devices

Select one or more devices to assign this profile to. Device assignment is optional and can be configured later.

Step 4 — Review

Review the profile summary including selected datasets and assigned devices before creation.

Profile Detail View

The profile detail page provides access to the profile's general settings, assigned datasets, and assigned devices.

File Log Datasets

File log datasets collect lines from arbitrary log files on Linux and Windows hosts. They support glob path expansion, lookback-based backfill, multiline parsing, include/exclude filtering, character-set decoding, and per-pipeline routing. Two dataset types are available:

  • linux_file_log_collector — for Linux devices
  • windows_file_log_collector — for Windows devices

Both types share the same input schema; only the path syntax differs between platforms.

YAML-only configuration

File log datasets are configured via device YAML files under config/devices/. They are not yet exposed in the dataset creation wizard.

Device-level property

FieldRequiredTypeDefaultDescription
file_log_concurrencyNint1Maximum number of inputs processed in parallel per device.

Input properties

Each entry under inputs: defines one file log source. Inputs share the standard dataset input frame (id, name, status) plus the following properties:

FieldRequiredTypeDefaultDescription
pathYstringFile path or glob pattern. Supports wildcards (e.g. /var/log/myapp/*.log, C:\Logs\*\app-*.log).
start_dateNint300Lookback window in seconds. 0 collects the last second only; a negative value disables time-based filtering.
ignore_cacheNbooleanfalseSkip the persisted file-position cache (re-read from the start).
ignore_old_dateNbooleanfalseSkip old-date filtering.
ignore_retentionNbooleanfalseSkip retention filtering.
ignore_timeNbooleanfalseSkip time-based filtering.
date_formatNstringCustom timestamp layout using Java-style tokens (e.g. yyyy-MM-dd HH:mm:ss). Used to extract an event timestamp from each line.
line_parserNobject | stringMultiline detection rules. See Line parser.
filter_modeNstringexcludeinclude or exclude. Synonyms: inclusive/allow (include), exclusive/deny (exclude).
filter_rulesNarrayInclude/exclude rules. See Filter rules. Alias: filters.
encodingNstring | intCharacter encoding for the file. Accepts an alias (e.g. utf-8, windows-1252) or a numeric decoder ID. See Encoding aliases.
pipeline_nameNstringRoute matched lines to a specific preprocessing pipeline by name.

Line parser

line_parser controls how multi-line log entries are reassembled. It accepts either an object or a bare string shorthand.

Accepted type values:

ValueAliasesBehavior
regex1A regex pattern detects where a new entry begins.
newlinenew_line, 2Each raw line is a separate entry.
stringprefix, 3Lines beginning with a literal string mark a new entry.

Object fields:

FieldRequiredTypeDescription
typeNstring | intOne of the values above. If omitted and regex is set, regex is assumed.
regexNstringPattern used by regex, or the literal prefix used by string/prefix.
valueNstringAlias for regex.
date_basedNbooleanMerge continuation lines using date boundaries detected via date_format.
has_spaceNbooleanTreat leading whitespace on a line as a continuation of the previous entry.

A bare string at line_parser: is treated as type: regex with that pattern:

line_parser: '^\d{4}-\d{2}-\d{2}'

Filter rules

filter_rules is an array of include/exclude rules applied after line parsing. The effect of each match (keep or drop) is determined by filter_mode.

Each rule accepts:

FieldRequiredTypeDescription
typeNstring | intregex (1) or string (2). If omitted, inferred from which field is set.
regexNstringRegex pattern to match against the line.
sourceNstringWildcard pattern to match against the line.
valueNstringAlias for source.

A bare array of regex strings is also accepted and is equivalent to type: regex for each entry.

Encoding aliases

encoding accepts a numeric decoder ID or one of the following aliases. Dashes, underscores, dots, and spaces are ignored during alias lookup, so utf-8, utf_8, and UTF 8 all resolve identically.

Alias(es)ID
utf81
utf8bom2
utf16be3
utf16le4
utf16bebom5
utf16lebom6
gbk11
latin1, iso8859115
windows1250, cp125050
windows1251, cp125151
windows1252, cp125252
windows1256, cp125656

Examples

Linux application logs

Collecting a rotating application log on Linux with regex-based multiline detection and an include filter for errors and warnings...

devices:
- id: 123457
name: app-linux
type: linux
status: true
properties:
file_log_concurrency: 2
definitions:
- name: linux_file_log_collector
status: true
inputs:
- id: 1001
name: Application Logs
status: true
properties:
path: /var/log/myapp/*.log
start_date: 300
date_format: yyyy-MM-dd HH:mm:ss
line_parser:
type: regex
regex: '^\d{4}-\d{2}-\d{2}'
date_based: true
filter_mode: include
filter_rules:
- type: regex
regex: '^(ERROR|WARN)'
encoding: utf-8
pipeline_name: my-pipeline

Windows IIS logs

Collecting IIS access logs on Windows with Western European encoding, rolling over daily log files by glob...

devices:
- id: 123456
name: iis-windows
type: windows
status: true
properties:
file_log_concurrency: 1
definitions:
- name: windows_file_log_collector
status: true
inputs:
- id: 2001
name: IIS Logs
status: true
properties:
path: C:\inetpub\logs\LogFiles\W3SVC1\*.log
start_date: 300
line_parser:
type: regex
regex: '^\d{4}-\d{2}-\d{2}'
encoding: windows-1252
pipeline_name: my-pipeline

Permissions

Access to datasets and profiles is controlled by the following permission scopes:

ScopeDescription
DATASET_READView datasets and their configurations
DATASET_CREATECreate new datasets
DATASET_EDITModify existing datasets and device assignments
DATASET_DELETEDelete datasets
PROFILE_READView profiles and their configurations
PROFILE_CREATECreate new profiles
PROFILE_EDITModify existing profiles, dataset selection, and device assignments
PROFILE_DELETEDelete profiles

Device Integration

Datasets connect to devices through the Configure Data Collection workflow. When configuring a device's data collection:

  1. A selection drawer displays available datasets and profiles
  2. Select one or more datasets or profiles to assign
  3. A confirmation modal with a switch control confirms the assignment change
  4. The device begins collecting data according to the assigned dataset rules
Exclusive Assignment

A device can be assigned either datasets or a profile, not both. Assigning one type replaces any existing assignment of the other type.

Each device tracks its configuration mode (dataset or profile), determining whether it receives collection rules from individual datasets or from a profile.

Assigned datasets appear in the device's detail view under the Data Configuration tab (see Devices Management) and can be managed from either the device or dataset side of the relationship.