Skip to main content
Version: 1.4.0

Sample

Synopsis

Reduces data volume by sampling log entries based on configurable rules.

Schema

- sample:
rules: <rule[]>
exclude_filters: <string[]>
tag: <string>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>

Configuration

The following fields are used to define the processor:

FieldRequiredDefaultDescription
rulesN-List of sampling rules with filters and rates
exclude_filtersN-List of conditions to exclude events from sampling
descriptionN-Explanatory note
ifN-Condition to run
ignore_failureNfalseContinue processing if sampling fails
ignore_missingNfalseSkip if referenced fields don't exist
on_failureN-Error handling processors
on_successN-Success handling processors
tagN-Identifier

Sampling Rule Object

Each rule in the rules array is an object with the following properties:

FieldRequiredDefaultDescription
filterY-Condition that determines which events this rule applies to
sampling_rateY2Keep 1 event for every N events (integer or string template)

Details

Reduces data volume by implementing systematic sampling of log entries based on configurable rules. The processor keeps one event for every N matching events, where N is the specified sampling rate, allowing for efficient data reduction while maintaining statistical representation.

note

The processor adds a metadata field _vmetric.sampled to sampled events, showing the current sampling rate (e.g., "10:1"). This information is useful for adjusting statistics during analysis to account for sampling.

Rule-based sampling provides fine-grained control over which types of events are sampled and at what rates. This helps balance data volume with analytical needs by keeping all critical events while sampling high-volume, routine events.

warning

Sampling inherently discards data, so use with caution for critical events. Always use exclude_filters to preserve important events like errors, alerts, or security incidents that require 100% preservation regardless of volume.

Examples

Basic

Applying simple 1:10 sampling...

- sample:
rules:
- filter: "true"
sampling_rate: 10

keeps every 10th event, reducing volume by 90%

Conditional

Applying different sampling rates based on log level...

- sample:
rules:
- filter: "logEntry.log.level == 'debug'"
sampling_rate: 100
- filter: "logEntry.log.level == 'info'"
sampling_rate: 10
- filter: "logEntry.log.level == 'warning'"
sampling_rate: 2
exclude_filters:
- "logEntry.log.level == 'error'"
- "logEntry.log.level == 'critical'"

keeps all error/critical logs, 50% of warnings, 10% of info, and 1% of debug logs

Dynamic

Using field values to determine sampling rate...

- sample:
rules:
- filter: "hasField(logEntry, 'http.response.status_code')"
sampling_rate: "{{config.sampling_rates.http}}"
- filter: "hasField(logEntry, 'database.query')"
sampling_rate: "{{config.sampling_rates.db}}"

applies configurable rates from system settings

Service-Based

Sampling differently based on service...

- sample:
rules:
- filter: "logEntry.service.name == 'frontend'"
sampling_rate: 5
- filter: "logEntry.service.name == 'backend-api'"
sampling_rate: 20
- filter: "logEntry.service.name == 'auth-service'"
sampling_rate: 2
exclude_filters:
- "hasField(logEntry, 'error.message')"
- "logEntry.event.outcome == 'failure'"

tailors sampling based on service characteristics while preserving error data

Complex

Comprehensive sampling keeps relevant data while significantly reducing traffic

- sample:
if: "logEntry.environment == 'production'"
rules:
- filter: "logEntry.http.request.method == 'GET' && logEntry.http.response.status_code >= 200 && logEntry.http.response.status_code < 300"
sampling_rate: 50
- filter: "logEntry.http.request.method == 'GET' && logEntry.http.response.status_code >= 400 && logEntry.http.response.status_code < 500"
sampling_rate: 5
- filter: "logEntry.event.category == 'database' && logEntry.event.duration < 100"
sampling_rate: 20
exclude_filters:
- "logEntry.http.response.status_code >= 500"
- "logEntry.event.severity >= 70"
- "logEntry.http.request.method != 'GET'"
- "logEntry.event.duration >= 1000"
- "logEntry.user.id == 'admin'"