Skip to main content

Pattern

Parse

Synopsis

Extracts structured patterns from log messages, identifying key components and normalizing variable content.

Schema

pattern:
- field: <ident>
- target_field: <ident>
- description: <text>
- if: <script>
- custom_patterns: <map[string]string>
- tokenize_all: <boolean>
- ignore_failure: <boolean>
- ignore_missing: <boolean>
- on_failure: <processor[]>
- on_success: <processor[]>
- tag: <string>

Configuration

FieldRequiredDefaultDescription
fieldY-Field containing the message to analyze
target_fieldNfieldField to store pattern information
descriptionN-Documentation note
custom_patternsN-Map of custom regex patterns to use
tokenize_allNfalseUse all built-in patterns for tokenization
ifN-Condition to run
ignore_failureNfalseSee Handling Failures
ignore_missingNfalseIf true, skip if field doesn't exist
on_failureN-See Handling Failures
on_successN-See Handling Success
tagN-Identifier

Details

The processor extracts a normalized representation of log messages by first removing numeric or other identifiable values, and then replacing the standard patterns with tokens in order to generate a consistent pattern representation. It uses the following patterns:

EMAIL
Email address format
IP
IP address detection
NUMBER
Numeric values
PATH
File system paths
TIMESTAMP
Date and time formats
URL
Web URLs

For each processed message, the processor generates a pattern string, a unique hash, and an identifier. It limits the pattern to 100 words, and requires words to be at least 2 characters long.

warning

The processor may modify the input message to create a generalized pattern.

Long messages are truncated, and specific identifiers are replaced with generic tokens.

Examples

Basic

Extracting a pattern from a log message...

{
"message": "2019-07-24 12:06:21,688 package.name [DEBUG] got 10 things in 3.1s"
}
pattern:
- field: message
- target_field: log_pattern

creates a normalized one:

{
"message": "2019-07-24 12:06:21,688 package.name [DEBUG] got 10 things in 3.1s",
"log_pattern": {
"pattern": "package.name got things in",
"hash": "...",
"id": "..."
}
}

Custom

Adding a custom pattern for error codes...

{
"message": "Application error ERR1234: connection timeout"
}
pattern:
- field: message
- target_field: log_pattern
- custom_patterns:
ERROR_CODE: "ERR\\d{4}"

applies it:

{
"message": "Application error ERR1234: connection timeout",
"log_pattern": {
"pattern": "Application error <ERROR_CODE> connection timeout",
"hash": "...",
"id": "..."
}
}

Tokenizing

Using all built-in patterns...

{
"message": "User admin@example.com logged in from 192.168.1.100"
}
pattern:
- field: message
- target_field: log_pattern
- tokenize_all: true

replaces all the known patterns:

{
"message": "User admin@example.com logged in from 192.168.1.100",
"log_pattern": {
"pattern": "User <EMAIL> logged in from <IP>",
"hash": "...",
"id": "..."
}
}

Error Handling

Handling non-string inputs...

{
"message": 12345
}
pattern:
- field: message
- target_field: log_pattern
- ignore_failure: true
- on_failure:
- append:
field: tags
value: pattern_parse_error

adds an error tag and continues execution:

{
"message": 12345,
"tags": ["pattern_parse_error"]
}