Skip to main content
Version: 1.4.0

Trim First

Text Processing String Manipulation Data Cleaning

Synopsis

A text processing processor that removes a specified number of characters or predefined keywords from the beginning of strings, providing precise control over prefix removal for data cleaning and normalization tasks.

Schema

- trim_first:
field: <ident>
count: <numeric>
keywords: <string[]>
target_field: <ident>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>

Configuration

The following fields are used to define the processor:

FieldRequiredDefaultDescription
fieldY-Field containing the string(s) to process
countN-Number of characters to remove from beginning
keywordsN-Keywords to remove from beginning
target_fieldNfieldField to store the trimmed result
descriptionN-Explanatory note
ifN-Condition to run
ignore_failureNfalseContinue if trimming fails
ignore_missingNfalseContinue if source field doesn't exist
on_failureN-See Handling Failures
on_successN-See Handling Success
tagN-Identifier

Details

The processor supports two trimming modes: character count-based trimming and keyword-based trimming. Both modes can be used together, with character trimming applied first followed by keyword trimming.

note

The processor supports both single strings and string arrays, applying the trimming operation to each string element.

Character count trimming removes the specified number of characters from the start of each string. If the count exceeds the string length, the entire string is removed, resulting in an empty string.

Keyword trimming removes matching prefixes from the beginning of strings. Multiple keywords can be specified, and each is checked sequentially for prefix matches.

warning

Ensure the count parameter contains valid numeric values to avoid processing errors.

Examples

Character Count Trimming

Removing first characters from strings...

{
"log_line": "2024-01-15 ERROR: Database connection failed",
"code": "ABC123456789"
}
- trim_first:
field: log_line
count: 11
target_field: message_only
- trim_first:
field: code
count: 3
target_field: numeric_part

removes the prefixes:

{
"log_line": "2024-01-15 ERROR: Database connection failed",
"code": "ABC123456789",
"message_only": "ERROR: Database connection failed",
"numeric_part": "123456789"
}

Keyword Trimming

Removing specific keywords from beginning...

{
"error_msg": "ERROR: Failed to connect to server",
"warning_msg": "WARNING: Low disk space detected"
}
- trim_first:
field: error_msg
keywords: ["ERROR: "]
target_field: clean_error
- trim_first:
field: warning_msg
keywords: ["WARNING: "]
target_field: clean_warning

removes the log level prefixes:

{
"error_msg": "ERROR: Failed to connect to server",
"warning_msg": "WARNING: Low disk space detected",
"clean_error": "Failed to connect to server",
"clean_warning": "Low disk space detected"
}

Array Processing

Processing string arrays...

{
"file_paths": [
"/var/log/application.log",
"/var/log/system.log",
"/var/log/error.log"
]
}
- trim_first:
field: file_paths
keywords: ["/var/log/"]
target_field: file_names

extracts just the filenames:

{
"file_paths": [
"/var/log/application.log",
"/var/log/system.log",
"/var/log/error.log"
],
"file_names": [
"application.log",
"system.log",
"error.log"
]
}

Multiple Keywords

Removing various prefixes...

{
"messages": [
"INFO: System startup complete",
"DEBUG: Loading configuration",
"TRACE: Memory allocation successful"
]
}
- trim_first:
field: messages
keywords: ["INFO: ", "DEBUG: ", "TRACE: "]

removes all log level prefixes:

{
"messages": [
"System startup complete",
"Loading configuration",
"Memory allocation successful"
]
}

Combined Trimming

Using both character count and keywords...

{
"raw_data": "00012345ERROR: Processing failed"
}
- trim_first:
field: raw_data
count: 8
keywords: ["ERROR: "]
target_field: clean_message

applies both trimming methods:

{
"raw_data": "00012345ERROR: Processing failed",
"clean_message": "Processing failed"
}

Conditional Trimming

Trimming based on conditions...

{
"log_entry": "AUDIT: User login successful",
"log_level": "AUDIT"
}
- trim_first:
field: log_entry
keywords: ["AUDIT: "]
if: "log_level == 'AUDIT'"
target_field: audit_message

applies trimming when condition matches:

{
"log_entry": "AUDIT: User login successful",
"log_level": "AUDIT",
"audit_message": "User login successful"
}