Clean
Removes unwanted characters from string fields using configurable cleaning modes and character sets.
Schema
- clean:
field: <ident>
target_field: <ident>
mode: <string>
chars: <string>
keep_chars: <string>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>
Configuration
The following fields are used to define the processor:
Field | Required | Default | Description |
---|---|---|---|
field | Y | - | Source field to clean |
target_field | N | field | Target field to store cleaned result |
mode | N | custom | Cleaning mode: alphanumeric , numeric , alpha , custom |
chars | N | Common delimiters | Characters to remove (used with custom mode) |
keep_chars | N | - | Additional characters to preserve in predefined modes |
description | N | - | Explanatory note |
if | N | - | Condition to run processor |
ignore_failure | N | false | Continue if processor fails |
ignore_missing | N | false | Continue if source field doesn't exist |
on_failure | N | - | Processors to run on failure |
on_success | N | - | Processors to run on success |
tag | N | - | Processor identifier |
Details
The processor only processes string fields and string arrays, with non-string values automatically converted to strings before processing. Each element in string arrays is processed individually, and the processor removes unwanted characters from the beginning and end of strings through efficient trimming operations.
Unicode characters are properly handled in all cleaning modes, ensuring international character support. In custom mode without specifying chars
, the processor removes common delimiters and special characters including quotes, brackets, and various punctuation marks.
The implementation uses efficient character-by-character processing suitable for high-volume log environments. This processor is essential for data sanitization, removing special characters from user input to ensure data integrity.
It excels at phone number normalization by extracting only digits from formatted phone numbers and ensures username cleaning by removing invalid characters while preserving valid ones. The processor is particularly effective for log message cleanup, removing formatting characters that may interfere with downstream processing.
It also supports identifier standardization, cleaning identifiers while allowing essential characters to be preserved through the keep_chars
configuration. These capabilities make it valuable for input validation, preparing fields for downstream processing or storage systems that require clean, standardized data.
Cleaning Modes
alphanumeric
Keeps only letters and digits, removes all other characters.
numeric
Keeps only digits (0-9), removes all other characters.
alpha
Keeps only letters (a-z, A-Z), removes all other characters.
custom
Removes characters specified in the chars
field. If chars
is not provided, removes common delimiters and quotes.
Examples
Alphanumeric
Cleaning username to keep only letters and digits... |
|
removes special characters: |
|
Numeric Extraction
Extracting only digits from formatted phone number... |
|
creates field with digits only: |
|
Custom Removal
Removing specific brackets and angle brackets from log message... |
|
removes specified characters: |
|
With Exceptions
Using alphanumeric mode while preserving specific characters... |
|
keeps allowed characters: |
|
Arrays
Cleaning each element in an array of tags... |
|
processes each array element: |
|
Default Mode
Using default custom mode without specifying chars... |
|
removes common delimiters and quotes: |
|
Email
Cleaning email address while preserving dots... |
|
removes special chars except dots: |
|