Azure Blob Storage
Synopsis
Creates a target that writes log messages to Azure Blob Storage with support for various file formats, authentication methods, and retry mechanisms. Inherits file format capabilities from the base file target.
Schema
- name: <string>
description: <string>
type: azblob
pipelines: <pipeline[]>
status: <boolean>
properties:
account: <string>
tenant_id: <string>
client_id: <string>
client_secret: <string>
container: <string>
name: <string>
format: <string>
extension: <string>
compression: <string>
schema: <string>
field_format: <string>
no_buffer: <boolean>
timeout: <numeric>
max_size: <numeric>
batch_size: <numeric>
containers:
- container: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
function_app: <string>
function_token: <string>
interval: <string|numeric>
cron: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>
Configuration
The following fields are used to define the target:
| Field | Required | Default | Description |
|---|---|---|---|
name | Y | Target name | |
description | N | - | Optional description |
type | Y | Must be azblob | |
pipelines | N | - | Optional post-processor pipelines |
status | N | true | Enable/disable the target |
Azure
| Field | Required | Default | Description |
|---|---|---|---|
account | Y | Azure storage account name | |
tenant_id | N* | - | Azure tenant ID (required unless using managed identity or function app) |
client_id | N* | - | Azure client ID (required unless using managed identity or function app) |
client_secret | N* | - | Azure client secret (required unless using managed identity or function app) |
container | N** | "vmetric" | Default container name (acts as catch-all when containers is also specified) |
* = Conditionally required. See authentication methods below.
** = Required if you want a catch-all container for unmatched events, or if not using the containers array.
Connection
| Field | Required | Default | Description |
|---|---|---|---|
timeout | N | 30 | Connection timeout in seconds |
max_size | N | 0 | Maximum file size in bytes before uploading |
batch_size | N | 100000 | Maximum number of messages per file |
When max_size is reached, the current file is uploaded to blob storage and a new file is created. For unlimited file size, set the field to 0.
Function App (Optional)
| Field | Required | Default | Description |
|---|---|---|---|
function_app | N | - | Azure Function App URL for uploading blobs |
function_token | N | - | Authentication token for the Function App |
If function_app is specified, the target will use the Function App to upload blobs instead of direct Azure Blob Storage SDK. This is useful for scenarios where direct access to storage is restricted.
Files
The following fields can be used for files:
| Field | Required | Default | Description |
|---|---|---|---|
containers | N* | - | Array of container configurations for file distribution |
containers.container | Y | - | Container name |
containers.name | Y | - | Blob name template |
containers.format | N | "json" | Output format: json, multijson, avro, parquet |
containers.compression | N | "zstd" | Compression algorithm. See Compression below |
containers.extension | N | Matches format | File extension override |
containers.schema | N** | - | Schema definition (required for Avro and Parquet formats) |
name | N | "vmetric.{{.Timestamp}}.{{.Extension}}" | Default blob name template (used with container for catch-all) |
format | N | "json" | Default output format (used with container for catch-all) |
extension | N | Matches format | Default file extension (used with container for catch-all) |
compression | N | "zstd" | Default compression algorithm (used with container for catch-all) |
schema | N | - | Default schema definition (used with container for catch-all) |
no_buffer | N | false | Disable write buffering |
field_format | N | - | Data normalization format. See applicable Normalization section |
* = Either container or containers must be specified.
** = Conditionally required for Avro and Parquet formats when using containers.
Scheduler
| Field | Required | Default | Description |
|---|---|---|---|
interval | N | realtime | Execution frequency. See Interval for details |
cron | N | - | Cron expression for scheduled execution. See Cron for details |
Debug Options
| Field | Required | Default | Description |
|---|---|---|---|
debug.status | N | false | Enable debug logging |
debug.dont_send_logs | N | false | Process logs but don't send to target (testing) |
Details
The Azure Blob Storage target supports writing to multiple containers with various file formats and schemas. The target provides enterprise-grade cloud storage integration with comprehensive file format support.
Authentication Methods
The target supports multiple authentication methods:
Service Principal Authentication: Use tenant_id, client_id, and client_secret for explicit credential-based authentication.
Managed Identity: When deployed on Azure infrastructure (VMs, App Services, Container Instances), can leverage managed identity without explicit credentials.
Function App: When function_app is specified, authentication is handled through the Function App token, and direct storage credentials are not required.
Container Routing
The target supports flexible container routing through pipeline configuration or explicit container settings:
Configuration-based routing: Define multiple containers in the target configuration, each with its own format, compression, and schema settings. Logs are routed to specific containers based on configuration.
Pipeline-based routing: Use the container field in pipeline processors to dynamically route logs to different containers at runtime. This enables conditional routing based on log content, source, or other attributes.
Catch-all routing: When a log doesn't match any specific container configuration or when no container field is set in the pipeline, logs are routed to the catch-all container (configured via the container field in target properties).
Routing priority:
- Pipeline
containerfield (highest priority) - Configured containers in
containersarray (if container name matches) - Default
containerfield (catch-all, lowest priority)
This multi-level routing enables flexible data distribution strategies, such as routing different log types to different containers based on content analysis, source system, severity level, or any other runtime decision.
File Formats
| Format | Description |
|---|---|
json | Each log entry is written as a separate JSON line (JSONL format) |
multijson | All log entries are written as a single JSON array |
avro | Apache Avro format with schema and compression support |
parquet | Apache Parquet columnar format with schema |
Compression
Some formats support built-in compression to reduce storage costs and transfer times. When supported, compression is applied at the file/block level before upload.
| Format | Default | Compression Codecs |
|---|---|---|
| JSON | - | Not supported |
| MultiJSON | - | Not supported |
| Avro | zstd | deflate, snappy, zstd |
| Parquet | zstd | gzip, snappy, zstd, brotli, lz4 |
File Management
Files are rotated based on size (max_size parameter) or event count (batch_size parameter), whichever limit is reached first. Template variables in blob names enable dynamic file naming for time-based partitioning.
Schema Support
The target supports the following built-in schema templates for structured data formats:
Syslog- Standard schema for Syslog messagesCommonSecurityLog- Schema compatible with Common Security Log Format (CSL)
You can also reference custom schema files by name (without the .json extension). The system will search for schema files in:
- User schema directory:
<user-path>/schemas/ - Package schema directory:
<package-path>/schemas/
Schema files are searched recursively in these directories, and filename matching is case-insensitive.
Templates
The following template variables can be used in blob names:
| Variable | Description | Example |
|---|---|---|
{{.Year}} | Current year | 2024 |
{{.Month}} | Current month | 01 |
{{.Day}} | Current day | 15 |
{{.Timestamp}} | Current timestamp in nanoseconds | 1703688533123456789 |
{{.Format}} | File format | json |
{{.Extension}} | File extension | json |
{{.Compression}} | Compression type | zstd |
{{.TargetName}} | Target name | my_logs |
{{.TargetType}} | Target type | azblob |
{{.Table}} | Container name | logs |
Multiple Containers
Single target can write to multiple Azure Blob Storage containers with different configurations, enabling data distribution strategies (e.g., raw data to one container, processed data to another).
Files with no messages (i.e. with counter=0) are automatically skipped during upload.
Examples
The following upload configurations are available.
Basic Configuration
The minimum configuration for a JSON blob storage:
targets:
- name: basic_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
container: "logs"
Pipeline-Based Routing
Dynamic container routing using pipeline processors to analyze log content and route to appropriate containers:
targets:
- name: smart_routing_blob
type: azblob
pipelines:
- dynamic_routing
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
containers:
- container: "security-events"
name: "security_{{.Year}}_{{.Month}}_{{.Day}}.parquet"
format: "parquet"
schema: "CommonSecurityLog"
compression: "zstd"
- container: "application-events"
name: "app_{{.Year}}_{{.Month}}_{{.Day}}.json"
format: "json"
- container: "system-events"
name: "system_{{.Year}}_{{.Month}}_{{.Day}}.avro"
format: "avro"
schema: "Syslog"
compression: "snappy"
container: "other-events"
name: "other_{{.Timestamp}}.json"
format: "json"
pipelines:
- name: dynamic_routing
processors:
- set:
field: "_vmetric.container"
value: "security-events"
if: "ctx.event_type == 'security'"
- set:
field: "_vmetric.container"
value: "application-events"
if: "ctx.event_type == 'application'"
- set:
field: "_vmetric.container"
value: "system-events"
if: "ctx.event_type == 'system'"
Multiple Containers with Catch-All
Configuration with multiple target containers and a catch-all default container:
targets:
- name: multi_container_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
containers:
- container: "security-logs"
name: "security_{{.Year}}_{{.Month}}_{{.Day}}.parquet"
format: "parquet"
schema: "CommonSecurityLog"
compression: "zstd"
- container: "system-logs"
name: "system_{{.Year}}_{{.Month}}_{{.Day}}.json"
format: "json"
- container: "application-logs"
name: "app_{{.Year}}_{{.Month}}_{{.Day}}.avro"
format: "avro"
schema: "Syslog"
compression: "snappy"
container: "general-logs"
name: "general_{{.Timestamp}}.json"
format: "json"
schema: "Syslog"
Parquet Format
Configuration for daily partitioned Parquet files:
targets:
- name: parquet_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
container: "logs"
format: "parquet"
compression: "zstd"
name: "logs/year={{.Year}}/month={{.Month}}/day={{.Day}}/data_{{.Timestamp}}.parquet"
schema: "Syslog"
max_size: 536870912 # 512MB
Avro with Custom Schema
Configuration for Avro format with a custom schema file:
targets:
- name: avro_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
container: "logs"
format: "avro"
compression: "snappy"
name: "logs_{{.Year}}_{{.Month}}_{{.Day}}.avro"
schema: "MyCustomSchema"
Function App Upload
Configuration using Azure Function App for uploading:
targets:
- name: function_blob
type: azblob
properties:
account: "mystorageaccount"
function_app: "https://my-function-app.azurewebsites.net/api/BlobStorage"
function_token: "your-function-token"
container: "logs"
Debug Configuration
Configuration with debugging enabled:
targets:
- name: debug_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
container: "logs"
debug:
status: true
dont_send_logs: true
High Volume with Batching
Configuration optimized for high-volume ingestion:
targets:
- name: high_volume_blob
type: azblob
properties:
account: "mystorageaccount"
tenant_id: "00000000-0000-0000-0000-000000000000"
client_id: "00000000-0000-0000-0000-000000000000"
client_secret: "your-client-secret"
container: "logs"
format: "parquet"
compression: "zstd"
batch_size: 50000
max_size: 536870912 # 512MB
timeout: 60