AWS S3
Synopsis
AWS S3 target enables data export to Amazon Simple Storage Service (S3) buckets with support for multiple file formats (JSON, Avro, Parquet), compression options, and AWS Security Lake integration. The target handles multipart uploads for large files and supports both direct S3 uploads and Azure Function App integration for indirect uploads.
Schema
targets:
- name: <string>
type: awss3
key: <string>
secret: <string>
buckets:
- bucket: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
size: <integer>
batch: <integer>
region: <string>
endpoint: <string>
session: <string>
source: <string>
account: <string>
part_size: <integer>
function:
url: <string>
method: <string>
Configuration
AWS Credentials
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
key | string | Y* | - | AWS access key ID for authentication |
secret | string | Y* | - | AWS secret access key for authentication |
session | string | N | - | Optional session token for temporary credentials |
region | string | Y | - | AWS region (e.g., us-east-1 , eu-west-1 ) |
endpoint | string | N | - | Custom S3-compatible endpoint URL (for non-AWS S3 services) |
* = Conditionally required. AWS credentials (key
and secret
) are required unless using IAM role-based authentication on AWS infrastructure.
Connection
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
name | string | Y | - | Unique identifier for the target |
type | string | Y | awss3 | Target type identifier (must be awss3 ) |
part_size | integer | N | 5242880 | Multipart upload part size in bytes (minimum 5MB) |
Files
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
buckets | array | Y | - | Array of bucket configurations for file distribution |
buckets.bucket | string | Y | - | S3 bucket name |
buckets.name | string | Y | - | File name template (supports variables: {date} , {time} , {unix} , {tag} ) |
buckets.format | string | Y | - | Output format: json , multijson , avro , parquet |
buckets.compression | string | N | - | Compression algorithm: gzip , snappy , deflate |
buckets.extension | string | N | - | File extension override (defaults to format-specific extension) |
buckets.schema | string | N* | - | Schema definition file path (required for Avro and Parquet formats) |
buckets.size | integer | N | 10485760 | Maximum file size in bytes before rotation (10MB default) |
buckets.batch | integer | N | 1000 | Maximum number of events per file |
* = Conditionally required. schema
field is required when format
is set to avro
or parquet
.
AWS Security Lake
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
source | string | N* | - | Security Lake source identifier |
account | string | N* | - | AWS account ID for Security Lake |
* = Conditionally required. When source
, region
, and account
are all provided, files use Security Lake path structure: ext/{source}/region={region}/accountId={account}/eventDay={date}/{file}
Azure Function App Integration
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
function.url | string | N | - | Azure Function App endpoint URL for indirect uploads |
function.method | string | N | POST | HTTP method for function app requests |
Debug
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
description | string | N | - | Optional description of target purpose |
tag | string | N | - | Target identifier tag for routing and filtering |
status | boolean | N | true | Enable or disable target processing |
Details
The target provides enterprise-grade cloud storage integration with comprehensive file format support and AWS Security Lake compatibility.
Authentication Methods: Supports static credentials (access key and secret key) with optional session tokens for temporary credentials. When deployed on AWS infrastructure, can leverage IAM role-based authentication without explicit credentials.
File Formats: Supports four output formats with distinct use cases:
json
: Single JSON object per file (human-readable, suitable for small datasets)multijson
: Newline-delimited JSON objects (streaming format, efficient for large datasets)avro
: Schema-based binary serialization (compact, schema evolution support)parquet
: Columnar storage format (optimized for analytics, compression-friendly)
Compression Options: All formats support optional compression (gzip
, snappy
, deflate
) to reduce storage costs and transfer times. Compression is applied before upload.
File Management: Files are rotated based on size (size
parameter) or event count (batch
parameter), whichever limit is reached first. Template variables in file names ({date}
, {time}
, {unix}
, {tag}
) enable dynamic file naming for time-based partitioning.
Multipart Upload: Large files automatically use S3 multipart upload protocol with configurable part size (part_size
parameter). Default 5MB part size balances upload efficiency and memory usage.
Multiple Buckets: Single target can write to multiple S3 buckets with different configurations, enabling data distribution strategies (e.g., raw data to one bucket, processed data to another).
AWS Security Lake Integration: When source, region, and account parameters are configured, files are uploaded using Security Lake path structure: ext/{source}/region={region}/accountId={account}/eventDay={date}/{file}
. This enables automatic ingestion by AWS Security Lake services.
Azure Function App Integration: Optional indirect upload via Azure Function App endpoint. When configured, target sends file data to function app instead of directly to S3, enabling custom processing or authentication workflows.
Schema Requirements: Avro and Parquet formats require schema definition files. Schema files must be accessible at the path specified in the schema
parameter during target initialization.
Examples
Basic Configuration
Configuring basic AWS S3 target with JSON output to single bucket... |
|
Target writes JSON files to datastream-logs bucket with date/time naming... |
|
Multiple Buckets
Distributing data across multiple S3 buckets with different formats... |
|
Target writes compressed JSON to raw-data-archive and Parquet to analytics-data... |
|
Parquet Format with Schema
Configuring Parquet output with schema definition for analytics workloads... |
|
Target generates Parquet files with date-based partitioning for analytics queries... |
|
AWS Security Lake Integration
Configuring target for AWS Security Lake with required path structure... |
|
Target uses Security Lake path structure for automatic ingestion... |
|
Azure Function App Integration
Routing S3 uploads through Azure Function App for custom processing... |
|
Target sends file data to Azure Function App instead of direct S3 upload... |
|
S3-Compatible Storage
Using custom endpoint for S3-compatible storage services (MinIO, Wasabi, etc.)... |
|
Target connects to MinIO or other S3-compatible services using custom endpoint... |
|