Snowflake (S3 Staging)
Send processed telemetry data to Snowflake using Amazon S3 as the staging location.
Synopsis
The Snowflake S3 target stages telemetry files to Amazon S3, then executes COPY INTO commands on Snowflake to load data into tables.
Schema
targets:
- name: <string>
type: amazonsnowflake
properties:
account: <string>
username: <string>
password: <string>
database: <string>
schema: <string>
warehouse: <string>
role: <string>
staging_bucket: <string>
staging_prefix: <string>
region: <string>
key: <string>
secret: <string>
session: <string>
table: <string>
schema: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
tables: <array>
batch_size: <integer>
max_size: <integer>
timeout: <integer>
part_size: <integer>
field_format: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>
Configuration
Base Target Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Y | Unique identifier for this target |
description | string | N | Human-readable description |
type | string | Y | Must be amazonsnowflake |
pipelines | array | N | Pipeline names to apply before sending |
status | boolean | N | Enable (true) or disable (false) this target |
Snowflake Connection
| Field | Type | Required | Description |
|---|---|---|---|
account | string | Y | Snowflake account identifier (e.g., abc123.us-east-1) |
username | string | Y | Snowflake username |
password | string | Y | Snowflake password |
database | string | Y | Snowflake database name |
schema | string | N | Snowflake schema name. Default: PUBLIC |
warehouse | string | N | Snowflake virtual warehouse name |
role | string | N | Snowflake role name |
S3 Staging Configuration
| Field | Type | Required | Description |
|---|---|---|---|
staging_bucket | string | Y | S3 bucket name for staging files |
staging_prefix | string | N | S3 prefix path. Default: snowflake-staging/ |
region | string | Y | AWS region for S3 bucket |
key | string | N | AWS access key ID (uses default credentials chain if omitted) |
secret | string | N | AWS secret access key |
session | string | N | AWS session token for temporary credentials |
Table Configuration
| Field | Type | Required | Description |
|---|---|---|---|
table | string | Y* | Catch-all table name for all events |
schema | string | Y* | Avro/Parquet schema definition |
name | string | Y* | File naming template. Default: vmetric.{{.Timestamp}}.{{.Extension}} |
format | string | N | File format (csv, json, avro, orc, parquet, xml). Default: parquet |
compression | string | N | Compression algorithm |
extension | string | N | File extension override |
tables | array | N | Multiple table configurations (see below) |
tables.table | string | Y | Target table name |
tables.schema | string | Y* | Avro/Parquet schema definition for this table |
tables.name | string | Y | File naming template for this table |
tables.format | string | N | File format for this table |
tables.compression | string | N | Compression algorithm for this table |
tables.extension | string | N | File extension override for this table |
* At least one of table (catch-all) or tables (multiple) must be configured. For Avro/Parquet formats, schema is required.
Batch Configuration
| Field | Type | Required | Description |
|---|---|---|---|
batch_size | integer | N | Maximum events per file before flush |
max_size | integer | N | Maximum file size in bytes before flush |
timeout | integer | N | COPY INTO command timeout in seconds. Default: 300 |
part_size | integer | N | S3 multipart upload part size in MB |
Normalization
| Field | Type | Required | Description |
|---|---|---|---|
field_format | string | N | Apply format normalization (ECS, ASIM, UDM) |
Debug Options
| Field | Type | Required | Description |
|---|---|---|---|
debug.status | boolean | N | Enable debug logging for this target |
debug.dont_send_logs | boolean | N | Log events without sending to Snowflake |
Details
Architecture Overview
The Snowflake S3 target implements a two-stage loading pattern:
- Stage Files to S3: Events are written to files in S3 using the configured format
- Execute COPY INTO: SQL commands load data from S3 into Snowflake tables
Snowflake Connection
Account Identifier:
- Format:
<account_locator>.<region>(e.g.,abc123.us-east-1) - Account locator is visible in your Snowflake URL
- Region is the cloud region where your Snowflake account is deployed
Authentication:
- Uses username/password authentication
- Credentials are used to connect to Snowflake SQL API v2
- Supports optional warehouse and role specification
Database and Schema:
- Database name is required and must be a valid SQL identifier
- Schema defaults to
PUBLICif not specified - Both database and schema names are validated for SQL compliance
The Snowflake user requires permissions to:
- Execute SQL statements using the specified warehouse
- Write data to the target database and schema
- Access the S3 staging location (configured separately in Snowflake)
S3 Staging Operations
File Upload:
- Files are staged to
s3://bucket/prefix/table/filenamestructure - Uses AWS SDK multipart upload for large files
- Supports AWS credentials chain (access key, IAM role, instance profile)
Cleanup:
- Staged files are automatically deleted after successful COPY INTO execution
- Failed uploads remain in S3 for troubleshooting
File Format Support
Valid Formats:
- CSV: Comma-separated values with optional headers
- JSON: Newline-delimited JSON objects
- AVRO: Schema-based binary format (requires schema)
- ORC: Optimized row columnar format
- PARQUET: Columnar storage format (requires schema)
- XML: XML document format
Schema Requirements:
- Avro and Parquet formats require
schemafield with valid schema definition - Schema must match the expected table structure in Snowflake
- Other formats use schema inference from data
Multi-Table Routing
Catch-All Table:
- Use
tablefield to send all events to a single table - Simplest configuration for single-destination scenarios
Multiple Tables:
- Use
tablesarray to route different event types to different tables - Each table entry specifies
table,schema,name,formatfields - Events routed based on SystemS3 field in pipeline
Example Configuration:
tables:
- table: security_events
schema: security_schema.avsc
name: security.{{.Timestamp}}.parquet
format: parquet
- table: access_logs
schema: access_schema.avsc
name: access.{{.Timestamp}}.parquet
format: parquet
Performance Considerations
Batch Processing:
- Events are buffered until
batch_sizeormax_sizelimits are reached - Larger batches reduce S3 API calls and COPY INTO operations
- Balance batch size against latency requirements
Upload Optimization:
- Multipart uploads automatically handle large files
- Configure
part_sizefor optimal network performance - Default part size is AWS SDK default (5 MB)
COPY INTO Performance:
- COPY INTO commands are executed with configurable timeout
- Failed COPY operations return errors for retry logic
- Warehouse must be running (resumed) for COPY INTO to succeed
Ensure the virtual warehouse is running before sending data. COPY INTO commands will fail if the warehouse is suspended. Configure warehouse auto-resume or manual resume procedures.
Error Handling
Upload Failures:
- Failed S3 uploads are retried based on sender configuration
- Permanent failures prevent COPY INTO execution
- Check S3 bucket permissions and network connectivity
COPY INTO Failures:
- Schema mismatches between files and tables cause failures
- Invalid SQL identifiers (database, schema, table names) are rejected at validation
- Check Snowflake query history for detailed error messages
Examples
Basic Configuration
Sending telemetry to Snowflake using S3 staging with Parquet format... | |
With AWS Credentials
Using explicit AWS credentials for S3 staging access... | |
Multi-Table Configuration
Routing different event types to separate Snowflake tables... | |
High-Volume Configuration
Optimizing for high-volume ingestion with batch limits and compression... | |
JSON Format
Using JSON format for flexible schema evolution and debugging... | |
With Normalization
Applying ECS normalization before loading to Snowflake... | |
Production Configuration
Production-ready configuration with performance tuning, AWS credentials, and multi-table routing... | |