Skip to main content
Version: 1.5.1

Snowflake (S3 Staging)

Data Warehouse Target

Send processed telemetry data to Snowflake using Amazon S3 as the staging location.

Synopsis

The Snowflake S3 target stages telemetry files to Amazon S3, then executes COPY INTO commands on Snowflake to load data into tables.

Schema

targets:
- name: <string>
type: amazonsnowflake
properties:
account: <string>
username: <string>
password: <string>
database: <string>
schema: <string>
warehouse: <string>
role: <string>
staging_bucket: <string>
staging_prefix: <string>
region: <string>
key: <string>
secret: <string>
session: <string>
table: <string>
schema: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
tables: <array>
batch_size: <integer>
max_size: <integer>
timeout: <integer>
part_size: <integer>
field_format: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>

Configuration

Base Target Fields

FieldTypeRequiredDescription
namestringYUnique identifier for this target
descriptionstringNHuman-readable description
typestringYMust be amazonsnowflake
pipelinesarrayNPipeline names to apply before sending
statusbooleanNEnable (true) or disable (false) this target

Snowflake Connection

FieldTypeRequiredDescription
accountstringYSnowflake account identifier (e.g., abc123.us-east-1)
usernamestringYSnowflake username
passwordstringYSnowflake password
databasestringYSnowflake database name
schemastringNSnowflake schema name. Default: PUBLIC
warehousestringNSnowflake virtual warehouse name
rolestringNSnowflake role name

S3 Staging Configuration

FieldTypeRequiredDescription
staging_bucketstringYS3 bucket name for staging files
staging_prefixstringNS3 prefix path. Default: snowflake-staging/
regionstringYAWS region for S3 bucket
keystringNAWS access key ID (uses default credentials chain if omitted)
secretstringNAWS secret access key
sessionstringNAWS session token for temporary credentials

Table Configuration

FieldTypeRequiredDescription
tablestringY*Catch-all table name for all events
schemastringY*Avro/Parquet schema definition
namestringY*File naming template. Default: vmetric.{{.Timestamp}}.{{.Extension}}
formatstringNFile format (csv, json, avro, orc, parquet, xml). Default: parquet
compressionstringNCompression algorithm
extensionstringNFile extension override
tablesarrayNMultiple table configurations (see below)
tables.tablestringYTarget table name
tables.schemastringY*Avro/Parquet schema definition for this table
tables.namestringYFile naming template for this table
tables.formatstringNFile format for this table
tables.compressionstringNCompression algorithm for this table
tables.extensionstringNFile extension override for this table

* At least one of table (catch-all) or tables (multiple) must be configured. For Avro/Parquet formats, schema is required.

Batch Configuration

FieldTypeRequiredDescription
batch_sizeintegerNMaximum events per file before flush
max_sizeintegerNMaximum file size in bytes before flush
timeoutintegerNCOPY INTO command timeout in seconds. Default: 300
part_sizeintegerNS3 multipart upload part size in MB

Normalization

FieldTypeRequiredDescription
field_formatstringNApply format normalization (ECS, ASIM, UDM)

Debug Options

FieldTypeRequiredDescription
debug.statusbooleanNEnable debug logging for this target
debug.dont_send_logsbooleanNLog events without sending to Snowflake

Details

Architecture Overview

The Snowflake S3 target implements a two-stage loading pattern:

  1. Stage Files to S3: Events are written to files in S3 using the configured format
  2. Execute COPY INTO: SQL commands load data from S3 into Snowflake tables

Snowflake Connection

Account Identifier:

  • Format: <account_locator>.<region> (e.g., abc123.us-east-1)
  • Account locator is visible in your Snowflake URL
  • Region is the cloud region where your Snowflake account is deployed

Authentication:

  • Uses username/password authentication
  • Credentials are used to connect to Snowflake SQL API v2
  • Supports optional warehouse and role specification

Database and Schema:

  • Database name is required and must be a valid SQL identifier
  • Schema defaults to PUBLIC if not specified
  • Both database and schema names are validated for SQL compliance
Snowflake Permissions

The Snowflake user requires permissions to:

  • Execute SQL statements using the specified warehouse
  • Write data to the target database and schema
  • Access the S3 staging location (configured separately in Snowflake)

S3 Staging Operations

File Upload:

  • Files are staged to s3://bucket/prefix/table/filename structure
  • Uses AWS SDK multipart upload for large files
  • Supports AWS credentials chain (access key, IAM role, instance profile)

Cleanup:

  • Staged files are automatically deleted after successful COPY INTO execution
  • Failed uploads remain in S3 for troubleshooting

File Format Support

Valid Formats:

  • CSV: Comma-separated values with optional headers
  • JSON: Newline-delimited JSON objects
  • AVRO: Schema-based binary format (requires schema)
  • ORC: Optimized row columnar format
  • PARQUET: Columnar storage format (requires schema)
  • XML: XML document format

Schema Requirements:

  • Avro and Parquet formats require schema field with valid schema definition
  • Schema must match the expected table structure in Snowflake
  • Other formats use schema inference from data

Multi-Table Routing

Catch-All Table:

  • Use table field to send all events to a single table
  • Simplest configuration for single-destination scenarios

Multiple Tables:

  • Use tables array to route different event types to different tables
  • Each table entry specifies table, schema, name, format fields
  • Events routed based on SystemS3 field in pipeline

Example Configuration:

tables:
- table: security_events
schema: security_schema.avsc
name: security.{{.Timestamp}}.parquet
format: parquet
- table: access_logs
schema: access_schema.avsc
name: access.{{.Timestamp}}.parquet
format: parquet

Performance Considerations

Batch Processing:

  • Events are buffered until batch_size or max_size limits are reached
  • Larger batches reduce S3 API calls and COPY INTO operations
  • Balance batch size against latency requirements

Upload Optimization:

  • Multipart uploads automatically handle large files
  • Configure part_size for optimal network performance
  • Default part size is AWS SDK default (5 MB)

COPY INTO Performance:

  • COPY INTO commands are executed with configurable timeout
  • Failed COPY operations return errors for retry logic
  • Warehouse must be running (resumed) for COPY INTO to succeed
Warehouse State

Ensure the virtual warehouse is running before sending data. COPY INTO commands will fail if the warehouse is suspended. Configure warehouse auto-resume or manual resume procedures.

Error Handling

Upload Failures:

  • Failed S3 uploads are retried based on sender configuration
  • Permanent failures prevent COPY INTO execution
  • Check S3 bucket permissions and network connectivity

COPY INTO Failures:

  • Schema mismatches between files and tables cause failures
  • Invalid SQL identifiers (database, schema, table names) are rejected at validation
  • Check Snowflake query history for detailed error messages

Examples

Basic Configuration

Sending telemetry to Snowflake using S3 staging with Parquet format...

targets:
- name: snowflake-warehouse
type: amazonsnowflake
properties:
account: abc123.us-east-1
username: datastream_user
password: "${SNOWFLAKE_PASSWORD}"
database: PRODUCTION_DATA
warehouse: COMPUTE_WH
staging_bucket: datastream-staging
region: us-east-1
table: EVENTS
schema: event_schema.avsc
name: events.{{.Timestamp}}.parquet
format: parquet

With AWS Credentials

Using explicit AWS credentials for S3 staging access...

targets:
- name: snowflake-secure
type: amazonsnowflake
properties:
account: xyz789.us-west-2
username: security_user
password: "${SNOWFLAKE_PASSWORD}"
database: SECURITY_ANALYTICS
warehouse: SECURITY_WH
role: SECURITY_ADMIN
staging_bucket: security-logs-staging
staging_prefix: snowflake/
region: us-west-2
key: "${AWS_ACCESS_KEY}"
secret: "${AWS_SECRET_KEY}"
table: SECURITY_EVENTS
schema: security_schema.avsc
name: security.{{.Timestamp}}.parquet
format: parquet

Multi-Table Configuration

Routing different event types to separate Snowflake tables...

targets:
- name: snowflake-multi-table
type: amazonsnowflake
properties:
account: abc123.us-east-1
username: analytics_user
password: "${SNOWFLAKE_PASSWORD}"
database: ANALYTICS
warehouse: ANALYTICS_WH
staging_bucket: analytics-staging
region: us-east-1
tables:
- table: AUTHENTICATION_EVENTS
schema: auth_schema.avsc
name: auth.{{.Timestamp}}.parquet
format: parquet
- table: NETWORK_EVENTS
schema: network_schema.avsc
name: network.{{.Timestamp}}.parquet
format: parquet
- table: APPLICATION_LOGS
schema: app_schema.avsc
name: app.{{.Timestamp}}.parquet
format: parquet

High-Volume Configuration

Optimizing for high-volume ingestion with batch limits and compression...

targets:
- name: snowflake-high-volume
type: amazonsnowflake
properties:
account: abc123.us-east-1
username: streaming_user
password: "${SNOWFLAKE_PASSWORD}"
database: HIGH_VOLUME_DATA
warehouse: LARGE_WH
staging_bucket: streaming-staging
region: us-east-1
batch_size: 100000
max_size: 134217728
part_size: 16
timeout: 600
table: STREAMING_EVENTS
schema: streaming_schema.avsc
name: stream.{{.Timestamp}}.parquet
format: parquet
compression: snappy

JSON Format

Using JSON format for flexible schema evolution and debugging...

targets:
- name: snowflake-json
type: amazonsnowflake
properties:
account: abc123.us-east-1
username: dev_user
password: "${SNOWFLAKE_PASSWORD}"
database: DEVELOPMENT
warehouse: DEV_WH
staging_bucket: dev-staging
region: us-east-1
table: TEST_EVENTS
name: test.{{.Timestamp}}.json
format: json

With Normalization

Applying ECS normalization before loading to Snowflake...

targets:
- name: snowflake-normalized
type: amazonsnowflake
properties:
account: abc123.us-east-1
username: security_user
password: "${SNOWFLAKE_PASSWORD}"
database: SECURITY_DATA
warehouse: SECURITY_WH
staging_bucket: security-staging
region: us-east-1
field_format: ECS
table: ECS_EVENTS
schema: ecs_schema.avsc
name: ecs.{{.Timestamp}}.parquet
format: parquet

Production Configuration

Production-ready configuration with performance tuning, AWS credentials, and multi-table routing...

targets:
- name: snowflake-production
type: amazonsnowflake
properties:
account: production.us-east-1
username: production_user
password: "${SNOWFLAKE_PASSWORD}"
database: PRODUCTION_ANALYTICS
warehouse: PRODUCTION_WH
role: DATA_ENGINEER
staging_bucket: production-staging-bucket
staging_prefix: datastream/snowflake/
region: us-east-1
key: "${AWS_ACCESS_KEY}"
secret: "${AWS_SECRET_KEY}"
batch_size: 50000
max_size: 67108864
part_size: 10
timeout: 300
field_format: ASIM
tables:
- table: SECURITY_EVENTS
schema: security_schema.avsc
name: security.{{.Timestamp}}.parquet
format: parquet
compression: snappy
- table: AUDIT_LOGS
schema: audit_schema.avsc
name: audit.{{.Timestamp}}.parquet
format: parquet
compression: snappy
- table: NETWORK_FLOWS
schema: network_schema.avsc
name: network.{{.Timestamp}}.parquet
format: parquet
compression: snappy