BigQuery
Synopsis
Creates a BigQuery target that streams data directly into BigQuery tables using the streaming insert API. Supports multiple tables, custom schemas, and field normalization.
Schema
- name: <string>
description: <string>
type: bigquery
pipelines: <pipeline[]>
status: <boolean>
properties:
project_id: <string>
dataset_id: <string>
credentials_json: <string>
table: <string>
batch_size: <numeric>
timeout: <numeric>
drop_unknown_table_events: <boolean>
ignore_unknown_values: <boolean>
skip_invalid_rows: <boolean>
max_bad_records: <numeric>
field_format: <string>
tables:
- name: <string>
schema: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>
Configuration
The following fields are used to define the target:
| Field | Required | Default | Description |
|---|---|---|---|
name | Y | - | Target name |
description | N | - | Optional description |
type | Y | - | Must be bigquery |
pipelines | N | - | Optional post-processor pipelines |
status | N | true | Enable/disable the target |
Google Cloud
| Field | Required | Default | Description |
|---|---|---|---|
project_id | Y | - | Google Cloud project ID |
dataset_id | Y | - | BigQuery dataset ID |
credentials_json | N | - | Service account credentials JSON (uses default credentials if not provided) |
table | N | - | Default table name |
Streaming Options
| Field | Required | Default | Description |
|---|---|---|---|
batch_size | N | 1000 | Maximum number of rows per batch |
timeout | N | 30 | Connection timeout in seconds |
drop_unknown_table_events | N | true | Ignore events for undefined tables |
ignore_unknown_values | N | false | Accept rows with values that don't match the schema |
skip_invalid_rows | N | false | Skip rows with errors and insert valid rows |
max_bad_records | N | 0 | Maximum number of bad records allowed (0 = no limit) |
field_format | N | - | Data normalization format. See applicable Normalization section |
Multiple Tables
You can define multiple tables to stream data into:
targets:
- name: bigquery_multiple_tables
type: bigquery
properties:
tables:
- name: "security_logs"
schema: "timestamp:TIMESTAMP,message:STRING,severity:STRING"
- name: "system_logs"
schema: "timestamp:TIMESTAMP,message:STRING,level:STRING"
Schema Format
The schema format follows the pattern: field1:type1,field2:type2,...
Supported types:
STRING- Variable-length character dataINTEGERorINT64- 64-bit integerFLOATorFLOAT64- 64-bit floating pointBOOLEANorBOOL- True or falseTIMESTAMP- Absolute point in timeDATE- Calendar dateTIME- Time of dayDATETIME- Date and timeBYTES- Binary dataNUMERIC- Exact numeric valueBIGNUMERIC- Larger numeric valueGEOGRAPHY- Geographic dataJSON- JSON dataRECORDorSTRUCT- Nested structure
Debug Options
| Field | Required | Default | Description |
|---|---|---|---|
debug.status | N | false | Enable debug logging |
debug.dont_send_logs | N | false | Process logs but don't send to BigQuery (testing) |
Details
The BigQuery target uses streaming inserts to send data in near real-time. Data is batched locally until batch_size is reached or when an explicit flush is triggered during finalization.
When using the SystemS3 field in your logs, the value will be used to route the message to the appropriate table. If no table is specified, the default table (if configured) will be used.
The target automatically parses JSON messages. If the message is not valid JSON, it creates a structured event with message and timestamp fields.
Authentication
The target supports two authentication methods:
- Service Account JSON: Provide credentials directly in the configuration using
credentials_json - Default Credentials: If
credentials_jsonis not provided, the target uses Google Cloud's default credential chain (environment variables, gcloud CLI, GCE metadata service)
Error Handling
The target provides flexible error handling:
ignore_unknown_values: Allows inserting rows with extra fields not in the schemaskip_invalid_rows: Continues inserting valid rows even if some rows failmax_bad_records: Limits the number of failed rows before returning an error
When skip_invalid_rows is enabled and errors occur, the target logs individual row errors when debug mode is enabled.
Streaming inserts have cost implications. Consider batch loading for high-volume historical data.
BigQuery streaming inserts have quotas and limits. Ensure your project has adequate quota for your ingestion rate.
Examples
Basic
Minimum configuration using default credentials:
targets:
- name: basic_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "logs"
table: "system_events"
With Credentials
Configuration with explicit service account credentials:
targets:
- name: auth_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "logs"
table: "application_logs"
credentials_json: |
{
"type": "service_account",
"project_id": "my-project",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "service-account@my-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token"
}
Multiple Tables
Configuration with multiple target tables and schemas:
targets:
- name: multi_table_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "security_data"
batch_size: 500
tables:
- name: "firewall_events"
schema: "timestamp:TIMESTAMP,src_ip:STRING,dst_ip:STRING,action:STRING,bytes:INTEGER"
- name: "authentication_events"
schema: "timestamp:TIMESTAMP,username:STRING,success:BOOLEAN,source:STRING"
- name: "dns_queries"
schema: "timestamp:TIMESTAMP,query:STRING,response:STRING,client_ip:STRING"
High-Volume
Configuration optimized for high-volume streaming:
targets:
- name: highvol_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "metrics"
table: "performance_data"
batch_size: 5000
timeout: 60
skip_invalid_rows: true
max_bad_records: 100
With Error Handling
Configuration with flexible error handling:
targets:
- name: flexible_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "logs"
table: "app_logs"
ignore_unknown_values: true
skip_invalid_rows: true
max_bad_records: 50
Normalized
Using field normalization for enhanced compatibility:
targets:
- name: normalized_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "security"
table: "normalized_events"
field_format: "ecs"
With Debugging
Configuration with debug options for testing:
targets:
- name: debug_bigquery
type: bigquery
properties:
project_id: "my-project"
dataset_id: "logs"
table: "test_events"
debug:
status: true
dont_send_logs: true
Environment Variables
Using environment variables for sensitive data:
targets:
- name: secure_bigquery
type: bigquery
properties:
project_id: "${GCP_PROJECT_ID}"
dataset_id: "${BIGQUERY_DATASET}"
table: "secure_logs"
credentials_json: "${GCP_CREDENTIALS_JSON}"