Microsoft Sentinel Integration
VirtualMetric Director supports Microsoft Sentinel integration through two different approaches: automatic discovery and manual configuration. Choose the method that best fits your environment and requirements.
Prerequisites for both approaches:
- an Azure subscription with permissions to create resources
- a Log Analytics workspace required by Microsoft Sentinel
- Autodiscovery
- Manual
VirtualMetric Director provides an autodiscovery feature for Microsoft Sentinel integration. This enables automatic detection and configuration of Data Collection Rules (DCRs) and their associated streams, simplifying the setup process and providing dynamic updates as your Sentinel environment changes.
Open a terminal with Administrative access and navigate to <vm_root>
. Then, type the following command and press
- PowerShell
- Bash
C:\vmetric-director -sentinel -autodiscovery
vmetric-director -sentinel -autodiscovery
Follow the on-screen prompts to complete the setup process. For detailed step-by-step instructions, refer to Microsoft Sentinel Overview.
Manual integration requires step-by-step configuration of Microsoft Sentinel components. This approach provides full control over the integration process and is ideal for environments requiring specific configuration requirements.
Service Principal Setup
Create a service principal for DataStream authentication:
- Navigate to Azure Active Directory > App registrations
- Select New registration
- Enter DataStream as the application name
- Select Accounts in this organizational directory only
- Click Register
- Record the Application (client) ID and Directory (tenant) ID
- Go to Certificates & secrets > New client secret
- Create a secret and record the Client secret value
Data Collection Endpoint Setup
- Navigate to Azure Portal > Monitor > Data Collection Endpoints
- Select Create
- Configure the DCE:
- Name:
datastream-dce
- Resource group: Select your resource group
- Region: Same region as your Log Analytics workspace
- Click Review + create > Create
- Record the Logs Ingestion endpoint URL
Data Collection Rule Creation
- Navigate to Monitor > Data Collection Rules
- Select Create
- Configure basic settings:
- Rule name:
datastream-dcr
- Resource group: Same as your DCE
- Region: Same as your DCE
- Platform Type: Windows or Linux based on your data sources
- In Resources tab:
- Add your Log Analytics workspace
- In Collect and deliver tab:
- Data source type: Custom Text Logs or Windows Event Logs
- Data source name:
DataStreamLogs
- File pattern: Configure based on your log sources
- Configure Destination:
- Destination type: Azure Monitor Logs
- Destination: Your Log Analytics workspace
- Table: Create or select target table
- Click Review + create > Create
- Record the DCR Immutable ID
Required Permissions
Director needs the following permissions for Microsoft Sentinel integration.
- Autodiscovery
- Manual
If you used the Automation tool with App Registration, these permissions are already configured.
Director needs the following permissions to fetch the Data Collection Rules and their associated streams.
Director needs permissions to send data to your manually configured Data Collection Rules.
- Autodiscovery
- Manual
For Data Collection
For each DCR name prefixed with vmetric:
- Navigate to the DCR in Azure Portal
- Go to Access Control (IAM)
- Select Add > Add role assignment
- Assign the following permissions:
Role Assignee Monitoring Metrics Publisher Your Managed Identity or Application
For Autodiscovery
To enable the DCR autodiscovery features:
- Navigate to the Resource Group containing your DCRs
- Go to Access Control (IAM)
- Select Add > Add role assignment
- Assign the following permissions: |Role|Assignee| |---|---| |Monitoring Reader|Your Managed Identity or Application|
The Monitoring Reader
role should be assigned at the Resource Group level only. Assigning this role at the Subscription level is not recommended since it is not required for the functionality to work, and it increases the autodiscovery scan duration.
For Data Collection
For each manually created DCR:
- Navigate to the DCR in Azure Portal
- Go to Access Control (IAM)
- Select Add > Add role assignment
- Assign the following permissions: |Role|Assignee| |---|---| |Monitoring Metrics Publisher|Your Service Principal or Application|
DataStream Target Configuration
Configure the DataStream target using direct DCE endpoint URL:
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "https://<dce-name>-<region>.ingest.monitor.azure.com" # Direct URL
streams:
- name: "Custom-DataStreamLogs"
dcr_id: "<dcr-immutable-id>"
- Autodiscovery
- Manual
Verification
After assigning the permissions:
- Wait a few minutes for Azure RBAC to propagate
- Test the connection using Director
- Check the logs for any permission-related errors
If you encounter permission issues, verify that all role assignments are properly configured, that Azure RBAC changes have propagated (can take up to 30 minutes), and that the identity has the correct access scope.
Verification
Test your manual configuration:
- Wait a few minutes for Azure RBAC to propagate
- Test the connection using Director:
vmetric-director -test-target sentinel
- Check the logs for authentication or DCR errors
- Verify data appears in your Log Analytics workspace table
For manual configuration issues, verify the DCE endpoint URL is correct, the DCR Immutable ID matches your configuration, and the service principal has proper permissions on both the DCR and Log Analytics workspace.
How It Works
- Autodiscovery
- Manual
Resource ID-Based Discovery
Instead of manually configuring the Data Collection Endpoint (DCE) URL, you can provide the DCE Resource ID. For example:
/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Insights/dataCollectionEndpoints/<dce-name>
When using a Resource ID, Director will discover all DCRs associated with the specified DCE, and collect detailed stream information including table names, table schemas (column definitions), and stream configurations.
Caching Mechanism
The default cache duration is 5 minutes. The cache is automatically invalidated when the configuration file (sentinel.yml
) is modified or the cache timeout is reached.
Dynamic Updates
The autodiscovery feature continuously adapts to changes in your Sentinel environment thanks to which new DCRs are automatically detected, and changes to table schemas are recognized while custom tables and columns are discovered and integrated.
Direct Endpoint Configuration
Manual integration uses direct DCE endpoint URLs and explicit stream configuration. You must specify:
- DCE Endpoint URL: Direct HTTPS URL to your Data Collection Endpoint
- DCR Immutable ID: The specific Data Collection Rule identifier
- Stream Names: Exact stream names matching your DCR configuration
Static Configuration
Manual configuration is static and requires updates when:
- New DCRs are created
- Stream configurations change
- Table schemas are modified
- Endpoint URLs change
Schema Management
For manual integration, you must define table schemas in advance, map data fields to table columns, handle schema mismatches in your pipeline configuration, and monitor for phantom fields manually.
Phantom Field Prevention
Microsoft Sentinel has moved to DCR-based log ingestion and manual schema management. This change, while powerful, can lead to phantom fields, i.e. data fields that are ingested and billed even though they are not part of the table schema and therefore are inaccessible for querying, and yet are still incurring storage costs.
For a comprehensive understanding of phantom fields, see Sentinel Phantom Fields by ManagedSentinel.
Among the common scenarios that cause these are log splitting with mismatched schemas, temporary fields in transformations, duplicate fields emerging from improper field mapping, and schema modifications without proper cleanup.
- Autodiscovery
- Manual
Director's autodiscovery feature includes a built-in phantom field prevention mechanism based on the following:
Schema Validation - Automatically discovers table schemas from DCRs, validates each field against the known schema, and discards fields not present in the table schema.
Dynamic Field Mapping - Fields that exist in the schema or are required are kept while others are discarded.
Cost Optimization - Prevents unnecessary data ingestion thereby reducing storage costs while maintaining data accessibility.
The guiding principles for autodiscovery include schema management (regularly review table schemas, update schemas when adding new fields, use autodiscovery to validate field usage), field mapping (let autodiscovery handle field validation, define critical fields explicitly when needed, and monitor for any dropped fields in logs), and cost monitoring (track ingestion volumes, monitor field usage patterns, and verify data accessibility).
Manual integration requires proactive phantom field prevention since automatic schema validation is not available.
Schema Pre-Definition - Define table schemas in advance and ensure all pipeline processors output only schema-defined fields.
Pipeline Field Filtering - Use remove
processors to eliminate fields not present in your target table schema.
Manual Validation - Regularly review ingested data to identify phantom fields and update pipeline configuration.
The guiding principles for manual configuration:
-
For schema management, pre-define table schemas, document all expected fields, update pipelines when schemas change.
-
For field mapping, use explicit field mapping in pipelines, implement field removal processors, validate data before ingestion.
-
For cost monitoring, implement manual monitoring of phantom fields, regularly audit ingested data structure, track field usage patterns.
The most salient reason for preventing phantom fields is to reduce their impact on cost. Some environments show up to 65% of table data as phantom fields.
Configuration Examples
- Autodiscovery
- Manual
Basic Autodiscovery Configuration
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "/subscriptions/.../dataCollectionEndpoints/<your-dce-name>" # Use Resource ID
Filtered Stream Configuration
You can filter the autodiscovered streams that you intend to use:
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "/subscriptions/.../dataCollectionEndpoints/<your-dce-name>"
streams:
- name: "Custom-WindowsEvent"
- name: "Custom-SecurityEvent"
Cache Configuration
You can optionally adjust the cache timeout (in seconds):
targets:
- name: sentinel
type: sentinel
properties:
endpoint: "/subscriptions/.../dataCollectionEndpoints/<your-dce-name>"
cache:
timeout: 300
Basic Manual Configuration
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "https://<dce-name>-<region>.ingest.monitor.azure.com" # Direct URL
streams:
- name: "Custom-DataStreamLogs"
dcr_id: "dcr-<immutable-id>"
Multi-Stream Manual Configuration
For multiple tables and streams:
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "https://<dce-name>-<region>.ingest.monitor.azure.com"
streams:
- name: "Custom-SecurityEvents"
dcr_id: "dcr-<security-dcr-id>"
- name: "Custom-NetworkEvents"
dcr_id: "dcr-<network-dcr-id>"
- name: "Custom-SystemEvents"
dcr_id: "dcr-<system-dcr-id>"
Manual Configuration with Field Filtering
Prevent phantom fields with explicit field management:
targets:
- name: sentinel
type: sentinel
properties:
tenant_id: "<your-tenant-id>"
client_id: "<your-client-id>"
client_secret: "<your-client-secret>"
endpoint: "https://<dce-name>-<region>.ingest.monitor.azure.com"
streams:
- name: "Custom-FilteredLogs"
dcr_id: "dcr-<filtered-dcr-id>"
# Use with pipeline processors to filter fields
field_mapping:
allowed_fields:
- "TimeGenerated"
- "EventID"
- "Level"
- "Message"
- "Computer"