Version: 1.2.0

Example: Reading JSON With a Pipeline

This section will help you get started with configuring and running a very simple pipeline, walking you through a common use case.

info

For a detailed discussion of pipelines, see this section.

In this example, we will use a feature of Director that facilitates designing and testing pipelines.

Scenario

To understand how a pipeline works at the very basic level, we will create a very simple input file in JSON format and a very simple pipeline that does only one transformation. Afterwards, we will write the transformed data to another JSON file.

Using Director's pipeline validation and testing functionality, we will see the pipeline in action.

Setup and Trial

Loading include…

Create the test data

First, create a JSON file named sample-data.json in our working directory, and put the following sample data in it:

sample-data.json
{
  "raw_data": "{\"words\": \"hello world\"}"
}

Define the pipeline logic

Then, create a YAML file in our working directory named convert-words.yml, and put the following pipeline definition in it:

convert-words.yml
pipelines:
  - name: convert_words
    processors:
      - json:
          field: raw_data
          add_to_root: true
      - uppercase:
          field: words
          target_field: converted_words

tip

When configuring a pipeline, we use the identifiers in the name fields of the components.

Here is what this pipeline will do:

Look for the field named raw_data in the JSON file we will feed it.
Using the json processor, grab its contents and—since we turned on its add_to_root setting—move the words field one level up.
Using the uppercase processor, convert the words to uppercase and write them to a field named converted_words.

Validate the pipeline

To see whether our pipeline is valid, we enter the following command in the terminal:

PowerShell
Bash

.\vmetric-director -validate -path=".\config\Examples\convert-words.yml"

./vmetric-director -validate -path="./config/Examples/convert-words.yml"

important

Note that we have to specify the paths and names of the files we are using. This is specific to the pipeline mode.

If our pipeline is valid, this command will return:

PowerShell
Bash

[OK] No issues found.

[OK] No issues found.

And that is indeed the case. Now, to visualize the output, enter the following command:

PowerShell
Bash

.\vmetric-director -pipeline -path=".\config\Examples\convert-words.yml" -name=convert_words -visualize

./vmetric-director -pipeline -path="./config/Examples/convert-words.yml" -name=convert_words -visualize

This should return the following:

PowerShell
Bash

{
  "converted_words": "HELLO WORLD",
  "raw_data": "{\"words\": \"hello world\"}",
  "words": "hello world"
}

{
  "converted_words": "HELLO WORLD",
  "raw_data": "{\"words\": \"hello world\"}",
  "words": "hello world"
}

This means, Director is able to recognize the pipeline we have defined by its name and generate the expected output.

Now we can test our pipeline.

Test the pipeline

To see our pipeline in action, enter the following in the terminal and check its status message:

PowerShell
Bash

.\vmetric-director -pipeline -path=".\config\Examples\convert-words.yml" -name convert_words -input ".\config\Examples\sample-data.json" -output ".\config\Examples\processed-sample-data.json"
Successfully exported to .\config\Examples\processed-sample-data.json

./vmetric-director -pipeline -path="./config/Examples/convert-words.yml" -name convert_words -input "./config/Examples/sample-data.json" -output "./config/Examples/processed-sample-data.json"
Successfully exported to ./config/Examples/processed-sample-data.json

Here is what this test does:

Consume the data in the input file named sample-data.json in our working directory
Using the pipeline we named convert_words, extract and convert the words in the words field to uppercase
Write the converted names to a new field named converted_words
Save the results, along with the original raw_data, to an output file named processed-sample-data.json in our working directory.

note

The output file should be created when the test is run.

Check the processed data

Now open the file named processed-sample-data.json to see the results. It should appear like this:

processed-sample-data.json
{
    "converted_words": "HELLO WORLD",
    "raw_data": "{\"words\": \"hello world\"}",
    "words": "hello world"
}

Monitoring

Verify the pipeline processed correctly:

Output file processed-sample-data.json was created
File contains the expected converted_words field with "HELLO WORLD"

If both conditions are met, your pipeline is functioning properly.

Now we can proceed to designing an end-to-end data stream.

Scenario​

Setup and Trial​

Create the test data​

Define the pipeline logic​

Validate the pipeline​

Test the pipeline​

Check the processed data​

Monitoring​