This type of transformation duplicates an input dataset to create identical output datasets. This is done by using the Split step in the Data Transformation module. This step does not need to be configured, and can be used multiple times to split a data set.
Note: This is a built-in step, and therefore will be available in the Transformations List by default.
Follow the instructions below to configure a split step:
- Expand the Transformation Steps button on the left side of the Transformation Flow builder, to view a list of transformation steps.
- Drag the Split step from the list of transformation steps.
- Connect this step to the previous step in the flow.
- Add more steps to the flow, and one by one connect them to the split step.
For example, you can split data from a single data source and store it in multiple databases.
In this example, we will cover how to create a simple transformation flow that involves a split step. Our flow will involve extracting data from a data source, splitting it into two parts, with each part getting a different type of transformation applied to it. Each of the results are then saved into separate outputs. You could always include additional steps in your own transformation flows.
Click on the Create button in the top-right corner.
Then select Transformation Flow.
If you do not see this option, you may not have security access to transformation flows. Learn how to get access here.
You will be taken to the transformation flow builder.
Hover your cursor over the input steps button on the left side. A panel with a list of all data extraction steps will appear.
Drag one of these steps onto the canvas. For this procedure, we will use the single table step as an example. (Click here to learn about all the different input steps.)
On doing so, a popup will appear to load data from a data source.
Click on the data source that you require.
Then choose the database table, and click on Submit.
The selected table's fields will appear in the transformation flow panel to be configured.
Select only the fields that you want data to be extracted from.
You can make further changes to the step, such as renaming it, adding a description, etc.
Once you’re done with the step configuration, click on the Apply button.
On doing so, the data preview panel will display the data extracted from the configured database table.
Once you are ready to split your data, extend the transformation steps panel, and drag in the Split step.
- Then create a connection between the database input step (or the previous step) and this split step.
There is no need to configure the split step, however you can still choose which fields will be carried onto the next steps through the Field tab.
Now you can add multiple steps simultaneously and perform different transformations using the same dataset.
For this example, we will aggregate this dataset and separately create a custom calculated field on its copy.
- Drag in the Aggregate step from the transformation steps panel, and connect the split step to it.
Click on the aggregate step, and in the transformation flow panel, select the aggregations to be applied on each of the fields. Then click Apply.
- The result of this step will appear in the data preview panel.
- Next, include the calculated field transformation step to the flow, and connect the split step to it.
Click on this newly added step and configure it through the transformation flow panel. First click Add Item, and then in the popup window, create a custom calculated field that you want your data to generate.
- You can confirm the validation of your formula by using the Validate button.
- Then click on the Save button.
The result of this step will be generated in the data preview panel.
Now we will save the data from each of these steps into separate output steps.
Extend the output steps panel by hovering on its icon, and drag the SQL database output step onto the canvas.
Connect the aggregate step to this output step by creating a connection.
Note: By default, the output step will be highlighted as red to signify that it contains errors. This is because it has not been configured yet.
And then configure the output step through the panel on the right-side. Click here to learn more about configuring this step.
- Similarly, add another SQL database output step, and create a connection with the calculated field step.
Configure this output step as well.
You can now execute the draft flow by clicking on the run button in the top header menu. (This does a quick execution of the data rows in the data preview panel.)
Or save the flow for a full execution. To do that, click on the Publish button.
Then provide details in the popup that appears, such as providing a proper name, and selecting rights to access the flow.
Finally, click on the Save button.