Merge Node

The Merge node is a powerful tool designed to combine multiple datasets into a single cohesive dataset. This node provides flexibility in merging datasets either vertically (stacking rows) or through a natural join operation based on shared keys. Whether you're consolidating data from various sources, preparing datasets for analysis, or creating unified views for visualization, the Merge node simplifies the process of combining data efficiently.

Parameters

The Merge node accepts the following parameter:

`join`

This parameter determines the method of merging datasets. Set join to true to perform a natural join based on shared keys between datasets, or false to merge datasets vertically by stacking rows without considering keys.

What can it do?

The Merge node enables a wide range of data manipulation tasks, including:

Combining datasets from multiple sources into a single unified dataset.
Performing natural joins to integrate datasets based on shared keys.
Stacking rows from multiple datasets to expand the scope of analysis.
Preparing datasets for reporting, visualization, or further processing.

How to use it

Using the Merge node is straightforward:

Add the Merge node to your data flow.
Connect two or more datasets to the node.
Set the join parameter to control the merging method:
- true for natural join based on shared keys.
- false for vertical merging by stacking rows.
Connect the node to subsequent transformations or visualizations to continue your workflow.

Example

Imagine you have two datasets:

Dataset A:

name   age   city  
Alice  25    London  
Bob    30    Paris  

Dataset B:

name   age   country  
Alice  25    UK  
Bob    30    France  

Vertical Merge (`join: false`)

If you want to stack the rows from both datasets without considering shared keys, set join to false. The resulting dataset will look like this:

Alice  25    London  null  
Bob    30    Paris   null  
Alice  25    null    UK  
Bob    30    null    France

Natural Join (`join: true`)

If you want to merge the datasets based on shared keys (in this case, name), set join to true. The resulting dataset will look like this:

Alice  25    London  UK  
Bob    30    Paris   France

This example demonstrates how the Merge node can help you combine datasets effectively, whether through vertical stacking or natural joins based on shared keys.

Why use the Merge node?

The Merge node offers several advantages:

Simplifies the process of combining datasets without requiring manual coding or scripting.
Enables dynamic merging based on shared keys or vertical stacking, making workflows more flexible.
Provides a concise way to create unified datasets for reporting, analysis, or visualization.
Integrates seamlessly with other transformation nodes, allowing you to build complex workflows with ease.

Tips

To make the most of the Merge node, consider the following tips:

Use the join parameter to control how datasets are combined, depending on whether you need a natural join or vertical stacking.
Ensure that datasets have compatible structures (e.g., matching column names) for natural joins to work effectively.
Test your merges on a small sample of data to verify the output structure and avoid unexpected results.
Combine the Merge node with filtering or transformation nodes to preprocess data before merging it.
Use the Merge node in conjunction with visualization nodes to create comprehensive reports or dashboards that highlight key insights from combined datasets.

Use cases

The Merge node is ideal for a variety of use cases, including:

Data consolidation: Combine datasets from different sources into a single dataset for analysis or reporting
Data integration: Merge datasets based on shared keys to create unified views for visualization or reporting.
Dataset expansion: Stack rows from multiple datasets to broaden the scope of analysis or reporting
Workflow optimization: Ensure datasets are merged correctly before applying further transformations or analyses.

Troubleshooting

If you encounter issues while using the Merge node, consider the following troubleshooting steps:

Verify that the datasets being merged have compatible structures, especially when using natural joins.
Check the join parameter to ensure it is set correctly for your use case (natural join vs. vertical stacking).
If the output dataset does not match expectations, review the input datasets for inconsistencies or missing values that may affect the merge operation.
Test your merge operation on a small sample of data to identify potential issues or edge cases before applying it to larger datasets.
If you experience performance issues with large datasets, consider optimizing the input datasets by filtering or aggregating data before merging to reduce complexity and improve processing speed.

With the Merge node, you can efficiently combine datasets, create unified views, and unlock new possibilities for analysis and visualization. Whether you're working with simple merges or complex workflows, this node empowers you to create meaningful and actionable insights from your data.

Parameters​

join​

What can it do?​

How to use it​

Example​

Vertical Merge (join: false)​

Natural Join (join: true)​

Why use the Merge node?​

Tips​

Use cases​

Troubleshooting​