Skip to main content

Merge Node

The Merge node is a powerful tool designed to combine multiple datasets into a single cohesive dataset. This node provides flexibility in merging datasets either vertically (stacking rows) or through a natural join operation based on shared keys. Whether you're consolidating data from various sources, preparing datasets for analysis, or creating unified views for visualization, the Merge node simplifies the process of combining data efficiently.

Parameters

The Merge node accepts the following parameter:

join

This parameter determines the method of merging datasets. Set join to true to perform a natural join based on shared keys between datasets, or false to merge datasets vertically by stacking rows without considering keys.

What can it do?

The Merge node enables a wide range of data manipulation tasks, including:

  • Combining datasets from multiple sources into a single unified dataset.
  • Performing natural joins to integrate datasets based on shared keys.
  • Stacking rows from multiple datasets to expand the scope of analysis.
  • Preparing datasets for reporting, visualization, or further processing.

How to use it

Using the Merge node is straightforward:

  1. Add the Merge node to your data flow.
  2. Connect two or more datasets to the node.
  3. Set the join parameter to control the merging method:
    • true for natural join based on shared keys.
    • false for vertical merging by stacking rows.
  4. Connect the node to subsequent transformations or visualizations to continue your workflow.

Example

Imagine you have two datasets:

  • Dataset A:

    name   age   city  
    Alice 25 London
    Bob 30 Paris
  • Dataset B:

    name   age   country  
    Alice 25 UK
    Bob 30 France

Vertical Merge (join: false)

If you want to stack the rows from both datasets without considering shared keys, set join to false. The resulting dataset will look like this:

Alice  25    London  null  
Bob 30 Paris null
Alice 25 null UK
Bob 30 null France

Natural Join (join: true)

If you want to merge the datasets based on shared keys (in this case, name), set join to true. The resulting dataset will look like this:

Alice  25    London  UK  
Bob 30 Paris France

This example demonstrates how the Merge node can help you combine datasets effectively, whether through vertical stacking or natural joins based on shared keys.

Why use the Merge node?

The Merge node offers several advantages:

  • Simplifies the process of combining datasets without requiring manual coding or scripting.
  • Enables dynamic merging based on shared keys or vertical stacking, making workflows more flexible.
  • Provides a concise way to create unified datasets for reporting, analysis, or visualization.
  • Integrates seamlessly with other transformation nodes, allowing you to build complex workflows with ease.

Tips

To make the most of the Merge node, consider the following tips:

  • Use the join parameter to control how datasets are combined, depending on whether you need a natural join or vertical stacking.
  • Ensure that datasets have compatible structures (e.g., matching column names) for natural joins to work effectively.
  • Test your merges on a small sample of data to verify the output structure and avoid unexpected results.
  • Combine the Merge node with filtering or transformation nodes to preprocess data before merging it.
  • Use the Merge node in conjunction with visualization nodes to create comprehensive reports or dashboards that highlight key insights from combined datasets.

Use cases

The Merge node is ideal for a variety of use cases, including:

  • Data consolidation: Combine datasets from different sources into a single dataset for analysis or reporting
  • Data integration: Merge datasets based on shared keys to create unified views for visualization or reporting.
  • Dataset expansion: Stack rows from multiple datasets to broaden the scope of analysis or reporting
  • Workflow optimization: Ensure datasets are merged correctly before applying further transformations or analyses.

Troubleshooting

If you encounter issues while using the Merge node, consider the following troubleshooting steps:

  • Verify that the datasets being merged have compatible structures, especially when using natural joins.
  • Check the join parameter to ensure it is set correctly for your use case (natural join vs. vertical stacking).
  • If the output dataset does not match expectations, review the input datasets for inconsistencies or missing values that may affect the merge operation.
  • Test your merge operation on a small sample of data to identify potential issues or edge cases before applying it to larger datasets.
  • If you experience performance issues with large datasets, consider optimizing the input datasets by filtering or aggregating data before merging to reduce complexity and improve processing speed.

With the Merge node, you can efficiently combine datasets, create unified views, and unlock new possibilities for analysis and visualization. Whether you're working with simple merges or complex workflows, this node empowers you to create meaningful and actionable insights from your data.