What are Explorations?

When designing pipelines, Datameer works with automatically generated sample data for large data sets. This feature helps speed up the process of creating transformations and provides immediate feedback on the changes made. However, using sample data has its limitations. Since you are working with a subset of the actual data, the results you see in the preview may not match the results when the pipeline is executed with the full dataset. This makes it challenging to verify the accuracy of intermediate steps in the transformation process, which is often crucial for you to confidently proceed with further transformations.

You may have specific questions about the data as they progress, such as:

  • checking if the oldest record is from a certain date
  • comparing record counts before and after joining data
  • confirming if certain filters have been applied correctly
  • validating aggregate results
  • ensuring that a calculated column value doesn't exceed a certain limit

To address these use cases and provide you with the confidence you need, Datameer introduces the Exploration feature.

The Exploration feature enables you to understand the transformation results of your datasets, based on full data from your source data set. You can explore from any data set that is displayed in the Flow Area.

The Exploration feature in Datameer has several distinct characteristics when compared to the data preview in the design phase:

  • explorations always run the entire upstream pipeline using the full source data set - but the displayed result is truncated to 1,000 records
  • explorations can be time-consuming queries and therefore they run in the background, allowing you to work on other tasks and check the results later
  • exploration queries can be cancelled if they are unnecessary or taking too long, preventing unnecessary strain on the data warehouse
  • explorations are temporary and not executed during the actual runtime of the pipeline - they serve as a way to analyze and examine data without impacting the live pipeline
  • each node in the Flow Graph can have one or multiple explorations associated with it
  • explorations are persisted and can be shared among collaborators working on the same project to collaborate and refer to the exploration results
  • persisted explorations can become outdated if changes are made upstream in the pipeline - an invalid exploration is marked with a yellow or red dot to indicate its invalidity and can be refreshed or deleted
  • exploration queries can involve various operations such as filters, aggregates, and sorts, allowing for flexible and detailed analysis of the data

Learn all about how to use the exploration feature here.

View several use cases with step-by-step instructions that provide explanations and demonstrate of how to perform an exploration to achieve your goals here.