Skip to content

Workbench

Data Browser#

Data Browser Overview#

The Data Browser provides all your Snowflake Schemas and datasets.

You can fade in or fade out the Data Browser by clicking "<<".

The Data Browser provides the following information:

  • an 'Upload File' button to upload data from your device
  • all of your Snowflake Warehouses, ordered alphabetically
  • all of your Snowflake Schemas and datasets
  • a filter option to browse more specifically through your data

The order within the Data Browser is the following:

  • Snowflake Warehouses are sorted alphabetically
  • Snowflake Schemas are sorted alphabetically
  • tables are sorted alphabetically
  • columns are sorted in the order defined in Snowflake's Schema

Browsing Datasets#

In order to find your needed dataset, you can enter your text in the search bar. Filtering is possible for databases, schemas and tables. Column filtering is not possible.

Find your matching entries listed below the search bar.

More Filtering Options#

You can filter the entries by the following criteria:

  • "All Datasets": all available datasets are displayed
  • "Added Datasets": shows added and in-use datasets for the actual Project

Flow Area#

The Flow Area visually represents your whole transformation pipeline, including data sources, transformations as well as deployed data sets.

The toolbar above the Flow Area is largely responsible for operating the Flow Area. The kebab menu includes Project level operations like 'change owner' but also flow related ones like 'Remove Unused Data Sources'.

Data Source Node Options#

Starting from a data source node you can:

  • show the node's details
  • remove the node from the Project (Note: But not simultaneously from your Snowflake instance)

Transformation Node Options#

For a transformation node you can:

  • edit the node
  • show the node's details
  • deploy or re-deploy the node to Snowflake
  • exchange the transformation source
  • delete the node from the Project

Deployed Nodes Options#

For a deployed node you can:

  • show the node's details
  • deploy or re-deploy the node to Snowflake
  • show the Execution History
  • show the Deployment History
  • remove the node from the Project (Note: But not simultaneously from your Snowflake instance)

Scheduled Email Node Options#

For a scheduled email node you can:

  • modify the scheduling
  • pause and resume the scheduling
  • delete the scheduling

Exploration Query Node Options#

Only when scheduling an exploration via Email, the according exploration query node is displayed in the Flow Area. You can do the following:

  • edit the exploration query
  • show the exploration query details
  • deploy the query data to Snowflake
  • delete the query

Grouping Node Options#

Once more than one node is grouped, you can do the following from the context menu:

  • ungroup the grouped nodes
  • expand the group and view all grouped nodes

Toolbar Options#

The Flow Area toolbar allows you (depending on which node is selected) to:

  • add a new SQL query in the 'New SQL Editor'
  • add a new transformation
  • deploy or re-deploy a node to Snowflake

The kebab menu allows you to:

  • manage the collaborators for a node
  • change the ownership of a node
  • remove unused data source that are not part of your pipeline
  • clone a whole Project
  • delete a whole Project

Flow Area Visibility#

You can customize the display of your flow area by clicking on the different icons:

  • hide unconnected nodes, only used nodes in the pipeline are displayed
  • make the nodes fit to the size of the Flow Area
  • zoom in and zoom out

Grouping Nodes#

To better manage complex projects with multiple data sources and get a better overview of the data nodes, you can organize the nodes in groups.

To group two ore more nodes, click on the first node, mark all needed other nodes by pressing cmd and click on "+ Create Group". The group is created and appears in the Flow Area highlighted.

To ungroup a group node, click on "Ungroup" in the 'Info' tab. The group is ungrouped and all nodes are available in the Flow Area.

To collapse a group node, click on "Collapse" in the 'Info' tab. The group is collapsed now and displayed as a single group node without showing the associated nodes.

To expand a group node, click on "Expand" in the 'Info' tab. The group is expanded. All nodes are visible in a highlighted group area.

To rename a group node, click on the icon next to the group name, enter the new group name and confirm with the checkmark. The group is renamed and displayed in the Flow Area.

Grouping Limitations

Note that the grouping functionality is not working when:

  • trying to mark nodes that are not in the same line, e.g. source node and associated deployed node but not the transformation node
  • trying to nest groups

You can not search for groups in the catalogue.

Data Grid#

Data Grid Overview#

The Data Preview area enables you to view a data preview, explore the columns within a dataset and metadata as well as several exploration features. You can also share your data via email, to a Google Sheet or download a CSV.

To view a data preview of the actual data and get information about the rows and columns, click on "Preview".

To explore the columns of a dataset, click on "Columns". This view presents the column names (table schema), their data type as well as metadata (optional).

To view the profile information such as column metrics, click on "Profile".

To view and copy the SQL query for views, click on "SQL".

To keep the most important things in view, you can limit the number of columns by selecting the columns. For that, click on the 'Columns' drop-down and select the needed columns. Columns that have been hidden once can be shown again later here.

Column Metrics#

You can investigate column metrics in the 'Profile' tab for any data type. For each column a statistic can be shown.

To perform calculations, switch to the tab "Profile" in the Data Grid. Click on "Calculate" in the required columns. The metric is displayed for the respective column.

To calculate all columns at once, click on "Calculate".

You can also display only the calculated columns. For that, hit the checkbox "See Only Calculated". All calculated metrics are shown.

To enlarge the metrics view, click on the 'Maximize' button. You can go back to the minimized display, when clicking the button again.

As a calculation result, each metric shows:

  • Data type and column name
  • a green loading bar, that visualizes the ratio between available values and missing entries in the column, e.g. you have 500 values but 500 entries are missing, the bar would be half green and half red
  • total number of values and total number of empty entries
  • total number of unique values, if 5,000 values are calculated in the preview, a max. of 5,000 values can be reached. If there are only 5 different values out of 5,000 possible values, the 'Unique' would show '5'
  • values for minimum, average and maximum
  • a visualization, depending on the data type

Clicking the 'Minimize' button provides information for unique and null value amount as well as the minimum, average and maximum value.

To refresh the calculation, click on the 'Refresh' button next to the graphics.

Examples

Datatype: INTEGER

Since this dataset is sampled, the column "0_ORDERKEY" contains 150,000,000 values in total, 0 empty/ null values and 150,000,000 unique values. The minimal value is 1, average is 300,000 and maximum is 600,000,000. Since we have unique values only, no graphic is being presented in this case.

Datatype: DECIMAL

Since this dataset is sampled, the column "0_ORDERPRICE" contains 150,000,000 values in total, 0 empty/ null values and 34,700,489 unique values. The minimal value is 811.7, the average is 151K (rounded) and maximum is 591K (rounded). The graphic shows the amount of values that belong to a specific price. The dashed line meanwhile marks the average.

Datatype: STRING

Since the dataset is sampled, the column "0_ORDERPRIORITY" contains 150,000,000 values in total, 5 unique values and 0 empty/ null values. Whitespace is 0 and indication for the STRING length is minimum 5, 15 for maximum and an average of 8. The graphic shows the 5 unique STRING values and the amount of usage in this dataset.

SQL Syntax#

The SQL syntax is highlighted to identify the single syntax expressions. Find the following highlighting in the SQL Editor:

Inspector#

The Inspector contains all information about the actual dataset/ table/ warehouse/ Project that is selected.

Dataset Inspector#

The Dataset Inspector provides all information about the selected dataset. To get insights, click on the dataset and view the following information:

Info Tab

  • Source: displays a link to the schema and its details, lists in which Projects the dataset is used in
  • Description: displays the dataset description
  • Owner: names the Datameer owner of the dataset
  • Tags: displays tags, if already set
  • Date and time information: displays the initial creation date and time, displays the last modified date and time

Transform Tab

  • Recipe: option to add a transformation/ operation to the recipe
  • Operations list: lists the already applied operations according to the execution time, provides the option to delete an already added operation

Schedule Tab

Daily:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Start at (UTC): sets the scheduling time

Weekly:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Repeat on: clickable weekday buttons
  • Start at (UTC): sets the scheduling time

Custom:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Cron Pattern: field for the custom cron pattern with allowed values below