Skip to content

Workbench

Data Browser#

Data Browser Overview#

The Data Browser provides all your Snowflake Schemas and datasets.

You can fade in or fade out the Data Browser by clicking "<<".

The Data Browser provides the following information:

  • an 'Upload File' button to upload data from your device
  • all of your Snowflake Warehouses, ordered alphabetically
  • all of your Snowflake Schemas and datasets
  • a filter option to browse more specifically through your data

The order within the Data Browser is the following:

  • Snowflake Warehouses are sorted alphabetically
  • Snowflake Schemas are sorted alphabetically
  • tables are sorted alphabetically
  • columns are sorted in the order defined in Snowflake's Schema

Browsing Datasets#

In order to find your needed dataset, you can enter your text in the search bar. Filtering is possible for databases, schemas and tables. Column filtering is not possible.

Find your matching entries listed below the search bar.

More Filtering Options#

You can filter the entries by the following criteria:

  • "All Datasets": all available datasets are displayed
  • "Added Datasets": shows added and in-use datasets for the actual Project

Flow Area#

The Flow Area visually represents your whole transformation pipeline, including data sources, transformations as well as deployed data sets.

The toolbar above the Flow Area is largely responsible for operating the Flow Area. The kebab menu includes Project level operations like 'change owner' but also flow related ones like 'Remove Unused Data Sources'.

Data Source Node Options#

Starting from a data source node you can:

  • show the node's details
  • remove the node from the Project (Note: But not simultaneously from your Snowflake instance)

Transformation Node Options#

For a transformation node you can:

  • edit the node
  • show the node's details
  • deploy or re-deploy the node to Snowflake
  • exchange the transformation source
  • delete the node from the Project

Deployed Nodes Options#

For a deployed node you can:

  • show the node's details
  • deploy or re-deploy the node to Snowflake
  • show the Execution History
  • show the Deployment History
  • remove the node from the Project (Note: But not simultaneously from your Snowflake instance)

Scheduled Email Node Options#

For a scheduled email node you can:

  • modify the scheduling
  • pause and resume the scheduling
  • delete the scheduling

Toolbar Options#

The Flow Area toolbar allows you (depending on which node is selected) to:

  • add a new SQL query in the 'New SQL Editor'
  • add a new transformation
  • deploy or re-deploy a node to Snowflake

The kebab menu allows you to:

  • manage the collaborators for a node
  • change the ownership of a node
  • remove unused data source that are not part of your pipeline
  • clone a whole Project
  • delete a whole Project

Flow Area Visibility#

You can customize the display of your flow area by clicking on the different icons:

  • hide unconnected nodes, only used nodes in the pipeline are displayed
  • make the nodes fit to the size of the Flow Area
  • zoom in and zoom out

Data Grid#

Data Grid Overview#

The Data Preview are allows you to preview the selected dataset as well as provides the option to download the datasets as CSV.

To view a data preview of the actual data and get insights like filtering or sorting the values, click on "DATA PREVIEW".

To explore the columns of a dataset, click on "COLUMNS". This view presents the column names (table schema), their data type as well as metadata (optional).

To view the profile information such as column metrics, click on "Profile".

To view and copy the SQL query for views, click on "SQL".

Column Metrics#

You can investigate column metrics in the 'Profile' tab for any data type. For each column a statistic can be shown.

To perform calculations, switch to the tab "Profile" in the Data Grid. Click on "Calculate" in the required columns. The metric is displayed for the respective column.

To calculate all columns at once, click on "Calculate".

You can also display only the calculated columns. For that, hit the checkbox "See Only Calculated". All calculated metrics are shown.

To enlarge the metrics view, click on the 'Maximize' button. You can go back to the minimized display, when clicking the button again.

As a calculation result, each metric shows:

  • Data type and column name
  • a green loading bar, that visualizes the ratio between available values and missing entries in the column, e.g. you have 500 values but 500 entries are missing, the bar would be half green and half red
  • total number of values and total number of empty entries
  • total number of unique values, if 5,000 values are calculated in the preview, a max. of 5,000 values can be reached. If there are only 5 different values out of 5,000 possible values, the 'Unique' would show '5'
  • values for minimum, average and maximum
  • a visualization, depending on the data type

Clicking the 'Minimize' button provides information for unique and null value amount as well as the minimum, average and maximum value.

To refresh the calculation, click on the 'Refresh' button next to the graphics.

Examples

Datatype: INTEGER

Since this dataset is sampled, the column "0_ORDERKEY" contains 150,000,000 values in total, 0 empty/ null values and 150,000,000 unique values. The minimal value is 1, average is 300,000 and maximum is 600,000,000. Since we have unique values only, no graphic is being presented in this case.

Datatype: DECIMAL

Since this dataset is sampled, the column "0_ORDERPRICE" contains 150,000,000 values in total, 0 empty/ null values and 34,700,489 unique values. The minimal value is 811.7, the average is 151K (rounded) and maximum is 591K (rounded). The graphic shows the amount of values that belong to a specific price. The dashed line meanwhile marks the average.

Datatype: STRING

Since the dataset is sampled, the column "0_ORDERPRIORITY" contains 150,000,000 values in total, 5 unique values and 0 empty/ null values. Whitespace is 0 and indication for the STRING length is minimum 5, 15 for maximum and an average of 8. The graphic shows the 5 unique STRING values and the amount of usage in this dataset.

SQL Syntax#

The SQL syntax is highlighted to identify the single syntax expressions. Find the following highlighting in the SQL Editor:

Inspector#

The Inspector contains all information about the actual dataset/ table/ warehouse/ Project that is selected.

Dataset Inspector#

The Dataset Inspector provides all information about the selected dataset. To get insights, click on the dataset and view the following information:

Info Tab

  • Source: displays a link to the schema and its details, lists in which Projects the dataset is used in
  • Description: displays the dataset description
  • Owner: names the Datameer owner of the dataset
  • Tags: displays tags, if already set
  • Date and time information: displays the initial creation date and time, displays the last modified date and time

Transform Tab

  • Recipe: option to add a transformation/ operation to the recipe
  • Operations list: lists the already applied operations according to the execution time, provides the option to delete an already added operation

Schedule Tab

Daily:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Start at (UTC): sets the scheduling time

Weekly:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Repeat on: clickable weekday buttons
  • Start at (UTC): sets the scheduling time

Custom:

  • Enable Scheduler: enables scheduling option for the asset
  • Repeat: provides the scheduling options 'Daily', 'Weekly', 'Custom'
  • Cron Pattern: field for the custom cron pattern with allowed values below