Other Exports#
INFO
Find here all information about other exports.
Downloading Data Without an Export Job#
You can download data from a import job, file upload, or Workbook into .csv or .tdsx (Tableau) files without setting up an export job.
Downloading Data From an Import Job or File Upload#
- Select the import job or file upload you want to download.
- Right-click on the file name and select "Browse data". The full data of the import job or file upload is displayed.
- Click "Download".
-
Select the file type for download.
Choose between:
-
.csv - Comma separated values. This format also has the ability to be compressed as a .zip file. If needed, define the date pattern. The date pattern is currently defaulted to 'yyyy-MM-dd hh:mm:ss a' which is readable in Excel
-
.tdsx - Tableau file format
-
-
Click "Download".
Downloading Data From a Workbook#
- Select the Workbook which contains the worksheet you want to download.
- Make sure you have run the Workbook first, then click "View Full Data".
- From the "Full Results" page, select the sheet that you want to download.
- Click "Download".
-
Select the file type for download.
Choose between:
-
.tdsx - Tableau file format.
-
.csv - Comma separated values. This format also has the ability to be compressed as a .zip file. If needed, define the date pattern. The date pattern is currently defaulted to 'yyyy-MM-dd hh:mm:ss a' which is readable in Excel
-
-
Click "Download". Depending on the download behavior of your browser, a download dialog opens or the file starts downloading immediately into the download folder.
INFO: The default limit for number of rows to download is 100,000, as this functionality is intended for small aggregated data sets. To adjust the record download limit, change the rest.download-data.records-max=100000 property in the <datameer_install>/conf/default.properties file. Setting the value to 0 unlocks the limit, but increasing the number could result in slower processing for the Spectrum conductor. Your local system might also have download restrictions. If you want to download more than 100,000 rows, we suggest creating an export job instead.
Exporting to Hive Server2#
INFO
The export behavior and validation for partitioned and unpartitioned tables are the same. Incorrect mapping will result in a failed export, and invalid records will be dropped from the export results. However, for unpartitioned tables with mode RECREATE, the original table definition is used to recreate the table and only columns with name and type are replaced by export sheet definition.
Configuring Export Jobs with a HiveServer2 Connection#
To export a worksheet through a HiveServer2 connector:
- Click the "+" button and choose "Export Job" or right-click in the browser and select "Create New" → "Export Job". The 'New Export Job' tab appears in the menu bar.
- Select the Workbook to export from in the browser by clicking on "Select". The dialog 'Select Workbook' opens.
- Select the Workbook and confirm with "Select". The name of the Workbook appears in the Workbook field.
- Click "Next".
-
Select the worksheet to export from out of the drop-down menu. The 'name of the sheet is displayed in the 'Select Sheet' field. A preview is displayed.
INFO: Only kept sheets are available for export. If the sheet you want to export is missing, modify the Workbook settings to keep the sheet.
-
Click "Next".
-
Select the HiveServer2 connector by clicking on "Select Connection". The dialog 'Select Connection' opens.
INFO: If there is already a valid HiveServer2 connection, you will find it unter 'connection'.
INFO: If there is no valid connection to HiveServer2 yet, you can create it here by clicking on "New Connection".
-
Select the required HiveServer2 connection and confirm with "Select". The selected connection is displayed in the 'Connection' field.
- Click "Next". *The 'Data Details' section opens.
-
Select the database in which to export.
INFO: To export to a new table on the database, select "Create new if not exist" and enter a new table name under "Name" and select the output format from the drop-down "Table Output Format".
INFO: To export to an existing table on the Hive server, select "Use existing". Selecting the file format will be ignored for existing tables.
-
Select a record replacement policy.
INFO: 'Drop and create new table' - The existing table is dropped from the database. A new table is created with the schema defined from the worksheet being exported. If the initial table was an external one, Spectrum will drop it and create a new external table. This replacement policy is only available for unpartitioned tables and is written as a TEXTFILE (plain text).
INFO: 'Overwrite records' - All records from the existing table are deleted and the new records are written to the table. No table properties are modified and the data is written into the original file type. (e.g. 'ORC')
INFO: 'Append records' - The records append to the existing table.
-
Confirm with "Next". The section 'Mapping' opens.
-
Check the preview and decide which columns will be included and whether invalid data is being accepted.
INFO: All checkboxes are activated per default.
-
Decide how to handle invalid data and confirm with "Next". The section 'Schedule' opens.
- Select the schedule method for when to run the export job.
- Confirm with "Next".
- If needed, add an export job description and mark the check box if you want to have the export job processed right after saving.
-
If needed, add your email address in section 'Notification' and confirm with "Save". The dialog 'Save Export Job' opens.
INFO: Spectrum can only send email notifications if an email server has been configured.
-
Select the place to save the export job, name it and confirm with "Save". Exporting data is finished.
Implicit Datatype Mapping#
INFO
The mapping table below describes how Spectrum creates table schema when a new one must be created and how a Spectrum value is validated and transformed into a Hive value type. See Hive documentation Allowed Implicit Conversions for further details.
For exports into an existing table, type mapping validation is performed to make sure records match the expected table column structure.
Spectrum Value Type | Hive Equivalent | Boolean | TinyInt | SmallInt | Int | BigInt | Float | Double | Decimal | String | Varchar | Timestamp | Date |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STRING | string | x | x | x (♦) | x | ||||||||
BOOLEAN | boolean | x (♦) | |||||||||||
FLOAT | double | x | x (♦) | x | x | x | |||||||
INTEGER | bigint | x | x | x | x (♦) | x | x | x | x | x | |||
DATE | timestamp | x (1) | x | x (♦) | x | ||||||||
BIG_DECIMAL | decimal | x (♦) (2) | x | x | |||||||||
BIG_INTEGER | - | x | x (3) | x (♦) | x | ||||||||
List |
array |
(4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) |
(♦) The default transformation for a newly created table - describes which Hive type Spectrum uses when creating a new table.
(1) Transformation of Date to String by given date pattern, otherwise the default Hive date pattern (yyyy-MM-dd HH:mm:ss) will be used to format.
(2) Spectrum BigDecimal has an unlimited precision/scale; this conversion is limited to Hive's decimal with maximum precision of 38 and scale of 38.
(3) Spectrums BigInteger has an unlimited precision, this conversion is limited to hive's decimal with max precision 38 and scale 0
(4) Conversion of Spectrum List's to Hive array uses the same logic to convert any underlying Spectrum value type.
Exporting to a Tableau Server#
INFO
Spectrum supports Tableau version 9 and 10 in the formats HYPER, TDE and TDSX. Older versions might not work as expected and are not supported.
Using the HYPER format requires at minimum operating system CentOS 7 with libc6 version >2.15/ libstdc++6 version >6.1.0. Local execution can be used but is not scaleable.
Requirement: Configuring Tableau Server as a Connection#
A Tableau Server connection with Spectrum must be created before exporting data.
Configuring Export Jobs with Tableau Server#
INFO
80 (HTTP) and/or 443 (HTTPS) need to be open to export data from Spectrum to a Tableau server.
To export a worksheet to a Tableau connector:
- Click the "+" button and choose "Export Job" or right-click in the browser and select "Create New" → "Export Job". The 'New Export Job' tab appears in the menu bar.
- Select the Workbook to export from in the browser by clicking on "Select". The dialog 'Select Workbook' opens.
- Select the Workbook and confirm with "Select". The name of the Workbook appears in the Workbook field.
- Click "Next".
-
Select the worksheet to export from out of the drop-down menu. The 'name of the sheet is displayed in the 'Select Sheet' field.
INFO: Only kept sheets are available for export. If the sheet you want to export is missing, modify the Workbook settings to keep the sheet.
-
Click "Next".
-
Select the Tableau connector by clicking on "Select Connection". The dialog 'Select Connection' opens.
INFO: If there is already a valid Tableau connection, you'll find it under 'connection'.
INFO: If there is no valid connection to Tableau Server yet, you can create it here by clicking on "New Connection".
-
Select the required Tableau Server connection and confirm with "Select". The selected connection is displayed in the 'Connection' field.
- Click "Next". The 'Data Details' section opens.
- Insert the name where the Tableau data source will be published under in the 'Data source name' field.
- Select the publish options.
-
Select the file format 'HYPER' from the drop-down for exporting data.
INFO: If you don't fulfil the requirements for using the Tableau Hyper file format, select 'TDE'.
INFO: Downloading data without an export job is only possible for the TDSX format.
-
Select the settings for the owner.
-
Enter the upload chunk size.
INFO: You can tune this value in case of network troubleshooting.
-
Click "Next". The section 'Mapping' opens.
-
Check the preview and decide which columns will be included and whether invalid data is being accepted.
INFO: All checkboxes are activated per default.
-
Decide how to handle invalid data.
-
If needed, decide of time-based partitions and click "Next". The section 'Schedule' is shown.
INFO: This chapter only appears, when having partitioned data
-
Select the schedule method for when to run the export job.
- If needed, add an advanced information, e.g. custom properties and confirm with "Next". The section 'Save' is shown.
- Add an export job description.
- Mark the check box if you want to have the export job processed right after saving.
-
If needed, add your email address in section 'Notification' and confirm with "Save". The dialog 'Save Export Job' opens.
INFO: Spectrum can only send email notifications if an email server has been configured.
-
Select the place to save the export job, name it and confirm with "Save". The export job is displayed in the export job. Exporting to a Tableau Server is finished.
Export-Only Workbooks#
In order to support zero-copy data pipelines, Workbook results can be configured for export only. Using data links as input for the Workbook is required for a true zero-copy pipeline. Import jobs can be used for input as well, but only the Workbook results are removed after the export job executes.
The following assumes an import job/data link has been created and the Workbook is about to be saved.
Workbook Configuration Details#
While saving the Workbook, select "Export only" under the "Workbook Settings" tab and "Data Retention" section.
Only the saved sheets are exportable. In the above example, the "finalResults" sheet is exportable.
Export Job Configuration Details#
The following assumes you have read through Exporting Data and understand the basics of configuring an export job. Most of the configuration for the export job is the same except you can't choose a schedule for the export job to run. With export only Workbook, the export job is always triggered by the Workbook executing successfully.
Miscellaneous Details#
If the Workbook is configured to be export only, but no export jobs are associated with it, the generated data is kept for one day. If during that period an export job is created and then triggered manually, the Workbook's data will be deleted after the export job runs. This allows you to test results of a Workbook or to configure an existing Workbook to be export only without having to manually delete the Workbook's data.
Exporting to Power BI#
The Power BI plug-in allows users to export Workbook in Spectrum by creating a connection to Power BI's cloud offering in Azure. Users with the Power BI desktop version for Windows should instead use an integration link.
Configuring Export Jobs with a Power BI Connection#
To export a worksheet to a Power BI connector:
- Select the "+" at the top left of the "File Browser" and select "Export Job", right click on a folder, select "Create New" then select "Export Job", or right click on the Workbook to export from and select "New Export Job".
- Select the Workbook to export from in the browser.
- Select the worksheet to export from the drop down menu.
- Select the Power BI connector.
-
Check the Data Details for the Dataset and Table names.
Select a retention policy:
-
"None" takes up to the first 5 million rows of data.
-
"FIFO" (First in first out) collects data in the table until it hits 200,000 rows. Once the data goes beyond 200,000 rows, the oldest rows continue to drop from the dataset.
-
-
Preview the columns of the Workbook to be included and select how to handle invalid data.
-
Select when to run the export job. This can be done manually by a user, run each time the Workbook in Spectrum is calculated, or at specifically scheduled times.
INFO: Custom properties can be added in the advanced section as well as logging specifications.
-
Enter a description and click "Save".
Power BI Export Job Details#
- Float type columns are not supported and are converted to the string data type.
- Power BI limitations cause all datasets to have one table.
-
The date/time of the latest save to a Power BI export is not saved.
Example:
A user uploads a dataset/table with 1000 records on 2:21:58 pm. At a later time, a user uploads 100 records to the same dataset/table. The previous data is overwritten and only the 100 records are saved. The date/time stays at 2:21:58 pm.
-
If the schema for the worksheet in Spectrum changes, the export job wizard for Power BI needs to be run again.
Example:
A user adds an additional column to a sheet in a Workbook. The Spectrum export job wizard needs to be configured and run for new schema to be exported.
-
If a sheet name changes in the Workbook, the export job wizard for Power BI needs to be run again.
Example:
A user changes a sheet name in a Workbook. The Spectrum export job wizard needs to be configured so they correct dataset is exported.
-
The Record Replacement Policy only supports overwriting and does not yet have an append option.
Power BI REST API Limitations#
The Power BI REST API has the following limitations: https://msdn.microsoft.com/en-us/library/dn950053.aspx
Spectrum does not control for these limitations and or give error messages when Power BI limits are reached.