Linking to Data
A data link lets you feed data into a Workbook without using an import job. The difference between the two job types is when the full data set is loaded. A data link fetches only the preview data for the Workbook view; when you run the Workbook, the full data set is used. When you run an import job, the full data set gets loaded. You can create a data link from an existing connection or by selecting a new connection.
You can edit, rename, create a copy, run, view details and information, view the full data, or delete an existing data link.
Creating a Data Link#
To create a data link:
- Click the "+" button and select "Data Link" or right-click in the File Browser and use the context menu to select "New Data Link".
- Click "Select Connection" to select the connection and or "New Connection" to add a new connection if needed.
-
Specify the "File Type". See the sections that follow for additional details about importing each of the file types.
- Apache log: Specify the file or folder and the log format. See the samples provided in the dialog box for details.
- CSV/TSV files: Specify the delimiter such as"\t" for tab, comma ",", or semicolon ";", specify whether the first row contains the column headers and click Advanced Settings. In Advanced Settings, specify the escape character to "escape" processing that character and just show it, set the quote character, and if enter strict quoting is checked, characters outside the quotes are ignored.
- Fixed width: Specify the file or folder and specify whether the first row contains the column headers.
- Mbox: Specify the file or folder. This is a format used for collections of electronic mail messages.
- Text files: Specify the file or folder, a regex pattern for processing the data, and specify whether the first row contains the column headers.
- Twitter data: Specify the file or folder.
-
Enter file and folder information, schema detection, date and time filters, time partitions (see below), or advanced controls in the Data Details tab and click "Next".
-
View a sample of the data set to confirm this is the data source you want to use, and mark the checkboxes to select which fields to link into Spectrum.
INFO: You can specify how to handle empty fields and invalid data. The 'Empty value placeholders' section gives you the ability to assign specific values as NULL. Values added here are not imported into Spectrum.
-
Click "Next".
- Define the schedule details and click "Next".
-
Enter a description and select how to handle sample data.
INFO: You have the option to 'Generate Sample' data immediately after saving the data link (this saves the sample data to the Spectrum private folder during the data link ingestion) or to 'Defer Sample generation to Workbook', which defers sample generation until using the data link in a Workbook. Deferring sample generation is useful for when the data link is available to multiple users but each user may have different access to the original source of the data link. When deferring sample data, the sample data is created and stored in the Spectrum private folder by the Workbook user rather than the data link creator. This sample data access still follows impersonation security policies (e.g., Sentry/Ranger).
-
Click "Save".
- Name the file, and click "Save".
Scheduling Job Runtime#
You can choose to run jobs manually or at a time you specify.
- In the "Schedule" tab, select "Manually" or "On a schedule".
- For scheduled jobs, specify an interval, day of week, hour and minute for the data link to run.
-
Click "Save" to save your changes.
INFO: The schedule of a data link can also be viewed or edited from the inspector in the File Browser.
INFO: Schedules created with non complex cron patterns are converted automatically in the Inspector. Select "Schedule Type" and "Custom" from the drop down menu to view or edit the schedule cron pattern.
Editing a Data Link#
To edit a data link:
- Click the "File Browser" tab.
- Click the "Data Links" from the navigation box on the left side.
- Right-click data link you want to edit and select "Open".
- Make your changes and click "Next" to move through the wizard.
- Click "Save" when finished.
Copying a Data Link#
To copy a data link:
- Click the "File Browser" tab.
- Click "Data Links" from the navigation box on the left side.
- Right-click the data link you want to copy and select "Duplicate". The copy is created and is named "copy of " and the name of the original data link.
Running a Data Link#
To run a data link:
- Click the "File Browser" tab.
- Click "Data Links" from the navigation box.
-
Right-click the data link you want to run and select "Run".
INFO: Depending on the volume of data, this could take awhile.
Deleting a Data Link#
The delete feature deletes the data link in Spectrum but doesn't delete the actual data.
To delete a data link:
- Click the "File Browser" tab.
- Click "Data Links" from the navigation box.
- Right-click the data link you want to delete and select "Delete".
- Click "OK" and then confirm the deletion.
Setting Permissions#
INFO
Only an administrator can set permissions.
To set permissions for a data link:
- Click the "File Browser" tab.
- Click "Data Links" from the navigation box.
- Right-click the data link you want to set permissions for and select "Information".
- Optionally add one or more groups to this link. Set view, edit, and run permissions for each group.
- Set view, edit, and run permissions for all users.
Viewing Import Job Upload Size and Monthly Upload Sizes#
You can view the count of processed bytes for each upload and their total volume counting towards the license term.
To view the processed bytes per single job execution and totals for that job configuration of the data link:
- Click the "File Browser" tab.
- Click "Data Links" from the navigation box.
- The size of last job run is displayed first and the total for that job configuration is displayed to the right in parentheses.
INFO: If a new license term starts and the data link is processed again, the count starts with a new total processed data amount.