Importing Files#
This page covers all information about importing files from a data source. Find the general information for creating an import here.
Importing HTML Files
Importing HTML Files#
The maximum size an HTML file can be when importing or uploading, by default, is 4194304 bytes (4mb). This setting can be changed by using the property:
das.import.whole-text-file.max-size=<size in mb>
Set this for a specific job in the configuration settings by adding it to the Custom Properties field.
Set this globally by adding the property to the file<Spectrum path>/das-common.properties
. After this property is added, Spectrum doesn't need to be restarted.
To import HTML files:
-
Click on "+" and select "Import Job" or right-click in the File Browser and select "Create new" and select "Import Job". The 'New Import Job' tab appears in the menu bar.
-
Click "Select Connection". The dialog 'Select Connection' opens.
-
Click on the required connection name and confirm with "Select". The connection is displayed.
-
Select "HTML File Type" from the drop-down and confirm with "Next". The tab 'Data Details' opens.
-
Click on "Browse" and select the required HTML file or folder and confirm with "Select" . For that the dialog 'Remote Data Browser' opens.
-
Review the sample data.
-
Save and run the new import job or file upload.
Importing JSON Files
Importing JSON Files#
To import JSON files:
-
Click on "+" and select "Import Job" or right-click in the File Browser and select "Create new" and select "Import Job". The 'New Import Job' tab appears in the menu bar.
-
Click "Select Connection". The dialog 'Select Connection' opens.
-
Click on the required connection name and confirm with "Select". The connection is displayed.
-
Select "JSON" from the drop-down and confirm with "Next". The tab 'Data Details' opens.
-
Click on "Browse" and select the required HTML file or folder and confirm with "Select" . For that the dialog 'Remote Data Browser' opens.
-
If needed, enter advanced settings:
-
Select the character encoding type.
-
Define how many records Spectrum should read to detect the column type.
-
Define the root path for where parsing should begin.
-
Define the object properties depth for which your JSON file is parsed.
-
Define JSON paths so a field is always created for them.
-
Define JSON paths that don't need to be expanded.
-
Define JSON paths that can be ignored and don't need to be imported.
-
Select if you want an "Additional Fields" column to be added to store extra fields as a JSONArray.
-
Select if you want to include the root JSON array when searching the records during import.
-
Select if you want to ignore the text before and and after the JSON object.
-
Enter columns you want to have obfuscated upon import.
-
View a sample of the data set to confirm this is the JSON data source you want to use. Use the checkboxes to select which fields to import into Spectrum. Click Next.
-
Define the schedule details, select whether to replace or append data, and click Next. (See Configuring Workbook Settings for information on the schedule details. If you append data with a sliding window, define when the window expires and how many results to keep.)
-
Add a description, name the file, click the checkbox to start the import immediately if desired, and click Save. You can also specify notification emails to be sent for error messages received and when a job has successfully run.
Example#
If your JSON data is located within an array, the import to Spectrum can't be read without configuring an advanced settings.
JSON data within an array
[
{
"fruit": "Apple",
"size": "Large",
"color": "Red"
}
]
Without changing the advanced import settings in Spectrum when trying to parse this data, you receive the error: "Can not parse input: Could not read any raw record from: <your file>"
Spectrum supports parsing JSON data from within an root array by selecting the Include root array checkbox.
Importing Key-Value Pairs
Importing Key-Value Pairs#
To import key-value pairs:
-
Click on "+" and select "Import Job" or right-click in the File Browser and select "Create new" and select "Import Job". The 'New Import Job' tab appears in the menu bar.
-
Click "Select Connection". The dialog 'Select Connection' opens.
-
Click on the required connection name and confirm with "Select". The connection is displayed.
-
Select "Key/Value pairs" from the drop-down and confirm with "Next". The tab 'Data Details' opens.
-
Specify the file and folder location.
-
Fill in the Delimiter and Separator fields.
-
View a sample of the data set to confirm this is the key-value pair scheme you want to use. Use the checkboxes to select which fields to import into Spectrum.
-
Define the schedule details, specify whether to replace or append data, and click "Next".
INFO: If you append data with a sliding window, define when the window expires and how many results to keep.
-
Select which groups have view, edit, and run access permissions and specify what access permissions all users have.
-
Add a description, name the file, select the checkbox to start the import immediately if desired, and click "Save". You can also specify notification emails to be sent for error messages received and when a job has successfully run.
Importing XML Files and Using XPath
Importing XML Files and Using XPath#
To import XML files:
-
Click on "+" and select "Import Job" or right-click in the File Browser and select "Create new" and select "Import Job". The 'New Import Job' tab appears in the menu bar.
-
Click "Select Connection". The dialog 'Select Connection' opens.
-
Click on the required connection name and confirm with "Select". The connection is displayed.
-
Select "XML" from the drop-down and confirm with "Next". The tab 'Data Details' opens.
-
Select the XML file to import. In the "XML Log Message" section, enter the XML record tag name for the data you want to import. Don't include the opening and closing brackets (<,>).
In the "Field XPaths" sections, enter path expression(s) to specify what data to import.
INFO: See additional examples of Parsing XML File Format.
INFO: The XML parser ignores namespaces. To parse the namespaced values, the local-name() function must be used.
Example:
local-name([node-set]//text)
-
Review the sample data.
- Save and run the new import job or file upload.
Examples#
INFO: This example imports the employee data from the XML file.
<DataSet1>
<employee number="1">
<emp_id>PAJASI2345</emp_id>
<fname>Mathis</fname>
<minit>J</minit>
<lname>Parkers</lname>
<job_id>24</job_id>
<job_lvl>87</job_lvl>
<pub_id>55869</pub_id>
<hire_date>2012-08-14</hire_date>
</employee>
<employee number="2">
<emp_id>KAIJAF45354</emp_id>
<fname>Jeff</fname>
<minit>L</minit>
<lname>Johnson</lname>
<job_id>25</job_id>
<job_lvl>87</job_lvl>
<pub_id>55369</pub_id>
<hire_date>2012-09-03</hire_date>
</employee>
<employee number="3">
<emp_id>ASDFKI68464</emp_id>
<fname>Tess</fname>
<minit>A</minit>
<lname>Lane</lname>
<job_id>26</job_id>
<job_lvl>87</job_lvl>
<pub_id>55314</pub_id>
<hire_date>2012-10-18</hire_date>
</employee>
</DataSet1>
In the ""XML Record Tag Name" field, enter employee.
In order to import the data, you must use XPath expressions to specify what data to import.
In this example, you have one attribute for the record tag name and eight different child nodes under the parent employee.
Record tag attribute: number
Child nodes: emp_id, fname, minit, lname, job_id, job_lvl, pub_id, and hire_date.
This example needs to use the expression // in front of each node so that it knows to match that node from anywhere within any employee node.
//employee/@number
//emp_id/text()
//fname/text()
//minit/text()
//lname/text()
//job_id/text()
//job_lvl/text()
//pub_id/text()
//hire_date/text()
Click next to review the data after all XPath expressions have been written.
XML Tag and XPath Behavior#
The XML data:
<employees>
<emp number="1">
<firstname>Scott</firstname>
<lastname>Pilgrim</lastname>
<location>
<city>Atlanta</city>
<country>USA</country>
<state>Georgia</state>
</location>
<position>Sales</position>
<comment></comment>
<active />
</emp>
<emp number="2">
<firstname>Kim</firstname>
<lastname>Pine</lastname>
<location>
<city>San Francisco</city>
<country>USA</country>
<state>California</state>
</location>
<position>Developer</position>
<comment></comment>
</emp>
<emp number="3">
<firstname>Ramona</firstname>
<lastname>Flowers</lastname>
<location>
<city>Berlin</city>
<country>Germany</country>
<state></state>
</location>
<position></position>
<comment></comment>
<active></active>
</emp>
<emp number="">
</emp>
</employees>
INFO: Full XPath Expressions: See W3C.org for full documentation on XML Path Language (XPath) expressions.
The following examples show the results returned by different XPath expressions:
/emp/@number
XML tag / tag attribute
<emp number="1"></emp>
<emp number="2"></emp>
<emp number="3"></emp>
<emp number=""></emp>
Result -
INFO The empty attribute is handled as a NULL value.
/emp/position/text()
XML tag / tag attribute
``` xml
<position>Sales</position>
<position>Developer</position>
<position></position>
tag missing
```
Result -
INFO - If the XPath points to the text content of a tag that is empty or missing a tag, the value is NULL.
/emp/position
XML tag / tag attribute
``` xml
<position>Sales</position>
<position>Developer</position>
<position></position>
tag missing
```
Result -
INFO
1. If the XPath points to a tag and the tag is empty, the value is set as (Boolean) true.
2. If the XPath points to a missing tag, the value is NULL.
/emp/comment/text()
XML tag / tag attribute example
```
<comment></comment>
<comment></comment>
<comment></comment>
tag missing
```
Result - Data isn't imported in this example.
Comment - If the XPath points to the text content of a tag which is empty or missing in all target tags, the column doesn't import.
/emp/location
XML tag / tag attribute
``` xml
<location>
<city>Atlanta</city>
<country>USA</country>
<state>Georgia</state>
</location>
<location>
<city>San Francisco</city>
<country>USA</country>
<state>California</state>
</location>
<location>
<city>Berlin</city>
<country>Germany</country>
<state></state>
</location>
```
Result -
INFO - If the XPath points to a tag which contains sub-tags, they are represented as a JSON string.
Importing Fixed-Width Files
Importing Fixed-Width Files#
Fixed-width format is a file with a font whose letters and characters each occupy the same amount of horizontal space.
To create an import job from a fixed width file:
- Click the + (plus) button and select Import Job or right-click in the browser and select Create new Import job.
- Click Select Connections, select the connection and click Select, then select the the Fixed Width file type and click Next. Click New Connection to add a new connection if needed.
- Enter the file or folder location and select whether the first row contains the column headers. You can use wildcard characters. Click Next.
- View a sample of the data set to confirm this is the fixed width data source you want to use. Use the checkboxes to select which fields to import into Spectrum.
- From the Define Fields portion of the wizard, you can adjust your fixed width file to properly align as well as updating the data types for each field and specifying the format for the date field.
- Define the schedule details, select whether to replace or append data, and click Next. See Configuring Workbook Settings for information on the schedule details. If you append data with a sliding window, define when the window expires and how many results to keep.
- Select which groups have view, edit, and run access permissions and specify what access permissions all users have.
Add a description, name the file, click the checkbox to start the import immediately if desired, and click Save. You can also specify notification emails to be sent for error messages received and when a job has successfully run.
Importing Log4j Files
Importing Log4j Files#
Log4j is a popular logging package written in Java. One of its distinctive features is the notion of inheritance in loggers. Using a logger hierarchy, it is possible to control which log statements are output at arbitrary granularity. This helps reduce the volume of logged output and minimize the cost of logging.
To import a Log4j file:
- Open Spectrum and create a new Import Job or File Upload.
- Select your connector where the Log4j log file is stored and then select the Log4j Log File data type.
-
Select the log file to import.
In the Log File Details section, type in the log and time-stamp pattern. The log pattern should be provided using keywords, such as the following: - LOGGER - MESSAGE - TIMESTAMP - NDC - LEVEL - THREAD - CLASS - FILE - LINE - METHOD - PROP(<prop_name>)
-
Review the sample data.
- Save and run the new import job or file upload.
Examples#
Example parsing Spectrum's conductor log
Log pattern example:
LOGGER LEVEL [TIMESTAMP] THREAD (FILE:LINE) - MESSAGE
Timestamp pattern example:
yyyy-MM-dd HH:mm:ss.SSS
Example parsing a hive log#
Log patten example:
TIMESTAMP LEVEL CLASS: MESSAGE
Timestamp pattern example:
yyyy-MM-dd HH:mm:ss,SSS