Skip to content

Data Field Types in Spectrum

Data Field Types in Spectrum#

INFO

When importing into Spectrum it is important to know which type of data you are importing or need to import. When using your data within Workbooks, certain Spectrum functions only work with certain types of data. Congruently, certain functions return only a specific type of data.

Data Field Types#

Field type Product icon Description Internal representation
INTEGER 64-Bit integer value Java Long
BIG_INTEGER Unlimited integer value Java BigInteger
FLOAT 64-Bit float value Java Double
BIG_DECIMAL High-precision float value Java BigDecimal
DATE Date object Java Date
STRING String object Java String
BOOLEAN Boolean object Java Boolean
LIST a collection of multiple values of one data type
NUMBER float, big decimal, integer, or big integer
ANY float, big decimal, integer, big integer, date, string, list, or Boolean

Integer#

In mathematics integers (aka whole numbers) are made up of the set of natural numbers including zero (0,1,2,3, ...) along with the negatives of natural numbers (-1,-2,-3, ...). When talking about Integers in computer programming, it is necessary to define a minimum and maximum value. Spectrum uses a 64-bit integer which allows the user to represent whole numbers between -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

Big Integer#

Big integers are like integers, but they are not limited to 64 bits. They are represented using arbitrary-precision arithmetic. Big integers represent only whole numbers. Big integers in Spectrum are treated differently than in Hive because Spectrum allows a larger range of values, so they are written as strings into a Hive table if you export. By default, the precision for big integers is set at 32 upon import. This can updated if needed in the custom properties by changing the value of das.big-decimal.precision=.

Float#

In mathematics, there are real numbers that represent fractions (1/2, 12/68) or numbers with decimal places (12.75, -18.35). Spectrum uses double precision floating-point representation (aka float) to manipulate and represent real numbers. The complete range of numbers that can be represented this way is approximately 2 -1022 through (1+(1-2 -52))x2 1023. During import/upload, Spectrum automatically recognizes a number with either a single period (.) or single comma (,) as a decimal separator and defines this data as a float data type. After ingestion, Spectrum stores float and big decimal values using a period (.) character. The auto schema detection for the float data type works with CSV, JSON, XML, Key/value files.

Big Decimal#

Big decimals are similar to float values. The main advantage of this data field type is that they are exact to the number of decimal places for which they are configured, float values might be inaccurate in certain cases. If a number has more decimal places than big decimal was configured for, then the number is rounded. The number of decimal places can be configured in conf/default.properties:

# Maximum precision used for BIG_DECIMAL types. Precision is equal to the maximum number of digits a BigDecimal
# can have.
system.property.das.big-decimal.precision=32

INFO: 32 digits is the default precision used by Spectrum for big decimal values upon import.

Date#

In Spectrum data in the DATE primitive data type is always represented in a Gregorian, month-day-year (MDY) format (e.g., "Sep 16, 2010 02:56:39 PM"). Spectrum detects if your data should be parsed into the DATE data type during ingest. This can also be done after ingest as other data types can be converted to the DATE primitive data type using Workbook functions.

String#

When using information other than numbers or dates in Spectrum it is represented as a string. This includes text, unparsed date patterns, URLs, JSON arrays, etc.

Boolean#

Boolean data in computing has two values, either true or false. It is used in many logical expressions and is derived from Boolean algebra created by George Boole in the 19th century.

List#

In Spectrum multiple values can be combined into a list. Lists are a series of values of a single data type, which starts counting from zero (0).

Number#

In Spectrum integers, big integers, floats, and big decimals are considered to be numbers.

Any#

Some visualizations and functions are able to use data represented by any data field type. These can be either a number, a string, a date, or a Boolean.

Data Mapping in Avro#

Import Mapping#

When importing data to Spectrum data types are mapped as follows:

Avro Schema Type Spectrum Value Type
Null STRING
Boolean BOOLEAN
Int INTEGER
Long INTEGER
Float FLOAT
Double FLOAT
Bytes STRING
Bytes with logical type Decimal BIG_DECIMAL
String STRING
Records STRING
Enums STRING
Arrays STRING
Maps STRING
Unions STRING
Fixed STRING
Fixed with logical type Decimal BIG_DECIMAL

Export Mapping#

When exporting data to Avro, data types are mapped as follows:

INFO

If a column is marked as "accept empty", a union scheme type is created, e.g. union (null, string) for nullable string type.

Spectrum Value Type Avro Schema Type
STRING String
BOOLEAN Boolean
INTEGER Long
FLOAT Double
DATE without pattern Long
DATE with pattern String
BIG_INTEGER String
BIG_DECIMAL String
LIST<listtype> Arrays<converted list type>

Data Mapping in Parquet#

Export Data Mapping#

When exporting data to Parquet, data types are mapped as follows:

Spectrum Field Type Parquet Field Type
INTEGER INT64
DATE BINARY
BIG_INTEGER BINARY
FLOAT DOUBLE
BIG_DECIMAL BINARY
STRING BINARY UTF8
BOOLEAN BOOLEAN
LIST (integer) INT64
LIST (float) DOUBLE
LIST (string) BINARY
LIST (Boolean) BOOLEAN

Import Data Mapping#

When importing data from Parquet, data types are mapped as follows:

Parquet Field Type Spectrum Field Type
BOOLEAN BOOLEAN
INT32 INTEGER
INT32 DECIMAL BIG_DECIMAL
INT64 INTEGER
INT64 DECIMAL BIG_DECIMAL
INT96 DATE
FLOAT FLOAT
DOUBLE FLOAT
BINARY STRING
BINARY DECIMAL BIG_DECIMAL
FIXED_LEN_BYTE_ARRAY STRING
FIXED_LEN_BYTE_ARRAY DECIMAL BIG_DECIMAL

Parquet files using the INT96 format are interpreted as time stamps. Spectrum accepts these columns, but cuts off the nanoseconds. If the Workbook has Ignore Errors enabled, then those error messages are stored in a separate column and the column with the error is NULL. The chart below provides additional Parquet storage mapping details.

Spectrum Type Parquet Type Description
DATE TIMESTAMP_MILLIS Stored as INT64
INTEGER INT_64 \
FLOAT DOUBLE \
STRING UTF8 BINARY format with UTF-8 encoding
BIGDECIMAL DECIMAL BINARY format
BIGINTEGER DECIMAL BINARY format with precision of 1 and scale of 0
LIST Repeated elements of group of the list element type. Optional group of a repeating group of optional element types. Nested lists are supported.

Data Mapping in External Systems#

Export Data Mapping#

When export data to External Systems, data types are mapped as follows:

Spectrum Field Type Hive Field Type
STRING STRING
FLOAT DOUBLE
INTEGER BIGINT
DATE BIGINT
BIG_INTEGER DECIMAL (38,0)
BIG_DECIMAL DECIMAL with copied precision and scale
LIST [A] ARRAY [A] (INFO: We support recursive types)
BOOLEAN BOOLEAN