Glossary
Here are some terms that have a specific meaning to Spectrum.
A
Term |
Definition |
administrator |
One of the users registered in Spectrum with unrestricted access who is responsible for managing the system. I.e., By configuring the system, monitoring the system, adding more users and assigning users both roles and groups. |
AES |
Advanced Encryption Standard (AES) is a symmetric key encryption cipher. This means that the same key used to encrypt the data is used to decrypt it. |
aggregate functions |
Aggregate functions combine and then operate on all the values in a group. I.e., The function returns one value for each group. |
AMI (Amazon Machine Image) |
A virtual machine that is used in Amazons EC2 or Amazon S3. |
analyst |
One of the users in Spectrum with restricted access who can configure data sources, analyze data, and create infographics and reports. |
API (Application Programming Interface) |
Source code based specifications used to interact with or to add functionality to a program. E.g., In Spectrum include custom functions, parsing scheme for import and export jobs, or custom plug-ins. |
argument |
An argument is a constant, a placeholder, or a data field used as input in a function. |
authentication provider |
The system used to authenticate users in Spectrum. Besides a default internal user management, Spectrum ships with plug-ins to use LDAP/Active Directory. It is also possible to create custom plug-ins for authentication purposes. |
B
Term |
Definition |
big decimal |
One of the primitive data types used in Spectrum. These are also known as high-precision float values. |
big integer |
One of the primitive data types used in Spectrum. These are also known as unlimited integer values. |
blank (blank cell) |
A blank cell can contain either an empty string value, a string with only white spaces, or a null value. |
Boolean |
One of the primitive data types used in Spectrum. Based on Boolean algebra, these are either TRUE or FALSE. |
C
Term |
Definition |
CBC |
CBC= Cipher Block Chaining. In CBC mode, the first block of the plaintext is exclusive-OR'd (XOR'd), which is a binary function or operation that compares two bits and alters the output with a third bit, with an initialization vector prior to the application of the encryption key. |
configuration ID |
A unique ID for each job that does not update if that job is run again. Once a job has been given a configuration ID it always hold that number. |
connections |
Where the data is stored such as a database, a file such as an S3 Amazon Web Services connection, or a Hive. |
constant |
Static values, e.g., a fixed number or string, used as function arguments, not to be confused with placeholders. |
D
Term |
Definition |
DAS/ das.* |
The abbreviation stands for Spectrum Analytics Solution. 'Das' is often used as a part of properties. |
data ID |
A new ID is created each time a job runs which produces new data. |
data link |
A data link lets you feed data into a Workbook without using an import job. Data links are not imported into HDFS, but are streamed into Spectrum on demand. |
data set |
A collection of data which is either in a tabular of non-tabular form. Data can be structured, semi-structured, or unstructured. In Spectrum data sets are the source of data, e.g. databases, server error logs, or Twitter feeds. |
date |
One of the data types used in Spectrum. These are dates in a form recognized by Spectrum rather than recognized as strings. |
E
Term |
Definition |
EC2 (Amazon Elastic Compute Cloud) |
Amazon Elastic Compute Cloud is a scalable web service offered by Amazon Web Services for computing data remotely. |
EMR (Amazon Elastic MapReduce) |
Amazon Elastic MapReduce is a hosted Hadoop framework running either on Amazon EC2 or Amazon S3 |
empty (empty string) |
An empty string is a string data type value with the length of zero. A cell with an empty string appears blank. |
export job |
This is a job which exports the results of a Workbook to an external resource, e.g., a file or a database, that can be used independently of Spectrum. Adaptors for several remote systems are included out of the box, and others can be added with plug-ins. |
expression |
A complete formula including defined functions and required arguments. An expression can contain multiple (nested) formulas. |
F
Term |
Definition |
field definition |
Field parameters including data field type, name, and acceptance of null values for a given data set. |
fixed-width |
A font whose letters and characters each occupy the same amount of horizontal space. |
float |
One of the primitive data types used in Spectrum. These are 64-bit float values (also called doubles). |
formula |
A formula is created by a data analyst and is similar to macros in other programs. It consists of a function and its required arguments. |
formula builder |
The graphical user interface to create expressions and formulas by selecting functions. |
G
Term |
Definition |
Google BigQuery |
BigQuery is Google's fully managed data warehouse for petabyte analytics. |
group series functions |
Group series functions operate row-wise within a group. I.e., The function is applied to every row and therefore returns a value for every argument in the group. |
Google Cloud Storage |
Google Cloud Storage is a RESTful onaline file storage web service for storing and accessing data on Google Cloud Platform infrastructure. |
H
Term |
Definition |
HDFS (Hadoop Distributed File System ) |
This is the primary storage system used by Hadoop applications. It is used either in a cluster or as a stand-alone distributed file system. |
Hive |
Apache Hive is an open source data warehouse system for querying and analyzing large data sets stored in Hadoop. |
I
Term |
Definition |
Infographics |
Infographics is a visualization tool that consolidates, aggregates, and arranges measurements and metrics (measurements compared to a goal) in the form of charts, graphs, reports, and sometimes scorecards on a single screen. |
integer |
One of the primitive data types used in Spectrum. These are 64-bit integer values (also called longs). |
import job |
Imports data sets into Spectrum. Many adapters for various connections are available straight out of the box. |
J
Term |
Definition |
Jaccard Distance |
Measures dissimilarity between sample sets. Complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union. |
JDBC (Java Database Connectivity ) |
This is a Java-specific API defining how a database may be accessed. |
JDK (Java Development Kit ) |
This is a collection of programming tools which can be used to design products with the Java programming language. |
job ID |
A new ID is created each time a job runs whether it produces new data or not. |
JSON (Javascript Object Notation ) |
A format for transmitting data from a server to a web application through a network using a pre-defined schema, while at the same time being easy to read. |
JSON Map |
A data structure that uses a hash function to map identified keys to corresponding values. (See below JSON Object) |
JSON Object |
An unordered collection of key:value pairs with the ':' character separating the key and the value, comma-separated and enclosed in curly braces; the keys must be strings and should be distinct from each other. |
job (Datameer ) |
This is a general word referring to the configuration and executions needed to complete analyses in Spectrum e.g., import jobs, export jobs or Workbook jobs. In Spectrum every job configuration is numbered consecutively and independently of job executions. Spectrum job executions usually correspond to one or more MapReduce jobs. |
job configuration |
The settings necessary to execute a job in Spectrum. Job configurations include e.g., file path, character encoding and schedule details for an import or export job and sheet names, formulas and connections for a Workbook. Every job configuration is numbered consecutively with a unique identifying number, independently of the corresponding job executions. |
job execution |
These are the individual operations performed in Spectrum according to a job configuration. Every job execution is numbered consecutively with a unique identifying number, independently of the corresponding job configurations. |
join |
The strategy used when combining two data sets, based on a given key. |
K
Term |
Definition |
Kerberos |
An authentication protocol that provides mutual authentication and single sign-on capabilities. |
L
Term |
Definition |
list |
In Spectrum multiple values can be combined into a list. Lists are a series of values of a single data type. |
M
Term |
Definition |
MapReduce |
MapReduce is a framework for processing data over a distributed file system. A 'map' step first splits the task into sub-tasks, and the 'reduce' step combines the results of the 'map' tasks into one result. |
My Spectrum |
My Spectrum is a web portal to login and manage your Spectrum account. Here you can renew a subscription, manage data limits, download updates, submit feature requests, submit support tickets, and more. |
N
Term |
Definition |
NMU |
NMU is the abbreviation for Native Multi User, which is a Spectrum grid mode |
null values (<null>) |
Null values (sometimes represented as ω) show that there is not any information attached to a specific record, or that specified information is not found within a specified connection. A cell with a null value appears blank. |
O
Term |
Definition |
OLAP (Online Analytical Processing ) |
A category of database software providing an interface which users can use to quickly and interactively examine their data and results of processes in various dimensions. |
operator |
These are special symbols which are used similarly to functions. |
P
Term |
Definition |
page |
As Spectrum is an analytics tool with a web interface, pages are information resources that can be seen using a web browser. In Spectrum all components are embedded in pages, e.g., a Workbook, data link configuration, or administrator controls. |
partitioning |
Partitioning segments of similar data into individually stored, often hierarchical parts. Typically, these represent periods of time, e.g., months, days or hours. The division of data is typically done for ease of management and performance reasons. |
permissions |
These describe if a user is allowed to read, edit or execute a given page or content, e.g., a data set, infographics, a data link, or a Workbook. |
placeholder |
A placeholder is symbol that is replaced by a dynamically changing value, e.g., %day% for the current day or %user% for the current user. Placeholders are also known as wildcards or free variables |
plug-in |
Extensions to Spectrum functionality, e.g., custom import/export adapters, custom functions, or custom infographic widgets. |
plug-in SDK |
An SDK shipped with Spectrum to create custom plug-ins. |
precision |
The total number of significant digits which can be included in a big decimal number. |
Q
R
Term |
Definition |
record |
A data entity corresponding to a row in a table of a specified data set, containing multiple data fields represented as one of the pre-defined data field types available in Spectrum. |
regular expression |
A sequence of characters that can be used to specify and recognize desired strings in a flexible and concise way. |
REST-API (Representational State Transfer - Application Programming Interface) |
REST-style architecture consists of clients and servers where clients initiate requests to servers, and servers process those requests and return appropriate responses. |
S
Term |
Definition |
S3 (Amazon Simple Storage Service) |
This is a scalable web storage service offered by Amazon Web Services used to store data remotely. |
scale |
The number of significant digits behind the decimal point in a big decimal number. |
SDK (Software Development Kit) |
A collection of development tools for creating applications for a software package. |
security |
A broad topic best described as information security, including the use of Spectrum-specific credentials or LDAP/Active Directory when connecting to Spectrum or using secure impersonation when connecting Spectrum to a database. Another tool used for implementing security is setting permissions for individual pages. |
semi-structured data |
A form of structured data that doesn't conform with the formal tables or data models of relational databases. |
sheet |
A page or tab in a Workbook. In spectrum there are different types of sheets, e.g data sheet, formula sheet, join sheet, union sheet. |
snowflake schema |
A set of tables comprised of a single central fact table surrounded by normalized dimensional hierarchies. |
Spark |
Apache Spark is a unified analytics engine for large-scale data processing. |
string |
One of the primitive data types used in Spectrum. All data that is not a Boolean value, a big decimal, a big integer, a date, a float value or an integer is considered a string. Strings can contain any type of (unix) character and are used to represent, text, URLs, and date patterns. |
star schema |
A star schema is a set of tables comprised of a single, central fact table surrounded by de-normalized dimensions. |
T
U
Term |
Definition |
unstructured data |
Any document, file, image, report, form, etc. that has no defined, standard structure that would enable convenient storage in automated processing devices. |
user group |
The group that a user is assigned to, e.g., sales department or research and development. |
user role |
The role a user is assigned to, e.g., administrator or analyst. |
V
W
Term |
Definition |
widget |
An infographic tool to present data. Examples include graphs, pie charts, and maps. |
Workbook |
The spreadsheet-like view used for analyses of data. |
X
Y
Z