General Product Description#
Find here the general product description for Spectrum.
Spectrum Application at a Glance#
Spectrum is an agile data fabric for building end-to-end, self-service, code-free data pipelines. Between ingest and landing data at the destination, Spectrum offers advanced data curation and exploration capabilities in a spreadsheet-style interface. Furthermore, data pipelines can be scheduled and executed at scale.
Enterprise security and governance integrations, strong data management, and obfuscation capabilities, as well as automation/orchestration APIs, meet IT needs and allow large deployments in a controlled environment in sync with regulatory requirements. Spectrum can run cloud-native on Amazon AWS and Google Cloud Platform. Wherever it is deployed, it ships with deep integrations into the platform. Hybrid deployments can perfectly bridge both worlds in a secured fashion.
Security#
Spectrum integrates very well with the existing enterprise IT infrastructure. It is possible to connect Spectrum with the existing shared user repositories like LDAP, Active Directory or Okta. If Spectrum is configured against an Amazon EMR cluster EMR's IAM and KMS can be leveraged. Same is true for Google DataProc"
Spectrum can obfuscate and encrypt the data while ingestion. Furthermore Spectrum supports encrypting the data either on transit and at rest.
Monitoring & Administration#
Spectrum's EventBus can be leveraged to send events to third party systems. Audit Logs contain information about the user behavior.
Spectrum can also be configured to send notifications for specific events. If system administrators or users want to be notified by specific events, Spectrum provides an EventBus SDK API.
Processing & Storage#
Spectrum is able to execute data transformation pipelines against Amazon EMR and Google DataProc. The results of a data pipeline can be stored in various distributed file systems like Amazon S3 or Google Cloud Storage as well as HDFS or a compatible DFS.
Web/ Mobile Usage#
Spectrum is a web application that can be used by recent versions of Google Chrome (version 84+), Mozilla Firefox (version 80+), Microsoft Edge (version 84+), Apple Safari (version 14+).
Analytics#
Spectrum provides a spreadsheet-like interface which enables you to do all types of analytics to your data pipeline.
Just to name a few examples for data transformations, it is possible to:
- sort or filter data
- aggregate and expand data
- join, union or pivoting data
- de-duplicate data
- data science operations like one-hot encoding or bined encoding
Spectrum provides more than 300 Workbook functions, besides a powerful Functions SDK API. The application also provides a point-and-click interface to parse your JSON data. Spectrum provides information about your data profile & metrics like number of unique records, minimum, mean and maximum value (if the data type has such an operator semantics), ...
Data Sources & Sinks#
Spectrum is able to ingest data from and export data to a wide range of various third party systems and therefore provides reading/ writing connectors for:
- cloud native data warehouses, e.g. BigQuery, Snowflake, Redshift, ...
- cloud data lakes, e.g. Google Cloud Storage, Amazon S3, Azure Data Lake Storage, ...
- on-premises systems, e.g. SFTP, Hive, ...
- JDBC-based systems, e.g. Exasol, Netezza or Teradata, ...
- files, e.g. CSV, JSON, Parquet, ORC or Avro, ...
- web services, e.g. Salesforce, Marketo, ...
If you miss any type of datasource or data sink - Spectrum provides a powerful pluggable Connector SDK API.
Data Integration & Management#
Once the data pipeline design is completed, Spectrum supports the user to schedule single artifacts or entire data pipelines either in a time-based or data-event driven fashion.
Spectrum provides the retention policy modes "Append", "Replace" or "Sliding Time Window". Each artifact in Spectrum has a JSON representation and can be versioned. Spectrum has a Git repository integration which supports you with the management of your different artifact versions.
A Spectrum Workbook can be configured to run in production mode, which will compute only the really required data-transformations. This optimizes the Workbook by reducing computing and storing resources.
Spectrum's Open Data Format allows end users to expose the Workbook's data into Hive and Google BigQuery.
Spectrum furthermore comes with strong metadata management features like tags & search, full lineage analysis, tracking different metrics of a Spectrum job as well as descriptions on different artifact, sheet and column levels. Once changes on Spectrum artifacts are done, Spectrum informs you about potential impact on dependening artifacts like downstream Workbooks.
REST Interface#
It is possible to automate tasks by using Spectrum's REST API. A user can start, stop or monitor executed jobs. You can also use the REST API to create or update your artifacts or data pipelines.