System Requirements

Datameer Application Server

Recommended hardware for the production environment with a database on the same server as Datameer

Minimum:

  • 1U Server
  • 2 Quad Core CPUs
  • 16 + GB RAM
  • 2 x 1 TB Hard Drives (Recommended available disk space: 250 GB)
  • RAID - 0 striping
  • RAID - 1 mirroring
  • Dual 1GbE network
  • Redundant power
  • Failover requires a standby server with the same configuration

Recommended:

  • 1U Server
  • 2 Octa Core CPUs
  • 16 + GB RAM
  • 2 x 1 TB Hard Drives (Recommended available disk space: 250 GB)
  • RAID - 0 striping
  • RAID - 1 mirroring
  • Dual 10GbE network
  • Redundant power
  • Failover requires a standby server with the same configuration

Required Software:

  • Unix-based operating system (see Supported Operating Systems for more information)

  • Oracle Java 1.8
    or
    OpenJDK 8 (As of Datameer 7.2)

    Hadoop distribution IBM BigInsights recommends to run IBM JDK 1.8.

    All Data Nodes must be running the same distribution and version of the JDK.

    For example: If OpenJDK is used for the Datameer host, all Data Nodes should also be using the same version of the OpenJDK.

    or 
    Amazon Corretto Open JDK (As of Datameer 7.4)

  • Installed software: SSH, VI, MySQL 5.5, 5.6, 5.7 (server and client executables must be available via shell search path)
    As of Datameer 7.4: MariaDB is supported as an alternative to MySQL.

    MySQL Database

    Datameer strongly recommends using MySQL databases instead of HSQL. Datameer service depends on the MySQL database, and it is used for constant writes for workbooks, permission changes, job execution, and scheduling, among other things. For proper function, a response time should be between ten and twenty milliseconds.

  • Optional: SMTP server (for email notification)

Datameer Database Server

The Datameer database should be hosted on the same machine as the Datameer application server. It can be located on a hosted database only if the response time for a full write to the database is less than 20 milliseconds. If during database maintenance the database response can't be guaranteed, you need to gracefully shut down the Datameer service before maintenance. 

Hosted databases must use MySQL 5.5 or higher. The recommended size is a minimum of 5 GB. 
As of Datameer 7.4: MariaDB is supported as an alternative to MySQL.

Hadoop cluster

  • One of the supported Hadoop Distributions installed (Datameer installs Hadoop if it isn't already available)
  • Gigabit switches (10 GigE interconnected)
  • Sufficient power supply & cooling
     

The Hadoop distribution being used should be available to provide the Hadoop cluster design and sizing recommendations.

 Helpful links:

Hadoop master (NameNode and JobTracker)

Hardware:

  • 1U Server
  • 2 Quad Core CPUs
  • 32 + GB RAM
  • 2x1 TB hard drives
  • RAID - 0 striping
  • RAID - 1 mirroring
  • Dual 1GbE network
  • Redundant power
  • Failover requires a standby server with the same configuration

Software:

  • Unix-based system (e.g., Ubuntu Linux 10.04)
  • Java 1.8 (Oracle recommended)
  • Set JAVA_HOME to the root of your Java installation
  • Installed software: VI, SSH, SSHD, rsync, and SCP
     

2 Hadoop master nodes are required for HA (high availability) testing.

Hadoop slave (data node)

Datameer recommends a minimum of 3 slave/data nodes in addition to the Hadoop master. 

A major feature of Hadoop is data redundancy which offers multiple benefits including availability, fast run times, and easy scalability.

As data can be stored multiple times on your hard drives, be aware of your storage sizing needs.

Hardware:

  • 1U Server
  • 2 Quad Core CPUs
  • 16 GB RAM (2 GB per Core)
  • 4x1 TB SAS JBOD
  • 1GbE network

Software:

  • Unix-based system (e.g., Ubuntu Linux 10.04)
  • Java 1.8 (Oracle recommended)
    Set JAVA_HOME to the root of your Java installation
  • Installed software: VI, SSH, SSHD, rsync, and SCP

AWS (Elastic MapReduce) deployments:

  • EC2 access key
  • EC2 secret access key
  • EC2 private key name
  • EC2 private key file
  • One empty S3 bucket