Table of contents

System requirements for Watson Studio Local

Ensure that your servers meet the hardware and software requirements for Watson Studio Local.

You need to provide each server's IP address and the storage partition name for the installation.

Operating system requirements

For detailed operating system requirements, search for "Watson Studio Local" at the Operating systems for a specific product page and Software Product Compatibility page.

Requirement: To install packages on Red Hat Enterprise Linux, you must set up repositories for installing Watson Studio Local and install the required RPM packages.

Docker requirements

Watson Studio Local for RHEL requires a docker. Because the Watson Studio Local installer does not include the docker distribution from RHEL, you must install docker yourself on all of the nodes before installing Watson Studio Local. Complete the following steps on each node in the cluster:
  1. Enable a Red Hat repo for package installation.
  2. Enable the extras repo so docker can be installed, for example, --subscription-manager repos --enable rhel-7-server-extras-rpms.
  3. Allocate a raw disk with at least 200 GB on each node for docker storage.
  4. Run the docker_redhat_install.sh script on each node (extractable from the Watson Studio Local installation package by running it with the --extract-pre-install-scripts parameter) to automatically install the docker from the RHEL repo, and to set up the devicemapper as the storage driver in "direct-lvm" mode with a 25 GB docker base container size.

Hardware and software requirements for a seven-node configuration

This configuration requires a minimum of six servers (either physical or virtual machines) and either one or two optional servers for deployment management on Watson Machine Learning.

Recommendation: Install the operating system with a minimal operating system installation package selection.

You need a sudo username and password to each node (this credential needs sudo root access and is used by the installer to lay down files and configure). The password cannot contain a single quotation ('), double quotation ("), pound sign (#), or white space ( ). After installation, Watson Studio Local will run as root.

Alternatively, you can use a SSH key installation using root's private SSH key that has been copied to each node (for example, using the ssh-copy-id command).

If you are using a power broker to manage access, run pbrun to become root on the node you will install from, and copy this root private ssh key to all other nodes (and use this for installation using the wdp.conf configuration file).

Ensure that each node has an extra disk partition for the installer files. Each storage node requires another additional disk partition. All of these disk partitions must be mounted to paths (the installer will ask for these paths) and formatted with XFS with ftype functionality enabled. Example command to format each partition: mkfs.xfs -f -n ftype=1 -i size=512 -n size=8192 /dev/sdb1

Recommendation:

To improve performance, add the noatime flag to the mount options in /etc/fstab for both the installer and data storage partitions. Example:

/dev/sdb1       /installer              xfs    
defaults,noatime    1 2
As a result, inode access times will not be updated on the filesystems.

Minimum server specifications for a seven-node configuration on Red Hat Enterprise Linux (both x86 and z) and POWER8

Node Type Number of Servers (BM/VM) CPU per node RAM per node Disk partitions IP addresses
Control Plane/Storage 3 8 cores 48 GB Minimum 300 GB with XFS format for installer files partition + minimum 500 GB with XFS format for data storage partition + minimum 200 GB of extra raw disk space for docker.  
Compute 3 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition + minimum 200 GB of extra raw disk space for docker. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.  
Deployment 1 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition + minimum 200 GB of extra raw disk space for docker. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.

Other requirements:

  • The installation requires at least 10 GB on the root partition.
  • If you plan to place /var on its own partition, reserve at least 10 GB for the partition.
  • SPSS Modeler add-on requirement: If you plan to install the SPSS Modeler add-on, add .5 CPU and 8 GB of memory for each stream you plan to create.
  • All servers must be synchronized in time (ideally through NTP or Chrony). Ensure that the system time of all the nodes in cluster are synchronized within one second. On each node, if NTP or Chrony is installed, but the node is not synchronized with one second, the installer will not allow you to proceed. If NTP and Crony are not installed, the installer will warn you. If an NTP or Chrony service is running but not used to sync time, then stop and disable the NTP or Chrony service in all nodes before running install.
  • SSH between nodes should be enabled.
  • YUM should not be already running.
  • Pre-requisites for installing Watson Studio Local with NVIDIA GPU support
Control plane/Storage
Requires a minimum of three servers: one master node to manage the entire cluster and at least two additional nodes for high availability. The Kubernetes cluster requires either a load balancer or one unused IP address as the HA proxy IP address. The IP address must be static, portable, and in the same subnet as the cluster. The data storage path is used by GlusterFS storage management.
Compute
Requires a minimum of three servers: one primary node and at least two extra nodes for high availability and scaling compute resources. During installation, you can add additional nodes for scaling Compute resources, for example, if you expect to run resource-intensive computations or have many processes that run simultaneously.
Deployment
Requires a minimum of one server: one primary node and one optional extra node for high availability. The Deployment nodes are the production versions of the Compute nodes, and thus have identical requirements.

Hardware and software requirements for a four-node configuration

This configuration requires a minimum of three servers (either physical or virtual machines) and either one or two optional servers for deployment management on Watson Machine Learning.

You need a sudo username and password that matches the login password of that user to each node (this credential needs sudo root access and is used by the installer to lay down files and configure). The password cannot contain a single quotation ('), double quotation ("), pound sign (#), or white space ( ). After installation, Watson Studio Local will run as root.

Alternatively, you can use an SSH key installation by using root's private SSH key that are copied to each node (for example, by using the ssh-copy-id command).

If you are using a power broker to manage access, run pbrun to become root on the node that you install from, and copy this root private ssh key to all other nodes (and use this for installation by way of the wdp.conf configuration file).

Ensure that each node has an extra disk partition for the installer files. Each storage node requires another additional disk partition. All of these disk partitions must be mounted to paths (the installer asks for these paths) and formatted with XFS with ftype functions enabled. Example command to format each partition: mkfs.xfs -f -n ftype=1 -i size=512 -n size=8192 /dev/sdb1

Recommendation:

To improve performance, add the noatime flag to the mount options in /etc/fstab for both the installer and data storage partitions. Example:

/dev/sdb1       /installer              xfs    
defaults,noatime    1 2
As a result, inode access times will not be updated on the filesystems.

Minimum server specifications for a four-node configuration on Red Hat Enterprise Linux (x86, POWER and z)

Node Type Number of Servers (BM/VM) CPU per node RAM per node Disk partition IP addresses
Control Plane/Storage/Compute 3 24 cores 64 GB Minimum 300 GB with XFS format for installer files partition + minimum 500 GB with XFS format for data storage partition + minimum 200 GB of extra raw disk space for docker.  
Deployment 1 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition + minimum 200 GB of extra raw disk space for docker. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.

Other requirements:

  • The installation requires at least 10 GB on the root partition.
  • If you plan to place /var on its own partition, reserve at least 10 GB for the partition.
  • SPSS Modeler add-on requirement: If you plan to install the SPSS Modeler add-on, add .5 CPU and 8 GB of memory for each stream you plan to create.
  • All servers must be synchronized in time (ideally through NTP or Chrony). Ensure that the system time of all the nodes in cluster are synchronized within one second. On each node, if NTP or Chrony is installed, but the node is not synchronized with one second, the installer will not allow you to proceed. If NTP and Crony are not installed, the installer will warn you. If an NTP or Chrony service is running but not used to sync time, then stop and disable the NTP or Chrony service in all nodes before running install.
  • SSH between nodes should be enabled.
  • YUM should not be already running.
  • Pre-requisites for installing Watson Studio Local with NVIDIA GPU support

Control Plane, Storage, and Compute are all installed on a single node with at least two extra nodes for high availability. Deployment is installed on a single node with an optional extra node for high availability. The Deployment nodes are the production versions of the Compute nodes, and have identical requirements. The Kubernetes cluster requires either a load balancer or one unused IP address as HA proxy IP address. The IP address must be static, portable, and in the same subnet as the cluster. The data storage path is used for file storage and GlusterFS storage management. You can add extra nodes for scaling Compute resources, for example, if you expect to run resource-intensive computations or have many processes that run simultaneously.

Disk requirements

Ensure that the storage has good disk I/O performance.

Disk latency test: dd if=/dev/zero of=/<path-to-install-path-directory>/testfile bs=512 count=1000 oflag=dsync The value must be better or comparable to: 512000 bytes (512 kB) copied, 1.7917 s, 286 kB/s

Disk throughput test: dd if=/dev/zero of=/<path-to-install-directory/testfile bs=1G count=1 oflag=dsync The value must be better or comparable to: 1073741824 bytes (1.1 GB) copied, 5.14444 s, 209 MB/s

To ensure that your data that is stored within Watson Studio Local is stored securely, you can encrypt your storage partition. If you use Linux Unified Key Setup-on-disk-format (LUKS) for this purpose, then you must enable LUKS and format the partition with XFS before you install Watson Studio Local.

Network requirements

  • Each node needs to have a working DNS and a gateway that is specified within the network configuration regardless of whether this gateway allows outbound network access.
  • A minimum of 1 GB network is required between the nodes.
  • The cluster requires a network that it can use for the overlay network within Kubernetes. The network cannot conflict with other networks that might establish a connection to the cluster. Watson Studio Local configures 9.242.0.0/16 as the default network. Use this default only if it does not conflict with other networks that this cluster is connected to.
  • In the /etc/sysctl.conf file, you must set net.ipv4.ip_forward = 1, and load the variable using the command sysctl -p.
  • From the first master node where the installer will run from, verify that you can actually SSH to every other node either by user ID or SSH key.
  • Verify the DNS you have set up on every node and ensure the DNS that you configured actually accepts DNS lookup requests. Enter a dig or nslookup command against a name on your network, and ensure your DNS correctly responds with an IP address.
  • Ensure the IP addresses being used for the installation match the host name for each node (hostnames and IP addresses need to be unique across the nodes).
  • Verify the machine-id is unique on each node by entering the command: cat /etc/machine-id. If they are not unique, you can generate new IDs with the following command: uuidgen > /etc/machine-id.
  • Ensure that ping is enabled between the nodes.
  • Ensure ICMP is enabled between the nodes, and that you are able to ping each of the nodes.

Proxy IP or load balancer configuration

To provide High Availability for Watson Studio Local, you must use either a proxy IP address or a load balancer.

Option 1: Proxy IP address
Requirements:
  • All of the master nodes must be on the same subnet. The compute and deploy can be on any accessible subnet.
  • A static unused IP address on the network is required that is on the same VLAN and subnet of the master nodes. For high availability purposes, this will be used as a failover source IP where Watson Studio Local will be accessed from. The master nodes will use this IP so that if one of the master nodes fails, the other node will take over this IP and provide fault tolerance. The network administrator must provide the reservation of the IP to be used before you can install Watson Studio Local.
Option 2: Load balancer
For high availabilty purposes, you can use an external load balancer that is configured on your network. The load balancer does not require the nodes to be on the same subnet and VLAN. The load balancer can only be specified for a Watson Studio Local installation using a wdp.conf file.

You can use one or two load balancers for this configuration:

External Traffic Routing
This load balancer must be configured to forward traffic for port 6443 and 443 to all three control nodes (or master nodes) with persistent IP round robin for the cluster to function properly. After Watson Studio Local is installed, you can access it by connecting to the load balancer on this port via SSL or HTTPS.
Internal Traffic Routing
This load balancer must be configured before installing Watson Studio Local to forward internal traffic for port 6443 to all three control nodes (or master nodes). All nodes must have access to the Kubernetes API server for the cluster to communicate to itself.

Firewall restrictions

  • Kubernetes uses IP tables for cluster communication. Because Kubernetes cannot run a server firewall on each node in combination with the IP tables it is using, firewall (for example, firewalld and iptables) must be disabled. If an extra firewall is needed, it is recommended you set up the firewall around the cluster (for example, vyatta firewall), and open up port 443.
  • SELinux must be in either Enforcing or Permissive mode. Use the getenforce command to get the current SELinux mode. If the command shows "Disabled", then edit /etc/selinux/config and change the SELINUX= line to either SELINUX=permissive or SELINUX=enforcing. Then, restart the node for the change to take effect.
  • Watson Studio Local expects to be displayed externally through one port: 443 (https), for which access must be permitted.
  • The Watson Studio Local runtime environment components connect to data sources (for example, relational databases, HDFS, and enterprise LDAP server/port) to support authentication for which access should be permitted.
  • Ensure that no daemon, script, process, or cron job makes any modification to /etc/hosts, IP tables, routing rules, or firewall settings (like enabling or refreshing firewalld or iptables) during or after install.
  • Ensure every node has at least one localhost entry in the /etc/hosts file corresponding to IP 127.0.0.1.
  • If your cluster uses multiple network interfaces (one with public IP addresses and one with private IP addresses), use only the private IP address in the /etc/hosts file with the short hostnames.
  • Ansible requirement: ensure the libselinux-python package is available.
  • Restriction: Watson Studio Local does not support dnsmasq. Check with your network administrator to make sure that dnsmasq is not enabled.

Certificates

Watson Studio Local generates SSL certificates during installation. The certificates are used for inter-cluster communication and must be trusted during first time access by users.

IBM Cloud offering requirements

See IBM Cloud documentation for details on ordering resources and performing installation tasks.

  • Set up a minimum of three virtual machines or bare metal servers, choosing specifications needed for Watson Studio Local. Choose SSD drivers when ordering.
  • Ensure the DNS you have configured on each node is working, and can resolve names or IP addresses on the network you are on.
  • Set up a local load balancer and configure it to redirect the TCP port 6443 to the three master node instances. Choose persistent IP and round robin configuration. For health checks, use whether the port is open or closed.
  • Install Watson Studio Local using the wdp.conf file with virtual_ip_address= commented out and the new line added: load_balancer_ip_address=<IP of the network load balancer>. Use the private IPs for each of the nodes to ensure Watson Studio Local is installed using the private network.
  • After the installation completes, create an external load balancer for HTTPS (443) and point this to the three master nodes. Do not use SSL off loading. Use this external load balancer to connect to Watson Studio Local through HTTPS and TCP port 443.

Additional requirements for Microsoft Azure

  • Red Hat Enterprise Linux operating system only.
  • When ordering the VMs, choose Premium SSD.
  • All three master nodes need to be added to the availablity set.
  • Use an SSD drive for the installation partition. Use a separate raw disk (rather than a raw partition) for the docker.
  • Use either the root user or the root SSH key installation. Sudo user is not supported.

See Microsoft Azure Documentation for details on ordering resources and performing installation tasks.

Additional requirements for Amazon Web Services

Before installing Watson Studio Local, complete the following steps:

  1. Create an HTTPS "Application" Elastic Load Balancer that forwards traffic to port 443 to the three master nodes. This load balancer will be the front-facing URL to the users, so you can choose whatever port to listen on, and the certificate to secure the connection on via AWS's certificate manager.
  2. Create a TCP "Network" Elastic Load Balancer that listens on port 6443, and forwards to port 6443 on the three master nodes. This load balancer will be used by the cluster to communicate with the kubernetes API server.

For version 1.2.1.0 or later: Install Watson Studio Local using the wdp.conf file file with virtual_ip_address= commented out and the new line added: load_balancer_fqdn=<FQDN of the TCP load balancer>. Use either the root user or the root SSH key installation. Sudo user is not supported.

For versions earlier than 1.2.1.0: Install Watson Studio Local using the wdp.conf file with virtual_ip_address= commented out and the new line added: load_balancer_ip_address=<static IP of the TCP load balancer>. Use either the root user or the root SSH key installation. Sudo user is not supported.

Hadoop requirements

See Hortonworks Data Platform (HDP) or Cloudera Distribution for Hadoop (CDH).

Supported web browsers

  • Google Chrome (recommended)
  • Mozilla Firefox