Table of contents

System requirements for Watson Studio Local

Ensure that your servers meet the hardware and software requirements for Watson Studio Local.

You need to provide each server's IP address and the storage partition name for the installation.

Operating system requirements

For detailed operating system requirements, search for Watson Studio Local at the Operating systems for a specific product page and Software Product Compatibility page.

Requirement: To install packages on Red Hat Enterprise Linux, you must set up repositories for installing Watson Studio Local and install the required RPM packages.

Docker requirements

Watson Studio Local for RHEL requires Docker. Because the Watson Studio Local installer does not include the docker distribution from RHEL, you must install docker yourself on all of the nodes before installing Watson Studio Local. Complete the following steps on each node in the cluster:
  1. Enable a Red Hat repo for package installation.
  2. Enable the extras repo so docker can be installed, for example, --subscription-manager repos --enable rhel-7-server-extras-rpms.
  3. If you are installing Docker with the devicemapper storage device, allocate a raw disk with at least 200 GB on each node for docker storage.
    Recommendation: Use the overlay2 storage device for RHEL and CentOS operating systems with kernel versions higher than or equal to 3.10.0-514. For lower kernel versions, use the devicemapper storage device.
  4. If you installing Docker prior to a silent installation of Watson Studio Local (available in Version 1.2.3.1 of POWER only), set the DOCKER_SILENT_INSTALL=1 environment variable on every node. For example, to run docker_redhat_install.sh in silent mode from master-1 after copying it to master-2, enter the following command: ssh root@<cluster_name>-master-2 "export DOCKER_SILENT_INSTALL=1; /ibm/docker_redhat_install.sh /ibm" where /ibm represents the installation directory.
  5. Run docker_redhat_install.sh {INSTALL_PATH} on each node (extractable from the Watson Studio Local installation package by running it with the --extract-pre-install-scripts parameter) to automatically install the docker from the RHEL repo and the storage driver.
    • If you only specify an {INSTALL_PATH} value, for example docker_redhat_install.sh /ibm, then by default Watson Studio Local installs the docker with the overlay2 storage device (no raw disks needed). Ensure that you are using the proper kernel version before installing the docker with overlay2; otherwise, the overlay2 storage device might taint the kernel.
    • If you specify all three installation options {RAW_DISK_PATH} {INSTALL_PATH} {DOCKER_BASE_CONTAINER_SIZE}, for example docker_redhat_install.sh /dev/vdd /ibm 25G, then Watson Studio Local installs the docker with the devicemapper storage device in "direct-lvm" mode.

Hardware and software requirements for a seven-node configuration

This configuration requires a minimum of six servers (either physical or virtual machines) and either one or two optional servers for deployment management on Watson Machine Learning.

Recommendation: Install the operating system with a minimal operating system installation package selection.

You need a sudo username and password to each node (this credential needs sudo root access and is used by the installer to lay down files and configure).

Warning: The hostname for nodes cannot be mixed case or all-capitalized on Watson Studio Local version 1.2.3.1, or any other versions that uses Weave. Including capitalized letters causes the installation to fail due to the method Weave uses to determine deleted nodes. For setting the hostname for nodes, use lower case only.
RHEL requirement: On each node for RHEL, ensure the sudo user can run the following commands and run from the following directories (they can be added as command aliases in the /etc/sudoers file): /usr/bin/kubectl, /usr/bin/kubelet, /usr/bin/docker, /usr/bin/ls, /usr/bin/grep, /usr/bin/python, /usr/bin/tail, /usr/bin/ssh, /usr/bin/mkdir, /usr/bin/rm, /usr/bin/ln, /usr/bin/sed, /usr/bin/scp, /usr/bin/cat, /usr/bin/echo, /usr/bin/awk, /usr/bin/cut, /usr/bin/readlink, /usr/bin/df, /usr/bin/sh, /usr/bin/which, /usr/bin/ntpstat, /usr/bin/mv, /usr/bin/cp, /usr/bin/curl, /sbin/ip, /sbin/getenforce, /sbin/xfs_info, /sbin/sysctl, /sbin/gluster, /sbin/route, /bin/ping, /sbin/iptables, /bin/rpm, /usr/bin/yum, /sbin/service, /sbin/chkconfig, /usr/bin/systemctl , /sbin/fdisk, /sbin/sfdisk, /sbin/parted, /sbin/partprobe, /bin/mount, /bin/umount, /bin/chown, /bin/sudo, /bin/chmod, /bin/kill, /usr/bin/kill, /usr/bin/killall, /usr/bin/touch, ${INSTALL_DIR}/, /wdp/utils/, /wdp/scripts/, /tmp/. Replace ${INSTALL_DIR} with the directory that contains the Watson Studio Local installation files.
Ubuntu requirement: On each node for Ubuntu, ensure the sudo user can run all of the commands listed under RHEL requirement (note that command paths might be different from RHEL, for example, /usr/bin/ls for RHEL and /bin/ls for Ubuntu. Ubuntu requires one additional command: /usr/bin/apt-get.

The password cannot contain a single quotation ('), double quotation ("), pound sign (#), or white space ( ). After installation, Watson Studio Local will run as root.

Alternatively, you can use a SSH key installation using root's private SSH key that has been copied to each node (for example, using the ssh-copy-id command).

If you are using a power broker to manage access, run pbrun to become root on the node you will install from, and copy this root private ssh key to all other nodes (and use this for installation using the wdp.conf configuration file).

Ensure that each node has an extra disk partition for the installer files. Each storage node requires another additional disk partition. All of these disk partitions must be mounted to paths (the installer will ask for these paths) and formatted with XFS with ftype functionality enabled. Example command to format each partition: mkfs.xfs -f -n ftype=1 -i size=512 -n size=8192 /dev/sdb1

Recommendation:

To improve performance, add the noatime flag to the mount options in /etc/fstab for both the installer and data storage partitions. Example:

/dev/sdb1       /installer              xfs    
defaults,noatime    1 2
As a result, inode access times will not be updated on the filesystems.

Minimum server specifications for a seven-node configuration on Red Hat Enterprise Linux (x86, POWER and z)

Node Type Number of Servers (BM/VM) CPU per node RAM per node Disk partitions IP addresses
Control Plane/Storage 3 8 cores 48 GB Minimum 300 GB with XFS format for installer files partition. GlusterFS requires a minimum 500 GB with XFS format for data storage partition. A docker with the devicemapper storage device requires a minimum 200 GB of extra raw disk space.  
Compute 3 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition. A docker with the devicemapper storage device requires a minimum 200 GB of extra raw disk space. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.  
Deployment 1 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition. A docker with the devicemapper storage device requires a minimum 200 GB of extra raw disk space. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.

Other requirements:

  • The installation requires at least 10 GB on the root partition.
  • If you plan to place /var on its own partition, reserve at least 10 GB for the partition.
  • SPSS Modeler add-on requirement: If you plan to install the SPSS Modeler add-on, add .5 CPU and 8 GB of memory for each stream you plan to create.
  • All servers must be synchronized in time (ideally through NTP or Chrony). Ensure that the system time of all the nodes in cluster are synchronized within one second. On each node, if NTP or Chrony is installed, but the node is not synchronized with one second, the installer will not allow you to proceed. If NTP and Crony are not installed, the installer will warn you. If an NTP or Chrony service is running but not used to sync time, then stop and disable the NTP or Chrony service in all nodes before running install.
  • SSH between nodes should be enabled.
  • YUM should not be already running.
  • Pre-requisites for installing Watson Studio Local with NVIDIA GPU support
Control plane/Storage
Requires a minimum of three servers: one master node to manage the entire cluster and at least two additional nodes for high availability. The Kubernetes cluster requires either a load balancer or one unused IP address as the HA proxy IP address. The IP address must be static, portable, and in the same subnet as the cluster. For production deployments, NFS storage management requires server IP and folder name.
For test deployments, GlusterFS requires the data storage path.
Compute
Requires a minimum of three servers: one primary node and at least two extra nodes for high availability and scaling compute resources. During installation, you can add additional nodes for scaling Compute resources, for example, if you expect to run resource-intensive computations or have many processes that run simultaneously.
Deployment
Requires a minimum of one server: one primary node and one optional extra node for high availability. The Deployment nodes are the production versions of the Compute nodes, and thus have identical requirements.

Hardware and software requirements for a four-node configuration

This configuration requires a minimum of three servers (either physical or virtual machines) and either one or two optional servers for deployment management on Watson Machine Learning.

You need a sudo username and password that matches the login password of that user to each node (this credential needs sudo root access and is used by the installer to lay down files and configure).

Warning: The hostname for nodes cannot be mixed case or all-capitalized on Watson Studio Local version 1.2.3.1, or any other versions that uses Weave. Including capitalized letters causes the installation to fail due to the method Weave uses to determine deleted nodes. For setting the hostname for nodes, use lower case only.
RHEL requirement: On each node for RHEL, ensure the sudo user can run the following commands and run from the following directories (they can be added as command aliases in the /etc/sudoers file): /usr/bin/kubectl, /usr/bin/kubelet, /usr/bin/docker, /usr/bin/ls, /usr/bin/grep, /usr/bin/python, /usr/bin/tail, /usr/bin/ssh, /usr/bin/mkdir, /usr/bin/rm, /usr/bin/ln, /usr/bin/sed, /usr/bin/scp, /usr/bin/cat, /usr/bin/echo, /usr/bin/awk, /usr/bin/cut, /usr/bin/readlink, /usr/bin/df, /usr/bin/sh, /usr/bin/which, /usr/bin/ntpstat, /usr/bin/mv, /usr/bin/cp, /usr/bin/curl, /sbin/ip, /sbin/getenforce, /sbin/xfs_info, /sbin/sysctl, /sbin/gluster, /sbin/route, /bin/ping, /sbin/iptables, /bin/rpm, /usr/bin/yum, /sbin/service, /sbin/chkconfig, /usr/bin/systemctl , /sbin/fdisk, /sbin/sfdisk, /sbin/parted, /sbin/partprobe, /bin/mount, /bin/umount, /bin/chown, /bin/sudo, /bin/chmod, /bin/kill, /usr/bin/kill, /usr/bin/killall, /usr/bin/touch, ${INSTALL_DIR}/, /wdp/utils/, /wdp/scripts/, /tmp/. Replace ${INSTALL_DIR} with the directory that contains the Watson Studio Local installation files.
Ubuntu requirement: On each node for Ubuntu, ensure the sudo user can run all of the commands listed under RHEL requirement (note that command paths might be different from RHEL, for example, /usr/bin/ls for RHEL and /bin/ls for Ubuntu. Ubuntu requires one additional command: /usr/bin/apt-get.

The password cannot contain a single quotation ('), double quotation ("), pound sign (#), or white space ( ). After installation, Watson Studio Local will run as root.

Alternatively, you can use an SSH key installation by using root's private SSH key that are copied to each node (for example, by using the ssh-copy-id command).

If you are using a power broker to manage access, run pbrun to become root on the node that you install from, and copy this root private ssh key to all other nodes (and use this for installation by way of the wdp.conf configuration file).

Ensure that each node has an extra disk partition for the installer files. Each storage node requires another additional disk partition. All of these disk partitions must be mounted to paths (the installer asks for these paths) and formatted with XFS with ftype functions enabled. Example command to format each partition: mkfs.xfs -f -n ftype=1 -i size=512 -n size=8192 /dev/sdb1

Recommendation:

To improve performance, add the noatime flag to the mount options in /etc/fstab for both the installer and data storage partitions. Example:

/dev/sdb1       /installer              xfs    
defaults,noatime    1 2
As a result, inode access times will not be updated on the filesystems.

Minimum server specifications for a four-node configuration on Red Hat Enterprise Linux (x86, POWER and z)

Node Type Number of Servers (BM/VM) CPU per node RAM per node Disk partition IP addresses
Control Plane/Storage/Compute 3 24 cores 64 GB Minimum 300 GB with XFS format for installer files partition. GlusterFS requires a minimum 500 GB with XFS format for data storage partition. A docker with the devicemapper storage device requires a minimum 200 GB of extra raw disk space.  
Deployment 1 16 cores 64 GB Minimum 300 GB with XFS format for installer file partition. A docker with the devicemapper storage device requires a minimum 200 GB of extra raw disk space. If you add additional cores, a total of 48-50 cores distributed across multiple nodes is recommended.

Other requirements:

  • The installation requires at least 10 GB on the root partition.
  • If you plan to place /var on its own partition, reserve at least 10 GB for the partition.
  • SPSS Modeler add-on requirement: If you plan to install the SPSS Modeler add-on, add .5 CPU and 8 GB of memory for each stream you plan to create.
  • All servers must be synchronized in time (ideally through NTP or Chrony). Ensure that the system time of all the nodes in cluster are synchronized within one second. On each node, if NTP or Chrony is installed, but the node is not synchronized with one second, the installer will not allow you to proceed. If NTP and Crony are not installed, the installer will warn you. If an NTP or Chrony service is running but not used to sync time, then stop and disable the NTP or Chrony service in all nodes before running install.
  • SSH between nodes should be enabled.
  • YUM should not be already running.
  • Pre-requisites for installing Watson Studio Local with NVIDIA GPU support

Control Plane, Storage, and Compute are all installed on a single node with at least two extra nodes for high availability. Deployment is installed on a single node with an optional extra node for high availability. The Deployment nodes are the production versions of the Compute nodes, and have identical requirements. The Kubernetes cluster requires either a load balancer or one unused IP address as HA proxy IP address. The IP address must be static, portable, and in the same subnet as the cluster. For production deployments, NFS storage management requires server IP and folder name. For test deployments, GlusterFS requires the data storage path. You can add extra nodes for scaling Compute resources, for example, if you expect to run resource-intensive computations or have many processes that run simultaneously.

CPU requirements

  • For a Linux x86-64 cluster, use a CPU that supports SSE 4.2.
  • Enable the AVX/AVX2 instruction set on the processor for the compute nodes where the GPU runtime can be started.

Disk requirements

Ensure that the storage has good disk I/O performance.

Disk latency test: dd if=/dev/zero of=/<path-to-install-path-directory>/testfile bs=512 count=1000 oflag=dsync The value must be better or comparable to: 512000 bytes (512 kB) copied, 1.7917 s, 286 kB/s

Disk throughput test: dd if=/dev/zero of=/<path-to-install-directory/testfile bs=1G count=1 oflag=dsync The value must be better or comparable to: 1073741824 bytes (1.1 GB) copied, 5.14444 s, 209 MB/s

To ensure that your data that is stored within Watson Studio Local is stored securely, you can encrypt your storage partition. If you use Linux Unified Key Setup-on-disk-format (LUKS) for this purpose, then you must enable LUKS and format the partition with XFS before you install Watson Studio Local.

NFS requirements

NFS storage is required for all production deployments and it can also be used for test deployments. Watson Studio Local supports NFS version 4.x through a wdp.conf installation. The Watson Studio Local installer does not install NFS or set up NFS.

Before you start the Watson Studio Local installation, ensure you meet the following requirements:

  • A minimum of 500 GB reserved for storage on the NFS server, and a minimum of 1 Gbits/sec network between the NFS server and Watson Studio Local nodes. This is the minimal requirement for the system to work with a low workload. The real required bandwidth depends on the actual work load created by you and your scheduled tasks. The amount of GB for storage that you need might vary from the minimum because the amount depends on what you'll be doing. For example, if you're analyzing data in databases, you might not need very much storage; however if you'll be working with large CSV files or large image files, you might need more than 500 GB of storage.
  • On the NFS server, ensure that /etc/exports file contains the following example configuration so that all nodes on the Watson Studio Local cluster are exported:
    /<directory-being-exported> xxx.xx.xx.xxx(rw,sync,no_root_squash) xxx.xx.xx.xxx(rw,sync,no_root_squash) xxx.xx.xx.xxx(rw,sync,no_root_squash)here xxx.xx.xx.xxx represents the IP addresses of the nodes of the cluster.
    , where xxx.xx.xx.xxx represents the IP addresses of the nodes of the cluster.
  • For a cluster with NFS storage, the following two parameters must be present in the wdp.conf file:
    nfs_server=xxx.xx.xx.x, where xxx.xx.xx.xxx represents the IP addresses of the nodes of the cluster.
    nfs_dir=/<directory-being-exported>
    Do not include entries with _data_ in the wdp.conf file.
  • Optional: Verify that the NFS setup can add mount points by creating a directory on a node:
    mkdir -p /mnt/nfs/home
    mount -t nfs <NFS server IP>:<directory-being-exported> /mnt/nfs/home
    You should see no errors.

When you are storing user data and Docker images, go here to help you determine whether to use enterprise NFS service or GlusterFS. Options for storing Docker images and local storage requirements on your nodes are also provided.

GlusterFS requirements

GlusterFS can only be used for test deployments. If you are creating a small test system and want to bypass requesting NFS storage, you can use GlusterFS to store the user data on local disks spread across the nodes in a cluster. Learn more

Network requirements

  • Each node needs to have a working DNS and a gateway that is specified within the network configuration regardless of whether this gateway allows outbound network access.
  • A minimum of 1 Gbits/sec network is required between the nodes.
  • The cluster requires a network that it can use for the overlay network within Kubernetes. The network cannot conflict with other networks that might establish a connection to the cluster. Watson Studio Local configures 9.242.0.0/16 as the default network. Use this default only if it does not conflict with other networks that this cluster is connected to.
  • In the /etc/sysctl.conf file, you must set net.ipv4.ip_forward = 1, and load the variable using the command sysctl -p.
  • From the first master node where the installer will run from, verify that you can actually SSH to every other node either by user ID or SSH key.
  • Verify the DNS you have set up on every node and ensure the DNS that you configured actually accepts DNS lookup requests. Enter a dig or nslookup command against a name on your network, and ensure your DNS correctly responds with an IP address.
  • Ensure the IP addresses being used for the installation match the host name for each node (hostnames and IP addresses need to be unique across the nodes).
  • Verify the machine-id is unique on each node by entering the command: cat /etc/machine-id. If they are not unique, you can generate new IDs with the following command: uuidgen > /etc/machine-id.
  • Ensure that ping is enabled between the nodes.
  • Ensure ICMP is enabled between the nodes, and that you are able to ping each of the nodes.

Proxy IP or load balancer configuration

To provide High Availability for Watson Studio Local, you must use either a proxy IP address or a load balancer.

Option 1: Proxy IP address
Requirements:
  • All of the master nodes must be on the same subnet. The compute and deploy can be on any accessible subnet.
  • A static unused IP address on the network is required that is on the same VLAN and subnet of the master nodes. For high availability purposes, this will be used as a failover source IP where Watson Studio Local will be accessed from. The master nodes will use this IP so that if one of the master nodes fails, the other node will take over this IP and provide fault tolerance. The network administrator must provide the reservation of the IP to be used before you can install Watson Studio Local.
Option 2: Load balancer
For high availabilty purposes, you can use an external load balancer that is configured on your network. The load balancer does not require the nodes to be on the same subnet and VLAN. The load balancer can only be specified for a Watson Studio Local installation using a wdp.conf file.

You can use one or two load balancers for this configuration:

External Traffic Routing
This load balancer must be configured to forward traffic for port 6443 and 443 to all three control nodes (or master nodes) with persistent IP round robin for the cluster to function properly. After Watson Studio Local is installed, you can access it by connecting to the load balancer on this port via SSL or HTTPS.
Internal Traffic Routing
This load balancer must be configured before installing Watson Studio Local to forward internal traffic for port 6443 to all three control nodes (or master nodes). All nodes must have access to the Kubernetes API server for the cluster to communicate to itself.

Firewall restrictions

  • Kubernetes uses IP tables for cluster communication. Because Kubernetes cannot run a server firewall on each node in combination with the IP tables it is using, firewall (for example, firewalld and iptables) must be disabled. If an extra firewall is needed, it is recommended you set up the firewall around the cluster (for example, vyatta firewall), and open up port 443.
  • SELinux must be in either Enforcing or Permissive mode. Use the getenforce command to get the current SELinux mode. If the command shows "Disabled", then edit /etc/selinux/config and change the SELINUX= line to either SELINUX=permissive or SELINUX=enforcing. Then, restart the node for the change to take effect.
  • Watson Studio Local expects to be displayed externally through one port: 443 (https), for which access must be permitted.
  • The Watson Studio Local runtime environment components connect to data sources (for example, relational databases, HDFS, and enterprise LDAP server/port) to support authentication for which access should be permitted.
  • Ensure that no daemon, script, process, or cron job makes any modification to /etc/hosts, IP tables, routing rules, or firewall settings (like enabling or refreshing firewalld or iptables) during or after install.
  • Ensure every node has at least one localhost entry in the /etc/hosts file corresponding to IP 127.0.0.1.
  • If your cluster uses multiple network interfaces (one with public IP addresses and one with private IP addresses), use only the private IP address in the /etc/hosts file with the short hostnames.
  • Ansible requirement: ensure the libselinux-python package is available.
  • Restriction: Watson Studio Local does not support dnsmasq. Check with your network administrator to make sure that dnsmasq is not enabled.

Certificates

Watson Studio Local generates SSL certificates during installation. The certificates are used for inter-cluster communication and must be trusted during first time access by users.

IBM Cloud offering requirements

See IBM Cloud documentation for details on ordering resources and performing installation tasks.

  • Set up a minimum of three virtual machines or bare metal servers, choosing specifications needed for Watson Studio Local. Choose SSD drivers when ordering.
  • Ensure the DNS you have configured on each node is working, and can resolve names or IP addresses on the network you are on.
  • Set up a local load balancer and configure it to redirect the TCP port 6443 to the three master node instances. Choose persistent IP and round robin configuration. For health checks, use whether the port is open or closed.
  • Install Watson Studio Local using the wdp.conf file with virtual_ip_address= commented out and the new line added: load_balancer_ip_address=<IP of the network load balancer>. Use the private IPs for each of the nodes to ensure Watson Studio Local is installed using the private network.
  • After the installation completes, create an external load balancer for HTTPS (443) and point this to the three master nodes. Do not use SSL off loading. Use this external load balancer to connect to Watson Studio Local through HTTPS and TCP port 443.

Additional requirements for Microsoft Azure

  • Red Hat Enterprise Linux operating system only.
  • When ordering the VMs, choose Premium SSD.
  • All three master nodes need to be added to the availablity set.
  • Use an SSD drive for the installation partition. Use a separate raw disk (rather than a raw partition) for the docker.
  • Use either the root user or the root SSH key installation. Sudo user is not supported.

See Microsoft Azure Documentation for details on ordering resources and performing installation tasks.

Additional requirements for Amazon Web Services

Before installing Watson Studio Local, complete the following steps:

  1. Create an HTTPS "Application" Elastic Load Balancer that forwards traffic to port 443 to the three master nodes. This load balancer will be the front-facing URL to the users, so you can choose whatever port to listen on, and the certificate to secure the connection on via AWS's certificate manager.
  2. Create a TCP "Network" Elastic Load Balancer that listens on port 6443, and forwards to port 6443 on the three master nodes. This load balancer will be used by the cluster to communicate with the kubernetes API server.

For version 1.2.1.0 or later: Install Watson Studio Local using the wdp.conf file file with virtual_ip_address= commented out and the new line added: load_balancer_fqdn=<FQDN of the TCP load balancer>. Use either the root user or the root SSH key installation. Sudo user is not supported.

For versions earlier than 1.2.1.0: Install Watson Studio Local using the wdp.conf file with virtual_ip_address= commented out and the new line added: load_balancer_ip_address=<static IP of the TCP load balancer>. Use either the root user or the root SSH key installation. Sudo user is not supported.

Hadoop requirements

See Hortonworks Data Platform (HDP) or Cloudera Distribution for Hadoop (CDH).

Supported web browsers

  • Google Chrome (recommended)
  • Mozilla Firefox