Pre-requisites for Installing DSX Local with NVIDIA GPU support
Ensure that your servers meet the requirements for installing DSX Local if you use a GPU NVIDIA operating system.
- Graphics Processing Unit System Preparation
- Azure Specific Pre-install Instructions
- General Pre-install Instructions
- More requirements for POWER8 with NVIDIA P100 GPUs
- More requirements for POWER9 with NVIDIA P100 GPUs
Graphics Processing Unit System Preparation
DSX Local supports GPUs by NVIDIA in Azure, AWS and Softlayer. If you have NVIDIA GPUs, you must perform the following steps before installing DSX.
Azure Specific Pre-install Instructions
If you are an Azure user, Azure is the only environment that has very specific kernel version requirements. The default kernel level of a non HPC node will not be compatible with DSX:
uname -r 3.10.0-514.28.1.el7.x86_64
To install a supported kernel level, run the following code:
yum install kernel-3.10.0-514.21.1.el7.x86_64 reboot
General Pre-install Instructions
Follow these steps, along with the examplar code, to pre-install packages, modules and update video drivers for DSX pre-installation. For instructions on downloading NVIDIA drivers, jump to the section Download and Install NVIDIA GPU.
yum install pciutils
Check for the default video driver.
lsmod | grep -i nouveau
Disable the default nouveau drivers using the following procedure below to a) update grub.conf and b) blacklist.conf (depending on the image you use these might already be disabled. See step 2) and c) reboot.
a. Update grub to blacklist the nouveau driver by appending
rd.driver.blacklist=nouveau nouveau.modeset=0to the GRUB_CMDLINE_LINUX line as shown below.
vi /boot/default/grub Change : GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto” to GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto rd.driver.blacklist=nouveau nouveau.modeset=0"
Complete the update by running the following command:
grub2-mkconfig -o /boot/grub2/grub.cfg
b. Edit/create the filepath
c. Reboot the system to activate the changes.
Add kernel-tools and kernel-devel packages by installing the version that matches the kernel version.
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
yum install gcc
dkmspackage from an external repo.
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum install dkms
Download and Install NVIDIA GPU
Follow these steps, along with the examples provided, to download the latest NVIDIA driver:
Download the NVIDIA 8.0 drivers: Download Drivers
Figure 1: Example of selected driver
Install the NVIDIA repo.
rpm -i nvidia-diag-driver-local-repo-rhel7-384.66-1.0-1.x86_64.rpm
Install the drivers and then reboot. CUDA-enabled NVIDIA 8.0 GPU must be installed on the host operating system compute nodes that have a GPU.
yum install cuda-drivers reboot
Verify the installation:
Tip: If the command is slow, persistence mode might be enabled. Disable persistence with the following command:
nvidia-smi -pm 0
Figure 2. Example of verification results
You are now ready to install DSX and reap the accelerated compute speed of the GPU.
More requirements for POWER8 with NVIDIA P100 GPUs
If you have a POWER8 "Minsky" system with NVIDIA P100 GPUs, before you install DSX Local on it, you must first install the NVIDIA GPU drivers the POWER8 system.
To install GPU drivers on Power8 hosts, install CUDA 9 from a local repo by following the Nvidia documentation.
- Clean up CUDA libraries from any prior installations:
yum list installed | grep -i cuda yum remove cuda-* yum remove dkms.noarch yum remove epel-release yum remove nvidia-kmod*
Get CUDA 9.2 libraries and install them on the system:
cd /tmp wget http://developer.download.nvidia.com/compute/cuda/repos/rhel7/ppc64le/cuda-repo-rhel7-9.2.88-1.ppc64le.rpm rpm -i cuda-repo-rhel7-9.2.88-1.ppc64le.rpm yum clean all yum install epel-release-latest-7.noarch.rpm yum install cuda #verify GPU can be seen nvidia-smi #verify if device file has been created ls /dev/nvidia-uvm #If device file not found download utility (https://www.ibm.com/support/knowledgecenter/en/SSBS6K_220.127.116.11/manage_cluster/verify_gpu.html) and execute it: ./cudaInit_ppc64le #verify file is created ls /dev/nvidia-uvm #verify if device log file exists: ls /var/lib/docker/volumes/ #if the device log file is missing, create the directory `nvidia_driver_xxx.xx` cd /var/lib/docker/volumes mkdir nvidia_driver_396.26
More requirements for POWER9 with NVIDIA P100 GPUs
See IBM PowerAI Releases.