Table of contents

Install global libraries and packages on the cluster

A DSX Local administrator can install Python or R packages in global directories. These packages are available to all users of the cluster.

Tasks for installing global libraries and packages:

To install a global Python library

  1. Log in to DSX Local as admin and create a Python notebook.
  2. Use the Python pip package installer command to install Python libraries to your notebook. For example, run the following command in a code cell to install the prettyplotlib library:
    !pip install –-target /user-home/_global_/python-2.7 prettyplotlib
    

The installed packages can be used by all notebook users that use the same Python version in the Spark service. Notebook users can now use the Python import command to import the library components. For example, users can run the following command in a code cell:

import prettyplotlib as ppl

To install a global Python library when the cluster is not connected to the internet

  1. Access the shared volume on the host. As root, do the following actions on the master node:
    1. Create a directory on the master node to mount the user-home volume. For example:
      mkdir -p /mnt/shared-user-home
      
    2. Find a storage node:
      kubectl get nodes -l is_storage=true
      
      Example output:
      NAME                                STATUS    AGE
      dev06-kube-storage-1.fyre.ibm.com   Ready     31d
      dev06-kube-storage-2.fyre.ibm.com   Ready     31d
      dev06-kube-storage-3.fyre.ibm.com   Ready     31d
      
      Pick one of the nodes in the output. In this example, you might pick dev06-kube-storage-1.fyre.ibm.com.
    3. Mount the user-home volume:
      mount -t glusterfs <storagehost>:/user-home  <mount-point>
      
      For example:
      mount -t glusterfs dev06-kube-storage-1.fyre.ibm.com:/user-home /mnt/shared-user-home/
      
  2. From a computer that has access to the internet and that has pip and Python v2.7 installed, run the following command to download the module and its dependencies:
    pip download -d tmp/piptest/prettyplotlib --no-binary :all: prettyplotlib
    
  3. Use tar or zip to create an archive of the downloaded files:
    tar -cf downloadedModule.tar tmp/piptest/prettyplotlib
    
  4. Copy the archive to the cluster master node:
    scp downloadedModule.tar root@dev06-kube-master-1:
    
  5. On the cluster master node, unpack the archive onto the shared directory from above:
    cd /mnt/shared-user-home/_global_/
    tar -xf ~/downloadedModule.tar
    
    Note the location of the directory and the module file:
    [root@dbl164-master-1 _global_]# tar -tf ~/downloadedModule.tar
    tmp/piptest/prettyplotlib/
    tmp/piptest/prettyplotlib/brewer2mpl-1.4.1.zip
    tmp/piptest/prettyplotlib/functools32-3.2.3-2.zip
    tmp/piptest/prettyplotlib/pyparsing-2.2.0.tar.gz
    tmp/piptest/prettyplotlib/cycler-0.10.0.tar.gz
    tmp/piptest/prettyplotlib/python-dateutil-2.6.0.tar.gz
    tmp/piptest/prettyplotlib/six-1.10.0.tar.gz
    tmp/piptest/prettyplotlib/pytz-2017.2.zip
    tmp/piptest/prettyplotlib/matplotlib-2.0.2.tar.gz
    tmp/piptest/prettyplotlib/subprocess32-3.2.7.tar.gz
    tmp/piptest/prettyplotlib/numpy-1.12.1.zip
    tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gz
    
    In this example, the location of the module file is:
    /mnt/shared-user-home/tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gz
    
    On the pod that is running the notebook server, this location is:
    /user-home/tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gz
    
  6. From DSX Local, log in as admin, create a new Python notebook, and enter the following command in a cell:
    !pip install --target /user-home/_global_/python-2.7  --no-index --find-links=/user-home/tmp/_global_/piptest/prettyplotlib2 /user-home/_global_/tmp/piptest/prettyplotlib2/prettyplotlib-0.1.7.tar.gz
    

Troubleshooting: If the module does not install, repeat these instructions again, but remove the --no-binary :all: turn on the pip download step.

The installed packages can be used by all notebook users that use the same Python version in the Spark service. Notebook users can now use the Python import command to import the library components. For example, users can run the following command in a code cell:

import prettyplotlib as ppl

To load a global R package

  1. Log in to DSX Local as admin and create an R notebook.

  2. Use the R install.packages() function to install new R packages. For example, run the following command in a code cell to install the ggplot2 package for plotting functions:

     install.packages("ggplot2")
    

    The imported package can be used by all R notebooks that is running in the Spark service.

Now, users can use the R library() function to load the installed package. For example, a user can run the following command in a code cell:

library("ggplot2")

When a user adds this command, they can now call plotting functions from the ggplot2 package in their notebook.

To install a global R library when the cluster is not connected to the internet

  1. Access the shared volume on the host. As root, do the following actions on the master node:
    1. Create a directory on the master node to mount the user-home volume. For example:
      mkdir -p /mnt/shared-user-home
      
    2. Find a storage node:
      kubectl get nodes -l is_storage=true
      
      Example output:
      NAME                            STATUS   AGE
      dev06-kube-storage-1.ibm.com    Ready    31d
      dev06-kube-storage-2.ibm.com    Ready    31d
      dev06-kube-storage-3.ibm.com    Ready    31d
      
      Pick one of the nodes in the output. In this example, you might pick dev06-kube-storage1.ibm.com.
    3. Mount the user-home volume:
      mount -t glusterfs <storagehost>:/<namespace>-user-home <mount-point>
      
      For example:
      mount -t glusterfs dev06-kube-storage-1.ibm.com:/dsx-user-home /mnt/shared-user-home/
      
  2. From a computer that has access to the internet, go to R CRAN page and search for packages, and download the package TAR file directly from the browser or use the following command to download through command line.

    First, create the destination folder:

    mkdir -p tmp-r
    

    Then use wget or curl to download the package by URL found from the CRAN website. wget example:

    wget https://cran.r-project.org/src/contrib/ggplot2_2.2.1.tar.gz --directory-prefix=tmp-r
    

    If R is installed on this computer, download the R package in an R session:

    download.packages('ggplot2',destdir='tmp-r')
    

    A TAR file for that package will be downloaded to folder tmp-r:

    $ ls tmp-r
    ggplot2_2.2.1.tar.gz
    
  3. Copy the archive to the cluster master node:
    scp -f tmp-r root@dev06-kube-master-1:/mnt/shared-user-home/
    
  4. On the cluster master node, check the uploaded file or files. In this example, the location of the module file is:
    /mnt/shared-user-home/tmp-r/ggplot2_2.2.1.tar.gz
    
    On the pod that is running the notebook server, this location is:
    /user-home/tmp-r/ggplot2_2.2.1.tar.gz
    
  5. From DSX Local, sign in as admin, create a new R notebook, and enter the following command in a cell:
    install.packages('/user-home/tmp-r/ggplot2_2.2.1.tar.gz', repos=NULL)
    

The installed packages can be used by all notebook users that use the same R version in the Spark service. Notebook users can now use the R library() command to load the library components. For example, users can run the following command in a code cell:

library(ggplot2)

Learn more