Table of contents

Migrate data in Data Science Experience Local

Migrate data from Version 1.1.3 to Version 1.2.0.3

Restriction: During the import of cloudant, you might see an error code 412 (missing_stub) or 413 (too_large). This is expected and does not affect functionality of DSX Local.

Method 1: Migrate data from a Version 1.1.3 cluster to a Version 1.2.0.3 cluster

Requirement: A separate target cluster with DSX Local 1.2.0.3 must be installed, accessible from the source DSX Local 1.1.3 cluster.

Step 1: From the DSX Local 1.1.3 master node, export cloudant, custom images, and mongodb from the source cluster:

  1. Copy the DSX Local 1.2.0.3 installer to the installer partition on one of the master nodes.
  2. Run the installer with the flag --extract-migration-scripts. This extracts the migration scripts into a directory named migration on the installer partition.
  3. From the migration directory, run the export.sh script. Confirm that there is a newly created directory backup_data in the migration directory populated with three sub-directories: cloudant, image-management, and mongodump.

Step 2: From the DSX Local 1.2.0.3 master node, import cloudant and mongodb from DSX Local Version 1.2.0.3:

  1. Copy the migration directory (including backup_data and its sub-directories) from the Version 1.1.3 cluster into any desired location on the master node of the Version 1.2.0.3 cluster.
  2. From the migration directory copied in the previous step, run the import.sh script.

Step 3: Migrate user-home from the Version 1.1.3 cluster to the Version 1.2.0.3 cluster:

  1. Copy the file exclusion_list.txt from the migration directory extracted in Step 1 to any desired location on the storage node of Version 1.1.3 cluster. Keep the path to the file.
  2. On a storage node of the Version 1.1.3 cluster, run the following command to pause the user-home gluster volume:
    echo y | gluster volume stop user-home
    
  3. On a master node of the Version 1.2.0.3 cluster, run the following command to pause the dsx-user-home gluster volume:
    echo y | kubectl exec -i -n kube-system $(kubectl get po -n kube-system | grep glusterfs | grep Running | head -n 1 | awk '{print $1}') gluster volume stop dsx-user-home
    
  4. Run the following command, specifying the required entries where specified, without angled brackets (<>). This will copy the source Version 1.1.3 user-home volume remotely to the target Version 1.2.0.3 cluster's dsx-user-home volume with the exception of a few files that must be excluded. Important: Include/exclude trailing slashes exactly as command is written:
    rsync -av --exclude-from=<PATH_TO_exclusion_list.txt> /<SRC_DATA_PARTITION_DIR>/user-home/ <TARGET_USER>@<TARGET_MASTER_IP>:/<TARGET_DATA_PARTITION_DIR>/dsx-user-home
    
  5. On a storage node of the Version 1.1.3 cluster, run the following command to resume the user-home gluster volume:
    gluster volume start user-home
    
  6. On a master node of the Version 1.2.0.3 cluster, run the following command to resume the dsx-user-home gluster volume:
    kubectl exec -i -n kube-system $(kubectl get po -n kube-system | grep glusterfs | grep Running | head -n 1 | awk '{print $1}') gluster volume start dsx-user-home
    

Method 2: Migrate data from Version 1.1.3 to Version 1.2.0.3 on the same cluster

Requirement: Additional storage matching the size of the data partitions, mounted onto one of the storage nodes in the cluster.

Step 1: From the DSX Local 1.1.3 master node, export cloudant, custom images, and mongodb from the source cluster:

  1. Copy the DSX Local 1.2.0.3 installer to the installer partition on one of the master nodes.
  2. Run the installer with the flag --extract-migration-scripts. This extracts the migration scripts into a directory named migration on the installer partition.
  3. From the migration directory, run the export.sh script. Confirm that there is a newly created directory backup_data in the migration directory populated with three sub-directories: cloudant, image-management, and mongodump.

Step 2: From the DSX Local 1.1.3 storage node, back up all user data:

  1. Run the following command to stop the user-home gluster volume:
    gluster volume stop user-home
    
  2. From the storage node, run the following command specifying the <DATA_PARTITION> directory and <PATH_TO_STORAGE>. Important: Include/exclude trailing slashes exactly as command is written:
    rsync -av --exclude=.glusterfs /<DATA_PARTITION>/user-home /<PATH_TO_STORAGE>/
    
  3. Run the following command to start the user-home gluster volume:
    gluster volume start user-home
    

Step 3: Uninstall DSX Local 1.1.3 without deleting any data

  1. Run the following script to uninstall DSX 1.1.3 without deleting any data. Important: Ensure you specify the --keep-data option:
    /wdp/utils/uninstall.sh --keep-data
    
  2. On all of the storage nodes, rename the user-home gluster volume directory to dsx-user-home, specifying the <DATA_PARTITION>:
    mv /<DATA_PARTITION>/user-home /<DATA_PARTITION>/dsx-user-home
    
  3. On all of the storage nodes, remove the following unneeded gluster volume directories in the data partition:
    /<DATA_PARTITION>/cloudant-repo
    /<DATA_PARTITION>/docker-registry
    /<DATA_PARTITION>/elasticsearch-storage
    /<DATA_PARTITION>/mongo-repo-00 
    /<DATA_PARTITION>/mongo-repo-01
    /<DATA_PARTITION>/mongo-repo-02
    /<DATA_PARTITION>/prometheus-storage
    /<DATA_PARTITION>/spark-metrics-repo
    /<DATA_PARTITION>/redis-repo
    /<DATA_PARTITION>/update-storage
    /<DATA_PARTITION>/dsx-user-home/_global_/config/dsxl_version.txt
    /<DATA_PARTITION>/dsx-user-home/_global_/.custom-images/temp-images
    /<DATA_PARTITION>/dsx-user-home/_global_/.custom-images/metadata
    /<DATA_PARTITION>/dsx-user-home/_global_/.builtin-metadata
    /<DATA_PARTITION>/dsx-user-home/_global_/config/.runtime-definitions
    /<DATA_PARTITION>/dsx-user-home/_global_/internal-nginx-conf.d
    /<DATA_PARTITION>/dsx-user-home/_global_/nginx-conf.d
    /<DATA_PARTITION>/dsx-user-home/_global_/config/addons/spss-modeler-streams.json
    /<DATA_PARTITION>/dsx-user-home/_global_/spark/jars
    /<DATA_PARTITION>/dsx-user-home/deployPkgs
    /<DATA_PARTITION>/dsx-user-home/.scripts
    
  4. Reboot all the nodes to fully complete the uninstallation.

Step 4: Install DSX Local 1.2.0.3 on the same cluster.

Step 5. From the DSX Local 1.2.0.3 master node, import all cloudant and admin settings:

  1. Copy the migration directory (including backup_data and its sub-directories) from the initial backup into any desired location on the master node.
  2. From the migration directory copied in the previous step, run the import.sh script.

Upgrade migrated Version 1.1.3 data to work in Version 1.2.0.3

Step 1: Move all custom database JDBC driver JARs from /user-home/_global_/spark/jars to /user-home/_global_/dbdrivers.

Step 2: Upgrade your Version 1.1.3 notebooks to 1.2.0.3:

  1. When migrating a custom Jupyter 2.7 image from 1.1.3 to 1.2.0.3, DSX Local provides no Insert to code option for R notebooks for both remote and local data sets. You must write this code manually. See Create a notebook for guidance.
  2. When migrating a project from 1.1.3 to 1.2.0.3 that contains a custom image, the project Environments tab does not default to the custom image. You must select the image manually.
  3. All endpoints referencing the dsxl-ml namespace must be removed. Example:

    Before:

    "val scoringURL=s"http://dsx-scripted-ml-python2-svc.dsxl-ml:7300/api/v1/score/unpublished/${projectName}/${modelName}"
    

    After:

    "val scoringURL=s"http://dsx-scripted-ml-python2-svc:7300/api/v1/score/unpublished/${projectName}/${modelName}"
    

Step 3: Upgrade your Version 1.1.3 models to 1.2.0.3:

  1. Because all existing model deployments will be gone, you must tag a new project release and re-deploy your models using IBM Deployment Manager. See Model Management and Deployment for details.
  2. Untrained models will not work in 1.2.0.3; you must recreate them. If any untrained WML models report kernel not found, you must recreate the model in visual model builder.
  3. Generate new batch scoring and evaluation scripts for all of your models.
  4. You will not be able to create a new version of a migrated model. Instead, you must either change the model name or delete the existing model.

Step 4: Because job runs and job schedules are not preserved, you must recreate them.