Table of contents

Troubleshoot the Data Science Experience Local installation

If an installation page displays improperly due to a network issue, refresh the page.

Retry or skip a failing installation step

If an installation step fails, you can view the log and contact IBM support. To avoid unexpected issues, make sure to resolve the issue in the log before you retry or skip the step.

Figure: Log file for a failed installation step with the retry or skip option

Installation failure screencap

Tip: If you are installing with wdp.conf and a step fails due to a timeout, you can try prolonging the step timeout using the step_timeout parameter, for example, step_timeout=30000 (time out in 30,000 seconds).

View progress for a dead session

If you accidentally disconnect from the installation session, you can still view the progress through the log output on InstallPackage/tmp/SetupCluster.out (outputs steps that are running on the docker container named wdp-ansible). When the installation finishes successfully, the last line of the file displays WDP_FINISH|0|9.30.54.24 where 9.30.54.24 represents the web address to sign in to your DSX client, for example, https://9.30.54.24/dsx-admin. If the log stops on an installation step failure, for example, Step fail - Wait for user's response and Wait for action file to show up, you can enter either InstallPackage/action.sh retry to retry the step or InstallPackage/action.sh skip to skip the step.

Resume installation from a specific step

If the log file hangs for more than 2 hours, then either the installation or container probably failed (it is possible that the node that is running the installation package was rebooted during the installation). Verify the failing step in the InstallPackage/tmp/SetupCluster.out log file, and either clean up or manually finish the currently failing step. Then, create a wdp.conf file (ensuring all parameters match your previous installation) with a new line jump_install=2 where 2 represents the step number to continue the installation from, and then rerun the installation by command line so that it detects the wdp.conf file.

Common problems

Validate the following requirements to prevent an installation failure:

  • Ensure that all servers can ping each other by hostname.
  • Ensure that a default gateway is set on each server (by using the command route).
  • If you used the SSH key installation, ensure that you can SSH without a password from the node you are installing from to the other nodes in the cluster.
  • Ensure that the required partitions show, and that the partitions have sufficient space on them.
  • Ensure that the proxy IP is used and cannot be pinged, and is not associated with any other system.
  • Ensure that SELinux is set correctly on each node to either permissive or enforcing, for example, by using the sestatus command to check status on each node.
  • Ensure that time syncing is working and set up on each node, for example, use ntpq -p for NAP and use chronic tracking for chrony. If the nodes are not synced, the pre-validation script displays the following message:

    ===== NTP configuration summary =====
    172.16.209.60 is synced to NTP server 132.163.4.102,
    172.16.209.61 NTP is not synced
    172.16.209.62 NTP is not synced
    =====================================
    
  • Ensure that firewalld and iptables are not running or enabled.

    (systemctl status `firewalld`, systemctl status `iptables`).