Troubleshoot the Watson Studio Local installation
You can troubleshoot a failed Watson Studio Local installation in a variety of ways.
If an installation page displays improperly due to a network issue, refresh the page.
- Retry or skip a failing installation step
- View progress for a dead session
- Resume installation from a specific step
- Monitor system performance during the installation process
- Common problems
Retry or skip a failing installation step
If an installation step fails, you can view the log and contact IBM support. To avoid unexpected issues, make sure to resolve the issue in the log before you retry or skip the step.
Figure: Log file for a failed installation step with the retry or skip option
View progress for a dead session
If you accidentally disconnect from the installation session, you can still view the progress through the log output on InstallPackage/tmp/SetupCluster.out (outputs steps that are running on the docker container named wdp-ansible). When the installation finishes successfully, the last line of the file displays WDP_FINISH|0|220.127.116.11 where 18.104.22.168 represents the web address to sign in to your Watson Studio Local client, for example, https://22.214.171.124/dsx-admin. If the log stops on an installation step failure, for example, Step fail - Wait for user's response and Wait for action file to show up, you can enter either InstallPackage/action.sh retry to retry the step or InstallPackage/action.sh skip to skip the step.
Resume installation from a specific step
If the log file hangs for more than 2 hours, then either the installation or container probably failed (it is possible that the node that is running the installation package was rebooted during the installation). Verify the failing step in the InstallPackage/tmp/SetupCluster.out log file, and either clean up or manually finish the currently failing step. Then, create a wdp.conf file (ensuring all parameters match your previous installation) with a new line jump_install=2 where 2 represents the step number to continue the installation from, and then rerun the installation by command line so that it detects the wdp.conf file.
Monitor system performance during the installation process
During the installation process, Watson Studio Local automatically runs a monitoring tool in the background to collect system performance statistics on every node in the cluster. The monitoring tool automatically terminates once the installation has either succeeded or failed. All data is written to a monitor_results.csv file located in the installation directory on each node. The tool also generates graphs in each installation directory, for example, cpu.png and memory_buff.png, to visualize monitoring results so that users can detect any performance issues from the images.
root 3393 0.0 0.0 18052 1560 ? S 11:20 0:00 bash -c monitor_install '/wdp' '/wdp/AnsiblePlaybooks/hosts' '--become --become-method=sudo' '7200'
./dpctl monitor --config=wdp.conf
The monitor_results.csv file contains the following output:
- Output from the vmstat command, used to monitor and collect virtual memory stats including memory, swap, system and CPU information.
- Output from the iostat command, used to collect input and output statistics for both the installation and the data paths.
- Output from the sar command, used to collect network interface information.
Validate the following requirements to prevent an installation failure:
- Ensure that all servers can ping each other by hostname.
- Ensure that a default gateway is set on each server (by using the command route).
- If you used the SSH key installation, ensure that you can SSH without a password from the node you are installing from to the other nodes in the cluster.
- Ensure that the required partitions show, and that the partitions have sufficient space on them.
- Ensure that the proxy IP is used and cannot be pinged, and is not associated with any other system.
- Ensure that SELinux is set correctly on each node to either permissive or enforcing, for example, by using the sestatus command to check status on each node.
- Ensure that time syncing is working and set up on each node, for example, use ntpq
-p for NAP and use chronic tracking for chrony. If the nodes are
not synced, the pre-validation script displays the following
===== NTP configuration summary ===== 172.16.209.60 is synced to NTP server 126.96.36.199, 172.16.209.61 NTP is not synced 172.16.209.62 NTP is not synced =====================================
Ensure that firewalld and iptables are not running or enabled.
(systemctl status `firewalld`, systemctl status `iptables`).