Troubleshoot the Watson Studio Local installation
You can troubleshoot a failed Watson Studio Local installation in a variety of ways.
If an installation page displays improperly due to a network issue, refresh the page.
- Retry or skip a failing installation step
- View progress for a dead session
- Resume installation from a specific step
- Monitor system performance during the installation process
- Common problems
Retry or skip a failing installation step
If an installation step fails, you can view the log and contact IBM support. To avoid unexpected issues, make sure to resolve the issue in the log before you retry or skip the step.
Figure: Log file for a failed installation step with the retry or skip option
step_timeout=14400(time out in 4 hours). Alternatively, before the installation begins, you can prolong the timeout for all of the helm installation steps by specifying seconds in the helm_request_timeout parameter, for example,
View progress for a dead session
If you accidentally disconnect from the installation session, you can still view the progress
through the log output on InstallPackage/tmp/SetupCluster.out (outputs steps
that are running on the docker container named wdp-ansible). When the installation finishes
successfully, the last line of the file displays
184.108.40.206 represents the web address to sign in to your Watson Studio Local client, for example,
https://220.127.116.11/dsx-admin. If the log stops on an installation step failure, for
Step fail - Wait for user's response and
Wait for action file to
show up, you can enter either InstallPackage/action.sh retry to
retry the step or InstallPackage/action.sh skip to skip the step.
Resume installation from a specific step
If the log file hangs for more than 2 hours, then either the installation or container probably
failed (it is possible that the node that is running the installation package was rebooted during
the installation). Verify the failing step in the
InstallPackage/tmp/SetupCluster.out log file, and either clean up or manually
finish the currently failing step. Then, create a
wdp.conf file (ensuring all
parameters match your previous installation) with a new line
2 represents the step number to continue the installation from, and then rerun the
installation by command line so that it detects the wdp.conf file.
Monitor system performance during the installation process
During the installation process, Watson Studio Local automatically runs a monitoring tool in the background to collect system performance statistics on every node in the cluster. The monitoring tool automatically terminates once the installation has either succeeded or failed. All data is written to a monitor_results.csv file located in the installation directory on each node. The tool also generates graphs in each installation directory, for example, cpu.png and memory_buff.png, to visualize monitoring results so that users can detect any performance issues from the images.
root 3393 0.0 0.0 18052 1560 ? S 11:20 0:00 bash -c monitor_install '/wdp' '/wdp/AnsiblePlaybooks/hosts' '--become --become-method=sudo' '7200'
./dpctl monitor --config=wdp.conf
The monitor_results.csv file contains the following output:
- Output from the vmstat command, used to monitor and collect virtual memory stats including memory, swap, system and CPU information.
- Output from the iostat command, used to collect input and output statistics for both the installation and the data paths.
- Output from the sar command, used to collect network interface information.
Validate the following requirements to prevent an installation failure:
- After installation, if the host name or IP address of a node is changed, the Watson Studio Local cluster will have issues. Some pods cannot run. Changing the host name or IP address of a Watson Studio Local node after installation is not supported.
- Ensure that all servers can ping each other by hostname.
- Ensure that a default gateway is set on each server (by using the command route).
- If you used the SSH key installation, ensure that you can SSH without a password from the node you are installing from to the other nodes in the cluster.
- Ensure that the required partitions show, and that the partitions have sufficient space on them.
- Ensure that the proxy IP is used and cannot be pinged, and is not associated with any other system.
- Ensure that SELinux is set correctly on each node to either permissive or enforcing, for
example, by using the
sestatuscommand to check status on each node.
- Ensure that time syncing is working and set up on each node, for example, use ntpq
-p for NAP and use chronic tracking for chrony. If the nodes are
not synced, the pre-validation script displays the following
===== NTP configuration summary ===== 172.16.209.60 is synced to NTP server 18.104.22.168, 172.16.209.61 NTP is not synced 172.16.209.62 NTP is not synced =====================================
Ensure that firewalld and iptables are not running or enabled.
(systemctl status `firewalld`, systemctl status `iptables`).