Monitor cluster nodes
Your Watson Studio Local deployment consists of three types of nodes.
- Control nodes
- Nodes that manage your Kubernetes cluster and your Watson Studio Local deployment.
- By default, the cluster has three control nodes. If you notice that a node is down, attempt to restore it to prevent outages. The cluster can continue to run if one node fails. However, if two nodes fail, your cluster fails.
- In the Admin Console, control node names include the term manage.
- Storage nodes
- Nodes where Watson Studio Local metadata and any data that you load into Watson Studio Local is stored.
- By default, the cluster has three storage nodes. The data on these nodes is replicated across each node, so that if a node fails, you can still access the data. Watson Studio Local can continue to run if two nodes fail.
- Tip: If you run out of space on your storage nodes, add XFS-formatted disks to each node and extend the Logical Volume Management (LVM) partition to include the disks. If possible, ensure that the disks are the same size.
- In the Admin Console, storage node names include the term storage.
- Compute nodes
- Nodes where Watson Studio Local services, such as Spark, run.
- Unlike storage nodes, compute nodes are not replicated. When a new process starts, Kubernetes determines which node has sufficient capacity to run the process. Watson Studio Local can continue to run when multiple compute nodes fail. However, you might notice that performance decreases when multiple nodes are down.
- Additionally, if a node fails, Kubernetes attempts to bring any active processes up on another node. While Kubernetes attempts to bring up the processes, you might experience an outage. If Kubernetes cannot bring the processes up on another node and the outage continues, contact IBM Software Support.
- In the Admin Console, compute node names include the term compute.
Monitor node health
If you want a high-level overview of the status of your cluster, you can monitor the health of your cluster nodes from the Dashboard page. You can access the Dashboard page from the menu icon: ( ).
Specifically, you can monitor:
- CPU usage
- Memory usage
- Disk usage
For compute nodes, the usage of CPU and memory is measured against the CPU and memory that the Watson Studio Local users reserved.
Each card on the Dashboard page shows the average usage across all of the nodes:
However, you can expand the cards to see the specific usage for each node:
This data is refreshed every 10 seconds.
By default, Kubernetes attempts to balance the load across servers.
Contact IBM Software Support if you notice that all of the nodes in a group are overloaded for extended time. Nodes are overloaded when they run above 90% usage.
Nodes can become overloaded when:
- You have more users than your cluster configuration can handle. For example, your cluster doesn't have sufficient CPU, memory, or storage.
- A node fails and other nodes need to handle requests that would normally be handled by that node.
Monitor network usage
If you encounter an issue with Watson Studio Local, you can view the recent network traffic in your cluster on the Dashboard page. You can view the number of megabytes that were sent and received across the nodes of the cluster over the last 20 minutes.