Cluster Management¶
This guide covers the full lifecycle management of a KubeAuto cluster.
Lifecycle Commands¶
| Operation | Command | When to Use |
|---|---|---|
| Start / Resume | vagrant up | Begin a session |
| Suspend | vagrant suspend | Pause for the day |
| Halt | vagrant halt | Clean shutdown before reboot |
| Destroy | vagrant destroy -f | Full reset, delete everything |
| Re-provision | vagrant provision <node> | Re-run setup scripts on a node |
Recommended Workflow¶
Daily Development¶
# Morning — resume
vagrant up
# Work with your cluster...
vagrant ssh controlplane
kubectl get nodes
# Evening — save state
vagrant suspend
Before Host Reboot¶
Complete Reset¶
Re-provisioning Nodes¶
If a node's provisioning failed or you want to re-run the setup scripts:
Not Idempotent (v0.1.0)
The current provisioning scripts are not fully idempotent. Re-provisioning a fully configured node may fail because kubeadm init detects an existing cluster. For a clean re-setup, destroy and recreate the node instead:
Monitoring Cluster Health¶
Quick Health Check¶
vagrant ssh controlplane
# All nodes Ready?
kubectl get nodes
# All system pods Running?
kubectl get pods -n kube-system
# Calico healthy?
kubectl get tigerastatus
Detailed Diagnostics¶
# Node conditions and events
kubectl describe node controlplane
# Recent cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
# kubelet logs on a node
sudo journalctl -u kubelet --no-pager -n 50
Recovering from Common Issues¶
Nodes Show NotReady After Resume¶
Wait 30–60 seconds. If a node doesn't recover:
Recreating a Single Worker¶
Full Cluster Reset¶
This is the nuclear option — only use when the cluster is unrecoverable.