Restore to a new Cluster
Whenever a cluster fails, the following steps can be used to re-create the cluster and restore to a newly created cluster.
Using the same installer/aks directory from the original installation, follow the steps below:
-
Run ‘./delete-cluster.sh’ to delete any remaining parts of the existing cluster/resource group/service principal.
-
Run ‘./create-cluster.sh’ to create the new cluster.
-
Install istio by running ‘./install-istio.sh’
-
Go to the Azure Portal and navigate to the Storage Accounts.
-
Click on the Storage Account created during the Velero install and then click on Containers under Data storage.
-
Click on the container that was created during the Velero install.
-
In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the folderinstaller/aks
-
Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz
-
Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar
-
Run ‘chmod +x restore.sh’ to provide restore script execute permissions.
-
To configure Velero into the cluster for the restore, run the script ‘./restore.sh’
-
Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero’
-
To see the available backups, run ‘velero backup get’
-
Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system’
-
The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.
-
Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.
-
After the restores are completed, you must restart the Istio Ingress. Execute the following commands:
POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)
kubectl -n istio-system delete pod $POD_ID
-
Check that all pods are in a running state.
-
Run ‘kubectl get pods -n <CONNECT_NAMESPACE>’
-
Run ‘kubectl get pods -n <AGENT_NAMESPACE>’
-
If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.
-
First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.
-
To edit the Ingress IP address in the agent deployment, run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent’
-
Find the old Ingress IP address in the file and change it to the new one.
-
Save the file and the Agent will be re-deployed and will start successfully.
-
-
-
The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.
-
Access the Connect UI to verify that Connect is fully restored and check the Health Status.
Restore to an existing Cluster
When a namespace or container is damaged, but the cluster and the resource group still exists, a restore can be done to an existing cluster.
Using the same installer/aks directory from the original installer, use the following steps.
-
Run ‘./uninstall-agent.sh’ to uninstall Connect Agent.
-
Run ‘./uninstall-connect.sh’ to uninstall Connect.
-
Run ‘./uninstall-velero.sh’ to uninstall velero.
-
Go to the Azure Portal and navigate to the Storage Accounts.
-
Click on the Storage Account created during the Velero install and then click on Containers under Data storage.
-
Click on the container that was created during the Velero install.
-
In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the install-aks .
-
Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz
-
Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar
-
To configure Velero into the cluster for the restore, run the script ‘./restore.sh’
-
Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero’
-
To see the available backups, run ‘velero backup get’
-
Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system’
The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.
-
Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.
-
After the restores are completed the Istio Ingress must be re-started. Execute the following commands:
POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)
kubectl -n istio-system delete pod $POD_ID
-
Check that all pods are in a running state.
-
Run ‘kubectl get pods -n <CONNECT_NAMESPACE>’
-
Run ‘kubectl get pods -n <AGENT_NAMESPACE>’
-
If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.
-
First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.
-
To edit the Ingress IP address in the agent deployment run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent’
-
Find the old Ingress IP address in the file and change it to the new one.
-
Save the file and the Agent will be re-deployed and will start successfully.
-
-
-
The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.
-
Access the Connect UI to verify that Connect is fully restored.