Restore to a new/existing Cluster - EcoSys - 3.0 - Installation & Upgrade

Restore to a new Cluster

Whenever a cluster fails, the following steps can be used to re-create the cluster and restore to a newly created cluster.

Using the same installer/aks directory from the original installation, follow the steps below:

Run ‘./delete-cluster.sh’ to delete any remaining parts of the existing cluster/resource group/service principal.
Run ‘./create-cluster.sh’ to create the new cluster.
Install istio by running ‘./install-istio.sh’
Go to the Azure Portal and navigate to the Storage Accounts.
Click on the Storage Account created during the Velero install and then click on Containers under Data storage.
Click on the container that was created during the Velero install.
In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the folderinstaller/aks
Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz
Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar
Run ‘chmod +x restore.sh’ to provide restore script execute permissions.
To configure Velero into the cluster for the restore, run the script ‘./restore.sh’
Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero’
To see the available backups, run ‘velero backup get’
Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system’
The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.
Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.
After the restores are completed, you must restart the Istio Ingress. Execute the following commands:

POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)

kubectl -n istio-system delete pod $POD_ID
Check that all pods are in a running state.
1. Run ‘kubectl get pods -n <CONNECT_NAMESPACE>’
2. Run ‘kubectl get pods -n <AGENT_NAMESPACE>’
  1. If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.
  2. First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.
  3. To edit the Ingress IP address in the agent deployment, run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent’
  4. Find the old Ingress IP address in the file and change it to the new one.
  5. Save the file and the Agent will be re-deployed and will start successfully.
The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.
Access the Connect UI to verify that Connect is fully restored and check the Health Status.

Restore to an existing Cluster

When a namespace or container is damaged, but the cluster and the resource group still exists, a restore can be done to an existing cluster.

Using the same installer/aks directory from the original installer, use the following steps.

Run ‘./uninstall-agent.sh’ to uninstall Connect Agent.
Run ‘./uninstall-connect.sh’ to uninstall Connect.
Run ‘./uninstall-velero.sh’ to uninstall velero.
Go to the Azure Portal and navigate to the Storage Accounts.
Click on the Storage Account created during the Velero install and then click on Containers under Data storage.
Click on the container that was created during the Velero install.
In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the install-aks .
Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz
Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar
To configure Velero into the cluster for the restore, run the script ‘./restore.sh’
Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero’
To see the available backups, run ‘velero backup get’
Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system’

The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.
Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.
After the restores are completed the Istio Ingress must be re-started. Execute the following commands:

POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)

kubectl -n istio-system delete pod $POD_ID
Check that all pods are in a running state.
1. Run ‘kubectl get pods -n <CONNECT_NAMESPACE>’
2. Run ‘kubectl get pods -n <AGENT_NAMESPACE>’
  1. If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.
  2. First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.
  3. To edit the Ingress IP address in the agent deployment run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent’
  4. Find the old Ingress IP address in the file and change it to the new one.
  5. Save the file and the Agent will be re-deployed and will start successfully.
The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.
Access the Connect UI to verify that Connect is fully restored.

Restore to a new/existing Cluster - EcoSys - 3.0 - Installation & Upgrade - Hexagon

EcoSys Connect Installation and Configuration (Azure Kubernetes Service)