Restore to a new/existing Cluster - EcoSys - 3.0 - Installation & Upgrade - Hexagon

EcoSys Connect Installation and Configuration (Azure Kubernetes Service)

Language
English
Product
EcoSys
Search by Category
Installation & Upgrade
EcoSys Version
3.0

Restore to a new Cluster

Whenever a cluster fails, the following steps can be used to re-create the cluster and restore to a newly created cluster.

Using the same installer/aks directory from the original installation, follow the steps below:

  1. Run ‘./delete-cluster.sh’ to delete any remaining parts of the existing cluster/resource group/service principal.

  2. Run ‘./create-cluster.sh’ to create the new cluster.

  3. Install istio by running ‘./install-istio.sh’

  4. Go to the Azure Portal and navigate to the Storage Accounts.

  5. Click on the Storage Account created during the Velero install and then click on Containers under Data storage.

  6. Click on the container that was created during the Velero install.

  7. In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the folderinstaller/aks

  8. Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz

  9. Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar

  10. Run ‘chmod +x restore.sh’ to provide restore script execute permissions.

  11. To configure Velero into the cluster for the restore, run the script ‘./restore.sh’

  12. Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero


  13. To see the available backups, run ‘velero backup get

  14. Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system

  15. The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.

  16. Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.

  17. After the restores are completed, you must restart the Istio Ingress. Execute the following commands:

    POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)

    kubectl -n istio-system delete pod $POD_ID

  18. Check that all pods are in a running state.

    1. Run ‘kubectl get pods -n <CONNECT_NAMESPACE>

    2. Run ‘kubectl get pods -n <AGENT_NAMESPACE>

      1. If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.

      2. First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.

      3. To edit the Ingress IP address in the agent deployment, run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent’

      4. Find the old Ingress IP address in the file and change it to the new one.

      5. Save the file and the Agent will be re-deployed and will start successfully.

  19. The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.

  20. Access the Connect UI to verify that Connect is fully restored and check the Health Status.

Restore to an existing Cluster

When a namespace or container is damaged, but the cluster and the resource group still exists, a restore can be done to an existing cluster.

Using the same installer/aks directory from the original installer, use the following steps.

  1. Run ‘./uninstall-agent.sh’ to uninstall Connect Agent.

  2. Run ‘./uninstall-connect.sh’ to uninstall Connect.

  3. Run ‘./uninstall-velero.sh’ to uninstall velero.

  4. Go to the Azure Portal and navigate to the Storage Accounts.

  5. Click on the Storage Account created during the Velero install and then click on Containers under Data storage.

  6. Click on the container that was created during the Velero install.

  7. In the root of the container, look for the file with the name <cluster>_<rsrcGrp>.tar.gz. Download this file and transfer it to the install-aks .

  8. Unzip the file by running gzip -d <cluster>_<rsrcGrp>.tar.gz

  9. Untar the file by running tar -xvf <cluster>_<rsrcGrp>.tar

  10. To configure Velero into the cluster for the restore, run the script ‘./restore.sh’

  11. Run the command to ensure that Velero pod is started by running ‘kubectl get pods -n velero


  12. To see the available backups, run ‘velero backup get’

  13. Restore the istio-sytem namespace, execute ‘velero restore create -from-backup <backup name> --include-namespaces istio-system

    The restore will run for several minutes. To monitor the process, run ‘velero restore describe <restore name>’. The restore is done and successful when the ‘Phase’ status shows ‘Completed’.

  14. Restore connect and connect-agent, execute ‘velero restore create --from-backup <backup name> --include-namespaces connect,connect-agent’. Ensure that the restore process gets to the Phase of Completed.

  15. After the restores are completed the Istio Ingress must be re-started. Execute the following commands:

    POD_ID=$(kubectl get --no-headers=true pods -o name -n istio-system | awk -F "/" '{print $2}' |grep ingressgateway)

    kubectl -n istio-system delete pod $POD_ID

  16. Check that all pods are in a running state.

    1. Run ‘kubectl get pods -n <CONNECT_NAMESPACE>

    2. Run ‘kubectl get pods -n <AGENT_NAMESPACE>

      1. If the IP address of the Ingress Gateway is changed, the Agent Pod will not start successfully, since it cannot connect to the config server.

      2. First, run ‘kubectl get services -n istio-system’ to get the new external Ingress Gateway.

      3. To edit the Ingress IP address in the agent deployment run ‘kubectl -n <AGENT_NAMESPACE> edit deployment agent

      4. Find the old Ingress IP address in the file and change it to the new one.

      5. Save the file and the Agent will be re-deployed and will start successfully.

  17. The restore process puts Velero into a Read-Only mode. To allow future backups, run the script ‘./reset-velero-after-restore.sh’ to put Velero back into Read-Write mode.

  18. Access the Connect UI to verify that Connect is fully restored.