Disaster Recovery Exercise

Sep 29, 2025 by Thibault Debatty | 145 views

News Teaching

https://cylab.be/blog/447/disaster-recovery-exercise

Today we are organizing a disaster recovery exercise: one of our services is down and we have to recover from a backup as quickly as possible.

cyber-incident-exercise.jpg

This is a very interesting exercise: it’s the occasion to test the quality of the documentation, our own skills, and the time required to restore services. It is also a great opportunity to learn how we can improve our architecture to be more resilient.

Finally, it’s the occasion to learn how to learn. Organizing a cyber incident exercise is actually full of challenges. From a high level perspective, the goal is to simulate an incident as realistically as possible, but keep actual production services running. This leads to a series of sub-questions:

  • Should we restore the backup service in the production environment, or should we use an isolated testing environment?
  • If we use an isolated environment, which other services should we duplicate to make the exercise possible, like the backup server for example?
  • Is it even feasible to duplicate the backup server which, by definition, holds massive quantities of data?
  • If we use duplicated backup server that stores fake data, or a subset of the real data, is the exercise still representative? Restoring a 1GB service or database is not the same as restoring a 1TB database!
  • Should we use dedicated domain names? Some services may be complex or even impossible to restore if we don’t use the same domain names.
  • If we have to restore the service using the same domain name as the production service (which is still runing), we may have to use an isolated environment with a controlled DNS server, with the constraints mentioned above.
  • If we use the same domain names for the restored services, how can we manage TLS certificates handling? Probably this means using a controlled (simulated) certificate authority (CA).

The exercise is still ongoing, but we can already conclude that it will serve two purposes: it enhances our operational readiness and it will help us create higher-quality exercises for our students in the future.

This blog post is licensed under CC BY-SA 4.0

This website uses cookies. More information about the use of cookies is available in the cookies policy.
Accept