Azure database for MySQL zone and region disaster recovery #
The Azure database for MySQL instance can be scaled up and down depending on your organization requirements without downtime. This article describes all possible failure events, which are managed by the Azure database for MySQL service, without admin intervention. These include any types of failures except for region-level failures. Azure database for MySQL offers zone-level redundancy at the database layer. The following sections discuss Azure database for MySQL zone and region disaster recovery options.
There are generally three types of redundancy in Azure for any Azure resource:
- Local-level redundancy. This means that the service is running on a single instance within the boundary of a single datacenter and relies upon fault domains and update domains. Failure of the single service instance will result in downtime.
- Zone-level redundancy. This means that the service is running on multiple instances which are spread, and in some cases continuously replicated, within Azure availability zones. An availability zone is a group of at least one datacenter. Each Azure region comprises at least three availabillity zones, which each zone comprising at least one fault-tolerant datacenter. So in zone-level redundancy scenarios, the service runs on at least three (3) instances and failure of the service in one zone means that there will not be any downtime for the service.
- Region-level redundancy. This is the most complex and the most expensive but also the most higly-available scenario. In this scenario, an Azure service is replicated in various availability zones in at least two Azure regions. This means that even in the unlikely event of a disaster which incapacitates a whole Azure region, the service will continue to run uninterrupted.
This article provides a high-level description of scenarios for Azure database for MySQL disaster recovery. Azure database for MySQL offers three (3) basic disaster recovery options:
- Point in Time Restore (PITR) from backup.
- Automatic zone-level DR.
- Geo-restore from geo-replicated backups. The Azure database for MySQL service offers geo-redundant storage (GRS) backups, which in turn allow for region-level redundancy with some expected downtime for restoring database backups from another region in case of a region failure. Azure Database for MySQL automatically creates server backups and stores them in user configured geo-redundant storage. Backups can be used to restore the MySQL server to a point-in-time. Azure Database for MySQL takes backups of the data files and the transaction log. These backups allow you to restore a server to any point-in-time within your configured backup retention period.
- Read replicas. These are constantly replicated MySQL server instances which run in another region and can instantly take over and become active and writable, in case the primary read-write replicate of the MySQL server fails.
Azure database for MySQL disaster recovery cases #
In the case of a zone-level disaster, there is no administrator intervention needed. All necessary steps are carried out automatically by the Azure database for MySQL service to ensure service recovery from another availability zone in the affected region. For MySQL zone-level failures and for planned service downtime, the MySQL instance automatically recovers without the need for admin intervention. The Azure database for MySQL is running as a container service within Azure and this allows for quick automatic failovers to a health VM if the currently productive VM faces a failure.
The following Microsoft diagram describes the automatic zone-level DR protection scenario for a MySQL server to which various MySQL clients and MariaDB clients connect. If data is corrupted, Microsoft always keep 3 copies for the MySQL service in geo-redundant storage, so the MySQL instance is attached to another storage. These processes normally take a couple of seconds.

In case of a region-level disaster, you should apply one of the aforementioned recovery options, i.e. geo-restore from geo-replicated backups or usage of read replicas. Geo-restore is only possible if you provisioned the server with geo-redundant backup storage. If you wish to switch from locally redundant to geo-redundant backups for an existing server, you must take a dump using mysqldump of your existing server and restore it to a newly created server configured with geo-redundant backups. Failure of a region is a rare event. In the event of a region-level failure, you can manually promote the read replica configured on the other region to be your production database server. Details about creating and managing read replicas can be found in this article. For MySQL region-level failures, when MySQL read replicas are running, all you have to do is stop MySQL replication. As soon as you stop replication, the replica will become a standalone server, and your MySQL operations can continue, so there is no need for a geo-restore.
In case of logical/user errors in the database schema or data, such as accidentally dropped tables or incorrectly updated data, you should revert to performing a point-in-time recovery (PITR), by restoring and recovering the data until the time just before the error had occurred. If you want to restore only a subset of databases or specific tables rather than all databases in the database server, you can restore the database server in a new instance, export the table(s) via mysqldump, and then use restore to restore those tables into your database.
Sources #
https://docs.microsoft.com/en-us/azure/mysql/concepts-business-continuity
Backup and restore - Azure Database for MySQL | Microsoft Docs
https://docs.microsoft.com/en-us/azure/mysql/concepts-high-availability