I have came across Percona xtrabackup but I am curious what is the best production deployment best practices and tools that are actually used by companies.
Probably want the selfhosted or sysadmin community for that.
In my experience xtrabackup is a bit hacky and risky but it does work fairly well. I think MariaDB also comes with it, but you must use the exact matching MySQL server for it to work. So Percona Server 8 with the matching xtrabackup, or MariaDB and its xtrabackup. You also need the exact same server version as the backup was made on to be able to restore it correctly.
As for incrementals, it’s up to you to track the LSN and then provide it back to xtrabackup when time comes to do the incremental. You also need to make sure the data structure is the same as when you did the original backup. If you have ran an alter table or whatever after the first backup, you have to make a full backup because the table have effectively been rewritten, and it will backup but not restore. It doesn’t need the last backup to exist at all to backup, but that also means it knows nothing about the previous backup.
It also only deals with putting the files in place. If you’re restoring a single database or a single table, you also need to manage deleting the database before restoring and also reloading the tablespace afterwards so the server rediscovers the new db/table that just popped into existence under its nose.
But if you can, taking a VM snapshot or filesystem snapshot is better if you can because usually the DB is able to recover from such a “crash”.
The problem with backing up a live database is that without help from the filesystem (ie. snapshot), you’re copying files as it’s writing to them so it can end up corrupted. What xtrabackup does is essentially also track and replay what the live server is doing so that it can make a copy that’s consistent up to the exact last transaction as if the server shut down cleanly. And that’s why the version must match exactly, it runs some bits of actual MySQL code to make it work.
If you want something more robust, I’d also recommend taking a regular mysqldump once in a while, so that if you have backup problems, you have a more universal backup that will restore well on most MySQL versions and forks. Longer possible loss, but better than losing everything. Of course, test your backups. Untested backups is no backup.
I don’t know if there’s tools for this already. My use case required a custom tool to manage it and integrate it with other automation for restores and adding replicas and whatnot. It’s really not that bad even as a simple bash script.
Hey I can’t tell you how important this response is to me. Thanks for helping out.
My tried-and-tested method has saved my (company’s clients) ass a few times.
Every Mysql/MariaDB server has at least one replication target. This replicant is not used for access by the infra, and can be paused, restarted, etc with no issue and is configured with this in mind.
We run a mysqldump on the replicant. Depending on the resiliency required, we store the dump on the replicant and/or a third location.
The tools differ, but the practice applies to pretty much every database system and the database has the benefit of not being interrupted during the backup (replication is paused during the backup, and resumed after completion). This also has the benefit of already having replication configured, and adding a secondary redundant instance you can swap out for the master (or using the backup replicant in a pinch) means disaster recovery is much faster.
Also, I dislike many things about Azure’s offerings, but their Flexible Database for MySQL does the above for you as one nicely packaged solution for a reasonable-but-not-cheap price.
Wow that’s amazing experience! Thanks for sharing it. 🙌🙌🙌
Veeam backup and recovery, setup application aware backups.
As far as physical backups go, a smaller company will send a tape copy home with an employee as an off-site option.
My experience: regular snapshots plus binlogs being copied to another ec2. So, depending on the retention of the snapshots and logs, I can do a Point In Time Recovery to any second in the last N days. In my case, I set for 2 days. I don’t send anything to S3.