Pythian Blog: Technical Track

MongoDB incremental backups using the oplog – Part 2

This is the second post in the series of blog posts for MongoDB replica set backups. I’ll try to show you how to properly run incremental backups using the oplog.rs collection. Restores with PITR using the full and incremental backups will be covered in the third post. If you have not read the first post that explains how to run full MongoDB backup using lvm snapshot, please visit this page.

What is Oplog

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database. Any secondary member can import oplog entries from any other member.

Each operation in the oplog is idempotent. That is, oplog operations produce the same results whether applied once or multiple times to the target dataset.

To read more on the Oplog, please read my previous blog post where I explore the oplog in more detail. 

 

Incremental Backup

 

Following the previous blog in part 1, we took full backup using lvm snapshot. Before taking full backup, we briefly locked the database for writes and took the latest position of the oplog. If you were following the commands, you should have something like this:

 

{"position":{"$timestamp":{"t":1666355398,"i":1}}}

 

Because this position matches with the time of our last full backup, we will be taking incremental backups. To do that, we can use a feature query or –queryFile=<path> from  mongodump. mongodump is a utility that creates a binary export of a database’s contents. 

–query=<json>, -q=<json> Provides a JSON document as a query that optionally limits the documents included in the output of mongodump

To use the query option, you must also specify the collection option.

–queryFile=<path> Specifies the path to a file containing a JSON document as a query filter that limits the documents included in the output of mongodump

 

Let’s see the sequence of commands we should run to start taking incremental backups. 

 

First, let’s build the query file that will be used in the query filter. Please note that the content in the –queryFile can be passed as –query if you decide to do so. We have the information for position as shown above. Our query file should have the following format:

 

{"ts": { "$gt": {"$timestamp":{"t":1666355398,"i":1}}}}

 

The above query will allow us to run mongodump on the oplog collection with the $gt operator on the timestamp “t”:1666355398 key. If we are running this 1 hour later since we took the full backup, it will only dump the last 1 hour operations log. If you are parsing the file from above, note that we only take the value of .position between the curly brackets {}.

 

cat oplog_position | jq .position

{

  "$timestamp": {

    "t": 1666355398,

    "i": 1

  }

}

 

The full mongodump command will look like this:

 

mongodump --quiet -u<username> -p<password> --port <port> --authenticationDatabase=admin  -d local -c oplog.rs --queryFile="$QUERY" -o /backups/incremental_1

-d Specifies a database to backup. We are using the local database.

-c Specifies a collection to backup. We are using the oplog.rs collection

–queryFile Path to the JSON file containing the query

-o Specifies the directory where mongodump will write BSON files for the dumped databases

 

If you are curious how this will look like using  –query parameter, see below:

 

mongodump --quiet -u<username> -p<password> --port <port> --authenticationDatabase=admin  -d local -c oplog.rs --query='{"ts": { "$gt": {"$timestamp":{"t":1666355398,"i":1}}}}' -o /backups/incremental_1

 

You will most likely be saving the incremental backup with your previous full backup, so adjust the -o parameter where you want to save the output. It will have files as shown below:

 

ls -l /backup/mongo_20221024/incremental_1/local/

total 32

-rw-r--r-- 1 root root 24720 Oct 24 13:10 oplog.rs.bson

-rw-r--r-- 1 root root   185 Oct 24 13:10 oplog.rs.metadata.json

 

Following the same command and query file, we can run incremental backups on custom frequency (hourly, once every 3 hours, twice a day, etc.). But running the next incremental backup after the full backup each time might not be very efficient. Even though we can do that, it will be more efficient to run our next incremental backup, after the previous incremental backup. To do that, we just need to save the latest oplog position from the last incremental backup and start from there. We can use another mongo utility bsondump to get the information directly from the oplog.rs.bson file.

 

bsondump --quiet /backup/mongo_20221024/incremental_1/local/oplog.rs.bson | tail -1 | jq -c .ts

{"$timestamp":{"t":1666616998,"i":1}}

 

We now have our timestamp for the next incremental backup. Which will be our starting point for our next query. Remember, we use the $gt operator. The content of the next queryFile will be:

 

{"ts": { "$gt": {"$timestamp":{"t":1666616998,"i":1}}}}

 

We just need to repeat this until we run the next full backup. One caveat is that in our last incremental backup, we need to make sure we set a query filter that will include oplog entries since the previous incremental backup, until the start of the full backup. The query file should look like this:

 

{"ts": { "$gt": {"$timestamp":{"t":1666616998,"i":1}}, "$lte": {"$timestamp":{"t":1666617998,"i":1}}}}

 

We now have two operators, $gt since the last incremental backup, and $lte before and including the start of the next full backup. 

In our next and final blog of this series, we will cover restores with point in time recovery using a full backup and incremental backups before we run an erroneous operation. 

NEXT:

RESTORES WITH POINT IN TIME RECOVERY

 

Conclusion

Running full daily backups followed by incremental backups will allow you to restore your database just before you execute an erroneous operation. Between each full backup you can run as many as you like incremental backups by saving the oplog last position. MongoDB has a utility mongodump that will help you run incremental backups using the oplog.rs collection. 

Comments (1)

Subscribe by email