How to Setup a Google Cloud SQL Replica from an External MySQL Primary

5 min read

Oct 26, 2023 4:00:23 PM

This is the first in a series of blog posts on how to migrate a MySQL instance from a VM to Google Cloud SQL.

Scenario

You are a DBA and were asked to migrate a MySQL instance running version 5.7 from a virtual machine on Google Compute Engine to Google Cloud SQL, also using MySQL 5.7. The current primary also has two replicas, one of which is used for backups only. This is your first time using Google Cloud SQL, and you have decided to use the API calls and learn more about the details of the product.

You have been requested to do the migration with a minimal amount of downtime. Due to this request, you plan to set up replication between the current MySQL environment and the target environment in Cloud SQL. After replication has been established and the data is up-to-date on the target, you can plan to cut over to the new environment running in Cloud SQL.

Prerequisites

We will first need to install Cloud SDK and have an account in Google Cloud with the required permissions. Next, we need to ensure that our current primary is prepared to replicate to a Cloud SQL instance. First, we need to check a few global variables and make sure that the proper options are in place:

GCE primary global variable options:
- log_bin is ON
- expire_log_days is 7
- server_id is unique within the replication topology
- binlog_fomat is ROW
- GTID_mode is ON
- enforce_gtid_consistency is ON

Next, we need to create a user account on the current primary, for the migration, which will be used for replication.

CREATE USER 'migration_user'@'%' IDENTIFIED BY 'PASSWORD';

GRANT SELECT, RELOAD, SHOW VIEW, REPLICATION CLIENT, REPLICATION SLAVE, EXECUTE ON *.* TO migration_user'@'%';

FLUSH PRIVILEGES;

At this point, we check our source schema to make sure that it fits the requirements needed by Cloud SQL. We verify that all tables to migrate use the InnoDB storage engine, and we check the definer on views, stored procedures, and triggers. Cloud SQL requires that the “definer” user is present before the migration.

Create the topology

The first part of our setup is to create a Source Representation instance in Cloud SQL, to represent the external primary. To do this, we need to provide some information about this instance, including its external IP address, MySQL version, GCP region, project, and the credentials from the user we created above.

We will create a JSON file, the “request data”, called source.json to run the API call. In this example, I used the hostname as the logical name for this instance.

{
      "name": "db-prd-gce",
      "region": "us-east1",
      "databaseVersion": "MYSQL_5_7",
      "onPremisesConfiguration": {
        "hostPort": "35.192.211.34:3306",
        "username": "migration_user",
        "password": "PASSWORD"
      }
    }

Next, I ran the API call to Cloud SQL using this file. I had to authenticate to Cloud SQL first. The authentication is usually active for about 30 minutes.

gcloud auth login

ACCESS_TOKEN="$(gcloud auth print-access-token)"

curl --header "Authorization: Bearer ${ACCESS_TOKEN}" \
     --header "Content-Type: application/json" \
     --data @./source.json \
     -X POST    https://sqladmin.googleapis.com/sql/v1beta4/projects/db-prd/instances

At this point, we are ready to create the replica in Cloud SQL, which will point to the external primary, above. Using the same process as above, we will first create the JSON request file, called replica.json. It is necessary to create the Cloud SQL instance using the API method, as that is the only way we have the option to identify the external primary associated with this instance.

For this scenario, we decided to use the same tier and disk size that we have on our current primary instance. To be able to replicate from the external primary, we name the primary instance using the “masterInstanceName”, defined in the preceding step.

To find specific values allowed for Cloud SQL fields, see the API reference at the end of this blog.

{
        "settings": {
            "tier": "db-custom-8-53248",
            "dataDiskSizeGb": "900",
            “dataDiskType”: PD_SSD,
            “ipConfiguration": {
                ipv4Enabled: true,
                "privateNetwork": "projects/db-prd/global/networks/db-prd-net"
             },
              locationPreference: {
                 kind: sql#locationPreference,
                 secondaryZone: us-east1-c,
                 zone: us-east1-b
             },
        },
        "availabilityType": REGIONAL,
        "masterInstanceName": "db-prd-gce",
        "region": "us-east1",
        "databaseVersion": "MYSQL_5_7",
        "name": "db-prd-primary"
    }

Now we run the API call using the file just created. Note that the command is similar to the one used when creating the external primary. We are running a “POST” to add the instance to Cloud SQL.

curl --header "Authorization: Bearer ${ACCESS_TOKEN}" \
     --header "Content-Type: application/json" \
     --data @./replica.json \
     -X POST      https://sqladmin.googleapis.com/sql/v1beta4/projects/db-prd/instances

Allow incoming connections on external instance

At this point, we need to set up network connectivity between our new Cloud SQL “replica”, and the external primary. We first need to find the outgoing external IP address of the Cloud SQL replica that we just created. To do this, we will check the instance in the GCP console.

Next, we will create a firewall rule for the GCE instance in the console. Navigate to the VM.

For the VM:

View Network Details
Choose vpc-firewall-rules
VPC Network Menu:
- Choose FIREWALL
- Create a firewall rule
Name: db-prd-to-cloud-sql
Network: db-prd-net
Ingress/ Allow
Target tags: cms-db-prd

Add IP address to Source IPv4 ranges: 35.192.211.34

Protocol and port: TCP/ 3306

Click CREATE

Seed the Cloud SQL Replica and Establish Replication

To start the seeding process, we will first create a backup of the primary external MySQL instance. Then we will copy the backup file to the bucket.

# cd /var/db/mysql/backups/prd   
# read -s pass
# nohup mysqldump -upythian -p$pass -hdb-prd-gce --set-gtid-purged=OFF --single-transaction --master-data=1 --hex-blob --events --routines --default-character-set=utf8mb4 --no-autocommit -–compress --databases  marketing content  > dbs_gcp_backup.sql &

Next, we need to make Cloud SQL aware of our backup file in the bucket, by updating its properties. The request data file is called updateSource.json.

{
      onPremisesConfiguration: {
        dumpFilePath: 'gs://db-prd-gcs-bucket/dbs_gcp_backup.sql'
      }
    }

Run the API call to make the update. The “PATCH” command will do the update.

curl --header "Authorization: Bearer ${ACCESS_TOKEN}" \
     --header "Content-Type: application/json" \
     --data @./updateSource.json \
     -X PATCH      https://sqladmin.googleapis.com/sql/v1beta4/projects/db-prd/instances/db-prd-gce

At this point, we are nearly ready to go. We are going to verify that the external primary is prepared to replicate to Cloud SQL. This includes requirements on the source schema, network connectivity, as well as the replication settings described above. Our request file is called verify.json, below.

{
      syncMode: 'online'
}

Now, we will run the “verify” using the API.

curl --header "Authorization: Bearer ${ACCESS_TOKEN}" \
     --header "Content-Type: application/json" \
     --data @./verify.json \
     -X POST      https://sqladmin.googleapis.com/sql/v1beta4/projects/db-prd/instances/db-prd-primary/verifyExternalSyncSettings

When the verify comes back clean, we can start the sync process. First create a new request file, called sync.json.

{
      "syncMode": "online",
      "skipVerification": "false"
    }

Start the sync process with the API call.

curl --header "Authorization: Bearer ${ACCESS_TOKEN}" \
     --header “Content-Type: application/json” \
     --data @./sync.json \
     -X POST      https://sqladmin.googleapis.com/sql/v1beta4/projects/db-prd/instances/db-prd-primary/startExternalSync

Check the status on the Cloud SQL console.

Conclusion

I found the process of setting up replication from an external primary to be a little different than I had expected. I had gone through a similar scenario a few times with AWS and found the high-level architecture and process to be similar, but the implementation was a bit different. I decided to use the API calls to get a better understanding of what is going on behind the scenes and allow for more flexibility, rather than using Database Migration Services.

The next blog in this series will cover cutting over from the external primary to the Google Cloud SQL replica. Stay tuned.