Pythian Blog: Technical Track

Scaling ProxySQL Rapidly in Kubernetes

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on November 26, 2019. It’s not uncommon these days for us to use a high availability stack for MySQL consisting of Orchestrator, Consul and ProxySQL. You can read more details about this stack by reading Matthias Crauwels’ blog post How to Autoscale ProxySQL in the Cloud as well as Ivan Groenwold’s post on MySQL High Availability With ProxySQL, Consul and Orchestrator. The high-level concept is simply that Orchestrator will monitor the state of the MySQL replication topology and report changes to Consul which in turn can update ProxySQL hosts using a tool called consul-template. Until now we’ve typically implemented the ProxySQL portion of this stack using an autoscaling group of sorts due to the high levels of CPU usage that can be associated with ProxySQL. It’s better to be able to scale up and down as traffic increases and decreases. This ensures you’re not paying for resources you don’t need. This, however, comes with a few disadvantages. The first is the amount of time it takes to scale up. If you're using an autoscaling group and it launches a new instance it will need to take the following steps:
  1. There will be a request to your cloud service provider for a new VM instance.
  2. Once the instance is up and running as part of the group, it will need to install ProxySQL along with supporting packages such as consul (agent) and consul-template.
  3. Once the packages are installed, they'll need to be configured to work with the consul server nodes as well as the ProxySQL nodes that are participating in the ProxySQL cluster.
  4. The new ProxySQL host will announce to Consul that it’s available, which in turn will update all the other participating nodes in the ProxySQL cluster.
This can take time. Provisioning a new VM instance usually happens fairly quickly — normally within a couple of minutes — but sometimes there can be unexpected delays. You can speed up package installation by using a custom machine image, but since there's an operational overhead with keeping images up to date with the latest versions of the installed packages, it may be easier to do this using a script that always installs the latest versions. All in all, you can expect a scale-up to take more than a minute. The next issue is how deterministic this solution is. If you’re not using a custom machine image, you’ll need to pull down your config and template files from somewhere — most likely a storage bucket — and there’s a chance those files could be overwritten. This means the next time the autoscaler launches an instance it may not necessarily have the same configuration as the rest of the hosts participating in the ProxySQL cluster. We can take this already impressive stack and go a step further using Docker containers and Kubernetes. For those unfamiliar with containerization; a container is similar to a virtual machine snapshot but isn't a full snapshot that would include the OS. Instead, it contains just the binary that’s required to run your process. You create this image using a Dockerfile; typically starting from a specified Linux distribution, then using verbs like RUN, COPY and USER to specify what should be included in your container “image.” Once this image is constructed, it can be centrally located in a repository and made available for usage by machines using a containerization platform like Docker. This method of deployment has become more and more popular in recent years due to the fact that containers are lightweight, and you know that if the container works on one system it will work exactly the same way when it’s moved to a different system. This reduces common issues like dependencies and configuration variations from host to host. Given that we want to be able to scale up and down, it’s safe to say we’re going to want to run more than one container. That’s where Kubernetes comes into play. Kubernetes is a container management platform that operates on an array of hosts (virtual or physical) and distributes containers on them as specified by your configuration; typically a YAML-format Kubernetes deployment file. If you’re using Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP), this is even easier as the vast majority of the work in creating a Kubernetes deployment (referred to as a ‘workload’ in GKE) YAML is handled for you via a simple UI within the GCP Console. If you want to learn more about Docker or Kubernetes, I highly recommend Nigel Poulton’s video content on Pluralsight . For now, let’s stick to learning about ProxySQL on this platform. If we want ProxySQL to run in Kubernetes and operate with our existing stack with Consul and Orchestrator, we’re going to need to keep best practices in mind for our containers.
  1. Each container should run only a single process. We know we’re working with ProxySQL, consul (agent), and consul-template, so these will all need to be in their own containers.
  2. The primary process running in each container should run as PID 1.
  3. The primary process running in each container should not run as root.
  4. Log output from the primary process in the container should be sent to STDOUT so that it can be collected by Docker logs.
  5. Containers should be as deterministic as possible — meaning they should run the same (or at least as much as possible) regardless of what environment they are deployed in.
The first thing in the list above that popped out is the need to have ProxySQL, consul-template and consul (agent) isolated within their own containers. These are going to need to work together given that consul (agent) is acting as our communication conduit back to consul (server) hosts and consul-template is what updates ProxySQL based on changes to keys and values in Consul. So how can they work together if they're in separate containers? Kubernetes provides the solution. When you’re thinking about Docker, the smallest computational unit is the container; however, when you’re thinking about Kubernetes, the smallest computational unit is the pod which can contain one or more containers. Any containers operating within the same pod can communicate with one another using localhost ports. So in this case, assuming you’re using default ports, the consul-template container can communicate to the consul (agent) container using localhost port 8500, and it can communicate to the ProxySQL container using port 6032 given that these three containers will be working together in the same pod. So let’s start looking at some code, starting with the simplest container and working our way to the most complex.

Consul (Agent) Container

Below is a generic version of the Dockerfile I’m using for consul (agent). The objective is to install Consul then instruct it to connect as an agent to the Consul cluster comprised of the consul (server) nodes.
FROM centos:7
 RUN yum install -q -y unzip wget && \
  yum clean all
 RUN groupadd consul && \
  useradd -r -g consul -d /var/lib/consul consul
 RUN mkdir /opt/consul && \
  mkdir /etc/consul && \
  mkdir /var/log/consul && \
  mkdir /var/lib/consul && \
  chown -R consul:consul /opt/consul && \
  chown -R consul:consul /etc/consul && \
  chown -R consul:consul /var/log/consul && \
  chown -R consul:consul /var/lib/consul
 RUN wget -q -O /opt/consul/consul.zip https://releases.hashicorp.com/consul/1.6.1/consul_1.6.1_linux_amd64.zip && \
  unzip /opt/consul/consul.zip -d /opt/consul/ && \
  rm -f /opt/consul/consul.zip && \
  ln -s /opt/consul/consul /usr/local/bin/consul
 COPY supportfiles/consul.conf.json /etc/consul/
 USER consul
 ENTRYPOINT ["/usr/local/bin/consul", "agent", "--config-file=/etc/consul/consul.conf.json"]
Simply put, the code above follows these instructions:
  1. Start from CentOS 7. This is a personal preference of mine. There are probably more lightweight distributions that can be considered, such as Alpine as recommended by Google, but I’m not the best OS nerd out there so I wanted to stick with what I know.
  2. Install our dependencies, which in this case are unzip and wget.
  3. Create our consul user, group and directory structure.
  4. Install consul.
  5. Copy over the consul config file from the host where the Docker build is being performed.
  6. Switch to the consul user.
  7. Start consul (agent).
Now let’s check the code and see if it matches best practices.
  • Container runs a single process:
    • The ENTRYPOINT runs Consul directly, meaning nothing else is being run. Keep in mind that ENTRYPOINT specifies what should be run when the container starts. This means when the container starts it won’t have to install anything because the packages come with the image as designated by the Dockerfile, but we still need to launch Consul when the container starts.
  • Process should be PID 1:
    • Any process run by ENTRYPOINT will run as PID 1.
  • Process should not be run as root:
    • We switched to the Consul user prior to starting the ENTRYPOINT.
  • Log output should go to STDOUT:
    • If you run Consul using the command noted in the ENTRYPOINT, you’ll see log output goes to STDOUT.
  • Should be as deterministic as possible:
    • We’ve copied the configuration file into the container, meaning the container doesn’t have to get support files from anywhere else before Consul starts. The only way the nature of Consul will change is if we recreate the container image with a new configuration file.
There’s really nothing special about the Consul configuration file that gets copied into the container. You can see an example of this by checking the aforementioned blog posts by Matthias or Ivan for this particular HA stack.

ProxySQL Container

Below is a generic version of the Dockerfile I'm using for ProxySQL. The objective is to install ProxySQL and make it available to receive traffic requests on 6033 for write traffic, 6034 for read traffic and 6032 for the admin console which is how consul-template will interface with ProxySQL.
FROM centos:7
 RUN groupadd proxysql && \
  useradd -r -g proxysql proxysql
 RUN yum install -q -y https://github.com/sysown/proxysql/releases/download/v2.0.6/proxysql-2.0.6-1-centos67.x86_64.rpm mysql curl && \
  yum clean all
 COPY supportfiles/* /opt/supportfiles/
 COPY startstop/* /opt/
 RUN chmod +x /opt/entrypoint.sh
 RUN chown proxysql:proxysql /etc/proxysql.cnf
 USER proxysql
 ENTRYPOINT ["/opt/entrypoint.sh"]
Simply put, the code above follows these instructions:
  1. Start from CentOS 7.
  2. Create our ProxySQL user and group.
  3. Install ProxySQL and dependencies, which in this case is curl, which will be used to poll the GCP API in order to determine what region the ProxySQL cluster is in. We’ll cover this in more detail below.
  4. Move our configuration files and ENTRYPOINT script to the container.
  5. Make sure the ProxySQL config file is readable by ProxySQL.
  6. Switch to the ProxySQL user.
  7. Start ProxySQL via the ENTRYPOINT script provided with the container.
In my use case, I have multiple ProxySQL clusters — one per GCP region. They have to be logically grouped together to ensure they route read traffic to replicas within the local region but send traffic to the master regardless of what region it’s in. In my solution, a hostgroup is noted for read replicas in each region, so my mysql_query_rules table needs to be configured accordingly. In my solution, the MySQL hosts will be added to different host groups, but the routing to each hostgroup will remain consistent. Given that it’s highly unlikely to change, I have mysql_query_rules configured in the configuration file. This means I need to select the correct configuration file based on my region before starting ProxySQL, and this is where my ENTRYPOINT script comes into play. Let’s have a look at a simplified and more generic version of my code:
#!/bin/bash
 dataCenter=$(curl https://metadata.google.internal/computeMetadata/v1/instance/zone -H "Metadata-Flavor: Google" | awk -F "/" '{print $NF}' | cut -d- -f1,2)
 ...
 case $dataCenter in
  us-central1)
  cp -f /opt/supportfiles/proxysql-us-central1.cnf /etc/proxysql.cnf
  ;;
  us-east1)
  cp -f /opt/supportfiles/proxysql-us-east1.cnf /etc/proxysql.cnf
  ;;
 esac
 ...
 exec proxysql -c /etc/proxysql.cnf -f -D /var/lib/proxysql
The script starts by polling the GCP API to determine what region the container has been launched in. Based on the result, it will copy the correct config file to the appropriate location, then start ProxySQL. Let's see how the combination of the Dockerfile and the ENTRYPOINT script allows us to meet best practices.
  • Container runs a single process:
    • ENTRYPOINT calls the entrypoint.sh script, which does some conditional logic based on the regional location of the container, then ends by running ProxySQL. This means at the end of the process ProxySQL will be the only process running.
  • Process should be PID 1:
    • The command “exec” at the end of the ENTRYPOINT script will start ProxySQL as PID 1.
  • Process should not be run as root:
    • We switched to the ProxySQL user prior to starting the ENTRYPOINT.
  • Log output should go to STDOUT:
    • If you run ProxySQL using the command noted at the end of the ENTRYPOINT script you’ll see that log output goes to STDOUT.
  • Should be as deterministic as possible:
    • We’ve copied the potential configuration files into the container. Unlike Consul, there are multiple configuration files and we need to determine which will be used based on the region the container lives in, but the configuration files themselves will not change unless the container image itself is updated. This ensures that all containers running within the same region will behave the same.

Consul-template container

Below is a generic version of the Dockerfile I'm using for consul-template. The objective is to install consul-template and have it act as the bridge between Consul via the consul (agent) container and ProxySQL; updating ProxySQL as needed when keys and values change in Consul.
FROM centos:7
 RUN yum install -q -y unzip wget mysql nmap-ncat curl && \
  yum clean all
 RUN groupadd consul && \
  useradd -r -g consul -d /var/lib/consul consul
 RUN mkdir /opt/consul-template && \
  mkdir /etc/consul-template && \
  mkdir /etc/consul-template/templates && \
  mkdir /etc/consul-template/config && \
  mkdir /opt/supportfiles && \
  mkdir /var/log/consul/ && \
  chown -R consul:consul /etc/consul-template && \
  chown -R consul:consul /etc/consul-template/templates && \
  chown -R consul:consul /etc/consul-template/config && \
  chown -R consul:consul /var/log/consul
 RUN wget -q -O /opt/consul-template/consul-template.zip https://releases.hashicorp.com/consul-template/0.22.0/consul-template_0.22.0_linux_amd64.zip && \
  unzip /opt/consul-template/consul-template.zip -d /opt/consul-template/ && \
  rm -f /opt/consul-template/consul-template.zip && \
  ln -s /opt/consul-template/consul-template /usr/local/bin/consul-template
 RUN chown -R consul:consul /opt/consul-template
 COPY supportfiles/* /opt/supportfiles/
 COPY startstop/* /opt/
 RUN chmod +x /opt/entrypoint.sh
 USER consul
 ENTRYPOINT ["/opt/entrypoint.sh"]
Simply put, the code above follows these instructions:
  1. Start from CentOS 7.
  2. Install our dependencies which are unzip, wget, mysql (client), nmap-ncat and curl.
  3. Create our Consul user and group.
  4. Create the consul-template directory structure.
  5. Download and install consul-template.
  6. Copy the configuration file, template files and ENTRYPOINT script to the container.
  7. Make the ENTRYPOINT script executable.
  8. Switch to the Consul user.
  9. Start consul-template via the ENTRYPOINT script that’s provided with the container.
Much like our ProxySQL container, we really need to look at the ENTRYPOINT here in order to get the whole story. Remember, this is multi-region so there is additional logic that has to be considered when working with template files.
#!/bin/bash
 dataCenter=$(curl https://metadata.google.internal/computeMetadata/v1/instance/zone -H "Metadata-Flavor: Google" | awk -F "/" '{print $NF}' | cut -d- -f1,2)
 ...
 cp /opt/supportfiles/consul-template-config /etc/consul-template/config/consul-template.conf.json
 case $dataCenter in
  us-central1)
  cp /opt/supportfiles/template-mysql-servers-us-central1 /etc/consul-template/templates/mysql_servers.tpl
  ;;
  us-east1)
  cp /opt/supportfiles/template-mysql-servers-us-east1 /etc/consul-template/templates/mysql_servers.tpl
  ;;
 esac
 cp /opt/supportfiles/template-mysql-users /etc/consul-template/templates/mysql_users.tpl
 ### Ensure that proxysql has started
 while ! nc -z localhost 6032; do
  sleep 1;
 done
 ### Ensure that consul agent has started
 while ! nc -z localhost 8500; do
  sleep 1;
 done
 exec /usr/local/bin/consul-template --config=/etc/consul-template/config/consul-template.conf.json
This code is very similar to the ENTRYPOINT file used for ProxySQL in the sense that it checks for the region the container is in, then moves configuration and template files into the appropriate location. However, there is some additional logic here that checks to ensure that ProxySQL is up and listening on 6032 and that consul (agent) is up and listening on port 8500. The reason for this is the consul-template needs to be able to communicate with both these hosts. You really have no assurance as to what container is going to load in what order in a pod, so to avoid excessive errors in the consul-template log, I have it wait until it knows that its dependent services are running. Let’s go through our best practices checklist one more time against our consul-template container code.
  • Container runs a single process:
    • ENTRYPOINT calls the entrypoint.sh script, which does some conditional logic based on the regional location of the container, then ends by running consul-template. This means at the end of the process consul-template will be the only process running.
  • Process should be PID 1:
    • The command “exec” at the end of the ENTRYPOINT script will start consul-template as PID 1.
  • Process should not be run as root:
    • We switched to the consul user prior to starting the ENTRYPOINT.
  • Log output should go to STDOUT:
    • If you run Consul using the command noted at the end of the ENTRYPOINT script, you’ll see log output goes to STDOUT.
  • Should be as deterministic as possible:
    • Just like ProxySQL and consul (agent), all the supporting files are packaged with the container. Yes, there is logic to determine what files should be used, but you have the assurance that the files won’t change unless you create a new version of the container image.

Putting it all together

Okay, we have three containers representing the three processes we need to package together so ProxySQL can work as part of our HA stack. Now we need to put it all together in a pod so Kubernetes can have it run against our resources. In my use case, I’m running this on GCP, meaning once my containers have been built they're going to need to be pushed up to the Google Container Registry. After this we can create our workload to run our pod and specify how many pods we want to run. Getting this up and running can be done with just a few short and simple steps:
  1. Create a Kubernetes cluster if you don’t already have one. This is what provisions the Cloud Compute VMs the pods will run on.
  2. Push your three Docker images to the Google container registry. This makes the images available for use by the Kubernetes engine.
  3. Create your Kubernetes workload, which can be done simply via the user interface in the GCP console. All that’s required is selecting the latest version of the three containers you’ve pushed up to the registry, optionally applying some metadata like an application name, Kubernetes namespace, and labels, then selecting which cluster you want to run the workload on.
Once you click deploy, the containers will spin up and, assuming there are no issues bringing the containers online, you’ll quickly have a functioning ProxySQL pod in Kubernetes that follows these high-level steps:
  1. The pod is started.
  2. The three containers will start. In Kubernetes, pods are fully atomic. All the containers start without error or the pod will not consider itself started.
  3. The consul-template container will poll consul (agent) and ProxySQL on their respective ports until it’s confirmed those processes have started, then consul-template will start.
  4. Consul-template will create the new SQL files meant to configure ProxySQL based on the contents of the Consul key / value store.
  5. Consul-template will run the newly created SQL files against ProxySQL via its admin interface.
  6. The pod is now ready to receive traffic.

The YAML

During the process of creating your workload, or even after the fact, you’ll be able to see the YAML you’d normally have to create with standard Kubernetes deployments. Let’s have a look at the YAML that was created for my particular

No Comments Yet

Let us know what you think

Subscribe by email