Pythian Blog: Technical Track

How Linux Hugepages work to improve Oracle performance

This post is not intended to be an in-depth discussion of how Linux Hugepages work, but rather just a cursory explanation. In the course of discussing this topic with many DBAs I have found that while many DBAs may know how to configure Hugepages, they often do not know how it works. Knowing how Hugepages work is key to understanding the benefits of using them. Please note this discussion is for static Hugepages only. RedHat Linux 6 (and derived distros) introduced dynamic or transparent Hugepages. Oracle strongly recommends disabling Transparent Hugepages, and in fact they are disabled by default in Oracle distributions of Linux 6 and 7. See Oracle Support Note ALERT: Disable Transparent HugePages on SLES11, RHEL6, RHEL7, OL6, OL7, and UEK2 and above (Doc ID 1557478.1)

Linux Memory Allocation

Linux manages memory in a standard pagesize, with the pagesize usually set at 4k.
$ getconf PAGESIZE
 4096
 
 
When and Oracle instance starts up, a request is made for all of the shared memory for the SGA. For this demonstration let's assume the SGA size is a rather modest 16 GiB. When the SGA memory is made available to Oracle via standard memory allocation, it is done in chunks of pagesize as seen earlier with getconf PAGESIZE. How many chunks of memory is that? When the pagesize is 4k ( 16 * 1G ) / ( 4k ) or ( 16 * 2**30 ) / ( 4 * 2**10 ) = 4194304 That is 4,194,304 chunks of memory to be managed for our rather modest SGA. Managing that many discrete chunks of memory adds significant processing overhead. What can be done to reduce that overhead? If the size of the pages is increased, the number of memory chunks can be reduced, and thereby reducing the overhead required to manage the memory. Let's consider what happens when we use hugepages instead of the standard pagesize. First look at the Hugepages info:
 grep Huge /proc/meminfo
 HugePages_Total: 800
 HugePages_Free: 2
 HugePages_Rsvd: 0
 HugePages_Surp: 0
 Hugepagesize: 2048 kB
 
The pagesize for Hugepages is here set to the Linux standard of 2M. 2M pagesize ( 16 * 1G ) / ( 2M ) or ( 16 * 2**30 ) / ( 2 * 2**20 ) = 8192 4194304 / 8192 = 512 With Hugepages there are 512x fewer pages to manage! Whatever time was being used to manage this memory was just reduced by a factor of 512. You can see where the benefit is.

An Extreme Demonstration

Following is a little bash script to illustrate the difference between 4K and 2M pagesizes. This script will output the address for each page of memory for each page allocated for our 16G SGA. Commenting out the second definition of PAGESIZE will cause the script to run for 4k pagesize rather than 2M pagesize.
#!/bin/bash
 
 # first show segments for 64G SGA in 4k chunks
 # then in 2M chunks
 SGA_SIZE=16
 
 (( PAGESIZE = 4 * (2**10) ))
 (( PAGESIZE = 2 * (2**20) ))
 
 for sgaGig in $(seq 1 $SGA_SIZE)
 do
 
  (( chunk=2**30 )) # 1 Gig
 
  while [[ $chunk -gt 0 ]]
  do
  (( chunk -= PAGESIZE ))
  printf "0x%08x\n" $chunk
  done
 
 done
 
Here is an example of the output for 2M pagesize:
$ ./sga-page-count.sh | head -10
 0x3fe00000
 0x3fc00000
 0x3fa00000
 0x3f800000
 0x3f600000
 0x3f400000
 0x3f200000
 0x3f000000
 0x3ee00000
 0x3ec00000
 
The full output is rather boring and will not be shown. It does become interesting though when you see how much time is required to run the script for each pagesize.

2M pagesize

$ time ./sga-page-count.sh | wc -l
 8192
 
 real 0m0.069s
 user 0m0.064s
 sys 0m0.000s
 

4K pagesize

$ time ./sga-page-count.sh | wc -l
 4194304
 
 real 0m39.687s
 user 0m29.268s
 sys 0m8.136s
 
The script required 574 times longer to run with the 4K pagesize than with the 2M pagesize. 39.6 / .069 = 573.9 This is a bit of an oversimplification as we are just counting the memory pages, not actually managing memory. However this simple explanation and demonstration makes it easy to understand the performance benefit of Hugepages. Note: as a bonus, Hugepages are also not subject to swapping, which is another benefit.

No Comments Yet

Let us know what you think

Subscribe by email