Pythian Blog: Technical Track

Oracle RAC: Network Performance with Jumbo Frames

Introduction

When working with Oracle RAC, it's strongly advised to use Jumbo Frames for the network that provides the interconnect between nodes. As the nodes tend to send a lot of blocks back and forth across this private network, the larger the size of the packets, the fewer of them there are to send. The usual block size for an Oracle database is 8192 bytes. The standard MTU (Maximum Transmission Unit) for IP frame size is 1500 bytes. Sending an 8k Oracle block requires assembling six chunks of data to create a frame or packet. However, if Jumbo Frames are used (9000 bytes), the entire block fits neatly into a single frame or packet. ----- Note: Viewing the mechanical effects of MTU in action requires a fair amount of effort to setup a SPAN or Port Mirror. Use that port to capture the traffic from the wire. This is not being done for this test. Why this explanation? Because, as shown below, the packet size will be ~8k, even though the MTU is set to 1500. Because we cannot see the effects of MTU directly on the client or server, these effects are inferred from other data. "Frame" and "packet" are terms that seem to be used interchangeably. However, they are context-sensitive. That is, they occupy different layers of the OSI model. ----- On with the story... Recently, I was working with a two Node Oracle RAC system that runs in a VMWare ESXi 6.5 environment. It was thought that due to the optimizations being performed by VMWare in the virtual network stack, that Jumbo Frames were unnecessary. However, that does not seem to be the case. After some testing of throughput using both the standard 1500 byte MTU and 9000 byte Jumbo Frame MTU, the larger MTU size resulted in a 22% increase in throughput speed. Why did that happen? Well, keep reading to find out.

The Test Lab

Though the VMWare testing was done on Oracle Linux 7.8, the following experiments are being performed on Ubuntu 20.04 LTS. As there was no need to run Oracle, Ubuntu works just fine for these tests. Following are the two servers created:
ubuntu-test-mule-01: tcp test client - 192.168.1.226
 ubuntu-test-mule-02: tcp test server - 192.168.1.227
Some version information:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '^(NAME=|VERSION=)' /etc/os-release
 NAME="Ubuntu"
 VERSION="20.04 LTS (Focal Fossa)"
 

Network Configuration

Other than having different IP addresses, ubuntu-test-mule-01 and ubuntu-test-mule-02 are set up exactly the same way. Because this version of Ubuntu uses netplan to configure the interfaces, we modified the /etc/netplan/00-installer-config.yaml file to configure the two test interfaces. The interfaces used for the testing are enp0s8 and enp0s9. Then, netplan apply was used to enable the changes.
root@ubuntu-mule-01:~/perl-sockets/packet-test# cat /etc/netplan/00-installer-config.yaml
 # This is the network config written by 'subiquity'
 network:
  ethernets:
  enp0s3:
  dhcp4: true
  enp0s8:
  dhcp4: false
  addresses: [192.168.154.4/24]
  mtu: 9000
  enp0s9:
  dhcp4: false
  addresses: [192.168.199.35/24]
  version: 2
 
The results:
# ip a
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
 link/ether 08:00:27:b8:5c:dc brd ff:ff:ff:ff:ff:ff
 inet 192.168.1.227/24 brd 192.168.1.255 scope global enp0s3
 valid_lft forever preferred_lft forever
 inet6 fe80::a00:27ff:feb8:5cdc/64 scope link
 valid_lft forever preferred_lft forever
 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
 link/ether 08:00:27:ab:d5:44 brd ff:ff:ff:ff:ff:ff
 inet 192.168.154.5/24 brd 192.168.154.255 scope global enp0s8
 valid_lft forever preferred_lft forever
 inet6 fe80::a00:27ff:feab:d544/64 scope link
 valid_lft forever preferred_lft forever
 4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
 link/ether 08:00:27:22:fc:a2 brd ff:ff:ff:ff:ff:ff
 inet 192.168.199.36/24 brd 192.168.199.255 scope global enp0s9
 valid_lft forever preferred_lft forever
 inet6 fe80::a00:27ff:fe22:fca2/64 scope link
 valid_lft forever preferred_lft forever

Configure the TCP Test

Some time ago, I put together some Perl scripts for network throughput testing. There were several reasons for this, including:
  • You can copy and paste the code if necessary.
  • It is easy to modify for different tests.
The following was run on each of the test mule servers: # git clone https://github.com/jkstill/perl-sockets.git

On the ubuntu-mule-02 Server:

The following changes were made to server.pl: First, disable the dot reporting. By default a "." is printed every 256 packets that are received. You can disable this using the following line: my $displayReport = 0; # set to 0 to disable reporting Now, set the listening addresses. Default code:
my $sock = IO::Socket::INET->new(
  #LocalAddr => '192.168.1.255', # uncomment and edit adddress if needed
  LocalPort => $port,
  Proto => $proto,
  Listen => 1,
  Reuse => 1
 ) or die "Cannot create socket: $@";
We changed this to reflect the network interfaces that would be used for the testing on the server-side.
my $sock = IO::Socket::INET->new(
  LocalAddr => '192.168.154.5', # MTU 9000
  #LocalAddr => '192.168.199.36', # MTU 1500
  LocalPort => $port,
  Proto => $proto,
  Listen => 1,
  Reuse => 1
 ) or die "Cannot create socket: $@";
 
The appropriate interface was used for each test.

On the ubuntu-mule-01 Client:

Some test data was created. The use of /dev/urandom and gzip makes it unlikely that you can perform any compression. This is something I learned to do for quick throughput tests using ssh, as ssh compresses data. It's probably not necessary in this case, but then again, it doesn't hurt to ensure the test data is non-compressible.
root # cd perl-sockets/packet-test
 root # dd if=/dev/urandom bs=1048576 count=101 | gzip - | dd iflag=fullblock bs=1048576 count=100 of=testdata-100M.dat
 root # dd if=/dev/urandom bs=1048576 count=1025 | gzip - | dd iflag=fullblock bs=1048576 count=1024 of=testdata-1G.dat
 
 root@ubuntu-mule-01:~/perl-sockets/packet-test# ls -l testdata-1*
 -rw-r--r-- 1 root root 104857600 May 11 16:05 testdata-100M.dat
 -rw-r--r-- 1 root root 1073741824 May 11 16:04 testdata-1G.dat

The Driver Script

We used the packet-driver.sh script to run each of the tests from the client-side. This script simply runs a throughput test 23 times in succession, using the specified MTU size.
#!/usr/bin/env bash
 
 : ${1:?Call with 'packet-driver.sh <SIZE> '!}
 : ${mtu:=$1}
 
 if ( echo $mtu | grep -vE '1500|9000' ); then
  echo Please use 1500 or 9000
  exit 1
 fi
 
 declare -A localHosts
 declare -A remoteHosts
 
 localHosts[9000]=192.168.154.4
 localHosts[1500]=192.168.199.35
 
 remoteHosts[9000]=192.168.154.5
 remoteHosts[1500]=192.168.199.36
 
 blocksize=8192
 testfile=testdata-1G.dat
 
 cmd="./client.pl --remote-host ${remoteHosts[$mtu]} --local-host ${localHosts[$mtu]} --file $testfile --buffer-size $blocksize"
 
 for i in {0..22}
 do
  echo "executing: $cmd"
  $cmd
 done
 

Perform the Tests

For each test, we enabled the correct interface in server.pl, and then the started the server. For the client-side, we called the packet-driver.sh script with the required MTU size. The MTU size passed on the command line determines which interface is used on the client-side.

1500 MTU

On the server-side, make sure the address in server.pl is set for the 1500 byte MTU interface. Then, start the server:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl
 LocalAddr => '192.168.199.36', # MTU 1500
 
 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-1500.log
 Initial Receive Buffer is 425984 bytes
 Server is now listening ...
 Initial Buffer size set to: 2048
On the client-side, run packet-driver.sh:
root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 1500
 executing: ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192
 
 remote host: 192.168.199.36
 port: 4242
 bufsz: 8192
 simulated latency: 0
 
 bufsz: 8192
 Send Buffer is 425984 bytes
 Connected to 192.168.199.36 on port 4242
 Sending data...

9000 MTU

On the server-side, make sure the address in server.pl is set for the 9000 byte MTU interface. Then, start the server:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl
 LocalAddr => '192.168.154.5', # MTU 9000
 
 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-9000.log
 Initial Receive Buffer is 425984 bytes
 Server is now listening ...
 Initial Buffer size set to: 2048
Now, run packet-driver.sh on the client:
root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 9000
 executing: ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192
 
 remote host: 192.168.154.5
 port: 4242
 bufsz: 8192
 simulated latency: 0
 
 bufsz: 8192
 Send Buffer is 425984 bytes
 Connected to 192.168.154.5 on port 4242
 Sending data...
 

Reporting

When all tests are complete, use packet-averages.pl to calculate the averages across all tests per MTU size.
root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-1500.log
 key/avg: Bytes Received 1073733637.000000
 key/avg: Avg Packet Size 7898.147391
 key/avg: Packets Received 135948.304348
 key/avg: Average milliseconds 0.043824
 key/avg: Avg Megabytes/Second 172.000000
 key/avg: Avg milliseconds/MiB 5.818500
 key/avg: Total Elapsed Seconds 6.850447
 key/avg: Network Elapsed Seconds 5.958098
 
 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-9000.log
 key/avg: Bytes Received 1073733637.000000
 key/avg: Avg Packet Size 7519.793478
 key/avg: Packets Received 142790.217391
 key/avg: Average milliseconds 0.033652
 key/avg: Avg Megabytes/Second 213.165217
 key/avg: Avg milliseconds/MiB 4.692753
 key/avg: Total Elapsed Seconds 5.495095
 key/avg: Network Elapsed Seconds 4.805343
The average Total Elapsed Seconds for the 9000 MTU tests is only 80% of the time required for the 1500 MTU tests. From these results, it appears as if using Jumbo Frames is a pretty clear winner, even in a virtualized environment. This result might seem somewhat surprising, as the tests are not sending any data over a physical wire. In this case, the "network" is only composed of VirtualBox host network adapters. So, why then are Jumbo Frames still so much faster than the standard 1500 MTU size?

Performance Profiling

This time, we'll run a single test for each MTU size. We'll use the perf profiler to gather process profile information on the client-side. First, the 1500 MTU size:
perf record --output perf-mtu-1500.data ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192
 
Now the 9000 MTU size:
perf record --output perf-mtu-9000.data ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192
Let's see some reports from perf.

1500 MTU

root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data --stats | grep TOTAL
 TOTAL events: 28648
 
 root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data | head -20
 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 28K of event 'cpu-clock:pppH'
 # Event count (approx.): 7137500000
 #
 # Overhead Command Shared Object Symbol
 # ........ ....... .................. ....................................
 #
 56.35% perl [kernel.kallsyms] [k] e1000_xmit_frame
 21.00% perl [kernel.kallsyms] [k] e1000_alloc_rx_buffers
 9.71% perl [kernel.kallsyms] [k] e1000_clean
 2.22% perl [kernel.kallsyms] [k] __softirqentry_text_start
 1.74% perl [kernel.kallsyms] [k] __lock_text_start
 0.61% perl [kernel.kallsyms] [k] copy_user_generic_string
 0.58% perl [kernel.kallsyms] [k] clear_page_rep
 0.39% perl libpthread-2.31.so [.] __libc_read
 0.30% perl [kernel.kallsyms] [k] do_syscall_64

9000 MTU

root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data --stats | grep TOTAL
 TOTAL events: 25259
 root@ubuntu-mule-01:~/perl-sockets/packet-test#
 root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data | head -20
 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 25K of event 'cpu-clock:pppH'
 # Event count (approx.): 6290500000
 #
 # Overhead Command Shared Object Symbol
 # ........ ....... .................. ........................................
 #
 64.15% perl [kernel.kallsyms] [k] e1000_xmit_frame
 16.92% perl [kernel.kallsyms] [k] e1000_alloc_jumbo_rx_buffers
 9.03% perl [kernel.kallsyms] [k] e1000_clean
 1.79% perl [kernel.kallsyms] [k] __softirqentry_text_start
 1.66% perl [kernel.kallsyms] [k] __lock_text_start
 0.41% perl [kernel.kallsyms] [k] clear_page_rep
 0.39% perl [kernel.kallsyms] [k] copy_user_generic_string
 0.26% perl [kernel.kallsyms] [k] do_syscall_64
 0.24% perl libpthread-2.31.so [.] __libc_read
A conclusion we might draw from these reports: When using an MTU of 9000, the test program spent more time sending data ( e1000_xmit_frame) and less time in overhead. Note that 16.92% of the time was spent in allocating Jumbo-sized frames through e1000_alloc_jumbo_rx_buffers in the MBU 9000 test, versus 21% of the time required in the MTU 1500 test for e1000_alloc_rx_buffers. The reason for the performance increase, in this case, seems to be this: The use of Jumbo Frames simply requires less work for the server. Rather than assembling six 1500 byte frames into a packet to accommodate our 8192-byte packet, Jumbo Frames can get it all in one frame. Though these tests were run using servers virtualized with VirtualBox, the results are quite similar to those seen in servers running in VMWare ESXi. The fact that the servers are virtual does not reduce the need to ensure that RAC nodes get the fastest possible throughput on the private network used for the interconnect... And that means using Jumbo Frames.

No Comments Yet

Let us know what you think

Subscribe by email