Pythian Blog: Technical Track

Round Trip Network Latency Tests for your Remote Database Service

If you are using a remote Oracle database service, such as Oracle’s Autonomous Transaction Processing database, or an AWS RDS database, how do you measure the network latency?

The usual method of using ‘ping’ or other tools will not work, as there is no access to the underlying server of the database service.

Wouldn’t it be great if there were a way to measure the round trip latency from your location directly to the remote database service?

Well, now there is. Here’s how you can do it.


Database Link

Two databases are used for this testing.  The database that is the source of the testing, and the remote database that is the target of the testing.

A database link is created in the local database, and as you have likely already realized, the database link connects to the remote database.

For best results, the sqlplus session for this test is best run on the server for the local database.  If the test is run from a client on another machine, the results can be skewed by the overhead of the connection between the client (such as your laptop) and the local database.

If the client machine and local database are physically close, then the skew would likely be only a fraction of millisecond.  This may be negligible for most testing.

For this testing, the target database is a standard Oracle database created on a virtual server that is located in San Francisco California, USA.  San Francisco is approximately 700 miles (1126 km) from my location.

Here’s the database link as created in my local database:

create database link droplet_link connect to pingtest identified by XXXX using
        'localhost:6789/ORCLPDB1'

This particular database link is a bit of a special case. The database is running on a remote virtual machine from a low cost vendor.  As I don’t want to open a database port to the internet, the connection to this database is via an ssh tunnel.  The local port 6789 is forwarded to port 1521 on the database server.

The reason for creating this test database in a remote virtual machine is so that latency may also be measured with ‘ping’ for comparison.

Now that the database link is created, the first test can be performed.

 

Singleton Database Calls

This test consists of running this SQL statement 5 times, and measuring the difference between the time the SQL is executed, and the time the results are received.

This is the SQL statement used:

select systimestamp at local from dual@dblink

The code is actually a PL/SQL block. This can be seen at ping-remote-db.sql

The SQL is run 5 times at 2 second intervals.  As the first execution of the SQL may include time required to connect to the database referred to by the database link, the first result is ignored.

 

$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db.sql
Local Seconds Begin: 1641640285.703832
Local Seconds End: 1641640285.799489
Round Trip: 0.095657
==============================
Local Seconds Begin: 1641640287.864372
Local Seconds End: 1641640288.054133
Round Trip: 0.189761
==============================
Local Seconds Begin: 1641640290.103683
Local Seconds End: 1641640290.471617
Round Trip: 0.367934
==============================
Local Seconds Begin: 1641640292.537824
Local Seconds End: 1641640292.671595
Round Trip: 0.133771
==============================
Local Seconds Begin: 1641640294.711450
Local Seconds End: 1641640295.176477
Round Trip: 0.465027
==============================
Iterations: 5

The latency is from 95 – 465 ms.

How does that compare to ‘ping’?

$  ping -i 2 -c 5 137.nnn.nn.nnn
PING 137.184.84.204 (137.184.84.204) 56(84) bytes of data.
64 bytes from 137.184.84.204: icmp_seq=1 ttl=48 time=31.4 ms
64 bytes from 137.184.84.204: icmp_seq=2 ttl=48 time=623 ms
64 bytes from 137.184.84.204: icmp_seq=3 ttl=48 time=194 ms
64 bytes from 137.184.84.204: icmp_seq=4 ttl=48 time=179 ms
64 bytes from 137.184.84.204: icmp_seq=5 ttl=48 time=66.5 ms

--- 137.nnn.nn.nnn ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 8004ms
rtt min/avg/max/mdev = 31.441/219.196/623.788/211.862 ms

The first few times I tested this, the times of the database ping were approximately 2x the standard ping times.
Testing on other days shows less of a difference between the times.

There is extra overhead for the database ping as compared to the ping utility, so it would be expected to take a more time.

Just how much more time seems to vary with internet performance, as the remote databases are accessed via the internet.

Multiple Row Set Ping

If the extra time is due to database overhead, we should be able to devise a test that minimizes the database overhead.

One way to do this is to use a single SELECT statement that returns many rows, so that only 1 SQL execution is required.

Craft the SQL so that each row requires an entire TCP packet.  Then get the total time required to retrieve the rows, and calculate the average latency.

The following SQL was used:

select systimestamp at local as ping_timestamp, rpad(''X'',1500-35-28,''X'') as filler from dual@dblink_name connect by level <= 10;

The standard TCP packet size is 1500 bytes, the length of `systimestamp at local` is 35, and the ICMP header is 28 bytes (on Linux anyway).

When this new test is run, as the number of packets increases, the average time per row should be close to the times seen by using the ‘ping’ utility.

This script was used to do that:  ping-remote-db-multirow.sql

Results of the multirow db-ping test:

$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db-multirow.sql
Local Seconds Begin: 1641644528.779681
Local Seconds End: 1641644528.849177
Round Trip: 0.069496
==============================
Local Seconds Begin: 1641644528.849205
Local Seconds End: 1641644529.350788
Round Trip: 0.501583
==============================
Local Seconds Begin: 1641644529.350817
Local Seconds End: 1641644530.000492
Round Trip: 0.649675
==============================
Local Seconds Begin: 1641644530.000530
Local Seconds End: 1641644530.042419
Round Trip: 0.041889
==============================
Local Seconds Begin: 1641644530.042447
Local Seconds End: 1641644530.146593
Round Trip: 0.104146
==============================
Local Seconds Begin: 1641644530.146621
Local Seconds End: 1641644530.264522
Round Trip: 0.117901
==============================
Local Seconds Begin: 1641644530.264549
Local Seconds End: 1641644530.625663
Round Trip: 0.361114
==============================
Local Seconds Begin: 1641644530.625691
Local Seconds End: 1641644531.093010
Round Trip: 0.467319
==============================
Local Seconds Begin: 1641644531.093038
Local Seconds End: 1641644531.397555
Round Trip: 0.304517
==============================
Local Seconds Begin: 1641644531.397584
Local Seconds End: 1641644531.448949
Round Trip: 0.051365
==============================
Connect Time: 3.449366
Round Trip Avg: 0.289347
Iterations: 10

And now a ping test with a count of 10:

$  ping -c 10 137.nnn.nn.nnn
PING 137.184.84.204 (137.184.84.204) 56(84) bytes of data.
64 bytes from 137.184.84.204: icmp_seq=1 ttl=48 time=207 ms
64 bytes from 137.184.84.204: icmp_seq=2 ttl=48 time=124 ms
64 bytes from 137.184.84.204: icmp_seq=3 ttl=48 time=1281 ms
64 bytes from 137.184.84.204: icmp_seq=4 ttl=48 time=249 ms
64 bytes from 137.184.84.204: icmp_seq=5 ttl=48 time=383 ms
64 bytes from 137.184.84.204: icmp_seq=6 ttl=48 time=58.7 ms
64 bytes from 137.184.84.204: icmp_seq=7 ttl=48 time=588 ms
64 bytes from 137.184.84.204: icmp_seq=8 ttl=48 time=84.0 ms
64 bytes from 137.184.84.204: icmp_seq=9 ttl=48 time=74.8 ms
64 bytes from 137.184.84.204: icmp_seq=10 ttl=48 time=81.1 ms

--- 137.nnn.nn.nnn ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9043ms
rtt min/avg/max/mdev = 58.797/313.416/1281.181/360.031 ms, pipe 2

With an average ping time of 313 ms, and the average time with the db ping test at 289 ms, the results are quite close between the two tests.

These results are quite poor.  While writing this blog over the course of a few days, I have seen the round trip latency for this database vary from 25-500 ms.

If this were a real production database rather than a throw away test database, it would be a good time to call in a network engineer.

Now that we know the db ping test is fairly accurate, we can trust the results when used with an ATP database.

Here are the results when testing against and ATP database it is in an Oracle OCI data center in Phoenix Arizona, USA, which is about 1200 miles (1931 km) from my location.

Here is the SQL for the database link:

-- for a dblink the oracle server must have access to wallets
-- so the wallets must be available on the db server

create database link oci_link connect to  jkstill identified by XXXX using
'(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=****************atp21c01_high.adb.oraclecloud.com))(security=(ssl_server_dn_match=true)(ssl_server_cert_dn="CN=****.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")(my_wallet_directory=/u01/app/oracle/wallets/atp21c01)))'
/

And here are the results:

$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db-multirow.sql

Local Seconds Begin: 1641644841.938675
Local Seconds End: 1641644842.365383
Round Trip: 0.426708
==============================
Local Seconds Begin: 1641644842.365437
Local Seconds End: 1641644842.603871
Round Trip: 0.238434
==============================
Local Seconds Begin: 1641644842.603898
Local Seconds End: 1641644843.064586
Round Trip: 0.460688
==============================
Local Seconds Begin: 1641644843.064614
Local Seconds End: 1641644843.749951
Round Trip: 0.685337
==============================
Local Seconds Begin: 1641644843.749978
Local Seconds End: 1641644844.319656
Round Trip: 0.569678
==============================
Local Seconds Begin: 1641644844.319686
Local Seconds End: 1641644844.550935
Round Trip: 0.231249
==============================
Local Seconds Begin: 1641644844.550964
Local Seconds End: 1641644845.017424
Round Trip: 0.466460
==============================
Local Seconds Begin: 1641644845.017453
Local Seconds End: 1641644845.678804
Round Trip: 0.661351
==============================
Local Seconds Begin: 1641644845.678831
Local Seconds End: 1641644845.868892
Round Trip: 0.190061
==============================
Local Seconds Begin: 1641644845.868920
Local Seconds End: 1641644846.417399
Round Trip: 0.548479
==============================
Connect Time: 2.455469
Round Trip Avg: 0.483373
Iterations: 10

The average latency is 483 ms, which is an excessive time for this location.

The time required to connect is being measured as well.  In this case the connection took nearly 2.5 seconds.

Previous tests on a different day had much better results, so clearly there are currently some network performance issues.

If there were complaints of slow performance retrieving results from this database, the network performance would certainly be a prime suspect.

Refinements

The results appear good enough for general use, but, there are some refinements that could be made.

TCP Packet Size

The calculate in the SQL for packet size does not take into account the overhead required for organization of the data that is returned.  The ICMP overhead and size the timestamp string were used, but there will be a few bytes used to organize that data. Each row returned is likely required 2 TCP packets.

To correct this, the ‘tcpdump’ utility could be used to analyze the size of the data and overhead, and reduce the size of the filler string used in the SQL.

Local Network Latency

No attempt has been made to correct for local network latency.

For instance, my testing was performed from a client that is in the same rack as the local database. There is a small amount of latency between those machines, about 0.000250 seconds, or .25 ms.  The timestamps are being returned from the local machine via ‘timestamp at local’, where local is the client machine.

To make this more accurate, the latency could be calculated and accounted for.

However, that is really too much work, as the simpler solution is to just run these tests directly from the server where the local database is running.

Even without those refinements, the results are acceptable for most use cases.

Now that you have the means, you can try this out on your own.

The full code is found at db-ping

 

Comments (1)

Subscribe by email