Pythian Blog: Technical Track

Adding Networks to Exadata: Fun with Policy Routing

I’ve noticed that Exadata servers are now configured to use Linux policy routing. Peeking at My Oracle Support, I’ve noticed that note 1306154.1 goes in a bit more detail about this configuration. It’s apparently delivered by default with factory build images 11.2.2.3.0 and later. The note goes on to explain that this configuration was implemented because of asymetric routing problems associated with the management network:

Database servers are deployed with 3 logical network interfaces configured: management network (typically eth0), client access network (typically bond1 or bondeth0), and private network (typically bond0 or bondib0). The default route for the system uses the client access network and the gateway for that network. All outbound traffic that is not destined for an IP address on the management or private networks is sent out via the client access network. This poses a problem for some connections to the management network in some customer environments.


It goes on to mention a bug where this was reported:

@ BUG:11725389 – TRACK112230: MARTIAN SOURCE REPORTED ON DB NODES BONDETH0 INTERFACE

The bug is not public, but the title does show the type of error messages that would appear if a packet with a non-local source address comes out.

This configuration is implemented using RedHat Oracle Linux-style /etc/sysconfig/network-scripts files, with matched rule- and route- files for each interface.

A sample configuration, where the management network is in the 10.10.10/24 subnet, is:

[root@exa1db01 network-scripts]# cat rule-eth0
from 10.10.10.93 table 220
to 10.10.10.93 table 220
[root@exa1db01 network-scripts]# cat route-eth0
10.10.10.0/24 dev eth0 table 220
default via 10.10.10.1 dev eth0 table 220

This configuration tells traffic originating from the 10.10.10.93 IP (which is the management interface IP on this particular machine) and traffic destined to this address to be directed away from the regular system routing table to a special routing table 220. Route-eth0 configures table 220 with two routers: one for the local network and a default route through a router on the 10.10.10.1 network.

This contrasts with the default gateway of the machine itself:

[root@exa1db01 network-scripts]# grep GATEWAY /etc/sysconfig/network
GATEWAYDEV=bondeth0
GATEWAY=10.50.50.1

The difference between this type of policy routing and regular routing is that traffic with the _source_ address of 10.10.10.93 will automatically go through default gateway 10.10.10.1, regardless of the destination. (The bible for Linux routing configuration is the Linux Advanced Routing and Traffic Control HOWTO, for those looking for more details.)

I ran into an issue with this configuration when adding a second external network on the bondeth1 interface. I set up the additional interface configuration for a network, 10.50.52.0/24:

[root@exa1db01 network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1

I also added rule and route entries:

[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.104 table 211
to 10.50.52.104 table 211
[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211

This was a dedicated data guard network to a remote server, IP 10.100.52.10.

The problem with this configuration was that it didn’t work. Using tcpdump, I could see incoming requests come in on the bondeth1 interface, but the replies came out the system default route on bondeth0 and did not reach their destination. After some digging, I did find the cause of the problem: In order to determine the packet source IP, the kernel was looking up the destination in the default routing table (table 255). And the route for the 10.100.52.0 network was in non-default table 211. So the packet followed the default route instead, got a source address in the client-access network, and never matched any of the routing rules for the data guard network.

The solution ended up being rather simple: Take out the “table 211” for the data guard network route, effectively putting it in the default routing table:

[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1

And then we ran into a second issue: The main interface IP could now be reached, but not the virtual IP (VIP). This is because the rule configuration, taken from the samples, didn’t list the VIP address at all. To avoid this issue, and in the case of VIP addresses migrating from other cluster nodes, we set up a netmask in the rule file, making all addresses in the data guard network use this particular routing rule:

[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211

So to sum up, when setting up interfaces in a policy-routed Exadata system remember to:

  • Set up the interface itself and any bonds using ifcfg- files.
  • Create a rule- file for the interface, encompassing every possible address the interface could have. I added the entire IP subnet. Add “from” and “to” lines with a unique routing table number.
  • Create a route- file for the interface, listing a local network route and a default route with the default router of the subnet, all using the table number defined on the previous step.
  • Add to the route- file any static routes required on this interface, but don’t add a table qualifier.

The final configuration:

[root@exa1db01 network-scripts]# cat ifcfg-eth8
DEVICE=eth8
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[root@exa1db01 network-scripts]# cat ifcfg-eth12
DEVICE=eth12
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[root@exa1db01 network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1
[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211
[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1

No Comments Yet

Let us know what you think

Subscribe by email