Redundant first-hop router with Raspberry Pi

Building VRRP-based first-hop router redundancy with a hairpin router on Raspberry Pi

2020-12-30

In my house, I’m using a general-purpose Linux machine as router. This is a much more flexible setup than using a dedicated device, it provides numerous benefits comparing to using a dedicated router, but it did pose a unique challenge: my router is actually a VM, not a physical machine: the host’s LAN carries multiple VLANs and the router/firewall VM just passes traffic between them.

This works great, but introduces multiple additional points of failure, comparing to a dedicated hardware router, the VM host being the biggest one. It needs to be updated/patched from time to time, and I didn’t want to end up without Internet/DNS/DHCP at that time. And here comes the idea of using a Raspberry Pi as a “backup” router when my main one is down for whatever reason.

L2 networking

The backbone of my home’s LAN is Brocade (Ruckus) ICX6450 switch. You could find it relatively cheap on eBay, it’s a managed switch with 48 1 GbE 802.3at (PoE+) ports and 4 10 GbE SFP+ ports; it is relatively quiet and power-efficient. But of course any other smart or managed switch would work, the key here is that it must support VLANs.

I configured the port where my Internet modem is connected as VLAN 99 access port, and all my internal home devices are on VLANs 8-16: I keep office, kids and IoT traffic separate, with separate firewall rules on the router.

Here’s a diagram how my network is set up: Home Networking Diagram

L3 networking

Both the VM and the Raspberry Pi 3 are connected to “trunk” ports on the switch, these ports carry all the VLANs, tagged with 802.11q. Raspberry Pi’s network adapter is VLAN aware and is trivially configurable using normal OS means.

/etc/network/interfaces should have something like this (IPv6 and multiple internal VLANs omitted for brevity):

auto lo
iface lo inet loopback

# Do not assign any addresses to the trunk interfaces themselves
iface eth0 inet manual
iface eth0 inet6 manual

# Firewall interface, LAN facing, a "private" address for this router instance
auto eth0.8
iface eth0.8 inet static
  address 10.8.8.10
  netmask 255.255.255.0
  metric 4096
  gateway 10.8.8.100

# Internet modem interface, externally facing, down by default
no auto eth0.99
iface eth0.99 inet dhcp
  hwaddress 00:50:aa:bb:cc:dd

Note that:

I configured a “floating” router IP 10.8.8.100 as a gateway on the internal interface. When the WAN interface is down, local traffic will use that gateway. When the WAN interface is up, it will be used instead, because it has a lower (default) metric.
I configured the WAN interface to be down by default, and to use the same MAC address on both routers. This is crucial for seamless failover: the cable internet modem never realizes it talks to two machines alternatively. To it, it looks like just one, because of the same MAC address. Keepalived (see below) ensures that only one is active at a time.

Keepalived

Since I wanted the “first-hop router” role to be redundant and to float between the VM and the Pi router, I set up a “floating IP” address that would dynamically be assigned to only one of these machines, using VRRP protocol and keepalived. Keepalived documentation states:

Virtual Router Redundancy Protocol (VRRP) is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. In order to offer fastest network failure detection, Keepalived implements the Bidirectional Forwarding Detection (BFD) protocol. VRRP state transition can take into account BFD hints to drive fast state transition. Keepalived frameworks can be used independently or all together to provide resilient infrastructures.

This is how I configured Keepalived on the “main” router:

global_defs {
  router_id DEFAULT_ROUT_ID
  enable_script_security
  script_user root root
}

vrrp_instance VI_ROUTER {
  interface eth0.8
  virtual_router_id 10
  state BACKUP
  priority 100
  authentication {
    auth_type PASS
    auth_pass "keep-this-secret"
  }
  virtual_ipaddress {
    10.8.8.100/24 brd 10.8.8.255 dev eth0.8
  }
  notify_master /etc/keepalived/master.sh
  notify_backup /etc/keepalived/backup.sh
}

On the secondary router, the configuration is the same, the only difference is a lower priority value.

The master.sh and backup.sh scripts take care of the proper transition between the roles:

master.sh:

# Keepalived script to transition into the ACTIVE state (acting as an active router)

# Activate the network interface that is connected to the Internet modem
killall dhclient
ifup --force eth0.99

# Restart DNS resolver so the data is fresh
systemctl restart pdns-recursor

# Restart the firewall, to make sure traffic shaping policies are applied!
shorewall restart
shorewall6 restart

backup.sh:

# Keepalived script to transition into the BACKUP state (no longer acting as a router)

# Delete the link first, to make sure our DHCP lease is not released (the other router might already got it!)
ip link delete eth0.99
ifdown --force eth0.99
killall dhclient

Dealing with a singular WAN interface

Obviously, my cable modem does not expect two routers accessing it at the same time. So we need to make them appear indistinguishable from the cable modem’s point of view. This is achieved by:

Assigning WAN interfaces on both routers the same MAC address (see /etc/network/interfaces definition above). I just copied the auto-generated VM’s mac address to the router config on the Raspberry Pi.
Having only one router’s interface active at a time. Keepalived’s master/backup scripts take care of that.
Making sure the DHCP lease on that interface is not released when the interface goes down on one of the routers. By doing this, I ensure that the other router gets the same IP address, and all the network connections remain active. Again, keepalived scripts do that.

DNS

I used PowerDNS recursive resolver as a DNS server on both routers. DNS could be configured in two ways:

Specify both servers’ private addresses as local DNS servers. Simpler configuration, but if one of the servers goes down, this would create significant delays during DNS resolution. I wanted to avoid that.
Specify the floating IP address as the local DNS resolver address, and use local firewall configuration to redirect the DNS traffic to this floating address to the private address on each router. Since I’m using shorewall, this is as simple as this (in /etc/shorewall/rules):
```
# Direct traffic to the floating IP to this server
DNAT-           all             10.8.8.11 tcp     53      -       10.8.8.100
DNAT-           all             10.8.8.11 udp     53      -       10.8.8.100
```

DHCP

I use ISC DHCP server on both routers. ISC DHCP supports failover, and this is how I configured it. Both servers are active and running at the same time, and the internal failover algorithm resolves any IP assignment conflicts. Note that it’s crucial that the “floating” IP address is specified in the DHCP configuration as the gateway address, not the “internal” IP address of each router. The DNS resolver address should be specified according to whatever path you chose above, during the DNS server configuration.

Conclusion

I’ve been using this configuration for over a year, and its performance was exemplary. Switching (from master to slave) takes less than a second, even time-sensitive applications like Skype and Google Meet calls are minimally affected (nothing more than a sub-second freeze, calls are not dropped).

Network performance is also great: my Internet connection is 1gbps, the VM router (with its 10 GbE LAN card) can handle it without any problem, but obviously this is way above Rasperry’s Pi ability. However, I still get 10-15 Mbps of bandwidth on my backup Raspberry Pi router, and this is good enough as a backup.