It seems that the new OpenVirtualNetwork setup in OpenStack can cause some big route leaks.

The reason for that is relatively simple, OVN relies on OVS as dataplane and OVS uses it’s own kernel driver to simply “inject” the packets into the network interfaces that has been selected based on kernel routes.

OVS has no notion of VRFs and by default will scrape all routing tables (Version 2.17 limits this to the default table only!).

This means that any attached network to the default table (or any table prior to version 2.17) could receive forged packets from OVS even if the outgoing interface was in a different VRF and should not have a route to them.

The ultimate solution to this was to filter it all out through IPTables. In this example all packets originate from ONLINK VRF.

The Setup

VRFs:

  • ONLINK:

    All FloatingIPs and other OnLink Resources’ vlan interfaces

    br-fip = OVS bridge for Floating IPs, has vlan3012 as slave

    vlan3012 = Floating IP Range

  • IBGP:

    Multipath Full BGP tables contains only interfaces for the IBGP mesh (929572 routes).

    No RFC routes present, only Public Routing Table and Router Loopbacks (which are also public addresses).

    vlan4094 = Interconnect between RT1 and RT2

    bond0 = Parent interface for vlan4094

  • MGMT

    This is the management VRF, containing mostly Management vlan interfaces and bridges.

    vlan2 = Cross-Site management vlan shared by all device. This is NOT br-mgmt or OSA management.

  • default:

    This is the default/main routing table equivalent to no-VRFs.

    br-mgmt = OSA management bridge, has vlan20 as slave (10.20.0.0/22)

    vlan20 = Physical OSA management interface

    br-vxlan = OSA tunnel bridge (for OVS VTEPs), has vlan52 as slave

    vlan52 = Physical OSA tunnel interface

    br-int = OVS bridge with VTEPs, cannot be slaved to any VRF, has no physical ports to br-vxlan but works somehow

    genev_sys_6081 = OVS vxlan magic interface, cannot be slaved to VRFs. This is where vxlan packages are manifested and enter the router

    lxcbr0 = lxc internal bridge for internet (10.0.3.1/32)

    bond0 = Parent interface for vlan3012 & vlan4094, No VRF since it has no routes nor IPs on itself.

    mlag15 = LACP/MLAG interface parent for vlan2,vlan20,vlan52,…, no VRF since it has no routes nor IPs on itself.

TCPDump

This is a VM with a floating ip assigned that will attempt to ping a device on 10.20.0.0/22 (br-mgmt) which should not be possible. 10.20.0.11 is the management IP of a hypervisor showing that traffic does not just match local devices but is actually forwarded entirely.

rt2 ~ # tcpdump -neli any host 10.20.0.11 and icmp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:37:35.520683 genev_sys_6081 P   ifindex 73 fa:16:3e:0f:e1:56 ethertype IPv4 (0x0800), length 104: 10.100.0.182 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521441 vlan3012 In  ifindex 76 fa:16:3e:0f:e1:56 ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521468 br-mgmt Out ifindex 19 4e:ec:15:8a:e7:7f ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521473 vlan20 Out ifindex 37 4e:ec:15:8a:e7:7f ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521474 mlag15 Out ifindex 21 4e:ec:81:00:00:14 ethertype IPv4 (0x0800), length 108: IP14 (invalid)
07:37:35.521477 enp61s0f3 Out ifindex 7 4e:ec:81:00:00:14 ethertype IPv4 (0x0800), length 108: IP14 (invalid)
07:37:35.521733 enp61s0f3 In  ifindex 7 b8:27:81:00:0b:b7 ethertype IPv4 (0x0800), length 108: IP10 (invalid)
07:37:35.521733 mlag15 In  ifindex 21 b8:27:81:00:0b:b7 ethertype IPv4 (0x0800), length 108: IP10 (invalid)
07:37:35.521733 vlan2999 In  ifindex 32 b8:27:eb:70:aa:cd ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 185.243.23.86: ICMP echo reply, id 18, seq 1, length 64
07:37:35.521750 vlan3012 Out ifindex 76 7e:8a:a4:b3:89:41 ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 185.243.23.86: ICMP echo reply, id 18, seq 1, length 64
07:37:35.522472 genev_sys_6081 Out ifindex 73 fa:16:3e:34:19:d0 ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 10.100.0.182: ICMP echo reply, id 18, seq 1, length 64

IPRoute2

rt2 ~ # ip route show to match 10.20.0.11/32 vrf ONLINK
rt2 ~ # ip route show to match 10.20.0.11/32 vrf IBGP
rt2 ~ # ip route show to match 10.20.0.11/32 vrf MGMT
rt2 ~ # ip route show to match 10.20.0.11/32 vrf default
default via 10.20.0.1 dev br-mgmt proto static
10.20.0.0/22 dev br-mgmt proto kernel scope link src 10.20.0.6

Diagnostic/Solution Attempts:

Attempt: remove default route next-hop-vrf IBGP from ONLINK and leak full-tables into ONLINK to prevent a default-route from causing in-kernel routing issues. Result: No notable difference, 10.20.0.11 is still pingable from VM

Attempt: ping 10.20.8.11 which is the VTEP IP of a hypervisor from the VM. Result: 10.20.8.11 is NOT pingable but packet is routed/forwarded into br-vxlan - showing same underlying issue.

Attempt: create OVS VRF and slave br-int and br-vxlan to it. Result: OVS shows all VTEP destinations as offline: {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}

Attempt: move everything but br-vxlan away from the default VRF. Result: OVS no longer pushes packets into br-mgmt but lxbr0 starts acting up breaking Neutron