4 minutes
OVN Oddities, Possible Route Leaks?
It seems that the new OpenVirtualNetwork setup in OpenStack can cause some big route leaks.
The reason for that is relatively simple, OVN relies on OVS as dataplane and OVS uses it’s own kernel driver to simply “inject” the packets into the network interfaces that has been selected based on kernel routes.
OVS has no notion of VRFs and by default will scrape all routing tables (Version 2.17 limits this to the default table only!).
This means that any attached network to the default table (or any table prior to version 2.17) could receive forged packets from OVS even if the outgoing interface was in a different VRF and should not have a route to them.
The ultimate solution to this was to filter it all out through IPTables. In this example all packets originate from ONLINK
VRF.
The Setup
VRFs:
-
ONLINK:
All FloatingIPs and other OnLink Resources’ vlan interfaces
br-fip
= OVS bridge for Floating IPs, hasvlan3012
as slavevlan3012
= Floating IP Range -
IBGP:
Multipath Full BGP tables contains only interfaces for the IBGP mesh (929572 routes).
No RFC routes present, only Public Routing Table and Router Loopbacks (which are also public addresses).
vlan4094
= Interconnect between RT1 and RT2bond0
= Parent interface for vlan4094 -
MGMT
This is the management VRF, containing mostly Management vlan interfaces and bridges.
vlan2
= Cross-Site management vlan shared by all device. This is NOTbr-mgmt
or OSA management. -
default:
This is the default/main routing table equivalent to no-VRFs.
br-mgmt
= OSA management bridge, hasvlan20
as slave (10.20.0.0/22
)vlan20
= Physical OSA management interfacebr-vxlan
= OSA tunnel bridge (for OVS VTEPs), has vlan52 as slavevlan52
= Physical OSA tunnel interfacebr-int
= OVS bridge with VTEPs, cannot be slaved to any VRF, has no physical ports tobr-vxlan
but works somehowgenev_sys_6081
= OVS vxlan magic interface, cannot be slaved to VRFs. This is where vxlan packages are manifested and enter the routerlxcbr0
= lxc internal bridge for internet (10.0.3.1/32
)bond0
= Parent interface forvlan3012
&vlan4094
, No VRF since it has no routes nor IPs on itself.mlag15
= LACP/MLAG interface parent forvlan2
,vlan20
,vlan52
,…, no VRF since it has no routes nor IPs on itself.
TCPDump
This is a VM with a floating ip assigned that will attempt to ping a device on 10.20.0.0/22
(br-mgmt
) which should not be possible.
10.20.0.11
is the management IP of a hypervisor showing that traffic does not just match local devices but is actually forwarded entirely.
rt2 ~ # tcpdump -neli any host 10.20.0.11 and icmp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:37:35.520683 genev_sys_6081 P ifindex 73 fa:16:3e:0f:e1:56 ethertype IPv4 (0x0800), length 104: 10.100.0.182 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521441 vlan3012 In ifindex 76 fa:16:3e:0f:e1:56 ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521468 br-mgmt Out ifindex 19 4e:ec:15:8a:e7:7f ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521473 vlan20 Out ifindex 37 4e:ec:15:8a:e7:7f ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.0.11: ICMP echo request, id 18, seq 1, length 64
07:37:35.521474 mlag15 Out ifindex 21 4e:ec:81:00:00:14 ethertype IPv4 (0x0800), length 108: IP14 (invalid)
07:37:35.521477 enp61s0f3 Out ifindex 7 4e:ec:81:00:00:14 ethertype IPv4 (0x0800), length 108: IP14 (invalid)
07:37:35.521733 enp61s0f3 In ifindex 7 b8:27:81:00:0b:b7 ethertype IPv4 (0x0800), length 108: IP10 (invalid)
07:37:35.521733 mlag15 In ifindex 21 b8:27:81:00:0b:b7 ethertype IPv4 (0x0800), length 108: IP10 (invalid)
07:37:35.521733 vlan2999 In ifindex 32 b8:27:eb:70:aa:cd ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 185.243.23.86: ICMP echo reply, id 18, seq 1, length 64
07:37:35.521750 vlan3012 Out ifindex 76 7e:8a:a4:b3:89:41 ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 185.243.23.86: ICMP echo reply, id 18, seq 1, length 64
07:37:35.522472 genev_sys_6081 Out ifindex 73 fa:16:3e:34:19:d0 ethertype IPv4 (0x0800), length 104: 10.20.0.11 > 10.100.0.182: ICMP echo reply, id 18, seq 1, length 64
IPRoute2
rt2 ~ # ip route show to match 10.20.0.11/32 vrf ONLINK
rt2 ~ # ip route show to match 10.20.0.11/32 vrf IBGP
rt2 ~ # ip route show to match 10.20.0.11/32 vrf MGMT
rt2 ~ # ip route show to match 10.20.0.11/32 vrf default
default via 10.20.0.1 dev br-mgmt proto static
10.20.0.0/22 dev br-mgmt proto kernel scope link src 10.20.0.6
Diagnostic/Solution Attempts:
Attempt: remove default route next-hop-vrf IBGP from ONLINK
and leak full-tables into ONLINK
to prevent a default-route from causing in-kernel routing issues.
Result: No notable difference, 10.20.0.11
is still pingable from VM
Attempt: ping 10.20.8.11
which is the VTEP IP of a hypervisor from the VM.
Result: 10.20.8.11
is NOT pingable but packet is routed/forwarded into br-vxlan - showing same underlying issue.
Attempt: create OVS
VRF and slave br-int
and br-vxlan
to it.
Result: OVS shows all VTEP destinations as offline: {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
Attempt: move everything but br-vxlan
away from the default VRF.
Result: OVS no longer pushes packets into br-mgmt
but lxbr0
starts acting up breaking Neutron
Related Posts: