cRPD EVPN VXLAN Type 5

I was in need of a reproducible, quick and simple setup with a working EVPN VXLAN tunnel using Type-5, advertised via BGP. Type-5 routes are interesting, as they are advertised with enough information for sending VXLAN packets in the data plane (Linux kernel) by advertising the tunnel endpoint, VNI# and MAC address via extended community.

Juniper cRPD 21.2R1 just got published end of June 2021 and it contains a very nice feature, amongst others, allowing layer 3 configuration of existing Linux interfaces via Junos configuration. I use this to not only configure a loopback IP address, but also the IP addresses of all the links between both cRPD instances r1 and r2 and the connected clients (c1 .. c4).

The source for the docker-compose file with configs and validation scripts can be found in my repo on github: https://github.com/mwiget/crpd-evpn-vxlan-type5

Topology

+----+   +------+        +------+   +----+
| c1 |---| |--------| |---| c2 |
+----+ | r1 | BGP | r2 | +----+
+----+ | | ISIS | | +----+
| c3 |---| |--------| |---| c4 |
+----+ +------+ +------+ +----+

The docker-compose.yml file contains the definitions to deploy 2 cRPD containers r1 and r2, four Ubuntu based client containers (c1..c4) and a short lived container links to stitch all containers according to above diagram via veth pairs, named net1..netn.

I didn’t use normal docker networks to avoid any linux bridges and auto-assigned IP addresses to the links.

r1 and r2 are interconnected via two links with ISIS to share routes to their loopback addresses and BGP to share routes from two routing instances, blue and yellow, via EVPN VXLAN VNI 10002 and 10003.

c1 and c2 are connected to the blue network, while c3 and c4 are connected via the yellow network. All with static IP addressing.

Deploy

Assuming you downloaded cRPD container image and license key, you can simply run ‘make’ in the top level directory of the cloned repository and the containers and kick off a validation script that queries ISIS, BGP, cRPD and kernel routes for the EVPN networks.

Please check the content of the Makefile to see the individual steps taken to deploy the topology. It mainly consists of these steps:

– docker-compose up -d: deploy containers
– add-license-key.sh: add license key from file junos_sfnt.lic to r1 and r2
– validate.sh: validate the topology, see next paragraph for a detailed description

To stop the topology, simply run ‘docker-compose down’ or ‘make down’.

How it works

Check the containers are all running with

$ docker-compose ps
Name Command State Ports
-----------------------------------------------------------------------------------------------------------------------------
c1 /bin/bash /config/network- ... Up
c2 /bin/bash /config/network- ... Up
c3 /bin/bash /config/network- ... Up
c4 /bin/bash /config/network- ... Up
links /usr/bin/python3 /add_link ... Exit 0
r1 /sbin/runit-init.sh Up 179/tcp, 22/tcp, 3784/tcp, 4784/tcp, 50051/tcp, 6784/tcp, 7784/tcp, 830/tcp
r2 /sbin/runit-init.sh Up 179/tcp, 22/tcp, 3784/tcp, 4784/tcp, 50051/tcp, 6784/tcp, 7784/tcp, 830/tcp

The links container terminated with ‘Exit 0’, which means, it completed its task (stitching the containers together).

You can log into either cRPD instances using

$ docker-compose exec r1 bash

===>
Containerized Routing Protocols Daemon (CRPD)
Copyright (C) 2020, Juniper Networks, Inc. All rights reserved.
<===

root@r1:/# cli show version
Hostname: r1
Model: cRPD
Junos: 21.2R1.10
cRPD package version : 21.2R1.10 built by builder on 2021-06-21 14:13:43 UTC

Check the existence of eth0 and net1 .. net4 in r1 with

root@r1:/# ip l |grep ' eth\|net'
9: net1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 3000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 92:a6:db:bd:13:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 2
11: net2@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 3000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 9a:ad:ca:6b:bc:e1 brd ff:ff:ff:ff:ff:ff link-netnsid 2
13: net3@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master __crpd-vrf1 state UP mode DEFAULT group default qlen 1000
link/ether 76:1c:b4:fb:24:1c brd ff:ff:ff:ff:ff:ff link-netnsid 3
15: net4@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master __crpd-vrf2 state UP mode DEFAULT group default qlen 1000
link/ether 06:df:4c:aa:7d:88 brd ff:ff:ff:ff:ff:ff link-netnsid 4
1634: eth0@if1635: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:c0:a8:20:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0

You can spot the network namespaces of the different interfaces. net3 belongs to __crpd-vrf1 and net4 to __crpd-vrf2. These are auto-created by cRPD based on their junos configuration for the routing instances blue and yellow. eth0 remains connected to the default docker network, allowing Internet access from all the containers, e.g. to install additional packages for testing.
The links between r1 and r2 are named net1 and net2 and they have a larger MTU of 3000 set. This is to account for the overhead (and a lot more) and has been configured via arguments given to the links container in the docker-compose.yml file.

Side note: if you increase the MTU of the links to the client containers, make sure to also increase the MTU of the irb interfaces to avoid IP fragmentation!

Another important interface is the irb, found in r1 and r2:

oot@r1:/# ip l show dev irb
10: irb: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 92:d1:86:8c:d2:17 brd ff:ff:ff:ff:ff:ff
root@r2:/# ip l show dev irb
11: irb: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether ce:5e:b9:b5:42:fb brd ff:ff:ff:ff:ff:ff

We will find these MAC addresses further down in the BGP route and vxlan encapsulated ethernet frames.

Let’s check now ISIS and the reachability between loopback IP’s:

root@r1:/# cli show route | head 

inet.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.0.0.12/32 *[Direct/0] 03:01:12
> via lo.0
10.0.0.13/32 *[IS-IS/18] 03:00:24, metric 10
> to 198.18.1.2 via net1
to 198.18.2.2 via net2
192.168.32.0/20 *[Direct/0] 03:01:12
cli: remote side unexpectedly closed connection
root@r1:/# ping 10.0.0.13
PING 10.0.0.13 (10.0.0.13) 56(84) bytes of data.
64 bytes from 10.0.0.13: icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from 10.0.0.13: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 10.0.0.13: icmp_seq=3 ttl=64 time=0.075 ms
^C
--- 10.0.0.13 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2027ms
rtt min/avg/max/mdev = 0.033/0.059/0.075/0.018 ms

Great! Now we can check BGP with its EVPN routes:

root@r1:/# cli show route protocol bgp table blue.evpn.0 detail 

blue.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
5:10.0.0.13:1::0::1.1.2.0::30/248 (1 entry, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 10.0.0.13:1
Next hop type: Indirect, Next hop index: 0
Address: 0x5623d5c2779c
Next-hop reference count: 8
Source: 10.0.0.13
Protocol next hop: 10.0.0.13
Indirect next hop: 0x2 no-forward INH Session ID: 0x0
State: <Secondary Active Int Ext>
Peer AS: 64512
Age: 3:04:23 Metric2: 10
Validation State: unverified
ORR Generation-ID: 0
Task: BGP_64512_64512.10.0.0.13
Announcement bits (1): 0-blue-EVPN-L3-context
AS path: I
Communities: target:11:11 encapsulation:vxlan(0x8) router-mac:ce:5e:b9:b5:42:fb
Import Accepted
Route Label: 10002
Overlay gateway address: 0.0.0.0
ESI 00:00:00:00:00:00:00:00:00:00
Localpref: 100
Router ID: 10.0.0.13
Primary Routing Table: bgp.evpn.0
Thread: junos-main

5:10.0.0.13:1::0::abcd::101:200::126/248 (1 entry, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 10.0.0.13:1
Next hop type: Indirect, Next hop index: 0
Address: 0x5623d5c2779c
Next-hop reference count: 8
Source: 10.0.0.13
Protocol next hop: 10.0.0.13
Indirect next hop: 0x2 no-forward INH Session ID: 0x0
State: <Secondary Active Int Ext>
Peer AS: 64512
Age: 3:04:23 Metric2: 10
Validation State: unverified
ORR Generation-ID: 0
Task: BGP_64512_64512.10.0.0.13
Announcement bits (1): 0-blue-EVPN-L3-context
AS path: I
Communities: target:11:11 encapsulation:vxlan(0x8) router-mac:ce:5e:b9:b5:42:fb
Import Accepted
Route Label: 10002
Overlay gateway address: ::
ESI 00:00:00:00:00:00:00:00:00:00
Localpref: 100
Router ID: 10.0.0.13
Primary Routing Table: bgp.evpn.0
Thread: junos-main

You can find the irb mac address of the remote cRPD instance in the received route:

encapsulation:vxlan(0x8) router-mac:ce:5e:b9:b5:42:fb

That together with the protocol next hop 10.0.0.13 (which is the loopback IP of the remote instance) is sufficient to send encapsulated packets on their way. We can check the routing table for the vrf in Linux with

root@r1:/# ip vrf list      
Name Table
-----------------------
__crpd-vrf1 1
__crpd-vrf2 2

root@r1:/# ip route show vrf __crpd-vrf1
1.1.1.0/30 dev net3 proto kernel scope link src 1.1.1.1
1.1.1.2 via 1.1.1.2 dev net3 proto 22
1.1.2.0/30 encap ip id 10002 src 10.0.0.12 dst 10.0.0.13 ttl 255 tos 0 via 10.0.0.13 dev irb proto 22 onlink

First, we list the vrf network namespaces in Linux, assume blue belongs to __crpd-vrf1 and show its routing table. The last entry shows the VXLAN encap entry with the VNI 10002, source IP 10.0.12 and destination IP 10.0.0.13. “proto 22” indicates cRPD as its source.

Before we simply ping from c1 to c2, we can start tcpdump on the link between r1 and r2. From another terminal, start tcpdump, filtering on UDP packets only:

root@r1:/# tcpdump -n -i net1 -e udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:18:33.638474 92:a6:db:bd:13:d0 > ea:d8:db:7a:d3:07, ethertype IPv4 (0x0800), length 148: 10.0.0.12.37485 > 10.0.0.13.4789: VXLAN, flags [I] (0x08), vni 10002
92:d1:86:8c:d2:17 > ce:5e:b9:b5:42:fb, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 1.1.2.2: ICMP echo request, id 51, seq 1, length 64
07:18:33.638580 ea:d8:db:7a:d3:07 > 92:a6:db:bd:13:d0, ethertype IPv4 (0x0800), length 148: 10.0.0.13.37485 > 10.0.0.12.4789: VXLAN, flags [I] (0x08), vni 10002
ce:5e:b9:b5:42:fb > 92:d1:86:8c:d2:17, ethertype IPv4 (0x0800), length 98: 1.1.2.2 > 1.1.1.2: ICMP echo reply, id 51, seq 1, length 64

Then from another terminal, issue a ping from c1 to c2:

$ docker-compose exec c1 bash
root@c1:~# ip r get 1.1.2.2
1.1.2.2 via 1.1.1.1 dev net1 src 1.1.1.2 uid 0
cache
root@c1:~# ping -c1 1.1.2.2
PING 1.1.2.2 (1.1.2.2) 56(84) bytes of data.
64 bytes from 1.1.2.2: icmp_seq=1 ttl=62 time=0.247 ms

--- 1.1.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.247/0.247/0.247/0.000 ms

Looking at the captured packets, you’ll find the VNI 10002 and the mac addresses of the local and remote IRB interfaces in the payload.

Finally, lets do a quick iperf3 test between c1 and c2. Launch the server in c2 and the client in c1. Option ‘-T’ disables tty, which is handy for scripting to avoid paging.

$ docker-compose exec -T c2 iperf3 -s

and the client in c1:

$ docker-compose exec c1 iperf3 -c 1.1.2.2
Connecting to host 1.1.2.2, port 5201
[ 5] local 1.1.1.2 port 38048 connected to 1.1.2.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 3.72 GBytes 32.0 Gbits/sec 0 416 KBytes
[ 5] 1.00-2.00 sec 3.80 GBytes 32.7 Gbits/sec 0 594 KBytes
[ 5] 2.00-3.00 sec 3.84 GBytes 33.0 Gbits/sec 0 696 KBytes
[ 5] 3.00-4.00 sec 3.80 GBytes 32.7 Gbits/sec 0 731 KBytes
[ 5] 4.00-5.00 sec 3.79 GBytes 32.6 Gbits/sec 0 806 KBytes
[ 5] 5.00-6.00 sec 3.78 GBytes 32.4 Gbits/sec 0 806 KBytes
[ 5] 6.00-7.00 sec 3.76 GBytes 32.3 Gbits/sec 0 1.02 MBytes
[ 5] 7.00-8.00 sec 3.76 GBytes 32.3 Gbits/sec 0 1.02 MBytes
[ 5] 8.00-9.00 sec 3.78 GBytes 32.4 Gbits/sec 7 847 KBytes
[ 5] 9.00-10.00 sec 3.80 GBytes 32.6 Gbits/sec 0 969 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 37.8 GBytes 32.5 Gbits/sec 7 sender
[ 5] 0.00-10.00 sec 37.8 GBytes 32.5 Gbits/sec receiver

iperf Done.

All set!

Stop the running containers with

$ docker-compose down
Stopping r2 ... done
Stopping c4 ... done
Stopping r1 ... done
Stopping c2 ... done
Stopping c3 ... done
Stopping c1 ... done
Removing r2 ... done
Removing c4 ... done
Removing r1 ... done
Removing links ... done
Removing c2 ... done
Removing c3 ... done
Removing c1 ... done
Removing network crpd-evpn-vxlan-type5_default

References

– Source repo: https://github.com/mwiget/crpd-evpn-vxlan-type5

– Day One eBook: Data Center Deployment with EVPN/VXLAN by Deepti Chandra: https://www.juniper.net/documentation/en_US/day-one-books/TW_DCDeployment.v2.pdf

– Juniper cRPD technical documentation: https://www.juniper.net/documentation/product/us/en/crpd)

– Original repo for the links container: https://github.com/mwiget/link-containers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

Up ↑

%d bloggers like this: