Hello and welcome to Juniper’s Campus Fabric EVPN Multi-Homing overview. We will discuss why Campus Fabric EVPN Multi-Homing is being deployed. What are the benefits? We’ll dive deep into various building blocks that make up the Campus Fabric EVPN Multi-Homing architecture. And then we’ll wrap up with the various Juniper hardware platforms that support the technology.
Although this is not an exhaustive list, these are the top four technical challenges that we’ve experienced in Legacy Campus networks over the past couple of years. And we’ll start with micro-segmentation. There are many reasons why customers segment traffic within a broadcast domain, that could be due to legacy equipment that can’t understand a layer 3 boundary, possible application requirements for layer 2. The list is fairly extensive.
At any rate, being able to isolate traffic within a broadcast domain itself is very problematic and can only really be done through the use of private VLANs. Private VLANs are problematic for a number of reasons. One, they’re difficult to configure, difficult to interoperate between third party devices, and then they lack scale. The second challenge here is really inefficient ACL usage. Most customers place ACL, or if you want to call them firewall filters everywhere to further segment.
So access distribution core and then, of course, you’ve got your firewalls. And what happens is you have ACL sprawl, which becomes an operational challenge. I haven’t experienced, I haven’t talked to a customer that doesn’t have the need to extend at least a couple of VLANs across their campus network, even if they’re routing at the access layer. And so when that happens, you have to plumb networks, you have to plumb VLANs from access-to-access across your network, which increases your blast radius, and exposes your network, your entire network effectively to broadcast domains and of course spanning tree loops and so forth.
And then the last challenge here is really lack of standards. Even though there are standards that are built, most of the time they’re not exactly adhered to. A perfect example would be the distribution layer Here. Those two switches are interconnected, which is a very typical MC-LAG deployment. I have not yet seen a customer deployment where you have multi-vendor MC-LAG, it’s always vendor-specific. And then with multichassis LAG, you can’t scale past two devices so horizontal or scale in general becomes problematic.
Juniper’s Campus Fabric based on EVPN and VXLAN solves these campus problems. Starting with micro-segmentation, remember the conversation around private VLANs and the challenges there. With EVPN and VXLAN and specifically group-based policy. Now, group-based policy is based on VXLAN. The header of VXLAN includes a group-based policy 16-bit identifier, and that identifier can be used to formulate micro-segmentation capabilities. So imagine a device plugs into the access layer, it authenticates through radius, and that authentication provides this scalable group tag, this identifier within a VXLAN line header.
Now at the access layer within the switch, I can micro-segment using that SGT, that scalable group tag. I can place critical applications in SGT 100, other critical applications in SGT 200, and then forbid them from communicating. So group-based policy addresses private VLAN within a switch and across an EVPN and VXLAN fabric. Now, what’s important here to understand is that EVPN and Multi-Homing today does not support group-based policy. I want to make that distinction.
Group-based policy is supported with our Campus Fabric IP clause. So Campus Fabric in general, the technology terminology addresses micro segmentation at the access layer through Campus Fabric IP clause. Efficient ACL usage can now be realized through group-based policy. So now I can apply across all my access layer switches, the same type of firewall filter, which makes it operationally friendly. For me to extend VLANs across my network, remember the challenge we talked about earlier, I don’t have to plug VLANs. I can use VXLAN as a tunneling methodology to extend layer 2 across a layer 3 network. In doing that, I can leverage my equal cost multi-path underlay.
I don’t have to worry about broadcast and unknown unicast. All the challenges with that type of traffic from the perspective of flooding, I’m using my control plane, which is eBGP to manage the use of withdrawal and learning of MAC addresses across this IP network. So it’s really a very flexible type of environment. And then EVPN and VXLAN is a standard-based methodology. We’ve taken this technology from our data center practice, which has been very successful, and have applied here to our Campus Fabric.
Juniper’s Campus Fabric EVPN and Multi-Homing addresses. A couple of key problem areas in today’s legacy networks. When we’re dealing with collapse core, think of two devices that are physically interconnected. We talked about this at a very high level earlier. Those two devices probably are running MC-LAG. MC-LAG is a very popular approach for customers over the last 10 to 12 years that bring all of their technology into what we call a collapse core, two devices. They’ll interconnect those two devices that interconnect is a proprietary interconnect, typically an ICCP link or an ICO link. And those two devices kind of share the same brain, you synchronize traffic between them, you synchronize heartbeats.
And so with that, it becomes very difficult for those two vendor-specific devices to interoperate with other devices in that same manner. So the scale of a collapse core is for the most part two devices. So scale is challenging. The other challenge is that these technologies are very long in the tooth. And so technology refreshed for customers right now, they’re looking for collapse core. This is exactly what EVPN Multi-Homing addresses.
So first of all, EVPN and Multi-Homing is a standard. The EVPN and VXLAN is a standard technology we talked about earlier. The two devices communicate across an IP underlay. They’ll overlay traffic on top of the underlay, which is through the use of EVPN and VXLAN. And then the collapse core technology within Juniper, that Multi-Homing technology can extend past two devices. So if I wanted to bring in a third or fourth device, I’d just add those to the ecosystem. Now, from a layer 2 perspective, from a southbound standpoint, this is really where some of the cost savings can be realized with EVPN Multi-Homing.
First of all, I can connect to any type of device. We see access layer virtual chassis. Now, virtual chassis is a Juniper technology, but this could be any kind of switch stack that supports standard LACP. So you’ll see the LAGs from the access layer switch Multi-Homing to two devices, that’s why we call it Multi-Homing. That is just a standard LAG. So I don’t have to upgrade my access layer, I don’t have to build new technologies there. I don’t have to add certain software packages. The LAG concept LACP is a standard. And so as long as that’s supported at the access layer to the device itself, those two links are shared as one, and they see one MAC address northbound. That’s very, very, very important to understand.
So I don’t have to retrofit and replace any of my access devices. Multi-Homing active-active Multi-Homing is supported by default. So now I can load balance traffic across those two links versus maybe having one link is active, one link is standby. And then we talked about this earlier, horizontal scale with this approach where I can extend this past to physical devices.
Juniper’s validated design spans all Campus Fabric architectures. In this presentation, we focus on EVPN and Multi-Homing. The technology is ESI-LAG. That effectively means that the collapse core devices that are physically interconnected to the southbound access switches look like one system. This is analogous to a degree to MC-LAG. Although it is a standard, it is much more scalable. Targeting small to medium campuses, perfect for north south traffic patterns, eliminate spanning tree. Once again, is a very easy migration path simply because the access layer switches don’t have to change your technology. This is a standard LAG, whether it’s a Juniper switch or a third party switch.
The Campus Fabric building blocks built on EVPN and VXLAN across all technologies and Campus Fabric’s support underlay, overlay layer 2, VXLAN gateway layer 3, VXLAN gateway and LAG to the fabric. And in the case of EVPN Multi-Homing, you’ll hear the term ESI-LAG. That is from the fabric to the southbound access switches. EVPN Multi-Homing illustrated here addresses the collapsed core deployment model. Both switches are interconnected by high speed links in an active-active ECMP model. Notice the term ESI-LAG, which is a common term used within EVPN Multi-Homing describes the active-active connection between the southbound switching apparatus and the collapse core deployment model.
Switches can be any type of layer 2 switch, which is another benefit of the technology in that all it requires is a LAG or LACP bundle. First, we build a very simple IP fabric underlay between the two collapse cores. Notice no requirement for spanning tree or technologies like multi-chassis LAG. This is a topology agnostic deployment model whereby we can scale this architecture out to more switches in this underlay, depending on scale requirements, without requiring us to have to build a completely separate network. Utilize OSPF or eBGP between both collapse cores for loopback reachability. And as we build multiple links, we can utilize ECMP equal cost multi-path for load balancing.
The overlay control plane is managed using multi-protocol BGP with address family EVPN. This is our first instantiation of EVPN, which is our control plane. We’ll talk about VXLAN in subsequent slides. EVPN is what manages our MAC tables and layer 2 extensibility if I want extend a VLAN from one part of the network to the other part of the network. In this case, we’re using BGP to manage MAC learning and MAC withdrawals as opposed to legacy technologies which require broadcast of that information to all portion interfaces. We’re using control plane here, so there’s no broadcast. It mitigates that need for broadcast.
So it’s much more scalable, a lot quieter. We use eBGP between both collapsed core switches. There’s a concept called VTEP. VTEP is a software interface tied to the loopback that terminates VXLAN tunnels for layer 2 extensibility across that IP network. Each switch has a unique AS switch because we’re using eBGP. And since we are physically connected with eBGP, there is no need for route reflectors.
So the VXLAN gateway, we talk about two types of gateways, a layer 2 and a layer 3 gateway. In this case of layer 2, which is going to happen in any type of e EVPN and VXLAN deployment, if I’m extending a tunnel at a minimum it has to be a layer 2 tunnel. When it’s layer 3, we’ll talk about the difference there, obviously layer 2 to layer 3, but you’ll see a couple terms here. So we mentioned VTEP earlier. You’ll hear a term now called VNI. So with your standard layer 2 network, you have the concept of a VLAN. So a VLAN is VLAN 100, VLAN 200, and you have a limit of two to the 12, typically 4,094 VLANs.
In the case of VXLAN, and this technology was adopted for the most part through VMware. VMware a number of years ago decided we in a data center world, how are we going to extend VLANs across an IP routed network? We can’t plumb VLANs because there’s a limitation to how many we have. And we just talked about the challenges of plumbing VLANs earlier with broadcast, multicast, and all that stuff that can happen across multiple links.
So in order to prevent that, let’s use this tunneling technology and we’ll just route that tunnel across my IP network. And the IP network doesn’t even know this routing a tunnel, it just sees endpoints. So with VXLAN and VNI, I actually have two to the 24 amount of addresses. So I have 16 million VXLANs. Now, you can imagine in a campus that’s not going to be an issue. But in a data center where you’ve got these public data centers that are supporting hundreds of thousands of customers, you can see where this could be beneficial.
So in the case between the two collapse cores, we basically take the VLANs that are configured and we apply a VNI to each VLAN. Best practice would be something like if I have VLAN 100, VLAN 200, VLAN 300, and they’re all part of the same construct. In other words, same part routing incidence if you will, then I might go ahead and add the same number to each one of those VLANs. So VLAN 100, the VNI becomes 1100. For VLAN 200, that VNI becomes 1200 for VLAN 300. That VNI becomes 1300.
And so you add a digit to the front of the VLAN that becomes a VNI, and you want to use the same digit for all VLANs that are part of the same routing and sensor VRF. And we’ll explain what that means later. So this is the concept of layer 2 VXLAN. So layer 3 VXLAN, what that does is it now provides layer 3 capabilities. So let’s start at the southbound portion of this network. We’ve got devices that are plugged into our 4,300. This is the layer 2 switch. 4,300 looks northbound and sees a single MAC address, that’s through the LAG and through the ESI concept.
So when a user boots up, it’s going to arp force its default gateway. So first of all, it’s going to get an address through DCP, and then it’ll arp force default gateway and so forth. Once that happens, that a entry goes northbound up to the ESI-LAG, and that IP address is going to be shared across those two collapse cores. So the beauty of layer 3 VXLAN is now I can provide what we call an IRB. This is a Juniper term. The IRB is a default gateway, think of it as a subnet address that both devices share. So they share the same IP address.
That is important. That is a concept that some people want to call virtual gateway. Other iterations, you’ll use the term anycast. We’re not going to get into that here, but the idea is both devices share the same IP address so that they can respond to arp entries and provide default gateway routing for that particular VLAN. Now in the case of VXLAN, EVPN and VXLAN, another benefit of the technology is not just the, remember the micro-segmentation GDP piece we talked about, but routing instances. So I can now create routing instances or VRFs, they are analogous. It’s a routing instance in Junos.
Really everybody else understands what a VRF is. Think of these as analogous terms. So I can place you my PCI traffic, my guest Wi-Fi and independent routing instances or VRFs. And by doing that, I’m completely isolating traffic across that network, across that collapse core. And customers do that certainly if they have PCI type … They absolutely want that layer of isolation. Some customers, what they’ll do is they’ll put all of their traffic in one VRF, just your default VRF, and then everybody can communicate everybody’s happy. But you can absolutely segment traffic on a VLAN by VLAN basis or on a group of VLANs by group of VLAN basis.
So once again, before we go back to the VXLAN gateway discussion, how you address your VNI should correspond to which VLANs are placed in which routing instance. So when I look at a VNI and I see a leading character, I know which routing instance or which isolation domain is part of, so I can put all subnets within a VRF, have reachability. That’s the way it works. VRF-to-VRF communication really is going to happen through a firewall. So in the case where I want to isolate my traffic, I can provide my default gateway capabilities. I can route within that routing instance for all those VLANs.
And then if those VLANs have to bleed out and talk to other VLANs and other routing instances, I’m going to push that up to a firewall, and let that make a decision of whether that should happen or not. That’s a very popular approach. So imagine I’ve got a firewall northbound or a cluster, I interconnect those to an ESI-LAG. In fact, that firewall could be the default like a gateway of last resort for every VLAN that resides down below. So very flexible in how this can operate.
So this is our VXLAN layer 3 VXLAN gateway concept For customers who wish to provide traffic isolation capabilities. EVPN and VXLAN supports the concept of a VRF or in Junos routing instance, which is analogous to a VRF. This is a very common methodology. Notice we have employee, guest and IoT. These are three separate routing instances. Communication amongst all VLANs within the routing instance is supported within the collapse core itself. What’s very common is for the collapse core to pass all this traffic to a cluster of firewalls northbound. And if there needs to be communication between the VRF, that’ll happen at the firewall itself. By default, the collapse core does not provide any communication between these routing instances. They’re completely isolated. Once again, a benefit of EVPN and VXLAN.
Okay, the final building block here is interconnecting to our southbound switching platforms, which could be Juniper or third party. Notice these are standard LAGs. So if I had a virtual chassis, I would have a connection from the top switch in the bottom switch, or from disparate switches within my chassis and multi-home that to each collapse core, or if I had an individual switch. Once again, I’m multi-homing to both collapse cores.
So from the switch perspective, this is a standard LAG. It sees a single system ID, LACP system ID that’s shared amongst the collapse core, and it sees a single MAC address. So there’s no need for a spanning tree. That means that from a switch perspective, it will hash and load balance so active-active Multi-Homing is supported by default. I can also scale this out if I wanted to, once again, if you think about multi-chassis LAG, you are stuck with two devices.
So if you are scaling past two devices, now you’ve got to build another network. That’s a pain. In this case, you just add another collapse core or two collapse cores, you interconnect them, and then you could literally have a multi-homing from a access switch to all collapse cores if that if that’s supported within your fiber ecosystem. There’s no ICL link required. No proprietary communication. This is a layer 2, layer 3 type of implementation.
And what’s really important here, once again, any access layer switch, I can implement this over a maintenance window. I don’t have to go to my switch and make any changes at all at my access layer. If a LAG is already built, that LAG doesn’t have to change, it really doesn’t. There might be some parameters within that LAG, such as one gig or 10 gig. So that might have to change, obviously depending on if you’re upgrading infrastructure bandwidth, if you will. But if you’re keeping your fiber infrastructure the same and it’s going to be the same speed, then there might not be a need at all to touch your access layer switches, which is a huge benefit.
Okay, so let’s talk about the different configuration steps and where they are applied with respect to EVPN and Multi-Homing. The first is we build our underlay. Remember, that can be OSPF or BGP. All we care about is loopback reachability. Then we apply the overlay configuration, which is multi-protocol BGP. This is an eBGP connection, but it effectively supports our VXLAN layer 2 tunneling capability, and it provides us with control plane of EVPN. We have layer 2 and we have layer 3 capabilities within the collapses core.
So all these configuration options happen at the collapse core. Once again, the access layer switches themselves really don’t have to be touched at all unless they are upgrading infrastructure, changing cables, and so forth. So there might be some configuration changes to the LAG, but if none of that’s happening, all I’m really doing is replacing my collapse core, plugging in the existing cabling, and configuring through these particular steps.
Here we summarize the three switching platforms that support EVPN Multi-Homing collapse core. This is the QFX5120, the EX4650, and the chassis-based EX9200. As we mentioned before, any access layer switch is supported. Here we list the Juniper access layer switches.
Here we summarize the entire Juniper switching platforms that are part of the AIDE business unit. We break these down into access and distribution core. With respect to EVPN Multi-Homing specifically at the distribution core level, remember the collapse core level, that would be the 4650, the 5120, and the EX9200.
Thank you for attending and hope this session was valuable.