One of the most effective methods to build stable and scalable routing designs is to summarize topology and reachability information in routing protocols. Summarization can help networks converge faster and limit the number of routers that need to perform route calculations when an event such as a link flap occurs. However, there are tradeoffs to performing summarization. When detailed routing information is removed from the routing system you can end up with suboptimal paths or even traffic being blackholed. In this article, we’ll review some of the ways summarization helps scalability and some of the dangers to consider when thinking about routing design.
Topological Summarization
Summarization of topology information occurs in distance vector routing protocols by default by the nature of distance vector routing. Distance vector routers only pass along reachability and metric information; no topological information is advertised from one router to another. In link state protocols, all routers within an area or flooding domain should have complete topology information when the network is converged. I’ll use OSPF terminology for describing link state for the rest of this article, but some similar principles apply to IS-IS. OSPF networks can be broken up into different areas and all areas must connect to area 0. The specific details of the topology are not advertised beyond area boundaries.
In this demo network R4 and R5 are area border routers between area 0 and area 1. (See screenshot below) The OSPF router IDs are based off the hostname. 1.1.1.1, 2.2.2.2 etc. R1’s link state database has two type 3 summary LSAs for the LAN network hanging off R5. These type 3 LSAs simply tell R1 that if it needs to reach 10.1.1.0/24, it’s reachable via R4 and R5, the area border routers. The metric in the type 3 LSA is the metric from the ABR to the destination network. Routers in area 0 will use router LSAs to calculate the cost to reach the ABRs and add the type 3’s metric to the cost to reach the ABR to come up with the total cost to reach interarea destinations. Area 0 routers have no idea what the topology of area 1 is.
Hiding topology in link state routing helps speed up convergence. SPF runs are faster and less resource intensive when the topology is summarized. Calculating interarea routes only requires calculating the shortest path to the ABR and adding the type 3’s metric to identify the shortest path. The topology of the destination area does not matter. In this demo, area 1 is small but imagine if area 1 contained hundreds of routers in a partial mesh. If the topology of these hundreds of interconnected routers was not summarized by OSPFs area boundary logic it would add a considerable amount of extra information for the backbone routers to process.
R1’s type 3 LSAs for the Area 1 LAN network:
R1#show ip ospf database summary 10.1.1.0
OSPF Router with ID (1.1.1.1) (Process ID 1)
Summary Net Link States (Area 0)
LS age: 52
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 10.1.1.0 (summary Network Number)
Advertising Router: 4.4.4.4
LS Seq Number: 80000002
Checksum: 0x8994
Length: 28
Network Mask: /24
MTID: 0 Metric: 2
LS age: 356
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 10.1.1.0 (summary Network Number)
Advertising Router: 5.5.5.5
LS Seq Number: 80000001
Checksum: 0x63B8
Length: 28
Network Mask: /24
MTID: 0 Metric: 1
Reachability Summarization
Anyone who’s worked on an IP network has worked with summarized reachability information even if they didn’t realize it. Unless you have individual host routes (/32s in v4 and /128s in v6) for every endpoint in your network, you have summarized routing information. 10.1.1.0/24 is summarizing the range of addresses from 10.1.1.0 to 10.1.1.255. Summarizing reachability information in the routing protocol just keeps this idea going.
For example, 10.1.0.0/24 and 10.1.1.0/24 can be summarized as 10.1.0.0/23. 0.0.0.0/0, the default route, is the ultimate summary route. Default covers all possible addresses for the given address family. Some benefits of reachability summarization include hiding unstable prefixes from the rest of the network, reduced routing table sizes, faster convergence (less routes to calculate).
Suppose that in the demo network we configure the ABRs to only advertise summary route 10.1.0.0/16 into area 0. With this summary in place, if the area 1 LAN network of 10.1.1.0/24 flaps frequently because of a damaged cable, routers in area 0 will not know and thus will not have to perform any route calculations. The instability and specific prefixes within area 1 are hidden from the rest of the network.
Suboptimal Routing Example
OSPF stub area types are a form of reachability summarization. Stub area ABRs do not advertise OSPF external routes into the stub area and instead a default route is advertised. Totally stubby areas omit external and type 3 summaries from being advertised into the stub area.
In our sample topology, R2 is learning prefix 198.51.100.0/24 via eBGP and redistributing the route into OSPF with metric type 1. With area 1 configured as a normal, non-stubby area, R6 will receive the type 5 external LSA from R2. The ABRs will inject a type 4 LSA into area 1 to educate area 1 routers on how to reach R2 and thus have reachability to the external prefix. With this information present, R6 will choose the shortest path to R2 via R5.
Suppose that we reconfigure area 1 to be a stub area. R6 will then no longer receive the type 5 LSA that provides specific routing information to reach 198.51.100.0/24. Instead, the ABRs will inject a default route into area 1. This introduces the possibility of suboptimal routing from R6 to the external prefix. R6 will have two equal cost default routes toward the ABRs. OSPF stub has reduced the granularity of routing information within area 1, R6 now has no idea what the best path is to reach the external route. If R6’s ECMP hashing decides that a packet toward the external destination should traverse the link to R4, R4 will have to route the packet to R1 to reach R2, a higher cost path than routing through R5.
Nondeterministic traffic flows can make troubleshooting more difficult, suboptimal routing can degrade application performance. The situation described also introduces the likelihood of asymmetric routing to occur. Flows that route from R6 to the external prefix via R4 will be asymmetric. R2 still has specific routing information to reach networks attached to R6, so R2 will chose the shortest path through R5. Asymmetric routing isn’t necessarily a problem unless there is a device inline that is relying on flow state, such as a stateful firewall. If a stateful firewall was installed between R2 and R5, asymmetric flows would be dropped. Asymmetry also makes troubleshooting more difficult.
CLI Output
With area 1 operating as a normal OSPF area, R6 has a route to 198.51.100.0/24 via R5, the shortest path.
R6#show ip route 198.51.100.0
Routing entry for 198.51.100.0/24
Known via "ospf 1", distance 110, metric 3
Tag 65512, type extern 1
Last update from 10.1.0.5 on GigabitEthernet0/0, 00:02:22 ago
Routing Descriptor Blocks:
* 10.1.0.5, from 2.2.2.2, 00:02:22 ago, via GigabitEthernet0/0
Route metric is 3, traffic share count is 1
Route tag 65512
R6#show cdp neighbors gigabitEthernet 0/0
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
R5 Gig 0/0 147 R B Gig 0/0
Total cdp entries displayed : 1
After verifying normal area operation, R4, R5 and R6 were configured for OSPF stub routing.
R6(config)#router ospf 1
R6(config-router)#area 1 stub
R6(config-router)#end
As a stub, R6 no longer has a specific route to the external prefix. Default routes to each ABR are installed instead. The metric of both default routes is the same so equal cost multipath load balancing will be used. This means that some traffic toward 198.51.100.0/24 will take the longer path through R4.
R6#show ip route 198.51.100.0
% Network not in table
R6#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
Known via "ospf 1", distance 110, metric 2, candidate default path, type inter area
Last update from 10.1.0.1 on GigabitEthernet0/1, 00:01:08 ago
Routing Descriptor Blocks:
* 10.1.0.5, from 5.5.5.5, 00:01:08 ago, via GigabitEthernet0/0
Route metric is 2, traffic share count is 1
10.1.0.1, from 4.4.4.4, 00:01:08 ago, via GigabitEthernet0/1
Route metric is 2, traffic share count is 1
R4’s next hop to reach the external prefix is R1. If the link between R4 and R5 was in area 0, R4 would have two equal cost paths to load balance across.
R4#show ip route 198.51.100.1
Routing entry for 198.51.100.0/24
Known via "ospf 1", distance 110, metric 3
Tag 65512, type extern 1
Last update from 10.0.0.1 on GigabitEthernet0/2, 00:02:08 ago
Routing Descriptor Blocks:
* 10.0.0.1, from 2.2.2.2, 00:02:08 ago, via GigabitEthernet0/2
Route metric is 3, traffic share count is 1
Route tag 65512
R4#show cdp neighbors g0/2
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
R1 Gig 0/2 136 R B Gig 0/0
Total cdp entries displayed : 1
From R2’s perspective, the best path to reach R6’s loopback is via R5. This would cause asymmetric routing for flows that took the suboptimal path though R1 to reach the external prefix.
R2#show ip route 10.1.6.6
Routing entry for 10.1.6.6/32
Known via "ospf 1", distance 110, metric 3, type inter area
Last update from 10.0.0.10 on GigabitEthernet0/0, 00:03:10 ago
Routing Descriptor Blocks:
* 10.0.0.10, from 5.5.5.5, 00:03:10 ago, via GigabitEthernet0/0
Route metric is 3, traffic share count is 1
R2#show cdp nei g0/0
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
R5 Gig 0/0 145 R B Gig 0/2
Total cdp entries displayed : 1
Black Hole Routing Example
Consider the network below as a dual hub phase 1 DMVPN where each hub resides in a different data center. This network uses EIGRP. All spoke routers are configured as EIGRP stubs. All spoke LAN subnets fall within 10.0.0.0/8. To increase network stability and speed of convergence, the hub routers are configured to only send a 10.0.0.0/8 summary route into the data center networks. In the example, routers DC1 and DC2 represent the entire data center network, with data center resources and a data center interconnect link hanging off of these routers.
Router DC1 receives the summary route for all spoke LANs from DC1_Hub, so any traffic destined for the 10. network will be forwarded to DC1_Hub. DC1 also receives this summary advertisement from DC2 but it has a higher metric, so it’s not installed in the RIB.
DC1#show ip route 10.1.1.1
Routing entry for 10.0.0.0/8
Known via "eigrp 1", distance 90, metric 26880512, type internal
Redistributing via eigrp 1
Last update from 172.17.0.1 on GigabitEthernet0/0, 00:08:04 ago
Routing Descriptor Blocks:
* 172.17.0.1, from 172.17.0.1, 00:08:04 ago, via GigabitEthernet0/0
Route metric is 26880512, traffic share count is 1
Total delay is 50020 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 1
DC1#show cdp neighbors gigabitEthernet 0/0
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
DC1_HUB Gig 0/0 141 R B Gig 0/1
Total cdp entries displayed : 1
Branch1’s routes to the DC1 networks:
BRANCH1_RTR#show ip route 172.16.0.0
Routing entry for 172.16.0.0/16
Known via "eigrp 1", distance 90, metric 26880512, type internal
Redistributing via eigrp 1
Last update from 192.168.0.1 on Tunnel1, 00:13:08 ago
Routing Descriptor Blocks:
* 192.168.0.1, from 192.168.0.1, 00:13:08 ago, via Tunnel1
Route metric is 26880512, traffic share count is 1
Total delay is 50020 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 2
Branch1’s routes to the DC2 networks:
BRANCH1_RTR#show ip route 172.31.0.0
Routing entry for 172.31.0.0/16
Known via "eigrp 1", distance 90, metric 26880512, type internal
Redistributing via eigrp 1
Last update from 192.168.1.1 on Tunnel2, 00:13:13 ago
Routing Descriptor Blocks:
* 192.168.1.1, from 192.168.1.1, 00:13:13 ago, via Tunnel2
Route metric is 26880512, traffic share count is 1
Total delay is 50020 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 2
At this stage, the network is running fine, and everyone is happy. Now, suppose that some ISP issue causes the tunnel from spoke1 to the DC1 hub to go down. Spoke1 still has good connectivity to the DC2 hub. I’ll simulate this by shutting down spoke1’s tunnel interface to DC1.
BRANCH1_RTR(config)#interface tunnel 1
BRANCH1_RTR(config-if)#shutdown
The routing table on Branch1_RTR converges around the tunnel outage. For Branch1 to reach the DC1’s networks, 172.16.0.0/16, it’ll simply route the traffic to DC2 and traverse the datacenter interconnect link. What about the return path from 172.16.0.0/24 to Branch1?
BRANCH1_RTR#show ip route 172.16.0.0
Routing entry for 172.16.0.0/16
Known via "eigrp 1", distance 90, metric 26880768, type internal
Redistributing via eigrp 1
Last update from 192.168.1.1 on Tunnel2, 00:00:39 ago
Routing Descriptor Blocks:
* 192.168.1.1, from 192.168.1.1, 00:00:39 ago, via Tunnel2
Route metric is 26880768, traffic share count is 1
Total delay is 50030 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 3
It’s the return path from DC1 to Branch1 that is a problem. DC1_Hub is still advertising the 10.0.0.0/8 summary to router DC1. This summary will attract traffic destined for the Branch1 LAN despite DC1_Hub’s lack of connectivity to the spoke. Because the spokes are configured as EIGRP stubs, DC1_Hub can not use Branch2 as a transit network to reach Branch1. This issue renders DC1_Hub a blackhole for traffic from DC1 networks to reach Branch1’s LAN subnet, 10.1.1.0/24. The discard route that is created by EIGRP summarization will drop the packets to prevent routing loops.
DC1#show ip route 10.1.1.1
Routing entry for 10.0.0.0/8
Known via "eigrp 1", distance 90, metric 26880512, type internal
Redistributing via eigrp 1
Last update from 172.17.0.1 on GigabitEthernet0/0, 00:19:14 ago
Routing Descriptor Blocks:
* 172.17.0.1, from 172.17.0.1, 00:19:14 ago, via GigabitEthernet0/0
Route metric is 26880512, traffic share count is 1
Total delay is 50020 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 1
DC1_HUB#show ip route 10.1.1.1
Routing entry for 10.0.0.0/8
Known via "eigrp 1", distance 5, metric 26880256, type internal
Redistributing via eigrp 1
Routing Descriptor Blocks:
* directly connected, via Null0
Route metric is 26880256, traffic share count is 1
Total delay is 50010 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 0
The red path represents the path from Branch1 to DC1 subnets when the Branch1 to DC1_Hub tunnel is down. The aqua colored path represents the blackhole created by the DC1_Hub summary configuration while Branch1 has no connectivity to DC1_Hub.
Blackhole Solution
A method to prevent the blackhole condition is to connect the two routers that are performing summarization. In this situation, the summarizing routers are in two geographically separated data centers. Dark fiber or a point-to-point WAN link may be too expensive. However, a GRE tunnel between DC1_Hub and DC2_Hub may be a viable option. The idea is to connect the routers “behind” the summary point so that DC1_Hub can tunnel traffic destined to Spoke1 to DC2_Hub to provide reachability.
To demonstrate this, I configured a GRE tunnel between lookback interfaces of the hub routers. While Branch1’s connectivity to DC1_Hub is down, the DC1 hub will use the tunnel to DC2_Hub to reach Branch1’s LAN.
DC1_HUB#show run int tu255
Building configuration...
Current configuration : 130 bytes
!
interface Tunnel255
ip address 172.17.254.1 255.255.255.252
tunnel source Loopback255
tunnel destination 172.17.255.254
end
DC1_HUB#show ip route 10.1.1.1
Routing entry for 10.1.1.0/24
Known via "eigrp 1", distance 90, metric 28160256, type internal
Redistributing via eigrp 1
Last update from 172.17.254.2 on Tunnel255, 00:00:27 ago
Routing Descriptor Blocks:
* 172.17.254.2, from 172.17.254.2, 00:00:27 ago, via Tunnel255
Route metric is 28160256, traffic share count is 1
Total delay is 100010 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1476 bytes
Loading 1/255, Hops 2
The aqua path is the path from Branch1 to DC1 subnets, the red dashed annotation represents the GRE tunnel between the DMVPN hubs, and the yellow path represents the return path from DC1 subnets to the Branch1 LAN. The GRE tunnel provides a backup path to prevent blackhole routing.
Some tradeoffs of this approach are increased complexity by having an additional alternate path that is rarely used, which may be easily forgotten when troubleshooting. Tunneling can cause MTU issues unless the datacenter backbone and GRE tunnel MTU values are modified to allow for the GRE header overhead. Increased state exchange and interaction between the hub routers. The tunnel adds an additional EIGRP neighbor to maintain. Placement of security functions such as firewalls should be considered. You don’t want to allow traffic to bypass security measures when the failover occurs.