Don’t Have a Bad Day – routingloop.net

Anycast RP with Multicast Source Discovery Protocol (MSDP) is a great way to provide fault tolerance to your multicast network. Configure another router with the RP address, bang out a few msdp commands on the RP routers and you’re all set for a basic anycast RP design. It’s important to not get tunnel vision and forget about your unicast IGP. In this article we’ll explore the bad things that might happen.

Demo Network

This is a reuse of a topology I used for a previous multicast related article. There is one multicast source and two multicast receivers, denoted by their hostnames in the image below. The network is using a basic PIM sparse mode setup where the original rendezvous point (RP) is on the first hop router (FHR). The FHR is router RP_Primary. In the demo we’ll configure Transit_RTR_2 to be a 2nd RP and use MSDP to synchronize active source information between the RPs. This network uses EIGRP for unicast routing.

How to Have a Bad Day

The focus of this article isn’t specifically on multicast or MSDP, but how you can accidentally break your IGP with an anycast RP deployment.

Before breaking stuff, let’s verify that multicast routing and forwarding is working. The multicast receivers are statically configured to join group 239.240.240.240. I sent a ping to the interesting group from MCAST_SRC and received a reply.

MCAST_SRC#ping 239.240.240.240
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.240.240.240, timeout is 2 seconds:

Reply to request 0 from 192.168.1.1, 68 ms

Since there was a reply to the ping and LHR_1 has (S,G) state, multicast seems to be working.

LHR_1#show ip mroute
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group, 
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute, 
       N - Received BGP Shared-Tree Prune, n - BGP C-Mroute suppressed, 
       Q - Received BGP S-A Route, q - Sent BGP S-A Route, 
       V - RD & Vector, v - Vector, p - PIM Joins on route, 
       x - VxLAN group
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.240.240.240), 00:15:20/stopped, RP 172.31.255.254, flags: SJC
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:15:20/00:02:31

(172.16.0.2, 239.240.240.240), 00:00:06/00:02:53, flags: JT
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:00:06/00:02:53

(*, 224.0.1.40), 00:15:35/00:02:29, RP 172.31.255.254, flags: SJCL
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:15:34/00:02:29

LHR_1#

On LHR_1, we can see that the RP address is 172.31.255.254

LHR_1#show ip pim rp 
Group: 239.240.240.240, RP: 172.31.255.254, uptime 00:17:07, expires never
Group: 224.0.1.40, RP: 172.31.255.254, uptime 00:17:23, expires never

With service confirmed, we can configure the network for anycast RP. The hostname of Transit_RTR_2 was changed to RP_Backup first.

Anycast RP configuration on RP_Backup:

RP_Backup#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
RP_Backup(config)#int lo1
RP_Backup(config-if)#ip address 172.31.255.254 255.255.255.255
RP_Backup(config)#int lo2
RP_Backup(config-if)#ip address 10.33.33.2 255.255.255.255
RP_Backup(config-if)#exit
RP_Backup(config)#ip msdp peer 10.33.33.1 connect-source loopback 2
RP_Backup(config)#end
RP_Backup#

Anycast RP configuration on RP_Primary:

RP_Primary#conf t
Enter configuration commands, one per line. End with CNTL/Z.
RP_Primary(config)#interface loopback 2
RP_Primary(config-if)#ip address 10.33.33.1 255.255.255.255
RP_Primary(config-if)#exit
RP_Primary(config)#ip msdp peer 10.33.33.2 connect-source loopback 2
RP_Primary(config)#end
RP_Primary#

After configuring anycast RP, I enabled MSDP debugging on RP_Backup and sent another multicast ping from MCAST_SRC. The debug output below indicates that MSDP source active messaging is probably working.

RP_Backup(config)#
*Mar 31 22:33:35.978: %MSDP-5-PEER_UPDOWN: Session to peer 10.33.33.1 going up
*Mar 31 22:33:35.978: MSDP(0): 10.33.33.1: TCP connection established
*Mar 31 22:33:36.969: MSDP(0): 10.33.33.1: Received 3-byte msg 0 from peer
*Mar 31 22:33:36.970: MSDP(0): 10.33.33.1: Keepalive TLV
*Mar 31 22:33:37.234: MSDP(0): 10.33.33.1: Building SA message from SA cache
*Mar 31 22:33:37.969: MSDP(0): 10.33.33.1: Received 20-byte msg 1 from peer
*Mar 31 22:33:37.970: MSDP(0): 10.33.33.1: SA TLV, len: 20, ec: 1, RP: 172.31.255.254 
*Mar 31 22:33:37.971: MSDP(0): 10.33.33.1: Peer RPF check passed for single peer
*Mar 31 22:33:37.972: MSDP(0): WAVL Insert SA Source 172.16.0.2 Group 239.240.240.240 RP 172.31.255.254 Successful 
*Mar 31 22:33:49.235: MSDP(0): 10.33.33.1: Sending Keepalive message to peer

Now, we can call it a day and consider this a job well done. Multicast is working and we now have RP redundancy. Everything is fine until a scheduled power outage causes RP_Backup to lose power. The following day, you’re flooded with calls reporting network issues. What happened?

Remember that EIGRP (and OSPF for that matter) elect their router ID at startup and the highest loopback IP address is preferred over other loopbacks or other local interfaces, if no static RID is configured. If no static RID is configured, it is elected when the protocol starts up. Adding a new interface with a higher IP address will not preempt and cause a new RID to be elected. Below we can see that RP_Backup’s highest loopback address is loopback 1, the interface acting as the now redundant RP. RP_Primary’s highest loopback address is also 172.31.255.254.

RP_Backup#show ip int brief
Interface                  IP-Address      OK? Method Status                Protocol
GigabitEthernet0/0         10.0.0.14       YES NVRAM  up                    up      
GigabitEthernet0/1         10.0.0.10       YES NVRAM  up                    up      
GigabitEthernet0/2         unassigned      YES NVRAM  administratively down down    
GigabitEthernet0/3         unassigned      YES NVRAM  administratively down down    
Loopback1                  172.31.255.254  YES NVRAM  up                    up      
Loopback2                  10.33.33.2      YES NVRAM  up                    up      
Tunnel0                    10.0.0.10       YES unset  up                    up      
Tunnel1                    172.31.255.254  YES unset  up                    up

RP_Primary and RP_Secondary now have the same EIGRP router ID. EIGRP uses the router ID for protection against routing information loops. The router ID of the router that originates information is carried in the EIGRP update. If an EIGRP router receives an update with its own router ID as the originating router, it discards the route.

We can confirm this by viewing the EIGRP events using the command shown below. This output doesn’t specify which routes are rejected, but it gives us a hint that there are issues.

RP_Backup#show ip eigrp events | include rid
5    22:43:09.524 Ignored route, dup routerid int: 172.31.255.254 
7    22:43:09.524 Ignored route, dup routerid int: 172.31.255.254 
9    22:43:09.524 Ignored route, dup routerid int: 172.31.255.254 
35   22:43:09.291 Ignored route, dup routerid int: 172.31.255.254 
48   22:43:09.290 Ignored route, dup routerid int: 172.31.255.254 
50   22:43:09.290 Ignored route, dup routerid int: 172.31.255.254 
52   22:43:09.290 Ignored route, dup routerid int: 172.31.255.254 
RP_Backup#

Now that we understand the issue, let’s hop in the time machine and travel back to a time before RP_Backup was rebooted. The topology table output below displays a route originating from Router ID 172.31.255.254, aka Primary_RP.

RP_Backup#show ip eigrp topology 172.16.0.0/30
EIGRP-IPv4 Topology Entry for AS(1)/ID(10.33.33.2) for 172.16.0.0/30
  State is Passive, Query origin flag is 1, 1 Successor(s), FD is 3072
  Descriptor Blocks:
  10.0.0.9 (GigabitEthernet0/1), from 10.0.0.9, Send flag is 0x0
      Composite metric is (3072/2816), route is Internal
      Vector metric:
        Minimum bandwidth is 1000000 Kbit
        Total delay is 20 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1500
        Hop count is 1
        Originating router is 172.31.255.254

If we fast forward back to reality where duplicate router IDs exist and check the topology table again, we see that RP_Backup no longer has an entry for 172.16.0.0/30. RP_Backup rejected the route because it saw its own router ID in the update. There are other routes not shown here that were also rejected.

RP_Backup#show ip eigrp topology 172.16.0.0/30
EIGRP-IPv4 Topology Entry for AS(1)/ID(172.31.255.254)
%Entry 172.16.0.0/30 not in topology table

This is an example of why manual configuration of unique router IDs can be a good thing. It takes a little more planning up front and one extra line of configuration during provisioning, but it can save you a headache or 12 in the future. A quick(ish) solution to the outage described in this article is to assign Backup_RP a unique EIGRP router ID and reload it so that EIGRP starts up with a unique ID.