PIM RP Redundancy that (kinda) Works

I spent a few minutes thinking about anycast RP for PIM sparse-mode deployment and thought of a way to provide RP redundancy without auto-RP or BSR to dynamically learn RPs, and without using MSDP to synchronize active source information between anycast RPs. There is a major pitfall to this approach but it can work.

Demo

I built the topology below and I tried to give the node names descriptive names. All routers participating in multicast routing have RP address 172.31.255.254 statically configured. MCAST_SRC and both MCAST_RCVRs are acting as end host and only have a default route. All transit routers are running OSPFv2. The mcast receivers are statically configured to join group 239.240.240.240.

RP_Primary has a loopback interface configured for 172.31.255.254/32. This interface is participating in OSPF via the network statement.

RP_Primary#show running-config interface loopback 1
Building configuration...

Current configuration : 95 bytes
!
interface Loopback1
 description Primary RP 
 ip address 172.31.255.254 255.255.255.255
end

RP_Backup has a loopback configured with 172.31.255.254/30. Because OSPF announces loopback interfaces as /32 regardless of the configured mask, I used a prefix list and route-map to redistribute connected but only permitted the loopback prefix to be redistributed.

RP_Backup#show running-config interface loopback 1
Building configuration...

Current configuration : 93 bytes
!
interface Loopback1
 description RP Backup
 ip address 172.31.255.254 255.255.255.252
end
RP_Backup#show run | section ospf
router ospf 1
 router-id 5.5.5.5
 redistribute connected metric-type 1 subnets route-map RP
 network 10.0.0.8 0.0.0.3 area 0
 network 10.0.0.12 0.0.0.3 area 0
RP_Backup#show route-map
route-map RP, permit, sequence 10
  Match clauses:
    ip address prefix-lists: RP 
  Set clauses:
  Policy routing matches: 0 packets, 0 bytes
RP_Backup#show ip prefix-list
ip prefix-list RP: 1 entries
   seq 5 permit 172.31.255.252/30

Using a longer subnet mask on the backup RP router allows multicast routers to use the primary RP based on longest match routing. If the primary RP goes down, the /32 route will be removed from the routing system, then the /30 route to the RP_Backup router will take over.
We can verify this by checking the routing table on either of the last hop routers (LHR).

LHR_1#show ip route 172.31.255.254
Routing entry for 172.31.255.254/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 10.0.0.17 on GigabitEthernet0/1, 00:21:44 ago
  Routing Descriptor Blocks:
  * 10.0.0.17, from 172.31.255.255, 00:21:44 ago, via GigabitEthernet0/1
      Route metric is 3, traffic share count is 1

With the primary RP route verified, I shut down the RP interface on RP_Primary.

RP_Primary#configure terminal 
Enter configuration commands, one per line.  End with CNTL/Z.
RP_Primary(config)#interface loopback 1
RP_Primary(config-if)#shutdown

The /30 route to the RP_Backup loopback interface is now the best (and only) route in the routing table to reach the statically configured RP address.

LHR_1#show ip route 172.31.255.254
Routing entry for 172.31.255.252/30
  Known via "ospf 1", distance 110, metric 22, type extern 1
  Last update from 10.0.0.17 on GigabitEthernet0/1, 00:43:48 ago
  Routing Descriptor Blocks:
  * 10.0.0.17, from 5.5.5.5, 00:43:48 ago, via GigabitEthernet0/1
      Route metric is 22, traffic share count is 1

Validation that multicast still works via the backup RP.

MCAST_SRC#ping 239.240.240.240 repeat 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 239.240.240.240, timeout is 2 seconds:

Reply to request 0 from 192.168.1.1, 66 ms
Reply to request 1 from 192.168.1.1, 15 ms
Reply to request 1 from 192.168.1.1, 30 ms
Reply to request 1 from 192.168.1.1, 21 ms
Reply to request 2 from 192.168.1.1, 10 ms
Reply to request 3 from 192.168.1.1, 7 ms
Reply to request 4 from 192.168.1.1, 12 ms
LHR_1#show ip mroute 239.240.240.240
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group, 
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute, 
       N - Received BGP Shared-Tree Prune, n - BGP C-Mroute suppressed, 
       Q - Received BGP S-A Route, q - Sent BGP S-A Route, 
       V - RD & Vector, v - Vector, p - PIM Joins on route, 
       x - VxLAN group
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.240.240.240), 00:41:19/stopped, RP 172.31.255.254, flags: SJC
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:41:19/00:02:15

(172.16.0.2, 239.240.240.240), 00:01:13/00:01:46, flags: JT
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:01:13/00:02:15

The Pitfall of This Approach

The major issue with this approach is that the router acting as the backup RP cannot be a part of the multicast distribution tree when the primary RP is active. The backup RP has a connected route for the RP address, so it will not utilize the primary RP. I had to purposely build a topology that keeps the backup RP out of the tree when the primary RP is available. If the link between Transit_RTR and RP_Primary fails, multicast routing is broken.

To prove this, I disconnected MCAST_RCVR2 from the network and then disconnected the link between Transit_RTR and RP_Primary. I then tried to ping the subscribed multicast group, no responses received and no (S,G) state populated on LHR_1

MCAST_SRC#ping 239.240.240.240 repeat 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 239.240.240.240, timeout is 2 seconds:
.....

LHR_1#show ip mroute 239.240.240.240
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group, 
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute, 
       N - Received BGP Shared-Tree Prune, n - BGP C-Mroute suppressed, 
       Q - Received BGP S-A Route, q - Sent BGP S-A Route, 
       V - RD & Vector, v - Vector, p - PIM Joins on route, 
       x - VxLAN group
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.240.240.240), 00:47:18/00:02:18, RP 172.31.255.254, flags: SJC
  Incoming interface: GigabitEthernet0/1, RPF nbr 10.0.0.17
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:47:18/00:02:18

1 Comments

Comments are closed.