OSPF in Phase 1 DMVPN Networks

When designing hub and spoke networks, most architects will opt for distance or path vector routing protocols when given the choice. I don’t blame them, distance/path vector provides several benefits that work well in hub and spoke topologies. EIGRP based hub and spoke networks can scale to thousands of spokes without much work. Route summarization on the hub combined with Cisco’s EIGRP stub feature on the spokes help with scalability and stability of the network. BGP provides similar benefits with route summarization and allows for simple traffic engineering in multi hub networks. Suppose though that for some business reason, you’re cornered into deploying OSPF in a hub and spoke network. If your network is small enough, any sane routing design will probably work. If you need greater scalability in a hub and spoke network with link state routing, great consideration should be taken.

Disclaimers:

I can’t provide a magic number of routers or prefixes that must be reached to consider a network “large” or at what point more careful network design is needed. Every design decision in computer networks has tradeoffs. Throughout my networking career, I’ve only worked in operations. I’ve never designed any portion of a production network so please don’t consider anything here to be authoritative. No warranty expressed or implied.

Challenges of OSPF Hub and Spoke:

For this article and demonstration, I’m going to focus on challenges and considerations for using OSPF in a phase 1 DMVPN network. Phase 1 indicates that all spoke-to-spoke traffic must traverse a hub router, no dynamic spoke-to-spoke tunnels are established. Most link state routing deployments require that all routers within an area or flooding domain have “eventually consistent” copies of the link state database. DMVPN uses a single tunnel interface on the hub to connect to all spokes. This NBMA property requires the hub and spokes to be in the same OSPF area. With all routers in the same area, there is no opportunity for IP prefix summarization, so all spokes typically maintain LSDB information for all other spokes. More memory is consumed with LSDB entries and spokes going up or down may trigger SPF calculation throughout the area. Other points to consider are selection of the appropriate OSPF network type on the tunnel interfaces and the integration of the hub and spoke to the rest of the network.

Demo Network:

I’m using a simple 4 node network for demonstration. One hub, two spokes and an intermediate node to provide IP connectivity between the DMVPN routers. Each router connects to the intermediate node with routed point to point links in unique subnets.

Router IDs:

R1(hub): 1.1.1.1

R2: 2.2.2.2

R3: 3.3.3.3

Output from hub with two dynamic neighbors:

RoutingLoop_R1#show dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
        N - NATed, L - Local, X - No Socket
        T1 - Route Installed, T2 - Nexthop-override
        C - CTS Capable, I2 - Temporary
        # Ent --> Number of NHRP entries with same NBMA peer
        NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
        UpDn Time --> Up or Down Time for a Tunnel
==========================================================================

Interface: Tunnel1, IPv4 NHRP Details
Type:Hub, NHRP Peers:2,

 # Ent  Peer NBMA Addr Peer Tunnel Add State  UpDn Tm Attrb
 ----- --------------- --------------- ----- -------- -----
     1 198.51.100.1         172.16.0.2    UP 02:48:23     D
     1 203.0.113.1          172.16.0.3    UP 02:46:46     D

OSPF Network Types:

This was the most fun part of this lab. I didn’t try every possible network type or mismatched types with timer adjustments. Using DMVPN with network types that do not do dynamic neighbor discovery seems maddening, so I didn’t try those. I experimented with point-to-point, broadcast and point-to-multipoint. I only recommend one of these settings for phase 1. This section only pertains to the network type on the tunnel interfaces.

Point-to-Point
This network type works fine if you only have the hub and one spoke (if that counts as h&s?). Things quickly fall apart as soon as the hub starts trying to have two OSPF neighbors on a point-to-point interface. Since only one neighbor is allowed, the hub oscillates between trying to neighbor with each spoke. The CPU utilization on my hub router quickly reached 99%.

*Jun 21 10:17:40.559: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:40.563: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:40.567: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Tunnel1 from EXSTART to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:40.571: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:40.575: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:41.415: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:41.415: %OSPF-4-NONEIGHBOR: Received database description from unknown neighbor 3.3.3.3
*Jun 21 10:17:42.127: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 21 10:17:42.987: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Tunnel1 from EXCHANGE to DOWN, Neighbor Down: Adjacency forced to reset

RoutingLoop_R1#show proc cpu history

RoutingLoop_R1   10:19:12 AM Wednesday Jun 21 2023 UTC




                          9999999999999999999999999999999999999999
      111112222211111     5555599999999999999999999999999999999999
  100                     ****************************************
   90                     ****************************************
   80                     ****************************************
   70                     ****************************************
   60                     ****************************************
   50                     ****************************************
   40                     ****************************************
   30                     ****************************************
   20                     ****************************************
   10                     ****************************************
     0....5....1....1....2....2....3....3....4....4....5....5....6
               0    5    0    5    0    5    0    5    0    5    0
               CPU% per second (last 60 seconds)

Broadcast:
This was the most interesting of the bunch. Remember that this is running over a non-broadcast network. I started by intentionally configuring R2 to become a designated router. This probably wasn’t necessary because of the router IDs mentioned previously. Because this is NBMA, R2 and R3 can not send multicast OSPF packets to each other. This resulted in R2 and R3 becoming designated routers within the same IP subnet. This puts R1 in a weird spot, it cannot accept having two designated routers on the same link. This also causes reachability issues. Both spokes think they’re the DR but neither spoke can flood LSAs to the other spoke. Route installation on the hub was inconsistent.

RoutingLoop_R2#show ip ospf interface tunnel 1
Tunnel1 is up, line protocol is up
  Internet Address 172.16.0.2/16, Area 0, Attached via Network Statement
  Process ID 1, Router ID 2.2.2.2, Network Type BROADCAST, Cost: 5
  Topology-MTID    Cost    Disabled    Shutdown      Topology Name
        0           5         no          no            Base
  Transmit Delay is 1 sec, State DR, Priority 255
  Designated Router (ID) 2.2.2.2, Interface address 172.16.0.2
  Backup Designated router (ID) 1.1.1.1, Interface address 172.16.0.1
  Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
    oob-resync timeout 40
    Hello due in 00:00:08
  Supports Link-local Signaling (LLS)
  Cisco NSF helper support enabled
  IETF NSF helper support enabled
  Index 1/5/5, flood queue length 0
  Next 0x0(0)/0x0(0)/0x0(0)
  Last flood scan length is 0, maximum is 1
  Last flood scan time is 0 msec, maximum is 0 msec
  Neighbor Count is 1, Adjacent neighbor count is 1
    Adjacent with neighbor 1.1.1.1  (Backup Designated Router)
  Suppress hello for 0 neighbor(s)



RoutingLoop_R3#show ip ospf interface tunnel 1
Tunnel1 is up, line protocol is up
  Internet Address 172.16.0.3/16, Area 0, Attached via Network Statement
  Process ID 1, Router ID 3.3.3.3, Network Type BROADCAST, Cost: 5
  Topology-MTID    Cost    Disabled    Shutdown      Topology Name
        0           5         no          no            Base
  Transmit Delay is 1 sec, State DR, Priority 1
  Designated Router (ID) 3.3.3.3, Interface address 172.16.0.3
  Backup Designated router (ID) 1.1.1.1, Interface address 172.16.0.1
  Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
    oob-resync timeout 40
    Hello due in 00:00:03
  Supports Link-local Signaling (LLS)
  Cisco NSF helper support enabled
  IETF NSF helper support enabled
  Index 1/1/1, flood queue length 0
  Next 0x0(0)/0x0(0)/0x0(0)
  Last flood scan length is 0, maximum is 1
  Last flood scan time is 0 msec, maximum is 0 msec
  Neighbor Count is 1, Adjacent neighbor count is 1
    Adjacent with neighbor 1.1.1.1  (Backup Designated Router)
  Suppress hello for 0 neighbor(s)

When OSPF was established for the first time with broadcast networks on all tunnels, the hub only installed routes for R2.

RoutingLoop_R1#show ip route | begin Gateway
Gateway of last resort is 192.0.2.2 to network 0.0.0.0

S*    0.0.0.0/0 [1/0] via 192.0.2.2, GigabitEthernet0/0
      10.0.0.0/24 is subnetted, 4 subnets
O        10.2.1.0 [110/101] via 172.16.0.2, 00:00:56, Tunnel1
O        10.2.2.0 [110/101] via 172.16.0.2, 00:00:56, Tunnel1
O        10.2.3.0 [110/101] via 172.16.0.2, 00:00:56, Tunnel1
O        10.2.4.0 [110/101] via 172.16.0.2, 00:00:56, Tunnel1
      172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
C        172.16.0.0/16 is directly connected, Tunnel1
L        172.16.0.1/32 is directly connected, Tunnel1
      192.0.2.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.0.2.0/30 is directly connected, GigabitEthernet0/0
L        192.0.2.1/32 is directly connected, GigabitEthernet0/0

I cleared the OSPF process on the hub and magically routes for both spokes installed in the RIB. Routing still did not work though. The spokes had no OSPF routes in their RIBs.

RoutingLoop_R1#show ip route ospf | begin Gateway 
Gateway of last resort is 192.0.2.2 to network 0.0.0.0

      10.0.0.0/24 is subnetted, 8 subnets
O        10.2.1.0 [110/101] via 172.16.0.2, 00:01:32, Tunnel1
O        10.2.2.0 [110/101] via 172.16.0.2, 00:01:32, Tunnel1
O        10.2.3.0 [110/101] via 172.16.0.2, 00:01:32, Tunnel1
O        10.2.4.0 [110/101] via 172.16.0.2, 00:01:32, Tunnel1
O        10.3.1.0 [110/101] via 172.16.0.3, 00:01:32, Tunnel1
O        10.3.2.0 [110/101] via 172.16.0.3, 00:01:32, Tunnel1
O        10.3.3.0 [110/101] via 172.16.0.3, 00:01:32, Tunnel1
O        10.3.4.0 [110/101] via 172.16.0.3, 00:01:32, Tunnel1

I then set the spoke OSPF priority to 0 and the hub to 255. Next, I reset OSPF from the hub. These settings consistently provided full routing information propagation and reachability. It’s not a design I would recommend for a phase 1 DMVPN but it works. One downside is that neighbor adjacency formation is needlessly slow since a DR election must take place. Another unfavorable property is the IP next hop for spoke-to-spoke traffic. Broadcast implies that all routers in the segment are data link layer adjacent, thus directly reachable. R2’s routes to reach R3’s LAN have a next hop of R3’s tunnel address and vice versa. The traffic still must go though the hub because of the spoke phase 1 tunnel configuration. Traffic forwarding seems to work fine but it’s not intuitive and may confuse network operators. This would probably be the best option for OSPF in phase 2 DMVPN though.

Point-to-Multipoint:

Point-to-Multipoint on all tunnel interfaces is my recommendation for phase 1 DMVPNs. It just works with no tweaks or knobs turned. Dynamic neighbor discovery between hubs and spokes works and no DR is elected. It also causes destinations reachable via hub to have “next hop self” style behavior so routes from spoke A to spoke B will have the hub address as the next hop.

RoutingLoop_R2#show ip route ospf | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 12 subnets, 2 masks
O        10.3.1.0/24 [110/106] via 172.16.0.1, 00:01:48, Tunnel1
O        10.3.2.0/24 [110/106] via 172.16.0.1, 00:01:48, Tunnel1
O        10.3.3.0/24 [110/106] via 172.16.0.1, 00:01:48, Tunnel1
O        10.3.4.0/24 [110/106] via 172.16.0.1, 00:01:48, Tunnel1
      172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O        172.16.0.1/32 [110/5] via 172.16.0.1, 00:01:48, Tunnel1
O        172.16.0.3/32 [110/105] via 172.16.0.1, 00:01:48, Tunnel1

LSDB Synchronization:

The most obvious problem with link state routing in hub and spoke topologies is Link State Database synchronization. As touched on earlier, the NBMA properties of DMVPN require all spokes to be in the same OSPF area. This restriction means that we can’t use summarization on the hub to as a method of information hiding. By default, every spoke must maintain LSDB and routing table information for all other spokes. Any time there is a topology change the updates must be flooded to all routers and SPF calculations may be triggered.

If your OSPF hub and spoke connects to other OSPF speaking parts of the network, consider putting the DMVPN portion in its own non backbone area. Doing so provides the opportunity for bidirectional summarization and/or configuration of the hub and spoke area as a stub or totally stubby area. Remember that there is no free lunch, techniques that remove or hide routing information can result in suboptimal paths.

The option that should allow for the largest number of SPF routers in a hub and spoke is to not synchronize the LSDB. This sounds like we’re breaking the link state rules but it works. This approach requires filtering all outbound LSAs on the hub and using static routes on the spokes for upstream forwarding. This should work well if you only have one hub, and all outbound traffic should transit the hub or if you have multiple hubs with an addressing design that aggregates nicely. For this single hub demo, I used a simple static default route toward the hub and filtered all LSAs outbound from the hub. Longest match techniques and/or object tracking can be used for static route failover in multi hub networks.

RoutingLoop_R2(config)#ip route 0.0.0.0 0.0.0.0 tunnel 1 172.16.0.1

RoutingLoop_R1(config)#interface tunnel 1
RoutingLoop_R1(config-if)#ip ospf database-filter all out

This implementation means that no OSPF routing information is propagated to the spokes, but the hub still receives LSAs from the spokes like normal. If R3 goes down, R2 will never be informed. This isn’t all that different from using EIGRP summarization on the hub.

Hub learning spoke routes with outbound LSA filtering enabled:

RoutingLoop_R1#show ip route ospf | begin Gateway
Gateway of last resort is 192.0.2.2 to network 0.0.0.0

      10.0.0.0/24 is subnetted, 8 subnets
O        10.2.1.0 [110/101] via 172.16.0.2, 00:12:52, Tunnel1
O        10.2.2.0 [110/101] via 172.16.0.2, 00:12:52, Tunnel1
O        10.2.3.0 [110/101] via 172.16.0.2, 00:12:52, Tunnel1
O        10.2.4.0 [110/101] via 172.16.0.2, 00:12:52, Tunnel1
O        10.3.1.0 [110/101] via 172.16.0.3, 00:01:02, Tunnel1
O        10.3.2.0 [110/101] via 172.16.0.3, 00:01:02, Tunnel1
O        10.3.3.0 [110/101] via 172.16.0.3, 00:01:02, Tunnel1
O        10.3.4.0 [110/101] via 172.16.0.3, 00:01:02, Tunnel1
      172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O        172.16.0.2/32 [110/100] via 172.16.0.2, 00:12:52, Tunnel1
O        172.16.0.3/32 [110/100] via 172.16.0.3, 00:01:02, Tunnel1

routingloop.net

A place for me to ramble about computer networking.

OSPF in Phase 1 DMVPN Networks