This morning I received a ticket to check a network down issue...
According to the description, customer had an isolated site, represented in the diagram bellow by Router R3 (by the way, here I´m working in a fictitious network, to avoid showing our customer data):
First things first... it could be a link failure (there´s only one exit from R3 to the core´s network, and that´s S1/0, and as far as the site is isolated, any engineer would check the link first) and that was exactly what I did. To my surprise link was UP!
Ok, so link is up... I was able to ping to other side of the link, also a good sign... just did a quick check at the router configuration, and found that it´s running OSPF.
Checked the neighbor relationship between it (R3) and it´s neighbor router R2. Again, everything looks pretty much fine so far!
The real problem began to reveal itself when I checked R3 routing table... R3´s routing table only had it´s connected network, plus the network 192.168.1.0... Hmmm... it seens to be an OSPF related issue!
| R3#sh ip route . output omitted . Gateway of last resort is not set 20.0.0.0/24 is subnetted, 1 subnets C 20.3.3.0 is directly connected, Loopback1 10.0.0.0/32 is subnetted, 1 subnets C 10.3.3.3 is directly connected, Loopback0 O 192.168.1.0/24 [110/128] via 192.168.3.2, 00:00:02, Serial1/0 C 192.168.3.0/24 is directly connected, Serial1/0 30.0.0.0/24 is subnetted, 1 subnets C 30.3.3.0 is directly connected, Loopback2 |
Quick jumped to R2 to check it´s routing table... I´ve just found entries to R1´s network, plus it´s directed connected networks, nothing related to router R3.
| R2#sh ip route . output omitted . Gateway of last resort is not set 20.0.0.0/32 is subnetted, 1 subnets O 20.1.1.1 [110/65] via 192.168.1.1, 00:00:02, Serial1/0 10.0.0.0/32 is subnetted, 1 subnets O 10.1.1.1 [110/65] via 192.168.1.1, 00:00:02, Serial1/0 C 192.168.1.0/24 is directly connected, Serial1/0 C 192.168.3.0/24 is directly connected, Serial1/1 30.0.0.0/32 is subnetted, 1 subnets O 30.1.1.1 [110/65] via 192.168.1.1, 00:00:02, Serial1/0 |
Ok, now I´ve found the problem... take a look at the neighbor relationship in R2 (using the show ip ospf neighbor command):
| R2#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 10.1.1.1 0 FULL/ - 00:00:33 192.168.3.3 Serial1/1 10.1.1.1 0 FULL/ - 00:00:37 192.168.1.1 Serial1/0 |
Can you see the problem?! Off course you can! Both routers(R1 and R3) have the SAME ROUTER ID!
I didn´t belived what I saw! Customer told me that "nobody" touched the router, at least for the last couple weeks!
Ok, just doing a debug ip ospf hello at R2 you can see the same issue... both routers (R1 and R3) using 10.1.1.1 as it´s Router ID.
| R2#debug ip ospf hello *Mar 1 00:11:16.039: OSPF: Rcv hello from 10.1.1.1 area 0 from Serial1/0 192.168.1.1 *Mar 1 00:11:16.043: OSPF: End of hello processing *Mar 1 00:11:16.403: OSPF: Rcv hello from 10.1.1.1 area 0 from Serial1/1 192.168.3.3 *Mar 1 00:11:16.407: OSPF: End of hello processing |
Taking a look at the routers ID, I´ve found R1 RID: 10.1.1.1, R2 RID: 10.1.1.2 and R3 RID: 10.1.1.1, asked the customer if I could change R3´s RID to 10.1.1.3 or if he had any other RID to be used, and received his GO-AHEAD to change it!
After entering the router-id 10.1.1.3 command in R3´s OSPF configuration, and off course, using a clear ip ospf process (in R3) things looked a lot better!
Take a look at R2´s neighbor table right now:
| Rack1R2#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 10.1.1.3 0 FULL/ - 00:00:33 192.168.3.3 Serial1/1 10.1.1.1 0 FULL/ - 00:00:37 192.168.1.1 Serial1/0 |
And finally, R2´s Routing Table with all posible routes:
| Rack1R2#sh ip route . output omitted . Gateway of last resort is not set 20.0.0.0/32 is subnetted, 2 subnets O 20.1.1.1 [110/65] via 192.168.1.1, 00:00:05, Serial1/0 O 20.3.3.3 [110/65] via 192.168.3.3, 00:00:05, Serial1/1 10.0.0.0/32 is subnetted, 2 subnets O 10.3.3.3 [110/65] via 192.168.3.3, 00:00:05, Serial1/1 O 10.1.1.1 [110/65] via 192.168.1.1, 00:00:05, Serial1/0 C 192.168.1.0/24 is directly connected, Serial1/0 C 192.168.3.0/24 is directly connected, Serial1/1 30.0.0.0/32 is subnetted, 2 subnets O 30.3.3.3 [110/65] via 192.168.3.3, 00:00:05, Serial1/1 O 30.1.1.1 [110/65] via 192.168.1.1, 00:00:05, Serial1/0 |
So... with a 5 minute troubleshooting everything was fine! But this little issue kept our customer´s network down for a while, due to a mistake some "ghost" made in their router configuration!
You might be asking how the RID is selected / configured...:
1 - If you configure the RID with the command router-id <address> (example: router-id 10.1.1.1) under OSPF configuration mode, this manually configured RID is used;
2 - If no RID is configured, the router tries to use the higher Loopback IP Address it founds in the OSPF startup process;
3 - If no Loopbacks are configured, the routers uses the higher IP Address configured in any physical interface;
4 - If no RID is configured, and the router has no IP configured interfaces, the OSPF process cannot start!
Just be carefull while using OSPF RID, it has to be unique in your network! Otherwise, you can have huge problems! Keep your network well documented, and do not make changes in a live enviroment without studying the side-effects before actually doing it!