M4300 Replying on ARP for external IP with lowest VRRP VMAC even if it is in BACKUP state
I've found this while investigating packet loss issue at a site with next setup:
- 3 identical XSM4316S connected to each other (triangle): sw1, sw2, sw3
- STP is used. Normaly STP chooses to block sw1-sw3 link, leaving sw1-sw2 and sw2-sw3 operational
- sw1 have internet connection
- sw3 have separate internel connection
- sw1 is Master for VRRP IP with VRID 10
- sw3 is Backup for VRRP IP with VRID 10
- sw3 is Master for VRRP IP with VRID 20
- sw1 is Backup for VRRP IP with VRID 20
In this setup if someone makes arp request for address external to this network, say 126.96.36.199, both routers (sw1 and sw3) will reply. sw1 will reply with VMAC_FOR_VRID_10, and bad thing is that sw3 will also reply with VMAC_FOR_VRID_10 despite the fact that VRID 10 is in BACKUP state on sw3!
This is a problem for hosts behind sw2 as both VMACs are non-local. When sw2 sees those 2 replies with same src mac (VMAC_FOR_VRID_10) on different ports its Address Table is polluted for a short period in case sw3's reply comes last.
This short period is where pings are lost for hosts behind sw2 that use VRID 10 as gateway. Actually they are not lost, they are being forwarded to wrong port (to port facing sw3 instead of port facing sw1) and being discarded by sw3.
Removing VRID 10 or making it inactive on sw3 makes sw3 replay with VMAC_FOR_VRID_20 and no problem occurs.
Considering the fact that sw1 and sw3 are identical, have simiral config and considering sw1 replies with correct VMAC I may guess that switch is replying with lowest VMAC it have not taking BACKUP VMAC state into account.
Relevant dumps recorded with port mirroring:
- Both VRIDs active on sw1 and sw3, VRID 10 is in BACKUP state on sw3. "arping -i br0 188.8.131.52" on host behind sw2, dump on sw2, port facing sw3, Rx:
17:43:43.288507 00:00:5e:00:01:0a > MAC_X, ethertype 802.1Q (0x8100), length 64: vlan 10, p 0, ethertype ARP, Reply 184.108.40.206 is-at 00:00:5e:00:01:0a, length 46
- VRID 10 set inactive on sw3. "arping -i br0 220.127.116.11" on host behind sw2, dump on sw2, port facing sw3, Rx:
17:40:32.996518 00:00:5e:00:01:14 > MAC_X, ethertype 802.1Q (0x8100), length 64: vlan 10, p 0, ethertype ARP, Reply 18.104.22.168 is-at 00:00:5e:00:01:14, length 46
In the above 2 cases on the host where arping is issued I always see 2 ARP replies to each ARP request. The difference is that in first case I see identical src MAC for each reply and in second case they are different.
The real world source for such arp requests for external address was MacOS host with absent/bad gateway, in my case gateway was set to host's IP by a mistake.
My questions are:
1. Should a router reply with its MAC for arp request for external address? This seem like a router discovery mothod to me, but I can't find information describing it.
2. Can this behaviour (question 1) be disabled?
3. Is it a bug that switch replies with a VMAC of VRID in BACKUP state?