× NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Orbi WiFi 7 RBE973
Reply

DHCP Response not traversing stacked switches

rand__
Aspirant

DHCP Response not traversing stacked switches

Hi,

hopefully only another user error, but at the moment I am out of ideas again.

I have 2 S3300's (FW 6.6.17) [A1,A2] connected via stacking in 2 different rooms.

On one of the 10G Ports another switch (B) is connected via Trunk, on that switch a bunch of boxes are connected.

I have issues getting DHCP to work over the two stacked switches, all regular traffic (non broadcast i suppose fine).

 

Local DHCP within Switch B is working fine, as is DHCP only using A1 (have not tested "A2 only"), but not the routes  "Client ->A1->A2->B" or "Client->B->A2->A1".

 

I can see DHCP Offers cross the line reaching my DHCP servers (one 2012 DHCP, one linux based one) on each room, but no actual DHCP replies get sent back.

 

I have one client and one Server in A1, one Client and one Server in B (VMs)

 

I have setup port mirroring on the uplink port 1/28 (which is on A2 and connected to Switch B) and capture DCHP Requests from B->A1, but the DHCP Server on A1 does not see them.

I have set up a client on A1 requesting from B and can see requests arriving on B but the answer never arrives back at A1

 

So basically traffic flows as follows:

Client (in either room) sends out DCHP request, this get received by DHCP server in other room.

DHCP Server replies with OFFER, but OFFER never reaches Client.

 

I used wireshark/tcpdump/port mirroring to verify this behaviour but can't really explain it 😞

 

Easiest would be to break the stack and just link them via trunk but that can't be it...

 

Any idea how to troubleshoot this?

Thanks

 

 

Model: S3300-28X-PoE+ (GS728TXP)|ProSAFE 24-port Stackable Smart Switches with PoE+
Message 1 of 14

Accepted Solutions
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Alright, that is good! Thanks for clarifying.

I think we will focus on one direction for now (not the music band Smiley LOL). Since your Windows server will be the DHCP server in the end, we can use that one and leave the Linux one turned off/disconnected for now.

I am not sure where the Windows server is connected, but let's just assume that it is connected to A1 in the stack and your client is connected to switch B. Connection flow is like this then:
DHCP server --> A1-stack --> A2-stack --> Switch B --> DHCP client.

You have already done some good troubleshooting with Wireshark. From my understanding, you have port mirrored and Wiresharked the link between A2-stack and switch B?

- You see the DHCP discover send from the client?
- You also see the DHCP offer from the server?
- Hereafter, you see nothing further?

Just to clarify how DHCP works:
1. Client sends a DHCP discover
2. Server sends a DHCP offer, offering an IP address to the client
3. Client checks that the offered address is not in use by anyone else (gratuitous ARP)
  a. If the offered address is in use by someone else, the client won't use the offered address and sends a DHCP decline message to the server. Hereafter the client starts the DHCP discover process all over again.

  b. If the offered address is not use then the client will accept the address, by sending a DHCP request to the server. A request to obtain the offered address.

4. The server will acknowledge to the client, with a DHCP ACK package, and then the server will register the DHCP entry in its DHCP binding table.

 

So, we gotta find out exactly where this does wrong. That will be important.

 

Let's try this:

- Port mirror the uplink between A2-stack and switch B, as you have done already. Run Wireshark to capture the traffic on that uplink.

- Packet capture the server as well, at the same time.

- Packet capture the client as well, at the same time.

So, we have 3 packet captures running now.

 

- Now, connect the DHCP client to switch B and see what happens. Filter the Wireshark captures by: bootp. This is so that you only see DHCP traffic.

 

What happens exactly?

The client sends the discover. The discover reaches the server?

The server replies with and offer? If so, does the offer reach the client?

etc.

 

As you have 3 Wiresharks running, you can see exactly where the problem lies. For example:

If the server sees the DHCP discover and sends the offer and the client sees the offer but never sends a request --> then we need to look into why the client didn't send the request.

or

If the server sees the DHCP discover and sends the offer, and you see the offer being forwarded on the link between A2-stack and switch B, but the client never sees the offer --> then we have to investigate why the offer was not forwarded to the client, by switch B.

 

etc.

 

I hope that makes sense? It is a bit of work, but this will give you a much better picture, because until we have the full picture we can only make educated guesses 🙂

 

Any questions, give me a shout!

 

 

Cheers

View solution in original post

Message 5 of 14

All Replies
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Hi,

 

Thanks for the detailed description. Very helpful. 

 

I need to get the obvious out of the way 🙂 

You have two DHCP servers. Those two are active at the same time? If so, they are in different VLANs? The reason for asking is because you should never have two DHCP servers assigning addresses to clients, within the same VLAN. It will cause a bunch of problems. 

 

So, before we dig any deeper, can you verify how that setup is?

 

Cheers 

 

Message 2 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

The usual mode of operation is only the AD DHCP to be active, the linux one is for troubleshooting only:)

But of course the two are not active at the same time:)

 

Message 3 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

Also, this is all on VLAN 1 i.e. native/untagged traffic

Message 4 of 14
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Alright, that is good! Thanks for clarifying.

I think we will focus on one direction for now (not the music band Smiley LOL). Since your Windows server will be the DHCP server in the end, we can use that one and leave the Linux one turned off/disconnected for now.

I am not sure where the Windows server is connected, but let's just assume that it is connected to A1 in the stack and your client is connected to switch B. Connection flow is like this then:
DHCP server --> A1-stack --> A2-stack --> Switch B --> DHCP client.

You have already done some good troubleshooting with Wireshark. From my understanding, you have port mirrored and Wiresharked the link between A2-stack and switch B?

- You see the DHCP discover send from the client?
- You also see the DHCP offer from the server?
- Hereafter, you see nothing further?

Just to clarify how DHCP works:
1. Client sends a DHCP discover
2. Server sends a DHCP offer, offering an IP address to the client
3. Client checks that the offered address is not in use by anyone else (gratuitous ARP)
  a. If the offered address is in use by someone else, the client won't use the offered address and sends a DHCP decline message to the server. Hereafter the client starts the DHCP discover process all over again.

  b. If the offered address is not use then the client will accept the address, by sending a DHCP request to the server. A request to obtain the offered address.

4. The server will acknowledge to the client, with a DHCP ACK package, and then the server will register the DHCP entry in its DHCP binding table.

 

So, we gotta find out exactly where this does wrong. That will be important.

 

Let's try this:

- Port mirror the uplink between A2-stack and switch B, as you have done already. Run Wireshark to capture the traffic on that uplink.

- Packet capture the server as well, at the same time.

- Packet capture the client as well, at the same time.

So, we have 3 packet captures running now.

 

- Now, connect the DHCP client to switch B and see what happens. Filter the Wireshark captures by: bootp. This is so that you only see DHCP traffic.

 

What happens exactly?

The client sends the discover. The discover reaches the server?

The server replies with and offer? If so, does the offer reach the client?

etc.

 

As you have 3 Wiresharks running, you can see exactly where the problem lies. For example:

If the server sees the DHCP discover and sends the offer and the client sees the offer but never sends a request --> then we need to look into why the client didn't send the request.

or

If the server sees the DHCP discover and sends the offer, and you see the offer being forwarded on the link between A2-stack and switch B, but the client never sees the offer --> then we have to investigate why the offer was not forwarded to the client, by switch B.

 

etc.

 

I hope that makes sense? It is a bit of work, but this will give you a much better picture, because until we have the full picture we can only make educated guesses 🙂

 

Any questions, give me a shout!

 

 

Cheers

Message 5 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

So it looks like I was mistaken - at the moment I receive no OFFERs on the Netgear switchport/uplink to switch B.

I thought this was the case yesterday when I opened this but maybe I confused it with packages the other way round.

 

I have been testing Client->A1->A2->B->Server now. Still see incoming/outgoing requests on Server, but mirror port on A2->B uplink dows not see offfers.

So looks like Switch B is not passing on (or A2 dropping) the packages. I checked obvious things like DHCP Snooping and discarded packages to see whether I could find out if its A2 or B but current indications point to B (as I can't find anything on A2).

B is a similar-to-Cisco switch without support so I'll have to dig around there.

 

Happy to get pointers but o/c not your job 🙂

Thanks for your help, I assume we can close this for now (or keep open for now in case I have to follow up, whatever you prefer)

Message 6 of 14
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Hi,

We, the community a whole, are always happy to give a helping wherever we can 🙂 I understand that switch B is not a Netgear, but I think we can still double-check things to be sure of where the issue lies.

So, you are connected in this way now: Client--> A1-stack--> A2-stack--> switch B --> Server. You are saying that, on the server, you see the DHCP discover coming in and you also see the DHCP offer going out. However, on the uplink between A2-stack and switch B, you don't see the offer traverse back towards the client.


It could be a problem on switch B, but let's check a few things to be sure:

1. Please ensure that your port mirror of the link between A2-stack and switch B, is set to mirror traffic is both directions (RX and TX).

2. Is the port mirror working correctly? The DHCP discover is a broadcast so it will be seen by all, whereas the DHCP offer is a unicast. So, if the port mirror is not working correctly then you will still see the discover (all devices on the network sees those), but you won't see the offer as that is a unicast. To confirm that the port mirror is OK, send a ping or other unicast traffic across. Does the probe see those on the uplink? If yes, then the port mirror is OK.

3. You said that if the client and server are both on switch B, then it works. It is odd right? Why would it not work across the uplink, but indeed work if client and server are both on switch B? I am wondering if the address table is not being populated correctly on switch B, for some reason. When the DHCP discover has been sent from the client, switch B should pick up the client's mac address and that address should be linked to the uplink port that connects to A2-stack. How does the address table look, on switch B?

4. Switch B, is it in production? Any chance you can reset it? Is there a lot of config on it? Just to see if the same occurs on a factory defaulted unit, with no additional settings. Since you are running on VLAN 1, out of the box this should work on the switch as all switches use VLAN 1 per default.. Maybe even try a different switch, as a test, if you have another one.

One more thing. If a client does not get the offer message, it just send more discover messages. Can you confirm that you see this? i.e. client sends discover --> server sends offer (offer never reaches client) --> client re-sends the discover. This cycle should happen over and over until the client give up. Can you see that cycle in the pcap?


Cheers

Message 7 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

Thanks a lot for your willingness to help, really appreciate it:)

 

1. Please ensure that your port mirror of the link between A2-stack and switch B, is set to mirror traffic is both directions (RX and TX).

Checked.

2. Is the port mirror working correctly? The DHCP discover is a broadcast so it will be seen by all, whereas the DHCP offer is a unicast. So, if the port mirror is not working correctly then you will still see the discover (all devices on the network sees those), but you won't see the offer as that is a unicast. To confirm that the port mirror is OK, send a ping or other unicast traffic across. Does the probe see those on the uplink? If yes, then the port mirror is OK.

Tested OK

3. You said that if the client and server are both on switch B, then it works. It is odd right? Why would it not work across the uplink, but indeed work if client and server are both on switch B? I am wondering if the address table is not being populated correctly on switch B, for some reason. When the DHCP discover has been sent from the client, switch B should pick up the client's mac address and that address should be linked to the uplink port that connects to A2-stack. How does the address table look, on switch B?

Client Mac address has been found on associated uplink port

4. Switch B, is it in production? Any chance you can reset it? Is there a lot of config on it? Just to see if the same occurs on a factory defaulted unit, with no additional settings. Since you are running on VLAN 1, out of the box this should work on the switch as all switches use VLAN 1 per default.. Maybe even try a different switch, as a test, if you have another one.

Tried playing around, caused a mess:p, some older version config did not fix the issue, reboot neither

One more thing. If a client does not get the offer message, it just send more discover messages. Can you confirm that you see this? i.e. client sends discover --> server sends offer (offer never reaches client) --> client re-sends the discover. This cycle should happen over and over until the client give up. Can you see that cycle in the pcap?

Yes cycle is visible

 

Can this be a MTU issue? I have mixed MTUs (due to mixed interface usage of VLAN/non VLAN traffic)

All Clients/Server are on 1500, ESX Switches at 9k, Ciscolike at 9218 and S3300 at 9198.

I assume since the client and server initiate at 1500 it should be no problem, but wanted to bring up for discussion. I o/c ran ping with 9k size to verify, all fine.

 

I also found that the DHCP offer is indeed a unicast message (not broadcast as originally thought) since there is only L2 involved.

This makes it even more weird since other unicast traffic (ping) flows freely between client and server.

 

Message 8 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

Also interesting is that I see arp requests from some VMs on the same ESX host arriving at the dhcp client but none originating from the DHCP server VM although all use the same physical NIC and thus the same route.

Set up another DHCP in the same ESX Box - is not working either, so not VM releated.

Just weird:p

 

Message 9 of 14
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Hi,

It is weird indeed! I don't think MTU would play a role here. As you said, you see other traffic from the the client VM to DHCP server VM.

I can't tell you why switch B is not passing those DHCP offers back to the client, however, at least you know where the issues lies. The fact that you tested the port mirror and it shows it is working, but you still don't see the offers traverse across that uplink - that tells us switch B is not forwarding them.

ACLs could be the issue, if they are implemented on switch B. DHCP snooping is an obvious suspect, but you have checked that this is turned off on switch B. You have narrowed the issue down, but I am not sure how to solve this. I have seen similar issues in the past. "Strange" behaviour that was hard to make sense of. In most of those cases it was a corrupt config that was the issue. This is why I suggested to factory reset the switch, if possible.

If this was me, I would try and connect a different switch to the Netgear stack and move my DHCP server to that switch. If it works fine with that setup, I would start to move devices to that new switch temporarily and afterwards go "brute force" on switch B - i.e. reset it and reconfigure it from scratch.

But yes, it is a really tough one. I understand why you were perplexed here.


Cheers

Message 10 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

So I might have a solution (if it is that then stupid user error) but I don't really understand it.

I was working on using an alternate NIC in one of the boxes to circumvent Switch B (instead directly attach to 10G Port).

Then I remembered that at some point in time (weeks ago when I started looking at it) I had DHCP working on a physical box attached to A1 but not a VM hosted on a box attached to A1.

Back then i checked the switch config and found that I was actively tagging VLAN 1 on the path between Switch B and A1, i.e. A2->Uplink B was tagging 1 as well as the stack port.

(Side question - do I need to manage the stacking port at all with vlans etc or is that config agnostic?)

 

So after reading and searching i found that the native VLAN (VLAN 1 on Netgear) is basically untagged traffic. Untagged trffic gets associated with VLAN 0 on ESX. So naturally I removed the tagging of VLAN 1 since I needed untagged traffic to flow into my ESX dvSwitch to reach the correct portGroup. Left that be ever since until i remembered this today.

 

So I reconfigured tagging for  VLAN 1 on these 2 interfaces only - *not* on the interface going to the ESX box and now suddenly both physical and virtual client receive IPs?

I think I had changed the dvSwitch Config during debugging to a non trunked value - mabye that was the actual fix here.

 

Further tests:

I have just added VLAN1 Tagging on the A1 side (to ESX box) - now its not working any more - remove from ESX Port - works.

So it seems that I need Vlan1 Tagging on uplink Port to physical Switch B but must not use it on ESX dvSwitch uplink ports...

Message 11 of 14
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Hi,

I am glad to hear you have made progress.

However, I will agree that the VLAN should not have been the issue - based on your testing. I understand that you are confused with that part. See, if the VLAN was incorrect then the server would never even see the DHCP discover from the client - let alone respond with an offer. So, that is odd. But I am no VMware expert and I am not sure how those virtual NICs handle the traffic.

I am very suprised the VLAN settings on the port to the VMs seems to have been the issue. Surely, if that was the case then you wouldn't be able to ping across or even see each others broadcasts. But, whatever, it is working and that is good! 🙂

The tagging vs untagging comes down to this:
1. If the device in the other end is VLAN-aware, then Tag ("T") the traffic.
2. If the device in the other end is NOT VLAN-aware, then UNtag ("U") the traffic and set corresponding PVID.


So, between switches I would always Tag my traffic. Tagging the traffic also allows for multiple VLANs on a port, called a trunk. You can leave VLAN 1 run untagged across trunk-links if you prefer though (letting VLAN 1 run as what is called a Native VLAN - as you mention yourself). It is a matter of preference really.

As regards to your switch-ports conneting to the VMs. You can Tag ("T") those for VLAN 1 if you want, but you need to make sure that the VMs are also set to Tag themselves, for VLAN 1. It has to match.

 

 

Cheers

Message 12 of 14
rand__
Aspirant

Re: DHCP Response not traversing stacked switches

Hi,

first of all thank you very much for your help, its always good to have someone doublecheck and order the approach (and confirm one is not going crazy if its not working :p).

 

Second - its totally correct that I must not tag the traffic to the ESX switch since only untagged traffic will be handled as such and reach the target portgroup (which accepts only untagged traffic).

The weird thing, is that I needed to tag traffic from Switch B to A although both switches treat Vlan 1 as native. But it might relate to the trunk itself that either switch is not able to trunk *and* accept untagged traffic at the same time.

If so, then I'd assume that the tagged Vlan1 traffic is untagged since PVID matches at that port. Thus I'd need to have the mixed tagging settings (contrary to seemingly logical approach to have native vlan all-tagged or none-tagged ).

 

While I have you, would you mind answering the question regarding the need of configuring the stacking port at all?

Thanks a lot,

cheers:)

Message 13 of 14
Hopchen
Prodigy

Re: DHCP Response not traversing stacked switches

Hey,

 

No problem at all!

 

Sorry, I missed that question. No, you do not need to manage the stacking port at all. 

 

As for the trunk, the NTGR switches can indeed Trunk with native VLAN, meaning all VLANs are tagged except one (the native VLAN) which is untagged + PVID. Not sure how switch B handles this though, but as you can tag all VLANs across, I would do that. I don't normally see much reason to run with a "native VLAN" on the Trunk between switches. I'd rather tag them all.

 

Any doubts, reach out! 🙂

 

 

Thanks

Message 14 of 14
Top Contributors
Discussion stats
  • 13 replies
  • 7154 views
  • 0 kudos
  • 2 in conversation
Announcements