NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

JonAB's avatar
JonAB
Aspirant
Dec 11, 2025

Issues with LACP with VLANs

Hi all,

Network newbie here. I am having some trouble with VLAN's combined with LACP. The setup I have at the moment are purely for educating my self. It consists of:

One Netgear GS724Tv4


One PFsense router virtualized in Proxmox. The LAN interface of router is a separate NIC where all ports are in the same bond, which is configured for LACP. This bond have a dedicated bridge which is assigned as LAN interface for the router. This NIC is connected to port 1-4 on the switch, which is configured as "lag1"

 

One Synology rack station with a SSD. The LAN interface is configured to use all four ports in LACP. It is connected to ports 5-8 on the switch which is configured as "lag2"

I am using two computers to test the bandwidth of the system, just by simply copying a large file from each computer to the NAS. These are either connected to ports 19 and 20, or 21 and 22 depending on which VLAN I want to use.

When connecting the computers to the same VLAN as the NAS everything seems to work just fine:

Port 19 and 20 are receiving ~2050000 packets each. Lag2, the LACP of the NAS, is transmitting ~4100000 packets. These packages are distributed between port 6 and 8. This means that the LACP to the NAS is working, right?

But when moving the computers to another VLAN the trouble begins:


Port 21 and 22 are receiving ~2050000 packets each. These are transmitted to the router on lag1 and it seems like it decided to use port 2 and 3 for the task.

The router is routing the packages from the VLAN with the computers to the VLAN with the NAS. The switch is receiving the packages on lag1 again, but this time on port 1 and 3. 

So far so good? It seems like the LACP between the switch and the router is working good to?

The switch is transmitting the packages on lag2 to the NAS, but why does it only use port 8 now!?

The consequence is obvious when looking at the speeds of the file transfer...

What am I missing? 


8 Replies

  • StephenB's avatar
    StephenB
    Guru - Experienced User
    JonAB wrote:

    What am I missing? 

    LACP is designed so each data flow only runs over one port of the LAG.  The traffic for that flow is not split over multiple ports.  This approach prevents buffer overruns on the final destination link (which is presumed not to be a LAG).  It also ensures that the packets arrive in the order they were sent.

     

    You have two data flows, so there is a 50-50 chance they will be transmitted on the same port (on a 2-port LAG).  Odds are better with a 4-port LAG (1 in 4). Still, a 50-50 chance this would happen on one of your two lags.

     

    In general, LACP does a reasonable job of load balancing when there is traffic from a lot of devices running over the LAG.  But as you are finding, the load balancing doesn't perform well if you only have a couple of data flows.

  • Hi Stephen and thank you for your answer!

    I get your point regarding the distribution of the traffic. This is why I tested with two dataflows. Of course, I can set up two more computers to test with four dataflows... But the thin that is bothering me is that:

    When the sources and destination is on the same VLAN, i.e. the flow is only going through the switch once,  the LACP is performing flawlessly. I have tested many times, and different ports for each flow is always chosen. 

    But when the sources and destination is on different VLANs, i.e. the flow has to go through the router, it is always the same story. The dataflow on the lag to the router seems to work just fine. The data transmitted from the switch to the router is distributed on two ports, as well as the data received to the switch from the router, it is always two ports (in this specific test case that will say...)

    So far, everything seems to behave very well!

    The data flow, sent by two computers, is received by the switch. It is re-transmitted by a lag to the router, by two different ports. The router is transmitting the same flows, but on the right VLAN, which the switch receives on two different ports. 

    But now, the switch is going to transmit the flows to the NAS. But the switch is always choosing to transmit both flows on the same port, even though I have confirmed that the switch and NAS knows how to properly make use of two ports. Which of course is compromising the bandwidth...

    I get the feeling that there is something that I am missing in the setup, but I cannot find it...

    OK, I get that there is a 50/50 chance that one of my lags would choose to make it this way. But now, lag1 (router - switch) is choosing a good way 100 % of the time, and lag2 (switch - NAS) is choosing to transmit everything on the same port 100 % of the time...

    • StephenB's avatar
      StephenB
      Guru - Experienced User
      JonAB wrote:

      OK, I get that there is a 50/50 chance that one of my lags would choose to make it this way. But now, lag1 (router - switch) is choosing a good way 100 % of the time, and lag2 (switch - NAS) is choosing to transmit everything on the same port 100 % of the time...

      The decision on which port to use is based on a hash of the source and destination mac addresses in the ethernet packet.  So it will be deterministic for a specific LAG.  In your particular case, the router is likely rewriting the source mac address.

       

      Note that the transmit decision is made only by the sending device, so flows in the opposite direction can use different ports.

       

      While some implementations give you a couple of alternative ways to make the decision, the GS724 does not.    I guess you could try changing to a static lag, and see if the traffic flow changes.  But likely the switch will use the same policy. Another thing you could try is removing one port from the router LAG.  Then (if you are lucky) these flows will use different ports. But different flows (from different devices) will still end up with sub-optimal load-balancing, so these hacks are only useful if you are trying to optimize these specific flows.

       

      Multigig NICs and switches are the cleanest path, but of course are expensive.

       

      Is there a reason you want local data flows to be routed?  Also, what is your ISP internet speed?

       

  • I will start answering your last questions. There is no real reason. At the moment, I am using the setup purely for educational purposes. I want to try different setups to test my understanding, and when I do run into troubles like this, take the chance to learn something! My ISP is nothing special, 300/300...

    "The decision on which port to use is based on a hash of the source and destination mac addresses in the ethernet packet.  So it will be deterministic for a specific LAG.  In your particular case, the router is likely rewriting the source mac address."

    I think this is it! When the router is transmitting the flows, the MACs and IPs are changed. Instead of having unique MACs and IPs, both flows now have the same MAC and IP. This is making the switch to decide to transmit them on the same port...

    All right. What should I do instead? Let's say that I want a segmented network for increased security. Let's say that I have different VLANs for different clients. Let's say that I have a NAS serving multiple purposes (of course with multiple pools for the corresponding client group). Of course I want to make use of that the NAS has 4 gigabit connection for maximum bandwidth with multiple clients

    • StephenB's avatar
      StephenB
      Guru - Experienced User
      JonAB wrote:

      I think this is it! When the router is transmitting the flows, the MACs and IPs are changed. Instead of having unique MACs and IPs, both flows now have the same MAC and IP. This is making the switch to decide to transmit them on the same port...

      I agree that is likely it.  But I do want to point out that the hash uses both the source and destination MAC addresses in the packet.  Both flows would have the same source MAC leaving the router, but they would still have the different destination MACs.   

       

      JonAB wrote:

      All right. What should I do instead?

      At the end of the day,  LAGs on the switch won't load-balance perfectly, no matter what you do in the config.

       

      If you set up a static lag on the switch <-> synology, you could probably also set up round-robin on the synology.  That would fully load-balance on the outbound path from the synology (but not the inbound path).  But you could end up with buffer overruns in your clients (or in other LAGs), which will result in packet loss.

       

       

    • schumaku's avatar
      schumaku
      Guru - Experienced User

      The crux is that LACP and the underlaying LACPDU was never intended to go beyond a single LAG connection, e.g. between a host and a switch, or for the matter between switch and switch. Said this, the LACPDU will never leave the connection making up a LAG or for sake an aggregation - it remains between the the two participants. Citing some parts from a public document Link Aggregation Control Protocol bv Mick Seaman 

       

      ===

      ...

      This note is not a ballot comment, but provides background for ballot comments in the usual form.

      ...

       

      This revision summarizes my (Mick Seaman) understanding of the P802.3ad Link Aggregation Control Protocol described in D1.0 and proposes some minor changes. 

       

      When it is clear that protocol exchanges between participants in separate systems are being discussed (rather than the aggregate behaviour of participants in a single system) the term “participants” refers to the local participant, sometimes called the “actor” for clarity, and his remote “partner”. Protocol Data Units A single message type, the LACPDU, is transmitted by protocol participants. It comprises the following information both for the transmitting actor, and its remote partner : the partner information being the actor’s current view of its partners parameters.

       

      • Port Number
      • System ID
      • Key
      • Status

       

      The Status information comprises the following flags:

       

      • LACP_Activity
      • LACP_Timeout
      • Aggregate
      • Synchronization
      • Collecting
      • Distributing

       

      ....

      ===

       

      The rewriting of the MAC addresses is intentional on an LACP LAG:

       

      LACP Link Aggregation Group (LAG) uses a single virtual MAC address (System ID), formed by combining a unique System Priority and the switch's base MAC address, to act as one logical link, allowing seamless failover and load-balancing; this virtual MAC identifies the entire bundle, not individual ports, ensuring consistent LACP negotiation and traffic flow across the aggregated link, crucial for systems like MC-LAG (Multi-Chassis LAG) to maintain a single identity. 

       

      Can't find at any point in the 802.3 documentation, including P802.3ad Link Aggregation Control Protocol, that the LACPDU could be distributed further than the two end points of the aggregation bundle.

       

      This is why the switch on the next hop will not specially take care about the virtual MAC address, not's not coming especially together with the LACPDU. So the next hop switch does not especially take care about the fact that is could know about it's previous hop aggregation - so the switch won't specially process and distribute these frames the way one might expect.

       

      Several community members came to the community before, querying about the possibility of cascading multiple switches using LACP LAGs. 

       

      Even re-distributing the aggregated traffic to a static LAG won't change anything in my understanding.

       

      Does this clarify about the ghosts you try to fight?

       

       

  • Thank you so much Stephen and Schumaku!

    This definitely helps me with the ghosts I am trying to fight! 

    I thought LACP and couple of parallell 1 gig connections was the answer to connect several switches and a router to the core switch. At least in the case where the cost of multigig equipment is not motivated.

    Now I know that it most probably is not my configuration that is faulty, just my former understanding of LACP.

    • StephenB's avatar
      StephenB
      Guru - Experienced User
      JonAB wrote:

      I thought LACP and couple of parallell 1 gig connections was the answer to connect several switches and a router to the core switch. At least in the case where the cost of multigig equipment is not motivated.

      Well, it can work if there are enough data flows.  But there almost never are on a home network.

       

      FWIW, I had it enabled on my ReadyNAS Pro-6 >-> GS724 for a while, but decided it wasn't worth the bother, so I switched back to one gigabit connection.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More