NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

zwavoo's avatar
zwavoo
Aspirant
Oct 21, 2016

Configuration of RSTP to stop my network frequently Dying.

We have severe problems, I believe with someone in the office crossing network ports, and causing a loop somewhere. This has happened a few times now, and on most occasions, after a few hours Ive managed to track the culprit down. On other occasions, the problem lasts all day, then seems to be OK the next.

 

Im reasonably confident that we need STP configured to help us, but there is quite a lot of conflict on the best ways to do this. I find myself turning to the community for assistance, as Netgear's documentation is not all that good on this topic..

 

Our network comprises two halves of an office with a single GSM7328Sv2 in each, linked by 2 x 10GB Fibre connections in a LAG. Connected directly to each of these switches are a number of GS724 / 48, FS728TP, and a few servers / Wireless APs (Meraki). The GS switches are all on VLAN 1 as they are just for staff network access. the FS728TP switches are all in VLAN5 since these are for our phones, that really is the only isolation we have. There should not be any other switches on the network, so essentially our "tree" is only one level deep.

 

Each time we have had a problem, and I have been able to find it, I discover a cable connecting one switch to another, simple fix. I understood though, that STP or RSTP would discover this problem, and block the ports, thereby isolating the errant switch. What I see however, is both of the GSM switches "go dark" - effectively they reboot, so the logs are useless. our SNMP logging server doesnt seem to show a lot either.

 

If anyone has any advise on how I go about setting up RSTP in our environment, I would be grateful for the assistance.

11 Replies

  • Spanning tree is fairly simple to implement in terms of overall topology but preventing certain activity in the network depends entirely on your overall network implementation.

     

    From the basic information given so far, you seems to have top down topology.  Here are few ideas to get you started at addressing this.  Beyond this, it would be helpful to post your text configuration (remove the password lines) and your network topology diagram.

     

    1. If you have basic downstream implementation as most network, enforce specific rules to prevent root changes which has much larger negative impact.  Change the top switch to lower Bridge priority value.  Default is 32768 or 16384.  Make one the GSM at top be 0 and other 16384

     

    1. Since you are using multiple VLAN, I will make a assumption here and ask that you make sure to use MSTP here when possible for now if the switch offers this option, otherwise stick with RSTP setting.  By default MSTP configuration is just like RSTP since there is only 1 instance running tied to all the VLANs.  

     

    1. If the issues is someone is creating a loop, there are few things you can do to help isolate them.  If you running into broadcast flood affecting the network, then I suggest taking a basic action on broadcast control under "Storm control" under security.    Implement a more strict broadcast setting then default.  If given a choice of PPS or %, choose PPS on edge port (user port) and limit it to 100 pps/sec.  For the uplink ports on switch which connect switches together, stick with % but limit it to 1-2% at most.  Your topology is flat based on what was stated, so consider segmenting the network further if possible.   This change has a negative impact on users which will get port shutdown so make sure to implement some kind of trap SNMP server to get alerts of ports being shut down.  If you have spare virtual machine or server, install NMS200 which is free for 200 devices and spend a day or two configure it for alerts, monitoring, and notification.

     

    1. Each of model depending of datasheet has specific features to address certain behavior dealing with BPDU.  I could elaborate those, but I suggest posting the 2 items I suggest before we start a discussion on those.  

     

    1. Any port that is edge (meaning expected to have users’ desktop connected) should be set to   "spanning-tree edgeport"

     

    1. To debug spanning tree quickly, when the issue occurs, connect via serial to one of top switches,  enable debug for spanning-tree bpdu via command

    debug spanning-tree bpdu

    You can now look at debug level buffer logging (you may need to enable that as well.  "logging console 7" to see any output.  Should give you more useful insight into what is going on instead of you flying blind.  Make sure undo the debug after you are done fixing it.

    "no debug spanning-tree"

     

    1. Quickly dealing with flooding.  There are few ways to do it.  Here are some example
    2. Review the broadcast rate on the port.  If needed, clear the stats on the port, and check them quickly.  You can identify the source ports with most in order to track down to the next uplink and all the way to source port quickly to find the culprit.
    3. It bit harder so I am not going to explain how to do it but how mostimplement it, using pcap to understand the culprit. Implement a port at the top level switch which will act as your eyes on the network.  Most people set aside a spare port to act as monitor destination on their top switch and typically set the source to uplink port going to the firewall or edge router.  Useful to snoop traffic going in/out of network to determine bad activity if needed. Snoop the traffic and determine the possible endpoint and track the endpoint across the network port map.

     

    Hope this is helpful, I apologize for spelling mistakes.

     

     

     

    • zwavoo's avatar
      zwavoo
      Aspirant

      Wow. This is a wealth of information, and Ive spent days searching the internet. Ive not managed to find nearly half as much as youve offered here, and IM grateful for the assistance.

       

      Our Topology is fairly basic, as the attached diagram shows.GFNetwork.png

      The office is made up of two buildings, physically joined, and networked with 2 x 10GB Fiber connections (shown above in blue). The Green lines represent gigabit uplinks from all of the switches to the two "Core" devices. connected to all of the switches are the users PCs, printers, and Phones (on the FS728TP).

       

      Im not sure what config files you need (from which devices) but Im happy to provide those.. ?

      • Jedi_Exile's avatar
        Jedi_Exile
        NETGEAR Expert

        The topology seems simple enough.

         

        For Building Wing A.  Change the Bridge priority value on GSM7328Sv2 to 0 under the spanning tree configuration.

        For Building Wing B.  Change the Bridge priority value on GSM7328Sv2 to 16384 under the spanning tree configuration. 

        That should take care of root changes immediately.  make sure to do this during off production hours.  Make sure all other switches are defaulted to 32768 for bridge value (This is default value)

        Make sure all switches are switched to MSTP or fallback to RSTP for switches taht do not have it. 

         

        1. Make sure the uplink between the 2 core switches is ethernet and not stacking.  Stacking in this specific circumstance will be ok if you want to make it unified switch setup but LAG would be preferred to prevent split brain routing since all routing will end up terminating on building A.

         

        2. Log into the switch via telnet or SSH or serial console.  At enable prompt type "show tech-support" command to generate the support file.  PM this file to me for both core switches

         

         

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More