1

Topic: Beta CobraNet firmware for NION

I have new CM-1 module firmware that looks like it has solved the NION HF2 related error problem.
Please let me know if you are interested in receiving a Beta version of this for testing.
Send me an E-mail at sgray@peavey-eu.com requesting inclusion in the Beta program.

Nihilism is best done by professionals

2

Re: Beta CobraNet firmware for NION

Excellent. Any thoughts on what you think the cause is? From your other post it seems to be unrelated to STP and its variants. Does it have to do with incoming non-Cobranet traffic? How about the amount of Cobranet traffic(in either direction) handled by the CM-1?

3

Re: Beta CobraNet firmware for NION

We can induce a failure by introducing a loop in a network.   The root cause is a stack overflow induced by large amounts of traffic (a data storm) on the network. The rapid reception of Ethernet frames appears to be causing nested interrupts in the firmware and causing the stack to overflow which then causes the firmware to lose its mind and stop functioning properly.

There are more than a few ways that data storms can appear on a network. The behavior of RSTP and MSTP can allow brief data storms to occur when network topologies change, particularly when a previously down link is restored and causes a loop that is not removed quickly enough.

So a data storm would be the event that can induce the problem. But the root cause is a firmware bug that we hope is now fixed in the Beta firmware.

Nihilism is best done by professionals

4

Re: Beta CobraNet firmware for NION

Thanks for the insight, unfortunately it still points to network traffic being the cause. In my particular instance I haven't found any such traffic as yet.

A quick note regarding your .pdf document addressing this issue. You mention that loops may be created for a short time with STP. That is not the case. Here is a link that has a pretty good explaination regarding the various states that a switch port progresses through:

http://www.cisco.com/univercd/cc/td/doc … tocid69353

5

Re: Beta CobraNet firmware for NION

This tallies with our observations over the last year or two.

We frequently use looped networks for redundancy, using HP Procurves and STP.

If the network ports that connect to Nions and CABs are not set correctly then when a STP renegotiation occurs when a backbone network link is removed then you get a broadcast storm that will lock up CAB 8i's and CAB 8o's and often reboot Nions immediately.

....good to know there is reason for it after all!

6

Re: Beta CobraNet firmware for NION

Jason,

Thanks for your last post. It has forced me to go back and dig deeper into how STP, RSTP and MSTP work. I've got some new things to try. I'll post more on this as soon as I can.
One thing that I was just told early today from a person trying the Beata firmware is that it worked great in a system with three switches but started to fail again when he added two more.
This points to a possible issue with the RSTP BPDU frames themselves as only their quantity would be a meaningful change in that scenario. So he configured his switches to block BPDU frames (EThertype 0x0000) on all the edge ports and the system stopped failing with HF2 errors. There is more to investigate here . Either the presence or frequency of BPDU frames at the CobraNet port is looking like an issue. I have contacted Cirrus about this and they are looking at it. More to come as we find out more. I really appreciate all the great feedback and participation on this topic from everyone.

Nihilism is best done by professionals

7

Re: Beta CobraNet firmware for NION

cobraguy wrote:

One thing that I was just told early today from a person trying the Beata firmware is that it worked great in a system with three switches but started to fail again when he added two more.

Now THAT is interesting. The system modification we did that seemed to be the start of our problems did indeed include the addition of some switches. However I have STP completely disabled. So far when I've monitored the failing n3's Cobranet ports, they are completely clean of non-Cobranet traffic.

Thanks again for your help.

8

Re: Beta CobraNet firmware for NION

tucan wrote:

We frequently use looped networks for redundancy, using HP Procurves and STP.

If the network ports that connect to Nions and CABs are not set correctly then when a STP renegotiation occurs when a backbone network link is removed then you get a broadcast storm that will lock up CAB 8i's and CAB 8o's and often reboot Nions immediately.

Hmm, I am curious about what precisely is required to 'set the network ports correctly'. I have heard that HP only implements one instance of STP for the entire network, as opposed to per-VLAN, which can result loss of connectivity, but a storm I'm not so sure. I have little experience with HP's products and therefore little knowledge of the details of their STP implementations, but correctly configured, STP should prevent exactly what you describe.

9

Re: Beta CobraNet firmware for NION

We use STP on our HP Procurve systems rather than RSTP.

Using 'HP terminology' we set the edge ports in the system to 'FAST' and the backbone ports to 'NORM'. With the default  'NORM' setting on all ports when there is an STP renegotiation some Nions will often reboot immediately.

What this FAST setting does is to exclude these ports from the spanning tree negotiations, so they never have to be monitored for a network loop situation

In the past I have experienced this mostly with Nware versions 1.2.4 and 1.2.5.

10

Re: Beta CobraNet firmware for NION

NIONs rebooting is a different issue for the HF2 errors on the CM-1.  The NION rebooting is when the control Ethernet port is flooded.  We are investigating this as well.  The problem with the HF2 error is the the systems do not automatically restart when this occurs.  Hence, it is a higher priority...  Also, NIONs rebooting seems to be more rare, and only appears to occur when there is a real, persistent, problem on the network.

11

Re: Beta CobraNet firmware for NION

I took a look at HP's 'fast' port setting, it appears to be quite similar to the portfast setting that Cisco uses. Both allow the port to advance to the forwarding state without waiting to transition through the listening and learning states. Now as to why these settings can cause a Nion to reboot is interesting. The documentation states that a port in fast mode still listens for STP messages, BPDU frames, but I wonder if they are transmitted via such ports. If not, I may have a theory. When STP is calculating a topology change, additional BPDUs are sent as compared to when the network is in a stable state. From what Cobraguy is finding, CM-1s may be quite sensitive to BPDUs. These additional BDPUs generated by the topology change may be what is faulting the CM-1 and not a storm.

Do you have access to this network which causes repeatable failures with the various STP port configurations? If so, it would be interesting to hook up a network sniffer and monitor one of the crashing CM-1 ports while trying the fast and normal port modes to see if you are actually experiencing a flood of frames or simply an increased rate of BPDUs during a topology change.

12

Re: Beta CobraNet firmware for NION

Unfortunately this system is on site at the moment and stable. We have enough politics surronding this project that I cant really take it offline without alot of people watching and the site of Nions rebooting by themselves would not do their confidence any good at all.

There may be a chance to try this in a few weeks, i will bear it in mind, as I am fairly confident that i can induce a reboot.

13

Re: Beta CobraNet firmware for NION

Regarding Jason's post #4 again. I have researched this matter more and have changed the text (not yet updated on line) that covers his point to now read:

Many Ethernet switches contain some variant of the Spanning Tree Protocol (STP, RSTP or MSTP) that detects and logically removes loops.  Standard STP should not allow a connection to be made until it is sure that the connection will not cause a loop. MSTP and RSTP can behave a little differently. If a new connection is made through a port that the protocol previously considered to be an ’edge’ port. I.e. can’t be connected to another switch, then the port will be immediately enabled. If this connection is such that it can create a loop, then a data storm can occur. Explicitly setting ports within a managed switch to be ‘edge’ or ‘bridge’ ports, depending on their role, may alleviate this problem.

Unfortunately I do not have a switch with any flavor of STP with which I can test and verify this. Does anyone out there have the ability to test this scenario and verify it before I update the ap note?
I am particularly interested in observing the behavior of bridge and edge ports in the presence of loops when using RSTP and MSTP.

Thanks,
Steve

Nihilism is best done by professionals

14

Re: Beta CobraNet firmware for NION

Is this firmware compatible with Nware 1.2.4 projects?

15

Re: Beta CobraNet firmware for NION

Yes. It is CM1 firmware and is completely compatible with previous versions.

Nihilism is best done by professionals