1

Topic: Looking for HF2 error data - Important

Hello all,

I am trying to help track down and fix the cause of HF2 errors in NION. This is an error condition in which the CM-1 CobraNet module stops responding to the NION host processor and appears dead.
It may be accompanied by a blinking light sequence on the CobraNet Ethernet port of 7,6,2


I am looking for any and all info from the community of the following nature:

1) Do you know of a way to consistently reproduce this error?
2) Under what conditions have you seen this error?
3) If you have seen this error, what do you know that can be done to minimize or eliminate its occurrence?

Thank you in advance for your help

Nihilism is best done by professionals

2

Re: Looking for HF2 error data - Important

Our have this problem.We have 12 NIONs,Evey 3 Nions in a vlan.Between of them use xdab.We have four switchs,all of them enable STP.If STP changed,some of them have this problem. Every times problem not in same Nion.We didn't konw how to eliminate it.If CAB4N included in problem conditions ,it's ok.

3

Re: Looking for HF2 error data - Important

By the way, if NION only LAN port working ,cobranet port not working ,this problem didn't have.If Nion lose xdab ,it also have.

Last edited by zhangye (2008-06-28 05:57:07)

4

Re: Looking for HF2 error data - Important

Thank you zhangye. Can you tell me if you are using Spanning Tree or are you actually using Rapid Spanning Tree?

Nihilism is best done by professionals

5

Re: Looking for HF2 error data - Important

I am also currently experiencing this problem with one of our systems. The system previously worked without issue. This problem started after an upgrade where we added a CAB and two switches.

In response to your questions:

1. I currently have no idea as to the cause, much less how to reproduce it.

2. The fault can occur two times a day, or not for several days in a row. I have made no progress in determining the conditions which cause it.

3. Same as above, I have tried several attempts to make some sort of change in the errors. If not to outright fix it, but perhaps to cause it to move to the other n3 in the system.

The only consistent symptom I have found is that the fault always occurs with the same n3. I've moved bundles around and even replaced the unit altogether to no avail. I currently have swapped that n3's Cobranet connection with another unit to see if the problem moves. If I get feedback showing no change in the problem, my next step is to temporarily install a laptop with a network monitoring program to try and capture all non-Cobranet traffic on the port connected to the faulting n3's Cobranet port.

If you have more specific questions regarding the network, system, etc. I can provide more detail as you require.

6

Re: Looking for HF2 error data - Important

We using MSTP (Multiple spanning tree) in LAN system.When we first get this problem,we thought MSTP casue  the problem .After switch completion STP, the problem also have.So we think MSTP not cause this problem.

7

Re: Looking for HF2 error data - Important

Zhangye,

MSTP is a newer variant of STP that allows separation of spanning tree domains and allows greater efficiency. However, the core protocol used underneath this scheme is RSTP. RSTP has been proven to  be a cause of the HF2 error. If you can, try disabling MSTP and/or RSTP and use standard spanning tree and see if this eliminates HF2 errors. This is (hopefully) a temporary fix. We are working with Cirrus right now to try to get this problem fixed.

Nihilism is best done by professionals

8

Re: Looking for HF2 error data - Important

cobraguy wrote:

RSTP has been proven to  be a cause of the HF2 error.

Really.....? Is this directly due to the increased non-Cobranet traffic received by the CM-1 or indirectly by the reconfiguration of the LAN during reconvergence? Either way, an interesting bit of knowledge, however I have disabled STP in my system so it can't be causing the error in my system.

9

Re: Looking for HF2 error data - Important

We have been try to disabling MSTP,it is also have this problem.Our testing result is when stp changed  it will have this problem.Our use H3C switch. Huawei and 3com combina is H3C.We didn't enable RSTP.We testing have three core switch and one edge switch.Both of core switch have double fibre optic cables.12 Nions connect to the edge switch.The edage swtich have two fibre optic cables connect to two of core switch.

10

Re: Looking for HF2 error data - Important

Reply to #8 an #9

MSTP behaves much like STP and both act much faster than standard STP.  There is very fast convergence which generates more traffic. And from what I have been told by a network guru, the philosophy of STP vs. RSTP is different. In STP, a new connection will not be allowed to become active until STP knows it will not create a loop. In RSTP and MSTP a new connection will be allowed right away and is then taken out if a loop is detected. I have not verified this behavior myself yet and am just relating what I have been told.

In any case, if the problems you are seeing are not STP related then, you must look elsewhere. Can you get any statistics from the switch or use a snooper like Wireshark to identify sources of high burst traffic?

Last edited by cobraguy (2008-07-05 16:47:33)

Nihilism is best done by professionals

11

Re: Looking for HF2 error data - Important

Interesting, I guess M/RSTP actually alters the sequence of modes which a port goes through upon startup, something I'll have to learn more about. Kind of leaves the door open to a loop created upon the activation of a port, but it will be shortlived.

As far as stats for the port related to the device exhibiting failure, I don't believe I have anything useful, yet. I have of course sniffed the port, but due to the frequency which faults occur(sometimes several days between errors) and my current inability to associate the faults with any other events, I haven't seen anything unusual with the network sniffer. I only see Cobranet traffic that I would expect, and some very infrequent CDP frames. If my current tests don't reveal anything useful, my next plan is to grab a company laptop with Wireshark and leave it at the site on a port mirrored with the failing CM-1. If I setup Wireshark's filter to exclude Cobranet traffic, protocol 0x8819, it should be capable of running for several days. I would hope to see something interesting with the same time stamp as a failure.

12

Re: Looking for HF2 error data - Important

cobraguy,
   Did you have Nion testing with STP?How about the result?
We have a testing with STP.We have 12 nions ,every three nions in a vlan.Both of them use xdab. We use 3 switch only enable stp .We also have this problem. Our switch engineer think maybe BPDU package cause this problem.BPDU for STP negotiate between switch.
We also use sniffer software to catch package.Our  catch much Cobranet package,less BPDU package and udp that port is 1234.I think the udp package is pandad send.

13

Re: Looking for HF2 error data - Important

Zhangye,

I have not done any new testing with STP. I have been working with Cirrus to find the root cause of the problem, which is a stack overflow, and correct it. I think we have a fix. Please see the announcements section.
BTW, STP MSTP or RSTP are not the root cause but can contribute to this error. Usually not STP but MSTP and RSTP can contribute to the problem when the net topology changes.

Nihilism is best done by professionals

14

Re: Looking for HF2 error data - Important

cobraguy,
It is our testing result what you said.Tonight we will do a testing  whether close BPDU package can fix it .

15

Re: Looking for HF2 error data - Important

CobraGuy has written a NioNote about this issue, and can be found here;

http://downloads.peavey.com/mm/index.cf … umentation

It will be placed on the public site and incorporated into the help files when it is ready.  But we thought it might be helpful to the conversation to make the draft version available to forum members.

Make it intuitive, never leave them guessing.

16

Re: Looking for HF2 error data - Important

A recent posy in another thread has forced me to go back and dig deeper into how STP, RSTP and MSTP work. I've got some new things to try. I'll post more on this as soon as I can.
One thing that I was just told early today from a person trying the Beta firmware is that it worked great in a system with three switches but started to fail again when he added two more.
This points to a possible issue with the RSTP BPDU frames themselves as only their quantity would be a meaningful change in that scenario. So he configured his switches to block BPDU frames (EThertype 0x0000) on all the edge ports and the system stopped failing with HF2 errors. There is more to investigate here . Either the presence or frequency of BPDU frames at the CobraNet port is looking like an issue. I have contacted Cirrus about this and they are looking at it. More to come as we find out more. I really appreciate all the great feedback and participation on this topic from everyone.

Nihilism is best done by professionals

17

Re: Looking for HF2 error data - Important

Steve, just read the NIONote HF2: fantastic for an old audio guy still trying to catch up on networking finesse!!
More documentation like this would certainly help us avoid unnecessary grief.
Something in a similar vein that pulled the network "specification" out of the depths of the Programmers Reference Guide, and could be handed to network administrators would be great!

PS It's my role in life not to have to learn all about every related field: that's what other specialists are for!

"The single biggest problem in communication is the illusion that it has taken place."
                                                                                        - George Bernard Shaw

18

Re: Looking for HF2 error data - Important

I've been playing around with trying to cause an HF2 error some more by using a little utility I wrote to blast later 2 frames onto the net through a gigabit port, including BPDU frames. So far, using the beta firmware, I have not been able to induce an HF2 error.

So we know that a broadcast storm can cause a problem. We know that using RSTP or MSTP vs STP seems to allow for the problem to occur.
And we know that the new CM-1 beta firmware offers an improvement but does not insure a fix in all cases according to Zhangye.


What we need is a reproducible method of causing the problem to appear.
Can anyone help with this? Does anyone out there have a solidly reproducible way (using a minimum of equipment and the beta firmware) to cause the HF2 error to occur? Zhangye?
Please let me know. I need to be able to consistently reproduce the error and then snoop the net and find out what is going on.

Thanks

Last edited by cobraguy (2008-07-18 16:28:58)

Nihilism is best done by professionals

19

Re: Looking for HF2 error data - Important

Has anyone ever seen the HF-2 error occur in a system that does not contain at least one CAB-4n populated with a CM-2 module?

Nihilism is best done by professionals

20

Re: Looking for HF2 error data - Important

I'm currently in the field for the next month or so, but when I get back I can check the system I'm having problems with. It has a bunch of 4ns, but I'm not sure what's in them.

21

Re: Looking for HF2 error data - Important

cobraguy wrote:

I've been playing around with trying to cause an HF2 error some more by using a little utility I wrote to blast later 2 frames onto the net through a gigabit port, including BPDU frames. So far, using the beta firmware, I have not been able to induce an HF2 error.

If you have your packet generator connected to a switch via a gigabit port and a CM-1 connected to some other port, are you sure that BPDU's generated by your host are forwarded on to the CM-1? BPDUs are not forwarded through a switch in the same manner as other frames. BPDUs are normally processed internally and the switch may then generate its own BPDUs which may in turn be propagated to other ports depending on configuration. Different types of BPDUs also travel is specifc directions with respect to the root switch. If the BPDUs you generate are injected into a port not expecting such a message, for example by definition your generator's port is not a root port and should not receive BPDUs that would normally originate from the root switch, the switch may ignore them completely.

22

Re: Looking for HF2 error data - Important

My switch does not have STP so it wouldn't know what to do with a BPDU anyway.
But to answer your question specifically, I turned on port mirroring and observed the BPDU's being forwarded to the target CM-1.

Nihilism is best done by professionals

23

Re: Looking for HF2 error data - Important

I've returned from an out of town project and had a chance to load the firmware into the project in which I had first experienced the HF-2 errors. It's been a week running the new firmware and so far no problems wink

24

Re: Looking for HF2 error data - Important

Jason,

That's good news. Please let us know after a while if the beta firmware continues to mitigate this problem over time.

Nihilism is best done by professionals

25

Re: Looking for HF2 error data - Important

Looks like I spoke too soon. Two failures within the last four days. Logs show the same error on the exact same unit.