n3 cm1 failure

**jvalenzuela** · 2007-11-07 03:58:34

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Topic: n3 cm1 failure

In the last two weeks I've started having system failures. The system, comprised of two n3's and a ConMan node, stopped passing audio and the n3 in question required a reboot. Here are the suspect n3's log entries beginning at the time of the error:

10/28/2007 7:40:54 12408 note mcp/processes shutting down gracefully
10/28/2007 7:34:27 12407 note piond/role_manager role is stopped
10/28/2007 7:34:24 12406 note project user logged off: pwadmin
10/28/2007 7:34:24 12405 note project user logged off: etech
10/28/2007 7:34:24 12404 note piond/role_manager role is running
10/28/2007 7:34:24 12403 fault piond/fault_policy more than one error in less than one minute; stopping engine
10/28/2007 7:34:20 12402 note project user logged on: pwadmin
10/28/2007 7:34:20 12401 note project user logged on: etech
10/28/2007 7:33:58 12400 error piond/cm1 cm1 not detected : /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:58 12399 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:57 12398 note piond/role_manager restarting role : USF SS new/DSP-01/JFb7-bfKY0Jcl7tj8I3pKUZ6uV8/xkD1T16dBJ-l9g11zxr5QbowFDS
10/28/2007 7:33:55 12397 note project user logged off: pwadmin
10/28/2007 7:33:55 12396 note project user logged off: etech
10/28/2007 7:33:55 12395 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12394 error piond/cm1 mute assertion failed: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12393 error piond/cm1 poke aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12392 error piond/cm1 poke/peek driver exception : /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12391 note piond/mute muted: menu command
10/28/2007 7:33:55 12390 error piond/fault_policy restarting audio engine
10/28/2007 7:33:55 12389 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high

The graceful shutdown was initiated by the technicians to bring the system back online.

As mentioned above, this has happened at least twice. Same errors with the same n3, about 10 days apart. The other n3's logs just contain the normal complaints about losing XDAB at the same time the above happened. Its log entries are as follows:

10/28/2007 7:40:55 3889 note mcp/processes shutting down gracefully
10/28/2007 7:40:37 3888 note project user logged off: pwadmin
10/28/2007 7:34:04 3887 note piond/xdab/leader arbitration done; ring is incomplete in redundant failed mode
10/28/2007 7:33:55 3886 error piond/xdab/leader communication failure
10/28/2007 7:33:54 3885 note piond/xdab/leader poll returned false: 'DSP-01'
10/28/2007 7:33:54 3884 note piond/mute muted: xdab loss of clock signal

**cwa** · 2007-11-07 07:38:38

cwa
Peavey Technical Director
Offline

From: Boulder, Colorado USA
Registered: 2007-08-13
Posts: 193

Re: n3 cm1 failure

The "timeout waiting for HF2 to go high" indicates that the CM-1 CobraNet interface has crashed. CM-1's can crash when they receive too much Ethernet traffic. Typically, this happens when there is an Ethernet "storm". Storms occur when there is a loop on the Ethernet network.

Other possible causes are too much broadcast traffic on the network. We've also seen this happen with certain fast spanning tree network configurations. Finally, any port bandwidth throttling (sometimes called "storm control") can cause problems as well.

So, check for any changes on your network. By the way, Cirrus (keeper of all things CobraNet) is aware of this problem. They may have suggestions as well.

**jvalenzuela** · 2007-11-07 13:01:13

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

Well the n3 that is crashing is the only n3 that is receiving Cobranet bundles, so as far as inbound traffic, it definitely handles the most. But its still only about nine bundles at three channels each.

The possibility of a loop is I guess there, however remote. I guess I could go through and disable all unused ports just to make sure no one is making patches.

I am not using RSTP.

As for Cirrus, do you know any way to contact them? As an end user, I've never been able to get a response from them on the few occasions that I've tried.

I did recently add two switches, a ConMan node and a CAB recently to the network, but the Cobranet VLAN is completely isolated, even from the Nion control network. I guess I can take a look at that Nion's port with a network sniffer, but I'm pretty sure I'm only going to find Cobranet frames. With the exception of the ConMan's second NIC for CAB control, there are no non-Cobranet devices on this VLAN.

**jvalenzuela** · 2007-11-07 13:15:37

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

Hmm, now that you put me on to this inbound traffic problem I think I might have an idea. That n3 is actually receiving nine bundles. All the bundles are from CAB 4ns, but I'm only picking off 3 channels from most of them, however the CABs are still sending four, which makes for a total of 36 inbound channels.

**ivorr** · 2007-11-11 10:13:12

ivorr
DSP Geek
Offline

From: West Des Moines, IA
Registered: 2007-08-17
Posts: 56

Re: n3 cm1 failure

I think you got it, Jason. Even though you're only using 3 out of 4 channels in a bundle, it still has to receive all 4 channels as the Conductor allocates enough bandwidth for all 4. We are working on advanced subchannel mapping in the CAB4n so you could send a 3 channel bundle so stay tuned. Bear in mind that the CM-2 module in the CAB4n only has 4 transmitters to work with however.

I have seen situations where the CM-1 gets "pounded" by Crest CKi amps with Cobranet inputs causing it to crash. Crest has new firmware to correct that. In the mean time, we have found that moving the Conductor to a CAB takes enough load off the CM-1 for it to function correctly.

The only true wisdom is in knowing you know nothing. -Socrates

**Joe Kurta** · 2007-11-11 16:20:06

Joe Kurta
Member
Offline

Registered: 2007-08-17
Posts: 1,179

Re: n3 cm1 failure

Jason,

You mentioned... "With the exception of the ConMan's second NIC for CAB control, there are no non-Cobranet devices on this VLAN"

...out of curiousity, what type of control are you doing with the CABs from this NIC?

Thanks,

Joe

**jvalenzuela** · 2007-11-12 16:09:37

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

To Ivor: I split the receive bundles across the two n3's last week. So far no problems. From what cwa said about too much incoming traffic I'm pretty positive that was the problem. My stupid mistake.

To Joe: One of the reasons I added a ConMan node was to manage the CAB devices along with the scripts. Leaving the n3s for only audio stuff. Normal CAB control traffic is carried in a Cobranet frame(Ethernet type 0x8819), which of course must be on the same LAN as Cobranet. The Conman node has to have a NIC connected to the control LAN, which has the normal NioNode control traffic, but is isolated from the Cobranet network. A second NIC was added to give the Conman node access to both LANs, which now allows Conman to communicate normally with the NioNodes and communicate natively with the CABs. As a side note, I did find out when I added the second NIC, which was an Intel, that it is capable of being 802.1q(VLAN) aware. One could simply use one of these physical network interfaces to connect to many VLANs using a trunked switch port.

**Joe Kurta** · 2007-11-13 16:26:46

Joe Kurta
Member
Offline

Registered: 2007-08-17
Posts: 1,179

Re: n3 cm1 failure

Right, I was curious if you were doing SNMP control of the CABs or something else. Thanks for the details.

Thanks,

Joe

**jvalenzuela** · 2007-11-13 16:56:17

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

I looked into the SNMP initially until I found out that ConMan can do native control. As you probably know, the catch with SNMP control is that it requires IP. It seems that you use BootP to give the CABs IP addresses, then use SNMP CABs from there. However there are some hoops to jump through because BootP won't always give the same IP address to the same MAC. Some folks from Peak Audio gave me the run down on making that association with various(highly undocumented) devices. It would probably work, but comes with a lot of headaches, including requiring the maintenance staff to be aware of MAC address changes when replacing equipment.

**jvalenzuela** · 2007-11-25 16:14:20

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

It seems that this problem has not been fixed after all. I had changed the program so that the failing n3 is now only receiving 20 channels, five bundles from various CAB 4n's at four channels each. Since then we have recently experienced two new failures, four days apart. The logs show the same problem as shown in the original post, with the same NioNode. We do have a spare n3 if the NioNode is suspect.

As a side note, this problem seems to have cropped up after a system modification we recently completed, during which we added a single CAB 4n, two switches and a ConMan node. Before the change I have logs for at least a year that contain no such error. That new CAB is used for output only CAB. No additional CobraNet input bundles were added to the NioNodes during this modification, which is interesting because the original system had been running with the 'overloaded' CM1. In fact the NioNodes were also handling the CAB control traffic at that time.

**weidong** · 2007-12-10 15:45:40

weidong
Member
Offline

Registered: 2007-12-09
Posts: 12

Re: n3 cm1 failure

Hi jvalenzuela, now I get the same trouble just like you have, in my system I use Crest Ci20x8 amplifiers, and I have CAB4n and ConMan too, I use 12 Nion to handle audio process, and use ConMan to handle control script. After I did that, I got this error everyday, that's nightmare. I'm afraid that Ci20x8 make a lot of network traffic, did you solved this problem now?

love peace and music
Tibet is one part of China!!!

**jvalenzuela** · 2007-12-10 21:55:02

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

Unfortunately I have not yet solved this problem, although my failures seem to be much more infrequent than yours. I am interested to know exactly what type of traffic these amplifiers produce in order to see if/how it could affect a NioNode. It is quite possible on a switched network, depending on the traffic type, that a NioNode would never see this traffic on its Cobranet interfaces.

**weidong** · 2008-01-22 08:30:40

weidong
Member
Offline

Registered: 2007-12-09
Posts: 12

Re: n3 cm1 failure

I think network broadcast storm will cause CM-1 crash, and we did some tests with network engineer, our network switch has MSTP application, and if switch restart, the root will rebuild, and there must have network storm. So broadcast storm one reason that will cause CM-1 crash.

love peace and music
Tibet is one part of China!!!

**jvalenzuela** · 2008-01-22 16:43:20

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

I'm not sure what 'MSTP' is, perhaps you mean 'STP' as in spanning tree protocol. With STP if have a switch go down, STP will not cause a loop and associated storm(s) while the network converges around a new topology. STP's whole purpose in life is to prevent network loops and the general unhappiness they bring.

**weidong** · 2008-01-26 08:50:49

weidong
Member
Offline

Registered: 2007-12-09
Posts: 12

Re: n3 cm1 failure

Yes, MSTP means 'Multiple Spanning Tree Protocol'.

love peace and music
Tibet is one part of China!!!

**James Kennedy** · 2008-05-12 15:37:02

James Kennedy
Operations Manager - EMEA
Offline

From: Peavey - Corby
Registered: 2007-09-03
Posts: 63

Re: n3 cm1 failure

Jason, did you ever get to the bottom of this? Did the problem mysteriously disappear?

This CM1/Piond crash is still popping up fairly frequently(Speaking for UK and Europe of course). Just trying to establish a pattern right now.

All energy flows according to the whims of the Great Magnet. What a fool I was to defy him.

**jvalenzuela** · 2008-05-12 15:46:10

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

Nope, I'm still working down the list of things that changed since the problem started. Since the failure tends to be rather infrequent, one or two failures a week(sometimes more, sometimes less), the process is a little slow. As soon as I get a clue, I'll be sure and post my findings.....

**Matt-H** · 2010-08-10 10:22:11

Matt-H
Member
Offline

Registered: 2007-11-17
Posts: 10

Re: n3 cm1 failure

Jason, did you ever get to the bottom of this, I am getting similar errors an a large system.

**jvalenzuela** · 2010-08-13 00:37:26

jvalenzuela
Initech programmer
Offline

From: Orlando, FL
Registered: 2007-10-10
Posts: 223

Re: n3 cm1 failure

Unfortunately not. Last time I visited the site, they had installed a watchdog monitor to reboot the Nions when a failure was detected.

n3 cm1 failure

Posts: 19

1 Topic by jvalenzuela 2007-11-07 03:58:34

Topic: n3 cm1 failure

2 Reply by cwa 2007-11-07 07:38:38

Re: n3 cm1 failure

3 Reply by jvalenzuela 2007-11-07 13:01:13

Re: n3 cm1 failure

4 Reply by jvalenzuela 2007-11-07 13:15:37

Re: n3 cm1 failure

5 Reply by ivorr 2007-11-11 10:13:12

Re: n3 cm1 failure

6 Reply by Joe Kurta 2007-11-11 16:20:06

Re: n3 cm1 failure

7 Reply by jvalenzuela 2007-11-12 16:09:37

Re: n3 cm1 failure

8 Reply by Joe Kurta 2007-11-13 16:26:46

Re: n3 cm1 failure

9 Reply by jvalenzuela 2007-11-13 16:56:17

Re: n3 cm1 failure

10 Reply by jvalenzuela 2007-11-25 16:14:20

Re: n3 cm1 failure

11 Reply by weidong 2007-12-10 15:45:40

Re: n3 cm1 failure

12 Reply by jvalenzuela 2007-12-10 21:55:02

Re: n3 cm1 failure

13 Reply by weidong 2008-01-22 08:30:40

Re: n3 cm1 failure

14 Reply by jvalenzuela 2008-01-22 16:43:20

Re: n3 cm1 failure

15 Reply by weidong 2008-01-26 08:50:49

Re: n3 cm1 failure

16 Reply by James Kennedy 2008-05-12 15:37:02

Re: n3 cm1 failure

17 Reply by jvalenzuela 2008-05-12 15:46:10

Re: n3 cm1 failure

18 Reply by Matt-H 2010-08-10 10:22:11

Re: n3 cm1 failure

19 Reply by jvalenzuela 2010-08-13 00:37:26

Re: n3 cm1 failure

Posts: 19