2017-07-28 01:43:30

by Greg Maitz

[permalink] [raw]
Subject: Problem seen with wireless mesh

Hi

I have two wireless products forming a wireless mesh. One (let's call
it A) runs 2.6.37 kernel while the other (let's call it B) runs 3.18
kernel. The wireless mesh based on IEEE80211s runs successfully
(mac80211 module).

I see a problem where the communication between A to B freezes
intermittently, each time the duration is between a minute and two
minutes. The product A is at fault here. During this period, no ping
from a PC to A succeeds. Using a serial terminal to A, I investigated
and found:

1) Top doesn't show anything suspicious when the issue is hit. The
same processes are seen always.

2) The watchdog on A is not triggered when the issue is hit.

3) It's not a firewall issue, had tried clearing all iptables entries.

4) I checked netstat, nothing that gave further clues.

5) Added debug printk's through the mesh code in mac80211, but I'm not
seeing anything with strong evidence so far.

Any tips to debug the issue? Thanks.


2017-07-28 08:29:38

by Stam, Michel

[permalink] [raw]
Subject: RE: Problem seen with wireless mesh

SGVsbG8gR2VvcmdlLA0KDQpEbyB5b3UgaGFwcGVuIHRvIGJlIHJ1bm5pbmcgQXV0aFNBRSBmb3Ig
YXV0aGVudGljYXRlZCBtZXNoPyBJJ3ZlIHNlZW4gdGhpcyBnbyB3cm9uZyB3aXRoIGF0aDlrIGNh
cmRzIHVzaW5nIGhhcmR3YXJlIGVuY3J5cHRpb247IGluIHRoYXQgcGFydGljdWxhciBjYXNlLCB5
b3UgbWF5IHdhbnQgdG8gbG9hZCB0aGUgbW9kdWxlIHdpdGggbm9od2NyeXB0PTEsIGl0IHNob3Vs
ZCB3b3JrIHRoZW4uDQoNCktpbmQgcmVnYXJkcywNCg0KTWljaGVsIFN0YW0NCi0tLS0tT3JpZ2lu
YWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBsaW51eC13aXJlbGVzcy1vd25lckB2Z2VyLmtlcm5lbC5v
cmcgW21haWx0bzpsaW51eC13aXJlbGVzcy1vd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFs
ZiBPZiBHZW9yZ2UgSA0KU2VudDogRnJpZGF5LCBKdWx5IDI4LCAyMDE3IDM6NDMgQU0NClRvOiBs
aW51eC13aXJlbGVzc0B2Z2VyLmtlcm5lbC5vcmcNClN1YmplY3Q6IFByb2JsZW0gc2VlbiB3aXRo
IHdpcmVsZXNzIG1lc2gNCg0KSGkNCg0KSSBoYXZlIHR3byB3aXJlbGVzcyBwcm9kdWN0cyBmb3Jt
aW5nIGEgd2lyZWxlc3MgbWVzaC4gT25lIChsZXQncyBjYWxsDQppdCBBKSBydW5zIDIuNi4zNyBr
ZXJuZWwgd2hpbGUgdGhlIG90aGVyIChsZXQncyBjYWxsIGl0IEIpIHJ1bnMgMy4xOA0Ka2VybmVs
LiBUaGUgd2lyZWxlc3MgbWVzaCBiYXNlZCBvbiBJRUVFODAyMTFzIHJ1bnMgc3VjY2Vzc2Z1bGx5
DQoobWFjODAyMTEgbW9kdWxlKS4NCg0KSSBzZWUgYSBwcm9ibGVtIHdoZXJlIHRoZSBjb21tdW5p
Y2F0aW9uIGJldHdlZW4gQSB0byBCIGZyZWV6ZXMNCmludGVybWl0dGVudGx5LCBlYWNoIHRpbWUg
dGhlIGR1cmF0aW9uIGlzIGJldHdlZW4gYSBtaW51dGUgYW5kIHR3bw0KbWludXRlcy4gVGhlIHBy
b2R1Y3QgQSBpcyBhdCBmYXVsdCBoZXJlLiBEdXJpbmcgdGhpcyBwZXJpb2QsIG5vIHBpbmcNCmZy
b20gYSBQQyB0byBBIHN1Y2NlZWRzLiBVc2luZyBhIHNlcmlhbCB0ZXJtaW5hbCB0byBBLCBJIGlu
dmVzdGlnYXRlZA0KYW5kIGZvdW5kOg0KDQoxKSBUb3AgZG9lc24ndCBzaG93IGFueXRoaW5nIHN1
c3BpY2lvdXMgd2hlbiB0aGUgaXNzdWUgaXMgaGl0LiBUaGUNCnNhbWUgcHJvY2Vzc2VzIGFyZSBz
ZWVuIGFsd2F5cy4NCg0KMikgVGhlIHdhdGNoZG9nIG9uIEEgaXMgbm90IHRyaWdnZXJlZCB3aGVu
IHRoZSBpc3N1ZSBpcyBoaXQuDQoNCjMpIEl0J3Mgbm90IGEgZmlyZXdhbGwgaXNzdWUsIGhhZCB0
cmllZCBjbGVhcmluZyBhbGwgaXB0YWJsZXMgZW50cmllcy4NCg0KNCkgSSBjaGVja2VkIG5ldHN0
YXQsIG5vdGhpbmcgdGhhdCBnYXZlIGZ1cnRoZXIgY2x1ZXMuDQoNCjUpIEFkZGVkIGRlYnVnIHBy
aW50aydzIHRocm91Z2ggdGhlIG1lc2ggY29kZSBpbiBtYWM4MDIxMSwgYnV0IEknbSBub3QNCnNl
ZWluZyBhbnl0aGluZyB3aXRoIHN0cm9uZyBldmlkZW5jZSBzbyBmYXIuDQoNCkFueSB0aXBzIHRv
IGRlYnVnIHRoZSBpc3N1ZT8gVGhhbmtzLg0KDQo=

2017-08-01 04:55:53

by Greg Maitz

[permalink] [raw]
Subject: Re: Problem seen with wireless mesh

Hi Michel,

I didn't have SAE authentication enabled, it was open mode. The latest
logs give some clues. The mesh appears to have gone down and this is
possibly the reason for the ping failure. After a while, the mesh
discovery starts again and restores the mesh. On product B, I see the
dropped_frames_no_route statistic incrementing during the period of
ping failure. I'm investigating why the mesh became inactive.

Thanks for your response.