2012-11-23 10:30:24

by Felix Liao

[permalink] [raw]
Subject: Badness at net/mac80211/offchannel.c:264

Hi Johannesh,

I'm Felix Liao from WatchGuard, we encountered a warning call trace when update the compat-wireless driver to the latest stable version 3.6.6-1, but it always dump the call trace at line 264 of offchannel.c:
[ 82.205559] ------------[ cut here ]------------
[ 82.210188] Badness at /opt/compat-wireless-3.6.6-1/net/mac80211/offchannel.c:264
[ 82.210203] NIP: e69df0c0 LR: e69df298 CTR: 00000001
[ 82.210215] REGS: d9761c30 TRAP: 0700 Tainted: P (2.6.35.12)
[ 82.210225] MSR: 00029000 <EE,ME,CE> CR: 20222484 XER: 20000000
[ 82.210248] TASK = d972cdb0[1578] 'hostapd' THREAD: d9760000
[ 82.210257] GPR00: 00000001 d9761ce0 d972cdb0 d95b0a80 00007b2a ffffffff d9761c3e 00007b29
[ 82.210280] GPR08: 00007af0 d95b1358 00000039 00000000 80222484 100885c4 00000000 10059568
[ 82.210304] GPR16: 00000000 00000000 00000000 10081334 df9f7500 e6a12ad0 e6a12a20 00000001
[ 82.210327] GPR24: d95b1358 d95b1358 dfb83300 d95b1148 d95b1358 d9737480 d95b0a80 d9737480
[ 82.210429] NIP [e69df0c0] ieee80211_start_next_roc+0x38/0x198 [mac80211]
[ 82.210462] LR [e69df298] ieee80211_roc_purge+0x78/0x248 [mac80211]
[ 82.210471] Call Trace:
[ 82.210591] [d9761ce0] [00000001] 0x1 (unreliable)
[ 82.210632] [d9761d00] [e69df298] ieee80211_roc_purge+0x78/0x248 [mac80211]
[ 82.210672] [d9761d50] [e69e5cd0] ieee80211_do_stop+0xb8/0x60c [mac80211]
[ 82.210708] [d9761d80] [e69e623c] ieee80211_stop+0x18/0x2c [mac80211]
[ 82.210734] [d9761d90] [c02d8ec4] __dev_close+0x8c/0xe0
[ 82.210752] [d9761db0] [c02d5b24] __dev_change_flags+0x138/0x190
[ 82.210768] [d9761dd0] [c02d8ca4] dev_change_flags+0x24/0x74
[ 82.210785] [d9761df0] [c0338480] devinet_ioctl+0x7a0/0x8e8
[ 82.210801] [d9761e60] [c0339d58] inet_ioctl+0xcc/0xf8
[ 82.210818] [d9761e70] [c02c39f8] sock_ioctl+0x7c/0x330
[ 82.210842] [d9761e90] [c00c3bb8] vfs_ioctl+0x34/0x8c
[ 82.210858] [d9761ea0] [c00c3ddc] do_vfs_ioctl+0x94/0x7e0
[ 82.210874] [d9761f10] [c00c4578] sys_ioctl+0x50/0x94
[ 82.210895] [d9761f40] [c0010e78] ret_from_syscall+0x0/0x3c
[ 82.210933] --- Exception: c01 at 0xfbfaaf4
[ 82.210938] LR = 0xfc85bcc
[ 82.210945] Instruction dump:
[ 82.210954] 7c691b78 93c10018 7c7e1b78 90010024 9361000c 93810010 93a10014 93e1001c
[ 82.210980] 87e908d8 7f9f4800 419e0060 881f0048 <0f000000> 2f800000 409e002c 81230058

the device is running as AP and the driver I used is ath9k, that is, the local->ops->remain_on_channel is null to use, besides, one important point is I use the ACS(http://linuxwireless.org/en/users/Documentation/acs) patch in the hostapd.

One feature of this issue is it only happens when hostapd start, but in the hostapd running time, it doesn't happen again(yes, I replace the WARN_ON_ONCE with WARN_ON in line 264 to get this result). Another feature is after I disable the ACS function, this issue doesn't happen again.

Let me present the process in my mind, when I start hostapd in the shell, it begin to run the ACS functions, which will send the REMAIN_ON_CHANNEL command(see hostapd_drv_remain_on_channel in hostapd source) to cfg80211, and then will call the function ieee80211_remain_on_channel() in mac80211, which will call the function ieee80211_start_roc_work(), which will queue some works ieee80211_sw_roc_work() on the local->workqueue. The work duration 60 is set by hostapd_config_acs_defaults() provided by ACS patch.

But on the other side, the hostapd also send SIOCSIFFLAGS command(see linux_set_iface_flags in hostapd source) to net core device, which will call the function dev_change_flags() to close the device, the call trace is dev_change_flags->ieee80211_stop->ieee80211_do_stop->ieee80211_roc_purge.

I verify that the roc->started was set to true in ieee80211_sw_roc_work:368, the value of roc->started change road is(hostapd and phy0 are gived by current->comm):

hostapd->ieee80211_start_roc_work[2146] local->roc_list: <empty>
hostapd->ieee80211_start_roc_work[2300] local->roc_list: <1>
roc d90e7480 started 0 abrt 0 hw_beg 0 notified 0 hw_start 0 dur 60(60)
phy0->ieee80211_sw_roc_work[333] local->roc_list: <1>
roc d90e7480 started 0 abrt 0 hw_beg 0 notified 0 hw_start 0 dur 60(60)
phy0->ieee80211_sw_roc_work[392] local->roc_list: <1>
roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)
hostapd->ieee80211_roc_purge[465] local->roc_list: <1>
roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)
hostapd->ieee80211_start_next_roc[264] roc d90e7480 cause call trace!
hostapd->ieee80211_roc_purge[480] local->roc_list: <1>
roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)

I suspect the root cause of this issue is after ieee80211_sw_roc_work set roc->start to true and queue it again, the hostapd just call linux_set_iface_flags right now. I just increase the delay time to magnify the probability of this issue, then it is more easier to reproduce this issue. So I think we should schedule the work as soon as possible, and should set the delay time to zero, that is,

--- a/net/mac80211/offchannel.c
+++ b/net/mac80211/offchannel.c
@@ -367,7 +367,7 @@ void ieee80211_sw_roc_work(struct work_struct *work)

roc->started = true;
- ieee80211_queue_delayed_work(&local->hw, &roc->work,
- msecs_to_jiffies(roc->duration));
+ ieee80211_queue_delayed_work(&local->hw, &roc->work, 0);
} else {
/* finish this ROC */
finish:

this issue is hardly to reproduce with this fix, but since ieee80211_roc_purge() and ieee80211_sw_roc_work are asynchronous, it still happen sometimes. So I take a try to delete the roc from list after queue to avoid the others access it by list, this change requires some other fix,

At last, to be save I think we should use a static tmp_list or a global tmp_list to save the abort roc in ieee80211_roc_purge.

--------------
the complete patch to fix this issue as below:

$ git diff
diff --git a/net/mac80211/offchannel.c b/net/mac80211/offchannel.c
index 83608ac..0a9bb58 100644
--- a/net/mac80211/offchannel.c
+++ b/net/mac80211/offchannel.c
@@ -336,14 +336,6 @@ void ieee80211_sw_roc_work(struct work_struct *work)
if (roc->abort)
goto finish;

- if (WARN_ON(list_empty(&local->roc_list)))
- goto out_unlock;
-
- if (WARN_ON(roc != list_first_entry(&local->roc_list,
- struct ieee80211_roc_work,
- list)))
- goto out_unlock;
-
if (!roc->started) {
struct ieee80211_roc_work *dep;

@@ -361,17 +353,18 @@ void ieee80211_sw_roc_work(struct work_struct *work)
list_for_each_entry(dep, &roc->dependents, list)
ieee80211_handle_roc_started(dep);

+ list_del(&roc->list); /* delete from the local->roc_list */
/* if it was pure TX, just finish right away */
if (!roc->duration)
goto finish;

roc->started = true;
- ieee80211_queue_delayed_work(&local->hw, &roc->work,
- msecs_to_jiffies(roc->duration));
+ ieee80211_queue_delayed_work(&local->hw, &roc->work, 0);
} else {
/* finish this ROC */
finish:
- list_del(&roc->list);
+ if (roc->abort)
+ list_del(&roc->list); /* delete from the tmp_list */
started = roc->started;
ieee80211_roc_notify_destroy(roc);

@@ -443,7 +436,7 @@ void ieee80211_roc_purge(struct ieee80211_sub_if_data *sdata)
{
struct ieee80211_local *local = sdata->local;
struct ieee80211_roc_work *roc, *tmp;
- LIST_HEAD(tmp_list);
+ static LIST_HEAD(tmp_list);

mutex_lock(&local->mtx);
list_for_each_entry_safe(roc, tmp, &local->roc_list, list) {

--------------
I had test this patch using the same testcase above 10 times, it runs very well and never dump the warning calltrace, can you give a review for this patch? If ok, how can I submit this patch into the repository?

Thanks,
- Felix



2012-11-24 01:08:33

by Felix Liao

[permalink] [raw]
Subject: RE: Badness at net/mac80211/offchannel.c:264

SGkgSm9oYW5uZXMsDQpUaGFua3MgZm9yIHlvdXIgaGVscCwgSSBqdXN0IG1ha2Ugc29tZSB0ZXN0
cyB3aXRoIHRoZSBwYXRjaCB5b3UgcHJvdmlkZWQuIA0KWWVzLCB0aGVyZSBpcyBub25lIHdhcm5p
bmcgY2FsbHRyYWNlIGFnYWluLCBhbmQgbm9uZSBvdGhlciBjYWxsdHJhY2UgdG9vLg0KDQpCdXQg
Zm9yIHRoZSB0bXBfbGlzdCwgd2hlbiB3ZSBhYm9ydCBhIHJvYyBhbmQgYWRkIHRvIHRoZSB0YWls
IG9mIHRtcF9saXN0LCANCldlIHdpbGwgZGVsZXRlIGl0IGZyb20gdG1wX2xpc3QgaW4gdGhlIHdv
cmsgYXN5bmMsIHdlIG5lZWQgdG8gbW9kaWZ5IHRoZSANCnByZXYgYW5kIG5leHQgcG9pbnRlciBv
ZiB0aGUgaGVhZCBvZiB0bXBfbGlzdCwgSSB0aGluayB0aGV5IGFyZSBmcmVlZCANCmZyb20gdGhl
IHN0YWNrIG9mIGZ1bmN0aW9uIGllZWU4MDIxMV9yb2NfcHVyZ2UoKSwgaXQgaXMgZGFuZ2Vyb3Vz
Lg0KDQpGZWxpeCBMaWFvDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBKb2hh
bm5lcyBCZXJnIFttYWlsdG86am9oYW5uZXNAc2lwc29sdXRpb25zLm5ldF0gDQpTZW50OiBTYXR1
cmRheSwgTm92ZW1iZXIgMjQsIDIwMTIgNzoyMCBBTQ0KVG86IEZlbGl4IExpYW8NCkNjOiBsaW51
eC13aXJlbGVzc0B2Z2VyLmtlcm5lbC5vcmc7IE9jZWFuIFN1OyBKb2UgUWlhbzsgQnJ5YW4gUGhp
bGxpcHBlDQpTdWJqZWN0OiBSZTogQmFkbmVzcyBhdCBuZXQvbWFjODAyMTEvb2ZmY2hhbm5lbC5j
OjI2NA0KDQpIaSBGZWxpeCwNCg0KVGhhbmtzIGZvciB0aGUgcmVwb3J0IGFuZCB0aGUgZGV0YWls
ZWQgZGVzY3JpcHRpb24gYW5kIGFuYWx5c2lzLg0KDQo+IGhvc3RhcGQtPmllZWU4MDIxMV9zdGFy
dF9yb2Nfd29ya1syMTQ2XSBsb2NhbC0+cm9jX2xpc3Q6IDxlbXB0eT4gDQo+IGhvc3RhcGQtPmll
ZWU4MDIxMV9zdGFydF9yb2Nfd29ya1syMzAwXSBsb2NhbC0+cm9jX2xpc3Q6IDwxPg0KPiAgICAg
cm9jIGQ5MGU3NDgwIHN0YXJ0ZWQgMCBhYnJ0IDAgaHdfYmVnIDAgbm90aWZpZWQgMCBod19zdGFy
dCAwIGR1ciANCj4gNjAoNjApDQo+IHBoeTAtPmllZWU4MDIxMV9zd19yb2Nfd29ya1szMzNdIGxv
Y2FsLT5yb2NfbGlzdDogPDE+DQo+ICAgICByb2MgZDkwZTc0ODAgc3RhcnRlZCAwIGFicnQgMCBo
d19iZWcgMCBub3RpZmllZCAwIGh3X3N0YXJ0IDAgZHVyIA0KPiA2MCg2MCkNCj4gcGh5MC0+aWVl
ZTgwMjExX3N3X3JvY193b3JrWzM5Ml0gbG9jYWwtPnJvY19saXN0OiA8MT4NCj4gICAgIHJvYyBk
OTBlNzQ4MCBzdGFydGVkIDEgYWJydCAwIGh3X2JlZyAwIG5vdGlmaWVkIDEgaHdfc3RhcnQgMCBk
dXIgDQo+IDYwKDYwKQ0KPiBob3N0YXBkLT5pZWVlODAyMTFfcm9jX3B1cmdlWzQ2NV0gbG9jYWwt
PnJvY19saXN0OiA8MT4NCj4gICAgIHJvYyBkOTBlNzQ4MCBzdGFydGVkIDEgYWJydCAwIGh3X2Jl
ZyAwIG5vdGlmaWVkIDEgaHdfc3RhcnQgMCBkdXIgDQo+IDYwKDYwKQ0KPiBob3N0YXBkLT5pZWVl
ODAyMTFfc3RhcnRfbmV4dF9yb2NbMjY0XSByb2MgZDkwZTc0ODAgY2F1c2UgY2FsbCB0cmFjZSEN
Cj4gaG9zdGFwZC0+aWVlZTgwMjExX3JvY19wdXJnZVs0ODBdIGxvY2FsLT5yb2NfbGlzdDogPDE+
DQo+ICAgICByb2MgZDkwZTc0ODAgc3RhcnRlZCAxIGFicnQgMCBod19iZWcgMCBub3RpZmllZCAx
IGh3X3N0YXJ0IDAgZHVyIA0KPiA2MCg2MCkNCg0KSSB0aGluayBpdCB3b3VsZCBoYXZlIGJlZW4g
aW5zdHJ1bWVudGFsIGhlcmUgdG8gcHJpbnQgb3V0IHRoZSBkaWZmZXJlbnQgaW50ZXJmYWNlcy4g
SXQgc2VlbXMgdG8gbWUgdGhhdCB0aGUgcm9jX3B1cmdlKCkgaXMgY2FsbGVkIGZvciBhIGRpZmZl
cmVudCBpbnRlcmZhY2UsIGFuZCBkb2Vzbid0IGFjdHVhbGx5IHJlbW92ZSBhbnkgUk9DIHdvcmsg
aXRlbSBhdCBhbGwuDQoNCj4gSSBzdXNwZWN0IHRoZSByb290IGNhdXNlIG9mIHRoaXMgaXNzdWUg
aXMgYWZ0ZXIgaWVlZTgwMjExX3N3X3JvY193b3JrIA0KPiBzZXQgcm9jLT5zdGFydCB0byB0cnVl
IGFuZCBxdWV1ZSBpdCBhZ2FpbiwgdGhlIGhvc3RhcGQganVzdCBjYWxsIA0KPiBsaW51eF9zZXRf
aWZhY2VfZmxhZ3MgcmlnaHQgbm93Lg0KDQpZZXMgYW5kIG5vLCBob3N0YXBkIGlzIGNoYW5naW5n
IGFuIGludGVyZmFjZSB0byBiZSBkb3duLCBidXQgaXQgd291bGQgYXBwZWFyIHRvIGJlIHNvbWUg
b3RoZXIgaW50ZXJmYWNlLCBub3QgdGhlIG9uZSB0aGF0J3MgZXhlY3V0aW5nIHRoZSBST0MuDQoN
Cj4gICBJIGp1c3QgaW5jcmVhc2UgdGhlIGRlbGF5IHRpbWUgdG8gbWFnbmlmeSB0aGUgcHJvYmFi
aWxpdHkgb2YgdGhpcyANCj4gaXNzdWUsIHRoZW4gaXQgaXMgbW9yZSBlYXNpZXIgdG8gcmVwcm9k
dWNlIHRoaXMgaXNzdWUuDQoNCk1ha2VzIHNlbnNlLg0KDQo+IFNvIEkgdGhpbmsgd2Ugc2hvdWxk
IHNjaGVkdWxlIHRoZSB3b3JrIGFzIHNvb24gYXMgcG9zc2libGUsIGFuZCBzaG91bGQgDQo+IHNl
dCB0aGUgZGVsYXkgdGltZSB0byB6ZXJvLCB0aGF0IGlzLA0KDQo+IEBAIC0zNjcsNyArMzY3LDcg
QEAgdm9pZCBpZWVlODAyMTFfc3dfcm9jX3dvcmsoc3RydWN0IHdvcmtfc3RydWN0IA0KPiAqd29y
aykNCj4gIA0KPiAgICAgICAgICAgICAgICAgcm9jLT5zdGFydGVkID0gdHJ1ZTsNCj4gLSAgICAg
ICAgICAgICAgIGllZWU4MDIxMV9xdWV1ZV9kZWxheWVkX3dvcmsoJmxvY2FsLT5odywgJnJvYy0+
d29yaywNCj4gLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbXNl
Y3NfdG9famlmZmllcyhyb2MtPmR1cmF0aW9uKSk7DQo+ICsgICAgICAgICAgICAgICBpZWVlODAy
MTFfcXVldWVfZGVsYXllZF93b3JrKCZsb2NhbC0+aHcsICZyb2MtPndvcmssIA0KPiArIDApOw0K
DQpkZWZpbml0ZWx5IG5vdCwgbm8sIHRoaXMgd291bGQgZGVmZWF0IHRoZSBwdXJwb3NlIG9mIHRo
ZSBST0MsIHRvIHN0YXkgb24gdGhlIGdpdmVuIGNoYW5uZWwgZm9yIHRoZSBnaXZlbiB0aW1lIC0t
IG5vdyB5b3UndmUgY3V0IHRoZSB0aW1lIHRvIHplcm8gd2hpY2ggaXNuJ3Qgd2hhdCB3ZSB3YW50
IGF0IGFsbC4NCg0KPiBBdCBsYXN0LCB0byBiZSBzYXZlIEkgdGhpbmsgd2Ugc2hvdWxkIHVzZSBh
IHN0YXRpYyB0bXBfbGlzdCBvciBhIA0KPiBnbG9iYWwgdG1wX2xpc3QgdG8gc2F2ZSB0aGUgYWJv
cnQgcm9jIGluIGllZWU4MDIxMV9yb2NfcHVyZ2UuDQoNCk5vLCB0aGF0IHRvdGFsbHkgZG9lc24n
dCBzZWVtIHJpZ2h0LCBhIHN0YXRpYyBsaXN0IGhlcmUgd291bGQgY29tcGxldGVseSBicmVhayBp
ZiB0aGUgZnVuY3Rpb24gd2FzIGNhbGxlZCBvbiB0d28gQ1BVcyBhdCB0aGUgc2FtZSB0aW1lICh0
aGlzIGNhbid0IGhhcHBlbiByaWdodCBub3csIGJ1dCBzdGlsbCkNCg0KDQo+IEkgaGFkIHRlc3Qg
dGhpcyBwYXRjaCB1c2luZyB0aGUgc2FtZSB0ZXN0Y2FzZSBhYm92ZSAxMCB0aW1lcywgaXQgcnVu
cyANCj4gdmVyeSB3ZWxsIGFuZCBuZXZlciBkdW1wIHRoZSB3YXJuaW5nIGNhbGx0cmFjZSwNCg0K
SSBjYW4gYmVsaWV2ZSB0aGF0LCBidXQgaXQgdG90YWxseSBkZWZlYXRzIHRoZSBwdXJwb3NlIG9m
IHRoZSBBQ1MgcGF0Y2ggeW91J3JlIHVzaW5nIDotKQ0KDQpJIHRoaW5rIHRoZSByb290IGNhdXNl
IG9mIHRoZSBpc3N1ZSBpcyB0aGF0IGluIGllZWU4MDIxMV9yb2NfcHVyZ2UoKSB3ZSB1bmNvbmRp
dGlvbmFsbHkgY2FsbCBpZWVlODAyMTFfc3RhcnRfbmV4dF9yb2MoKSwgZXZlbiBpZiBub3RoaW5n
IHdhcyByZW1vdmVkLg0KDQpTbyBJIHRoaW5rIHdoYXQgaGFwcGVucyBvdmVyYWxsIGlzIHRoaXM6
DQoNCiogcm9jIGlzIGFkZGVkIG9uLCBlLmcuLCB3bGFuMA0KKiByb2Mgc3RhcnRzIHJ1bm5pbmcg
KHJvYy0+c3RhcnRlZCBiZWNvbWVzIHRydWUpDQoqIGhvc3RhcGQgZGVjaWRlcyB0byB0dXJuIG9m
ZiBzb21lIG90aGVyIGludGVyZmFjZSAoc2F5IG1vbmkud2xhbjA/KQ0KKiB0aGVyZWZvcmUsIGll
ZWU4MDIxMV9yb2NfcHVyZ2UoKSBpcyBjYWxsZWQgX2ZvciB0aGF0IG90aGVyIGludGVyZmFjZV8N
CiogdGhlIGxvb3AgaW4gaWVlZTgwMjExX3JvY19wdXJnZSgpIGRvZXMgbm90aGluZyBzaW5jZSB0
aGVyZSdzIG5vIHJvYw0KICBmb3IgdGhlIG90aGVyIGludGVyZmFjZSBhbmQgdGhlIG9yaWdpbmFs
IHJvYyBzdGF5cyBhdCB0aGUgZnJvbnQgb2YgdGhlDQogIGxpc3QNCiogd2UgY2FsbCBpZWVlODAy
MTFfc3RhcnRfbmV4dF9yb2MoKSBhbmQgaXQgKGNvcnJlY3RseSkgd2FybnMNCg0KU28gdGhlIGZp
cnN0IHRoaW5nIHRoYXQgY29tZXMgdG8gbWluZCBpcyB0aGF0IHdlIG11c3Qgbm90IGNhbGwNCmll
ZWU4MDIxMV9zdGFydF9uZXh0X3JvYygpIHdoZW4gdGhlcmUgd2FzIG5vIFJPQyBpdGVtIHJlbW92
ZWQgZnJvbSB0aGUgbGlzdC4gSG93ZXZlciwgaXQgdHVybnMgb3V0IHdlIGNhbiBjb21wbGV0ZWx5
IHJlbW92ZSB0aGUgY2FsbCwgc2luY2UgdGhlIHNlY29uZCBsb29wIChvdmVyIHRoZSB0bXBfbGlz
dCkgd2lsbCBhY3R1YWxseSBjbGVhciBhbGwgdGhlIFJPQyBpdGVtcywgYW5kIHJ1biBpZWVlODAy
MTFfc3dfcm9jX3dvcmsoKSBmb3IgZWFjaCBpdGVtLiBJZiB0aGUgUk9DIHdhcyBzdGFydGVkLCB0
aGVuIHRoaXMgZnVuY3Rpb24gd2lsbCBjYWxsIGllZWU4MDIxMV9zdGFydF9uZXh0X3JvYygpLg0K
DQpTbyB0aGlzIHNpbXBsZSBwYXRjaCBzaG91bGQgZml4IHlvdXIgaXNzdWU6DQoNCmRpZmYgLS1n
aXQgYS9uZXQvbWFjODAyMTEvb2ZmY2hhbm5lbC5jIGIvbmV0L21hYzgwMjExL29mZmNoYW5uZWwu
YyBpbmRleCA2MzVjMzI1Li4yMTM4ZGMzIDEwMDY0NA0KLS0tIGEvbmV0L21hYzgwMjExL29mZmNo
YW5uZWwuYw0KKysrIGIvbmV0L21hYzgwMjExL29mZmNoYW5uZWwuYw0KQEAgLTQ1Myw4ICs0NTMs
NiBAQCB2b2lkIGllZWU4MDIxMV9yb2NfcHVyZ2Uoc3RydWN0IGllZWU4MDIxMV9zdWJfaWZfZGF0
YSAqc2RhdGEpDQogCQlsaXN0X21vdmVfdGFpbCgmcm9jLT5saXN0LCAmdG1wX2xpc3QpOw0KIAkJ
cm9jLT5hYm9ydCA9IHRydWU7DQogCX0NCi0NCi0JaWVlZTgwMjExX3N0YXJ0X25leHRfcm9jKGxv
Y2FsKTsNCiAJbXV0ZXhfdW5sb2NrKCZsb2NhbC0+bXR4KTsNCiANCiAJbGlzdF9mb3JfZWFjaF9l
bnRyeV9zYWZlKHJvYywgdG1wLCAmdG1wX2xpc3QsIGxpc3QpIHsNCg0KDQpQbGVhc2UgbGV0IG1l
IGtub3cgaWYgaXQgZG9lc24ndCwgdGhhdCB3b3VsZCBtZWFuIG15IGFuYWx5c2lzIGlzIGNvbXBs
ZXRlbHkgd3JvbmcuDQoNCmpvaGFubmVzDQoNCg==

2012-11-26 04:09:00

by Felix Liao

[permalink] [raw]
Subject: RE: Badness at net/mac80211/offchannel.c:264

VGhhbmtzIGZvciB5b3VyIGV4cGxhbmF0aW9uLCBJIGNhbiB1bmRlcnN0YW5kIG5vdy4NCg0KLSBG
ZWxpeA0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogSm9oYW5uZXMgQmVyZyBb
bWFpbHRvOmpvaGFubmVzQHNpcHNvbHV0aW9ucy5uZXRdIA0KU2VudDogU2F0dXJkYXksIE5vdmVt
YmVyIDI0LCAyMDEyIDY6MzYgUE0NClRvOiBGZWxpeCBMaWFvDQpDYzogbGludXgtd2lyZWxlc3NA
dmdlci5rZXJuZWwub3JnOyBPY2VhbiBTdTsgSm9lIFFpYW87IEJyeWFuIFBoaWxsaXBwZQ0KU3Vi
amVjdDogUmU6IEJhZG5lc3MgYXQgbmV0L21hYzgwMjExL29mZmNoYW5uZWwuYzoyNjQNCg0KT24g
U2F0LCAyMDEyLTExLTI0IGF0IDAxOjE5ICswMDAwLCBGZWxpeCBMaWFvIHdyb3RlOg0KPiBIaSBK
b2hhbm5lcywNCj4gVGhhbmtzIGZvciB5b3VyIGhlbHAsIEkganVzdCBtYWtlIHNvbWUgdGVzdHMg
d2l0aCB0aGUgcGF0Y2ggeW91IHByb3ZpZGVkLiANCj4gWWVzLCB0aGVyZSBpcyBub25lIHdhcm5p
bmcgY2FsbHRyYWNlIGFnYWluLCBhbmQgbm9uZSBvdGhlciBjYWxsdHJhY2UgdG9vLg0KPiANCj4g
QnV0IGZvciB0aGUgdG1wX2xpc3QsIHdoZW4gd2UgYWJvcnQgYSByb2MgYW5kIGFkZCB0byB0aGUg
dGFpbCBvZiANCj4gdG1wX2xpc3QsIFdlIHdpbGwgZGVsZXRlIGl0IGZyb20gdG1wX2xpc3QgaW4g
dGhlIHdvcmsgYXN5bmMsIHdlIG5lZWQgDQo+IHRvIG1vZGlmeSB0aGUgcHJldiBhbmQgbmV4dCBw
b2ludGVyIG9mIHRoZSBoZWFkIG9mIHRtcF9saXN0LCBJIHRoaW5rIA0KPiB0aGV5IGFyZSBmcmVl
ZCBmcm9tIHRoZSBzdGFjayBvZiBmdW5jdGlvbiBpZWVlODAyMTFfcm9jX3B1cmdlKCksIGl0IGlz
IA0KPiBkYW5nZXJvdXMuDQoNCk5vLCBpdCdzIGZpbmUgYmVjYXVzZSB3ZSBmbHVzaCB0aGUgYXN5
bmNocm9ub3VzIHdvcmsgdGhlcmUgYnkgY2FsbGluZw0KZmx1c2hfZGVsYXllZF93b3JrKCkgc28g
aXQncyBub3QgYWN0dWFsbHkgYXN5bmNocm9ub3VzIGJ1dCBzeW5jaHJvbm91cy4NCg0KVGhlIHJl
YXNvbiBJIGRpZG4ndCBqdXN0IG1ha2UgaXQgY2FsbCB0aGUgaWVlZTgwMjExX3N3X3JvY193b3Jr
KCkgZnVuY3Rpb24gaXMgdGhhdCBpdCBtYXkgYWxyZWFkeSBiZSBzY2hlZHVsZWQsIGFuZCB3ZSBu
ZWVkIHRvIGNhdGNoIHRoYXQgY2FzZSBhcyB3ZWxsLg0KDQpqb2hhbm5lcw0KDQo=

2012-11-24 01:19:34

by Felix Liao

[permalink] [raw]
Subject: RE: Badness at net/mac80211/offchannel.c:264

SGkgSm9oYW5uZXMsDQpUaGFua3MgZm9yIHlvdXIgaGVscCwgSSBqdXN0IG1ha2Ugc29tZSB0ZXN0
cyB3aXRoIHRoZSBwYXRjaCB5b3UgcHJvdmlkZWQuIA0KWWVzLCB0aGVyZSBpcyBub25lIHdhcm5p
bmcgY2FsbHRyYWNlIGFnYWluLCBhbmQgbm9uZSBvdGhlciBjYWxsdHJhY2UgdG9vLg0KDQpCdXQg
Zm9yIHRoZSB0bXBfbGlzdCwgd2hlbiB3ZSBhYm9ydCBhIHJvYyBhbmQgYWRkIHRvIHRoZSB0YWls
IG9mIHRtcF9saXN0LCBXZSB3aWxsIGRlbGV0ZSBpdCBmcm9tIHRtcF9saXN0IGluIHRoZSB3b3Jr
IGFzeW5jLCB3ZSBuZWVkIHRvIG1vZGlmeSB0aGUgcHJldiBhbmQgbmV4dCBwb2ludGVyIG9mIHRo
ZSBoZWFkIG9mIHRtcF9saXN0LCBJIHRoaW5rIHRoZXkgYXJlIGZyZWVkIGZyb20gdGhlIHN0YWNr
IG9mIGZ1bmN0aW9uIGllZWU4MDIxMV9yb2NfcHVyZ2UoKSwgaXQgaXMgZGFuZ2Vyb3VzLg0KDQpG
ZWxpeCBMaWFvDQo=

2012-11-23 23:19:08

by Johannes Berg

[permalink] [raw]
Subject: Re: Badness at net/mac80211/offchannel.c:264

Hi Felix,

Thanks for the report and the detailed description and analysis.

> hostapd->ieee80211_start_roc_work[2146] local->roc_list: <empty>
> hostapd->ieee80211_start_roc_work[2300] local->roc_list: <1>
> roc d90e7480 started 0 abrt 0 hw_beg 0 notified 0 hw_start 0 dur 60(60)
> phy0->ieee80211_sw_roc_work[333] local->roc_list: <1>
> roc d90e7480 started 0 abrt 0 hw_beg 0 notified 0 hw_start 0 dur 60(60)
> phy0->ieee80211_sw_roc_work[392] local->roc_list: <1>
> roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)
> hostapd->ieee80211_roc_purge[465] local->roc_list: <1>
> roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)
> hostapd->ieee80211_start_next_roc[264] roc d90e7480 cause call trace!
> hostapd->ieee80211_roc_purge[480] local->roc_list: <1>
> roc d90e7480 started 1 abrt 0 hw_beg 0 notified 1 hw_start 0 dur 60(60)

I think it would have been instrumental here to print out the different
interfaces. It seems to me that the roc_purge() is called for a
different interface, and doesn't actually remove any ROC work item at
all.

> I suspect the root cause of this issue is after ieee80211_sw_roc_work
> set roc->start to true and queue it again, the hostapd just call
> linux_set_iface_flags right now.

Yes and no, hostapd is changing an interface to be down, but it would
appear to be some other interface, not the one that's executing the ROC.

> I just increase the delay time to magnify the probability of this
> issue, then it is more easier to reproduce this issue.

Makes sense.

> So I think we should schedule the work as soon as possible, and should
> set the delay time to zero, that is,

> @@ -367,7 +367,7 @@ void ieee80211_sw_roc_work(struct work_struct *work)
>
> roc->started = true;
> - ieee80211_queue_delayed_work(&local->hw, &roc->work,
> - msecs_to_jiffies(roc->duration));
> + ieee80211_queue_delayed_work(&local->hw, &roc->work, 0);

definitely not, no, this would defeat the purpose of the ROC, to stay on
the given channel for the given time -- now you've cut the time to zero
which isn't what we want at all.

> At last, to be save I think we should use a static tmp_list or a
> global tmp_list to save the abort roc in ieee80211_roc_purge.

No, that totally doesn't seem right, a static list here would completely
break if the function was called on two CPUs at the same time (this
can't happen right now, but still)


> I had test this patch using the same testcase above 10 times, it runs
> very well and never dump the warning calltrace,

I can believe that, but it totally defeats the purpose of the ACS patch
you're using :-)

I think the root cause of the issue is that in ieee80211_roc_purge() we
unconditionally call ieee80211_start_next_roc(), even if nothing was
removed.

So I think what happens overall is this:

* roc is added on, e.g., wlan0
* roc starts running (roc->started becomes true)
* hostapd decides to turn off some other interface (say moni.wlan0?)
* therefore, ieee80211_roc_purge() is called _for that other interface_
* the loop in ieee80211_roc_purge() does nothing since there's no roc
for the other interface and the original roc stays at the front of the
list
* we call ieee80211_start_next_roc() and it (correctly) warns

So the first thing that comes to mind is that we must not call
ieee80211_start_next_roc() when there was no ROC item removed from the
list. However, it turns out we can completely remove the call, since the
second loop (over the tmp_list) will actually clear all the ROC items,
and run ieee80211_sw_roc_work() for each item. If the ROC was started,
then this function will call ieee80211_start_next_roc().

So this simple patch should fix your issue:

diff --git a/net/mac80211/offchannel.c b/net/mac80211/offchannel.c
index 635c325..2138dc3 100644
--- a/net/mac80211/offchannel.c
+++ b/net/mac80211/offchannel.c
@@ -453,8 +453,6 @@ void ieee80211_roc_purge(struct ieee80211_sub_if_data *sdata)
list_move_tail(&roc->list, &tmp_list);
roc->abort = true;
}
-
- ieee80211_start_next_roc(local);
mutex_unlock(&local->mtx);

list_for_each_entry_safe(roc, tmp, &tmp_list, list) {


Please let me know if it doesn't, that would mean my analysis is
completely wrong.

johannes


2012-11-24 10:35:17

by Johannes Berg

[permalink] [raw]
Subject: Re: Badness at net/mac80211/offchannel.c:264

On Sat, 2012-11-24 at 01:19 +0000, Felix Liao wrote:
> Hi Johannes,
> Thanks for your help, I just make some tests with the patch you provided.
> Yes, there is none warning calltrace again, and none other calltrace too.
>
> But for the tmp_list, when we abort a roc and add to the tail of
> tmp_list, We will delete it from tmp_list in the work async, we need
> to modify the prev and next pointer of the head of tmp_list, I think
> they are freed from the stack of function ieee80211_roc_purge(), it is
> dangerous.

No, it's fine because we flush the asynchronous work there by calling
flush_delayed_work() so it's not actually asynchronous but synchronous.

The reason I didn't just make it call the ieee80211_sw_roc_work()
function is that it may already be scheduled, and we need to catch that
case as well.

johannes