LinuxLists.cc - [PATCH 1/4] cfg80211: stop sched scan only when needed

2013-12-05 09:21:38

Subject: [PATCH 1/4] cfg80211: stop sched scan only when needed

From: Barak Bercovitz <[email protected]>

cfg80211_leave stops sched scan when any station vif
is leaving. Add an explicit check and call it only
when the relevant vif (the one we scan on) is leaving.

Signed-off-by: Barak Bercovitz <[email protected]>
[Eliad - changed the commit message a bit]
Signed-off-by: Eliad Peller <[email protected]>
---
net/wireless/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/wireless/core.c b/net/wireless/core.c
index 32af857..a8e48c7 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -775,7 +775,8 @@ void cfg80211_leave(struct cfg80211_registered_device *rdev,
break;
case NL80211_IFTYPE_P2P_CLIENT:
case NL80211_IFTYPE_STATION:
- __cfg80211_stop_sched_scan(rdev, false);
+ if (rdev->sched_scan_req && dev == rdev->sched_scan_req->dev)
+ __cfg80211_stop_sched_scan(rdev, false);

wdev_lock(wdev);
#ifdef CONFIG_CFG80211_WEXT
--
1.8.5.rc1

2013-12-05 15:52:31

by Arik Nemtsov

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, Dec 5, 2013 at 5:45 PM, Johannes Berg <[email protected]> wrote:
> On Thu, 2013-12-05 at 17:39 +0200, Eliad Peller wrote:
>> On Thu, Dec 5, 2013 at 4:43 PM, Johannes Berg <[email protected]> wrote:
>> > On Thu, 2013-12-05 at 16:36 +0200, Eliad Peller wrote:
>> >> On Thu, Dec 5, 2013 at 4:31 PM, Johannes Berg <[email protected]> wrote:
>> >> > On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:
>> >> >
>> >> >> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
>> >> >> * the scan request or not ... if it accesses the dev
>> >> >> * in there (it shouldn't anyway) then it may crash.
>> >> >> */
>> >> >> - if (!leak)
>> >> >> - kfree(request);
>> >> >> + if (leak) {
>> >> >> + request->pending_cleanup = true;
>> >> >> + return;
>> >> >
>> >> > This seems insufficient, if the driver never indicates completion, we'd
>> >> > never clear rdev->scan_req?
>> >> >
>> >> right, but i think it somehow makes sense (i.e. the driver must
>> >> indicate completion...)?
>> >
>> > But the whole thing was intended to catch buggy drivers :)
>> >
>> yeah, you have a point here :)
>> anyway, i guess it's either leaking scan_req and hoping the driver
>> really forgot about it, or keeping it and hoping the driver will
>> finally indicate completion.
>>
>> since i don't think this is a real-world scenario, i'm ok with
>> dropping this patch.
>
> Well, it can be made to crash, so ...
>
> Can we maybe avoid the crash in a different way? Disallow a new scan
> somehow?

Maybe we should drop the whole netdev-notified doing ___cfg80211_scan_done?
I mean if a workaround for buggy drivers is causing bugs for
legitimate drivers..

Something simple for buggy drivers would be doing this in the notifier
- BUG_ON(!rdev->scan_req->notified)

Just my 2c.

Arik

2013-12-05 14:31:25

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:

> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
> * the scan request or not ... if it accesses the dev
> * in there (it shouldn't anyway) then it may crash.
> */
> - if (!leak)
> - kfree(request);
> + if (leak) {
> + request->pending_cleanup = true;
> + return;

This seems insufficient, if the driver never indicates completion, we'd
never clear rdev->scan_req?

johannes

2013-12-05 09:21:40

by Eliad Peller

[permalink] [raw]

Subject: [PATCH 2/4] mac80211: determine completed scan type by defined ops

In some cases, determining the completed scan type was
done by testing the SCAN_HW_SCANNING flag.

However, this doesn't take care for the case in which
the hw scan was requested, but hasn't started yet (e.g.
due to active remain_on_channel).

Replace this test by checking whether ops->hw_scan is
defined.

This solves the following warning:

WARNING: CPU: 0 PID: 3552 at net/mac80211/offchannel.c:156 __ieee80211_scan_completed+0x1b4/0x2dc [mac80211]()
[<c001cd38>] (unwind_backtrace+0x0/0xf0)
[<c00181d0>] (show_stack+0x10/0x14)
[<c05c0d8c>] (dump_stack+0x78/0x94)
[<c0047c08>] (warn_slowpath_common+0x68/0x8c)
[<c0047c48>] (warn_slowpath_null+0x1c/0x24)
[<bf4d4504>] (__ieee80211_scan_completed+0x1b4/0x2dc [mac80211])
[<bf4d5a74>] (ieee80211_scan_cancel+0xe8/0x190 [mac80211])
[<bf4df970>] (ieee80211_do_stop+0x63c/0x79c [mac80211])
[<bf4dfae0>] (ieee80211_stop+0x10/0x18 [mac80211])
[<c0504d84>] (__dev_close_many+0x84/0xcc)
[<c0504df4>] (__dev_close+0x28/0x3c)
[<c0509708>] (__dev_change_flags+0x78/0x144)
[<c0509854>] (dev_change_flags+0x10/0x48)
[<c055fe3c>] (devinet_ioctl+0x614/0x6d0)
[<c04f22a0>] (sock_ioctl+0x5c/0x2a4)
[<c0124eb4>] (do_vfs_ioctl+0x7c/0x5d8)
[<c012547c>] (SyS_ioctl+0x6c/0x7c)

Signed-off-by: Eliad Peller <[email protected]>
---
net/mac80211/scan.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 618267b..e4baa53 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -271,10 +271,10 @@ static bool ieee80211_prep_hw_scan(struct ieee80211_local *local)
return true;
}

-static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted,
- bool was_hw_scan)
+static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
{
struct ieee80211_local *local = hw_to_local(hw);
+ bool hw_scan = local->ops->hw_scan;

lockdep_assert_held(&local->mtx);

@@ -290,7 +290,7 @@ static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted,
if (WARN_ON(!local->scan_req))
return;

- if (was_hw_scan && !aborted && ieee80211_prep_hw_scan(local)) {
+ if (hw_scan && !aborted && ieee80211_prep_hw_scan(local)) {
int rc;

rc = drv_hw_scan(local,
@@ -316,7 +316,7 @@ static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted,
/* Set power back to normal operating levels. */
ieee80211_hw_config(local, 0);

- if (!was_hw_scan) {
+ if (!hw_scan) {
ieee80211_configure_filter(local);
drv_sw_scan_complete(local);
ieee80211_offchannel_return(local);
@@ -751,7 +751,7 @@ void ieee80211_scan_work(struct work_struct *work)
container_of(work, struct ieee80211_local, scan_work.work);
struct ieee80211_sub_if_data *sdata;
unsigned long next_delay = 0;
- bool aborted, hw_scan;
+ bool aborted;

mutex_lock(&local->mtx);

@@ -838,8 +838,7 @@ void ieee80211_scan_work(struct work_struct *work)
goto out;

out_complete:
- hw_scan = test_bit(SCAN_HW_SCANNING, &local->scanning);
- __ieee80211_scan_completed(&local->hw, aborted, hw_scan);
+ __ieee80211_scan_completed(&local->hw, aborted);
out:
mutex_unlock(&local->mtx);
}
@@ -977,7 +976,7 @@ void ieee80211_scan_cancel(struct ieee80211_local *local)
*/
cancel_delayed_work(&local->scan_work);
/* and clean up */
- __ieee80211_scan_completed(&local->hw, true, false);
+ __ieee80211_scan_completed(&local->hw, true);
out:
mutex_unlock(&local->mtx);
}
--
1.8.5.rc1

2013-12-05 14:36:24

by Eliad Peller

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, Dec 5, 2013 at 4:31 PM, Johannes Berg <[email protected]> wrote:
> On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:
>
>> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
>> * the scan request or not ... if it accesses the dev
>> * in there (it shouldn't anyway) then it may crash.
>> */
>> - if (!leak)
>> - kfree(request);
>> + if (leak) {
>> + request->pending_cleanup = true;
>> + return;
>
> This seems insufficient, if the driver never indicates completion, we'd
> never clear rdev->scan_req?
>
right, but i think it somehow makes sense (i.e. the driver must
indicate completion...)?

Eliad.

2013-12-05 16:32:29

by Eliad Peller

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, Dec 5, 2013 at 6:14 PM, Johannes Berg <[email protected]> wrote:
> On Thu, 2013-12-05 at 17:39 +0200, Eliad Peller wrote:
>
>> > Btw, should any of this go to 3.13?
>> maybe the first one. it's the only "real" issue.
>
> Actually, I'm not going to take that into 3.13 - we defined the API so
> that the driver is always allowed to drop the scheduled scan, this isn't
> really any different, is it?
>
it doesn't getting stopped in normal cases (by the driver), and
wpa_supplicant doesn't seem to handle it well.
anyway, i guess we can wait with this patch as well.

Eliad.

2013-12-05 14:43:30

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, 2013-12-05 at 16:36 +0200, Eliad Peller wrote:
> On Thu, Dec 5, 2013 at 4:31 PM, Johannes Berg <[email protected]> wrote:
> > On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:
> >
> >> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
> >> * the scan request or not ... if it accesses the dev
> >> * in there (it shouldn't anyway) then it may crash.
> >> */
> >> - if (!leak)
> >> - kfree(request);
> >> + if (leak) {
> >> + request->pending_cleanup = true;
> >> + return;
> >
> > This seems insufficient, if the driver never indicates completion, we'd
> > never clear rdev->scan_req?
> >
> right, but i think it somehow makes sense (i.e. the driver must
> indicate completion...)?

But the whole thing was intended to catch buggy drivers :)

Btw, should any of this go to 3.13?

johannes

2013-12-05 09:21:42

by Eliad Peller

[permalink] [raw]

Subject: [PATCH 3/4] mac80211: start_next_roc only if scan was actually running

On scan completion we try start any pending roc.

However, if scan was just pending (and not actually started)
there is no point in trying to start the roc, as it might
have started already.

This solves the following warning:
WARNING: CPU: 0 PID: 3552 at net/mac80211/offchannel.c:269 ieee80211_start_next_roc+0x164/0x204 [mac80211]()
[<c001cd38>] (unwind_backtrace+0x0/0xf0)
[<c00181d0>] (show_stack+0x10/0x14)
[<c05c0d8c>] (dump_stack+0x78/0x94)
[<c0047c08>] (warn_slowpath_common+0x68/0x8c)
[<c0047c48>] (warn_slowpath_null+0x1c/0x24)
[<bf4d6660>] (ieee80211_start_next_roc+0x164/0x204 [mac80211])
[<bf4d5a74>] (ieee80211_scan_cancel+0xe8/0x190 [mac80211])
[<bf4df970>] (ieee80211_do_stop+0x63c/0x79c [mac80211])
[<bf4dfae0>] (ieee80211_stop+0x10/0x18 [mac80211])
[<c0504d84>] (__dev_close_many+0x84/0xcc)
[<c0504df4>] (__dev_close+0x28/0x3c)
[<c0509708>] (__dev_change_flags+0x78/0x144)
[<c0509854>] (dev_change_flags+0x10/0x48)
[<c055fe3c>] (devinet_ioctl+0x614/0x6d0)
[<c04f22a0>] (sock_ioctl+0x5c/0x2a4)
[<c0124eb4>] (do_vfs_ioctl+0x7c/0x5d8)
[<c012547c>] (SyS_ioctl+0x6c/0x7c)

Signed-off-by: Eliad Peller <[email protected]>
---
net/mac80211/scan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index e4baa53..d887b25 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -275,6 +275,7 @@ static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
{
struct ieee80211_local *local = hw_to_local(hw);
bool hw_scan = local->ops->hw_scan;
+ bool was_scanning = local->scanning;

lockdep_assert_held(&local->mtx);

@@ -327,7 +328,8 @@ static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
ieee80211_mlme_notify_scan_completed(local);
ieee80211_ibss_notify_scan_completed(local);
ieee80211_mesh_notify_scan_completed(local);
- ieee80211_start_next_roc(local);
+ if (was_scanning)
+ ieee80211_start_next_roc(local);
}

void ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
--
1.8.5.rc1

2013-12-05 15:45:33

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, 2013-12-05 at 17:39 +0200, Eliad Peller wrote:
> On Thu, Dec 5, 2013 at 4:43 PM, Johannes Berg <[email protected]> wrote:
> > On Thu, 2013-12-05 at 16:36 +0200, Eliad Peller wrote:
> >> On Thu, Dec 5, 2013 at 4:31 PM, Johannes Berg <[email protected]> wrote:
> >> > On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:
> >> >
> >> >> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
> >> >> * the scan request or not ... if it accesses the dev
> >> >> * in there (it shouldn't anyway) then it may crash.
> >> >> */
> >> >> - if (!leak)
> >> >> - kfree(request);
> >> >> + if (leak) {
> >> >> + request->pending_cleanup = true;
> >> >> + return;
> >> >
> >> > This seems insufficient, if the driver never indicates completion, we'd
> >> > never clear rdev->scan_req?
> >> >
> >> right, but i think it somehow makes sense (i.e. the driver must
> >> indicate completion...)?
> >
> > But the whole thing was intended to catch buggy drivers :)
> >
> yeah, you have a point here :)
> anyway, i guess it's either leaking scan_req and hoping the driver
> really forgot about it, or keeping it and hoping the driver will
> finally indicate completion.
>
> since i don't think this is a real-world scenario, i'm ok with
> dropping this patch.

Well, it can be made to crash, so ...

Can we maybe avoid the crash in a different way? Disallow a new scan
somehow?

> > Btw, should any of this go to 3.13?
> maybe the first one. it's the only "real" issue.

Thanks.

johannes

2013-12-05 15:39:17

by Eliad Peller

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, Dec 5, 2013 at 4:43 PM, Johannes Berg <[email protected]> wrote:
> On Thu, 2013-12-05 at 16:36 +0200, Eliad Peller wrote:
>> On Thu, Dec 5, 2013 at 4:31 PM, Johannes Berg <[email protected]> wrote:
>> > On Thu, 2013-12-05 at 11:21 +0200, Eliad Peller wrote:
>> >
>> >> @@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
>> >> * the scan request or not ... if it accesses the dev
>> >> * in there (it shouldn't anyway) then it may crash.
>> >> */
>> >> - if (!leak)
>> >> - kfree(request);
>> >> + if (leak) {
>> >> + request->pending_cleanup = true;
>> >> + return;
>> >
>> > This seems insufficient, if the driver never indicates completion, we'd
>> > never clear rdev->scan_req?
>> >
>> right, but i think it somehow makes sense (i.e. the driver must
>> indicate completion...)?
>
> But the whole thing was intended to catch buggy drivers :)
>
yeah, you have a point here :)
anyway, i guess it's either leaking scan_req and hoping the driver
really forgot about it, or keeping it and hoping the driver will
finally indicate completion.

since i don't think this is a real-world scenario, i'm ok with
dropping this patch.

> Btw, should any of this go to 3.13?
maybe the first one. it's the only "real" issue.

Eliad.

2013-12-05 15:53:53

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, 2013-12-05 at 17:52 +0200, Arik Nemtsov wrote:

> >> > But the whole thing was intended to catch buggy drivers :)
> >> >
> >> yeah, you have a point here :)
> >> anyway, i guess it's either leaking scan_req and hoping the driver
> >> really forgot about it, or keeping it and hoping the driver will
> >> finally indicate completion.
> >>
> >> since i don't think this is a real-world scenario, i'm ok with
> >> dropping this patch.
> >
> > Well, it can be made to crash, so ...
> >
> > Can we maybe avoid the crash in a different way? Disallow a new scan
> > somehow?
>
> Maybe we should drop the whole netdev-notified doing ___cfg80211_scan_done?
> I mean if a workaround for buggy drivers is causing bugs for
> legitimate drivers..
>
> Something simple for buggy drivers would be doing this in the notifier
> - BUG_ON(!rdev->scan_req->notified)

BUG_ON is probably a bit heavy-handed, but yeah, I suppose we can drop
this. We used to have more bugs with drivers and even mac80211, but that
should be a thing of the past.

johannes

2013-12-05 16:15:06

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

On Thu, 2013-12-05 at 17:39 +0200, Eliad Peller wrote:

> > Btw, should any of this go to 3.13?
> maybe the first one. it's the only "real" issue.

Actually, I'm not going to take that into 3.13 - we defined the API so
that the driver is always allowed to drop the scheduled scan, this isn't
really any different, is it?

johannes

2013-12-05 16:18:08

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH 1/4] cfg80211: stop sched scan only when needed

Applied 1-3, still discussing patch 4.

johannes

2013-12-05 09:21:43

by Eliad Peller

[permalink] [raw]

Subject: [PATCH 4/4] cfg80211: prevent race condition on scan request cleanup

___cfg80211_scan_done() can be called in some cases
(e.g. on NETDEV_DOWN) before the low level driver
notified scan completion (which is indicated by
passing leak=true).

Clearing rdev->scan_req in this case is buggy, as
scan_done_wk might have already being queued/running
(and can't be flushed as it takes rtnl()).

If a new scan will be requested at this stage, the
scan_done_wk will try freeing it (instead of the
previous scan), and this will later result in
a use after free.

Solve it by freeing scan_req (and clearing it) only
when leak=false. Otherwise, instead of freeing it
mark the request as pending_cleanup, and free it
later on (by the work).

An example backtrace after such crash:
Unable to handle kernel paging request at virtual address fffffee5
pgd = c0004000
[fffffee5] *pgd=9fdf6821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
PC is at cfg80211_scan_done+0x28/0xc4 [cfg80211]
LR is at __ieee80211_scan_completed+0xe4/0x2dc [mac80211]
[<bf0077b0>] (cfg80211_scan_done+0x28/0xc4 [cfg80211])
[<bf0973d4>] (__ieee80211_scan_completed+0xe4/0x2dc [mac80211])
[<bf0982cc>] (ieee80211_scan_work+0x94/0x4f0 [mac80211])
[<c005fd10>] (process_one_work+0x1b0/0x4a8)
[<c0060404>] (worker_thread+0x138/0x37c)
[<c0066d70>] (kthread+0xa4/0xb0)

Signed-off-by: Eliad Peller <[email protected]>
---
i encountered this one while adding some intentional delays in order
to reproduce a different issue. this is probably not a real-world scenario.

include/net/cfg80211.h | 2 +-
net/wireless/scan.c | 13 ++++++++++---
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 3f4eff0..c12259e 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1376,7 +1376,7 @@ struct cfg80211_scan_request {
/* internal */
struct wiphy *wiphy;
unsigned long scan_start;
- bool aborted, notified;
+ bool aborted, notified, pending_cleanup;
bool no_cck;

u32 min_dwell;
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index d960e4a..1ec43a8 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -176,6 +176,9 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
if (!request)
return;

+ if (request->pending_cleanup)
+ goto free_request;
+
wdev = request->wdev;

/*
@@ -209,7 +212,6 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
if (wdev->netdev)
dev_put(wdev->netdev);

- rdev->scan_req = NULL;

/*
* OK. If this is invoked with "leak" then we can't
@@ -219,8 +221,13 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool leak)
* the scan request or not ... if it accesses the dev
* in there (it shouldn't anyway) then it may crash.
*/
- if (!leak)
- kfree(request);
+ if (leak) {
+ request->pending_cleanup = true;
+ return;
+ }
+free_request:
+ rdev->scan_req = NULL;
+ kfree(request);
}

void __cfg80211_scan_done(struct work_struct *wk)
--
1.8.5.rc1