2014-05-07 07:23:12

by David Herrmann

[permalink] [raw]
Subject: [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

ah->caldata may be NULL if no channel is selected. Check for that before
accessing it.

Signed-off-by: David Herrmann <[email protected]>
---
Hi

This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
is freed while we run in hw_per_calibration(). However, this patch fixes serious
kernel panics with wifi-P2P on my machine.

I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
correct fix would be to synchronously stop any running hw-calibration before
setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
this small workaround.

Thanks
David

drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
index cdc7400..4667ab9 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
@@ -99,14 +99,16 @@ static bool ar9002_hw_per_calibration(struct ath_hw *ah,
}

currCal->calData->calPostProc(ah, numChains);
- caldata->CalValid |= currCal->calData->calType;
+ if (caldata)
+ caldata->CalValid |= currCal->calData->calType;
+
currCal->calState = CAL_DONE;
iscaldone = true;
} else {
ar9002_hw_setup_calibration(ah, currCal);
}
}
- } else if (!(caldata->CalValid & currCal->calData->calType)) {
+ } else if (caldata && !(caldata->CalValid & currCal->calData->calType)) {
ath9k_hw_reset_calibration(ah, currCal);
}

--
1.9.2



2014-05-13 06:50:00

by David Herrmann

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

Hi

On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <[email protected]> wrote:
> I looked into it again, the scenario where I assumed that this problem
> could occur didn't turn out to be true. I have no idea how this crash
> can occur.

The only path that can set ah->caldata to NULL is through:

ieee80211_hw_config()
ath9k_htc_config()
ath9k_htc_set_channel()
ath9k_hw_reset()

This happens whenever IEEE80211_CONF_OFFCHANNEL is set.

Now mac80211 is way to big for me to review right now and
ieee80211_hw_config() is used quite often. Given that the described
call-path does no synchronization against ath9k_htc_ani_work(), all
the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
ani-work is running. Is that intentional? I cannot see any of those
functions calling into ath9k_htc_stop_ani(). This might of course be
implicit.

One call-path I see is:
ieee80211_scan_cancel()
cancel_delayed_work()

We cannot use cancel_delayed_work_sync() here due to locking issues.
However, this obviously races against any following
set_channel(OFFCHANNEL) request.

If there's anything I can do to debug this, let me know. I tried
adding some printk()'s into the hot-path and it turns out to no longer
fail then. So this really seems to be a quite small race (given that a
bunch of simple printk()s is slow enough to work around it).

Thanks
David

2014-05-07 20:03:22

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Wed, May 7, 2014 at 12:54 PM, John W. Linville
<[email protected]> wrote:
> Is there any hope for getting a more complete fix from the ath9k guys
> in short order?

Wait, who are the ath9k folks now exactly ?

Luis

2014-05-12 18:00:15

by John W. Linville

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote:
> On 2014-05-07 21:54, John W. Linville wrote:
> > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> >> ah->caldata may be NULL if no channel is selected. Check for that before
> >> accessing it.
> >>
> >> Signed-off-by: David Herrmann <[email protected]>
> >> ---
> >> Hi
> >>
> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> >> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> >> kernel panics with wifi-P2P on my machine.
> >>
> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> >> correct fix would be to synchronously stop any running hw-calibration before
> >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> >> this small workaround.
> >>
> >> Thanks
> >> David
> >
> > Is there any hope for getting a more complete fix from the ath9k guys
> > in short order?
> This looks easy to fix. I'll send a patch soon.

Ping?

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2014-05-07 20:16:04

by Felix Fietkau

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On 2014-05-07 21:54, John W. Linville wrote:
> On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> ah->caldata may be NULL if no channel is selected. Check for that before
>> accessing it.
>>
>> Signed-off-by: David Herrmann <[email protected]>
>> ---
>> Hi
>>
>> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> kernel panics with wifi-P2P on my machine.
>>
>> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> correct fix would be to synchronously stop any running hw-calibration before
>> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> this small workaround.
>>
>> Thanks
>> David
>
> Is there any hope for getting a more complete fix from the ath9k guys
> in short order?
This looks easy to fix. I'll send a patch soon.

- Felix


2014-05-13 10:41:14

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Tue, May 13, 2014 at 11:09:48AM +0200, David Herrmann wrote:
> Hi
>
> On Tue, May 13, 2014 at 11:00 AM, Rajkumar Manoharan
> <[email protected]> wrote:
> > On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote:
> >> Hi
> >>
> >> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <[email protected]> wrote:
> >> > I looked into it again, the scenario where I assumed that this problem
> >> > could occur didn't turn out to be true. I have no idea how this crash
> >> > can occur.
> >>
> >> The only path that can set ah->caldata to NULL is through:
> >>
> >> ieee80211_hw_config()
> >> ath9k_htc_config()
> >> ath9k_htc_set_channel()
> >> ath9k_hw_reset()
> >>
> >> This happens whenever IEEE80211_CONF_OFFCHANNEL is set.
> >>
> >> Now mac80211 is way to big for me to review right now and
> >> ieee80211_hw_config() is used quite often. Given that the described
> >> call-path does no synchronization against ath9k_htc_ani_work(), all
> >> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
> >> ani-work is running. Is that intentional? I cannot see any of those
> >> functions calling into ath9k_htc_stop_ani(). This might of course be
> >> implicit.
> >>
> >> One call-path I see is:
> >> ieee80211_scan_cancel()
> >> cancel_delayed_work()
> >>
> >> We cannot use cancel_delayed_work_sync() here due to locking issues.
> >> However, this obviously races against any following
> >> set_channel(OFFCHANNEL) request.
> >>
> >> If there's anything I can do to debug this, let me know. I tried
> >> adding some printk()'s into the hot-path and it turns out to no longer
> >> fail then. So this really seems to be a quite small race (given that a
> >> bunch of simple printk()s is slow enough to work around it).
> >
> > David,
> >
> > Are you using USB devices? What is the PID/VID? I have not looked at
> > HTC driver perspective.
>
> Yes, I'm using the htc driver. I thought that was clear by pointing at
> ar9002, but I was wrong, sorry. lsusb information is:
> 'Bus 003 Device 056: ID 0411:017f BUFFALO INC. (formerly MelCo.,
> Inc.) Sony UWA-BR100 802.11abgn Wireless Adapter [Atheros
> AR7010+AR9280]'
>
Unlike ath9k driver, the ani work is stopped in sw_scan_start callback
for htc driver. So during scan process, ani work is stopped well before
calling set_channel. But in case of p2p_listen operation, set_channel can be
called by sw_roc without stopping ani.

Can you please test with attached change?

-Rajkumar


Attachments:
(No filename) (2.41 kB)
ani.patch (830.00 B)
Download all attachments

2014-05-08 18:18:07

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> ah->caldata may be NULL if no channel is selected. Check for that before
> accessing it.
>
> Signed-off-by: David Herrmann <[email protected]>
> ---
> Hi
>
> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> kernel panics with wifi-P2P on my machine.
>
> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> correct fix would be to synchronously stop any running hw-calibration before
> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> this small workaround.
>
David,

Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in
hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped
synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe
the panics. Is there a easiest way to reproduce the problem. Are you
using wireless-testing tree? Thanks for reporting the problem. Will try
to fix asap.

-Rajkumar

2014-05-12 18:43:42

by Felix Fietkau

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On 2014-05-12 19:49, John W. Linville wrote:
> On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote:
>> On 2014-05-07 21:54, John W. Linville wrote:
>> > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> >> ah->caldata may be NULL if no channel is selected. Check for that before
>> >> accessing it.
>> >>
>> >> Signed-off-by: David Herrmann <[email protected]>
>> >> ---
>> >> Hi
>> >>
>> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> >> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> >> kernel panics with wifi-P2P on my machine.
>> >>
>> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> >> correct fix would be to synchronously stop any running hw-calibration before
>> >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> >> this small workaround.
>> >>
>> >> Thanks
>> >> David
>> >
>> > Is there any hope for getting a more complete fix from the ath9k guys
>> > in short order?
>> This looks easy to fix. I'll send a patch soon.
>
> Ping?
I looked into it again, the scenario where I assumed that this problem
could occur didn't turn out to be true. I have no idea how this crash
can occur.

- Felix

2014-05-07 20:00:25

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> ah->caldata may be NULL if no channel is selected. Check for that before
> accessing it.
>
> Signed-off-by: David Herrmann <[email protected]>
> ---
> Hi
>
> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> kernel panics with wifi-P2P on my machine.
>
> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> correct fix would be to synchronously stop any running hw-calibration before
> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> this small workaround.
>
> Thanks
> David

Is there any hope for getting a more complete fix from the ath9k guys
in short order?

John

>
> drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> index cdc7400..4667ab9 100644
> --- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> +++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> @@ -99,14 +99,16 @@ static bool ar9002_hw_per_calibration(struct ath_hw *ah,
> }
>
> currCal->calData->calPostProc(ah, numChains);
> - caldata->CalValid |= currCal->calData->calType;
> + if (caldata)
> + caldata->CalValid |= currCal->calData->calType;
> +
> currCal->calState = CAL_DONE;
> iscaldone = true;
> } else {
> ar9002_hw_setup_calibration(ah, currCal);
> }
> }
> - } else if (!(caldata->CalValid & currCal->calData->calType)) {
> + } else if (caldata && !(caldata->CalValid & currCal->calData->calType)) {
> ath9k_hw_reset_calibration(ah, currCal);
> }
>
> --
> 1.9.2
>
>

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2014-05-13 09:00:35

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote:
> Hi
>
> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <[email protected]> wrote:
> > I looked into it again, the scenario where I assumed that this problem
> > could occur didn't turn out to be true. I have no idea how this crash
> > can occur.
>
> The only path that can set ah->caldata to NULL is through:
>
> ieee80211_hw_config()
> ath9k_htc_config()
> ath9k_htc_set_channel()
> ath9k_hw_reset()
>
> This happens whenever IEEE80211_CONF_OFFCHANNEL is set.
>
> Now mac80211 is way to big for me to review right now and
> ieee80211_hw_config() is used quite often. Given that the described
> call-path does no synchronization against ath9k_htc_ani_work(), all
> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
> ani-work is running. Is that intentional? I cannot see any of those
> functions calling into ath9k_htc_stop_ani(). This might of course be
> implicit.
>
> One call-path I see is:
> ieee80211_scan_cancel()
> cancel_delayed_work()
>
> We cannot use cancel_delayed_work_sync() here due to locking issues.
> However, this obviously races against any following
> set_channel(OFFCHANNEL) request.
>
> If there's anything I can do to debug this, let me know. I tried
> adding some printk()'s into the hot-path and it turns out to no longer
> fail then. So this really seems to be a quite small race (given that a
> bunch of simple printk()s is slow enough to work around it).

David,

Are you using USB devices? What is the PID/VID? I have not looked at
HTC driver perspective.

-Rajkumar

2014-05-13 09:09:49

by David Herrmann

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

Hi

On Tue, May 13, 2014 at 11:00 AM, Rajkumar Manoharan
<[email protected]> wrote:
> On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote:
>> Hi
>>
>> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <[email protected]> wrote:
>> > I looked into it again, the scenario where I assumed that this problem
>> > could occur didn't turn out to be true. I have no idea how this crash
>> > can occur.
>>
>> The only path that can set ah->caldata to NULL is through:
>>
>> ieee80211_hw_config()
>> ath9k_htc_config()
>> ath9k_htc_set_channel()
>> ath9k_hw_reset()
>>
>> This happens whenever IEEE80211_CONF_OFFCHANNEL is set.
>>
>> Now mac80211 is way to big for me to review right now and
>> ieee80211_hw_config() is used quite often. Given that the described
>> call-path does no synchronization against ath9k_htc_ani_work(), all
>> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
>> ani-work is running. Is that intentional? I cannot see any of those
>> functions calling into ath9k_htc_stop_ani(). This might of course be
>> implicit.
>>
>> One call-path I see is:
>> ieee80211_scan_cancel()
>> cancel_delayed_work()
>>
>> We cannot use cancel_delayed_work_sync() here due to locking issues.
>> However, this obviously races against any following
>> set_channel(OFFCHANNEL) request.
>>
>> If there's anything I can do to debug this, let me know. I tried
>> adding some printk()'s into the hot-path and it turns out to no longer
>> fail then. So this really seems to be a quite small race (given that a
>> bunch of simple printk()s is slow enough to work around it).
>
> David,
>
> Are you using USB devices? What is the PID/VID? I have not looked at
> HTC driver perspective.

Yes, I'm using the htc driver. I thought that was clear by pointing at
ar9002, but I was wrong, sorry. lsusb information is:
'Bus 003 Device 056: ID 0411:017f BUFFALO INC. (formerly MelCo.,
Inc.) Sony UWA-BR100 802.11abgn Wireless Adapter [Atheros
AR7010+AR9280]'

Thanks
David

2014-05-13 18:21:42

by David Herrmann

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

Hi

On Tue, May 13, 2014 at 12:41 PM, Rajkumar Manoharan
<[email protected]> wrote:
> On Tue, May 13, 2014 at 11:09:48AM +0200, David Herrmann wrote:
> Unlike ath9k driver, the ani work is stopped in sw_scan_start callback
> for htc driver. So during scan process, ani work is stopped well before
> calling set_channel. But in case of p2p_listen operation, set_channel can be
> called by sw_roc without stopping ani.
>
> Can you please test with attached change?

I cannot reproduce it anymore and my traces show that the
recalibration is called a _lot_ less often. In fact, during 10
successful p2p-connect attempts the per_calib helper is only called
once.

I will keep looking, but looks good so far:
Tested-by: David Herrmann <[email protected]>

Thanks
David

2014-05-08 20:16:45

by David Herrmann

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

Hi

On Thu, May 8, 2014 at 8:18 PM, Rajkumar Manoharan
<[email protected]> wrote:
> On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> ah->caldata may be NULL if no channel is selected. Check for that before
>> accessing it.
>>
>> Signed-off-by: David Herrmann <[email protected]>
>> ---
>> Hi
>>
>> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> kernel panics with wifi-P2P on my machine.
>>
>> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> correct fix would be to synchronously stop any running hw-calibration before
>> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> this small workaround.
>>
> David,
>
> Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in
> hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped
> synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe
> the panics. Is there a easiest way to reproduce the problem. Are you
> using wireless-testing tree? Thanks for reporting the problem. Will try
> to fix asap.

Reproducing it is actually quite easy on my machine. Whenever I start
a P2P-connect from my Android-phone to my linux-host and _immediately_
accept it (via p2p_connect on wpas), I get the kernel-panic. Adding
the NULL-protection fixes this.

However, if I delay accepting the connection (ie, issuing p2p_connect
by hand instead of automatically), I cannot see the bug. Furthermore,
on my slower Intel Core 2 Duo, the bug happens much less likely. On my
ARM machine I never saw this happening. Given that my main machine is
an Intel hsw quad-core, I guess it's a simple race-condition.

I also added a printk() whenever caldata is NULL and noticed that it
fires only during the first 2 or 3 runs. After that, it never happened
again.

The bug happens on all linux kernels I tested (starting with 3.9ish up
to linux-next). However, if I apply my fix, anything after 3.13-stable
fails to transmit DHCP data. I can connect properly but DHCP always
times out. I'm not sure why that happens and I'm still debugging this,
but it's quite likely a separate issue. (if I find some time, I will
bisect this)

I now looked at the ath9k code and I couldn't see any locking around
the hw_reset at all. I don't know whether the wifi-core / nl80211
locks this, but what happens if two hw_resets race each other? Just a
guess.. I will try to look into it tomorrow.

Thanks
David

2014-05-07 20:45:11

by John W. Linville

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH] ath9k: fix NULL-deref in hw_per_calibration() for ar9002

On Wed, May 07, 2014 at 01:03:00PM -0700, Luis R. Rodriguez wrote:
> On Wed, May 7, 2014 at 12:54 PM, John W. Linville
> <[email protected]> wrote:
> > Is there any hope for getting a more complete fix from the ath9k guys
> > in short order?
>
> Wait, who are the ath9k folks now exactly ?

A better question may be, who are the qualcomm folks now exactly? ;-)

Anyway, I hope that the ath9k guys know who they are and will respond.
If there aren't any, then I guess we'll have to start slapping bubble
gum and popsicle sticks onto ath9k...

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.