2009-11-04 06:00:25

by Larry Finger

[permalink] [raw]
Subject: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

As reported by Rick Farina ([email protected]), removing the RTL8187 USB
stick, or unloading the driver rtl8187 using rmmod will cause a kernel oops.
There are at least two forms of the failure, (1) BUG: Scheduling while atomic,
and (2) a fatal kernel page fault. This problem is reported in Bugzilla #14539.

This problem does not occur for kernel 2.6.31, but does for 2.6.32-rc2, thus
it is technically a regression; however, bisection did not locate any faulty
patch. The fix was found by comparing the faulty code in rtl8187 with p54usb.
My interpretation is that the handling of work queues in mac80211 changed
enough to the LEDs to be unregistered before tasks on the work queues are
cancelled. Previously, these actions could be done in either order.

Signed-off-by: Larry Finger <[email protected]>
Reported-and-tested by: Rick Farina <[email protected]>
---

John,

This is 2.6.32 material. Sorry to take so long to get a patch, but it was
difficult for me to locate the problem. Fortunately, I had the postings of the
two flame wars to amuse me while all the kernel compilations were happening.

Larry
---

Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
===================================================================
--- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187_leds.c
+++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
@@ -210,10 +210,10 @@ void rtl8187_leds_exit(struct ieee80211_

/* turn the LED off before exiting */
ieee80211_queue_delayed_work(dev, &priv->led_off, 0);
- cancel_delayed_work_sync(&priv->led_off);
- cancel_delayed_work_sync(&priv->led_on);
rtl8187_unregister_led(&priv->led_rx);
rtl8187_unregister_led(&priv->led_tx);
+ cancel_delayed_work_sync(&priv->led_off);
+ cancel_delayed_work_sync(&priv->led_on);
}
#endif /* def CONFIG_RTL8187_LED */



2009-11-05 04:55:06

by Sid Hayn

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

Larry Finger wrote:
> On 11/04/2009 06:14 PM, Herton Ronaldo Krzesinski wrote:
>
>> Hi, I checked here and the code above in led_classdev_unregister is the same in
>> 2.6.31, so I think the patch from Larry should also be applied/sent to 2.6.31.x
>> stable, as the bug could happen there too.
>>
>
> Herton,
>
> Technically you are correct; however, I did extensive testing of
> 2.6.31 and _NEVER_ got the failure. The sequencing is a bug waiting to
> happen, but something in the post 2.6.31 merge actually enabled the
> bug to happen. I tried bisection to determine which change actully did
> that, but was not successful.
>
>
Using kernel 2.6.29 and compat-wireless-stable 2.6.31* I am able to make
it flake. I may be the only one lucky enough to make this explode with
a high degree of accuracy but I can't see why we wouldn't fix what we
know is a bug waiting to happen. My vote is to add this patch to
2.6.31.x as well.

Thanks,
Rick Farina
> My feeling is that stable can be left alone. Of course, if bug reports
> of kernel panics on unload of rtl8187 start occurring, we will know
> how to fix it.
>
> Larry
>
>


2009-11-04 19:00:13

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On Wed, Nov 04, 2009 at 12:13:02PM -0600, Larry Finger wrote:
> On 11/04/2009 10:54 AM, John W. Linville wrote:
> > On Wed, Nov 04, 2009 at 09:54:37AM -0600, Larry Finger wrote:
> >
> >> I will try once more to get netconsole working to capture the backtrace.
> >
> > No need, I think I understand it now. The new order still sorta
> > looks/feels "wrong", but it seems fine. Maybe an alternative would
> > be to make the brightness_set routine aware of the shutdown and not
> > queue the work? Maybe even ieee80211_queue_delayed_work could be
> > made a bit smarter here?
>
> Are either of these questions a request, or are they musings?

They are musings -- but feel free to be inspired! :-)

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-11-04 17:00:14

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On Wed, Nov 04, 2009 at 04:30:19PM +0100, Christian Lamparter wrote:
> On Wednesday 04 November 2009 16:11:33 John W. Linville wrote:

> > This seems like a band-aid. If anything, the original order would
> > seem to make more sense.
>
> really?
>
> take this code from led-class.c
>
> void led_classdev_unregister(struct led_classdev *led_cdev)
> {
> device_remove_file(led_cdev->dev, &dev_attr_max_brightness);
> device_remove_file(led_cdev->dev, &dev_attr_brightness);
> #ifdef CONFIG_LEDS_TRIGGERS
> device_remove_file(led_cdev->dev, &dev_attr_trigger);
> down_write(&led_cdev->trigger_lock);
> if (led_cdev->trigger)
> led_trigger_set(led_cdev, NULL);
> up_write(&led_cdev->trigger_lock);
> #endif
>
> device_unregister(led_cdev->dev);
>
> down_write(&leds_list_lock);
> list_del(&led_cdev->node);
> up_write(&leds_list_lock);
> }
>
> as you can see the led is switched-off right before the device is unregistered.
> but rtl8187, p54 & ar9170 led-triggers are timed & asynchronous. So
> we really need a cancel_delayed_work_sync after the unregister routine
> finished... else the timed trigger might fire when the device/module
> is _faded_ from memory.

OK, I got it...the unregister can queue-up more work. Thanks for
the explanation.

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-11-04 15:30:22

by Christian Lamparter

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On Wednesday 04 November 2009 16:11:33 John W. Linville wrote:
> On Wed, Nov 04, 2009 at 12:00:25AM -0600, Larry Finger wrote:
> > As reported by Rick Farina ([email protected]), removing the RTL8187 USB
> > stick, or unloading the driver rtl8187 using rmmod will cause a kernel oops.
> > There are at least two forms of the failure, (1) BUG: Scheduling while atomic,
> > and (2) a fatal kernel page fault. This problem is reported in Bugzilla #14539.
> >
> > This problem does not occur for kernel 2.6.31, but does for 2.6.32-rc2, thus
> > it is technically a regression; however, bisection did not locate any faulty
> > patch. The fix was found by comparing the faulty code in rtl8187 with p54usb.
> > My interpretation is that the handling of work queues in mac80211 changed
> > enough to the LEDs to be unregistered before tasks on the work queues are
> > cancelled. Previously, these actions could be done in either order.
> >
> > Signed-off-by: Larry Finger <[email protected]>
> > Reported-and-tested by: Rick Farina <[email protected]>
> > ---
> >
> > John,
> >
> > This is 2.6.32 material. Sorry to take so long to get a patch, but it was
> > difficult for me to locate the problem. Fortunately, I had the postings of the
> > two flame wars to amuse me while all the kernel compilations were happening.
> >
> > Larry
> > ---
> >
> > Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
> > ===================================================================
> > --- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187_leds.c
> > +++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
> > @@ -210,10 +210,10 @@ void rtl8187_leds_exit(struct ieee80211_
> >
> > /* turn the LED off before exiting */
> > ieee80211_queue_delayed_work(dev, &priv->led_off, 0);
> > - cancel_delayed_work_sync(&priv->led_off);
> > - cancel_delayed_work_sync(&priv->led_on);
> > rtl8187_unregister_led(&priv->led_rx);
> > rtl8187_unregister_led(&priv->led_tx);
> > + cancel_delayed_work_sync(&priv->led_off);
> > + cancel_delayed_work_sync(&priv->led_on);
> > }
> > #endif /* def CONFIG_RTL8187_LED */
> >
>
> This seems like a band-aid. If anything, the original order would
> seem to make more sense.

really?

take this code from led-class.c

void led_classdev_unregister(struct led_classdev *led_cdev)
{
device_remove_file(led_cdev->dev, &dev_attr_max_brightness);
device_remove_file(led_cdev->dev, &dev_attr_brightness);
#ifdef CONFIG_LEDS_TRIGGERS
device_remove_file(led_cdev->dev, &dev_attr_trigger);
down_write(&led_cdev->trigger_lock);
if (led_cdev->trigger)
led_trigger_set(led_cdev, NULL);
up_write(&led_cdev->trigger_lock);
#endif

device_unregister(led_cdev->dev);

down_write(&leds_list_lock);
list_del(&led_cdev->node);
up_write(&leds_list_lock);
}

as you can see the led is switched-off right before the device is unregistered.
but rtl8187, p54 & ar9170 led-triggers are timed & asynchronous. So
we really need a cancel_delayed_work_sync after the unregister routine
finished... else the timed trigger might fire when the device/module
is _faded_ from memory.


2009-11-04 15:54:38

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On 11/04/2009 09:11 AM, John W. Linville wrote:
> On Wed, Nov 04, 2009 at 12:00:25AM -0600, Larry Finger wrote:
>> As reported by Rick Farina ([email protected]), removing the RTL8187 USB
>> stick, or unloading the driver rtl8187 using rmmod will cause a kernel oops.
>> There are at least two forms of the failure, (1) BUG: Scheduling while atomic,
>> and (2) a fatal kernel page fault. This problem is reported in Bugzilla #14539.
>>
>> This problem does not occur for kernel 2.6.31, but does for 2.6.32-rc2, thus
>> it is technically a regression; however, bisection did not locate any faulty
>> patch. The fix was found by comparing the faulty code in rtl8187 with p54usb.
>> My interpretation is that the handling of work queues in mac80211 changed
>> enough to the LEDs to be unregistered before tasks on the work queues are
>> cancelled. Previously, these actions could be done in either order.
>>
>> Signed-off-by: Larry Finger <[email protected]>
>> Reported-and-tested by: Rick Farina <[email protected]>
>> ---
>>
>> John,
>>
>> This is 2.6.32 material. Sorry to take so long to get a patch, but it was
>> difficult for me to locate the problem. Fortunately, I had the postings of the
>> two flame wars to amuse me while all the kernel compilations were happening.
>>
>> Larry
>> ---
>>
>> Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
>> ===================================================================
>> --- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187_leds.c
>> +++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
>> @@ -210,10 +210,10 @@ void rtl8187_leds_exit(struct ieee80211_
>>
>> /* turn the LED off before exiting */
>> ieee80211_queue_delayed_work(dev, &priv->led_off, 0);
>> - cancel_delayed_work_sync(&priv->led_off);
>> - cancel_delayed_work_sync(&priv->led_on);
>> rtl8187_unregister_led(&priv->led_rx);
>> rtl8187_unregister_led(&priv->led_tx);
>> + cancel_delayed_work_sync(&priv->led_off);
>> + cancel_delayed_work_sync(&priv->led_on);
>> }
>> #endif /* def CONFIG_RTL8187_LED */
>>
>
> This seems like a band-aid. If anything, the original order would
> seem to make more sense.
>
> Do you have a link to the original backtrace? I don't see one in
> the bugzilla entry.

I agree that the original order makes more sense, which is why I coded
it that way in the first place; however, something changed during the
post-2.6.31 merge period. I tried to bisect the regression, but gave
up after 4 days of trying. I kept ending up where all the remaining
commits referred to drivers I'm not even using.

I don't have a full backtrace as I have had no success with
netconsole. My hand notes have only limited trace info, but I did note
that none of the rtl8187 or mac80211 routines are mentioned in any
trace I've seen. In the one in my notes, the process that crashed was
ifdown with a "scheduling while atomic" BUG.

I will try once more to get netconsole working to capture the backtrace.

Larry

2009-11-05 05:16:11

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On 11/04/2009 10:55 PM, Richard Farina wrote:
> Using kernel 2.6.29 and compat-wireless-stable 2.6.31* I am able to make
> it flake. I may be the only one lucky enough to make this explode with
> a high degree of accuracy but I can't see why we wouldn't fix what we
> know is a bug waiting to happen. My vote is to add this patch to
> 2.6.31.x as well.

With compat-wireless-stable 2.6.31, you essentially have the wireless
drivers of 2.6.32, which does have the problem. I could not make the
problem happen with vanilla 2.6.31, which is why I do not think we
need to send the fix to 2.6.31.Y. Of course, your system is much more
sensitive than mine. If you get the kernel panic with 2.6.31, then we
will need to reconsider. As openSUSE 11.2 will be out in a little over
a week using the 2.6.31 kernel, the problem may show up there.

Larry

Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

Em Qua 04 Nov 2009, ?s 14:49:38, John W. Linville escreveu:
> On Wed, Nov 04, 2009 at 04:30:19PM +0100, Christian Lamparter wrote:
> > On Wednesday 04 November 2009 16:11:33 John W. Linville wrote:
>
> > > This seems like a band-aid. If anything, the original order would
> > > seem to make more sense.
> >
> > really?
> >
> > take this code from led-class.c
> >
> > void led_classdev_unregister(struct led_classdev *led_cdev)
> > {
> > device_remove_file(led_cdev->dev, &dev_attr_max_brightness);
> > device_remove_file(led_cdev->dev, &dev_attr_brightness);
> > #ifdef CONFIG_LEDS_TRIGGERS
> > device_remove_file(led_cdev->dev, &dev_attr_trigger);
> > down_write(&led_cdev->trigger_lock);
> > if (led_cdev->trigger)
> > led_trigger_set(led_cdev, NULL);
> > up_write(&led_cdev->trigger_lock);
> > #endif
> >
> > device_unregister(led_cdev->dev);
> >
> > down_write(&leds_list_lock);
> > list_del(&led_cdev->node);
> > up_write(&leds_list_lock);
> > }
> >
> > as you can see the led is switched-off right before the device is unregistered.
> > but rtl8187, p54 & ar9170 led-triggers are timed & asynchronous. So
> > we really need a cancel_delayed_work_sync after the unregister routine
> > finished... else the timed trigger might fire when the device/module
> > is _faded_ from memory.
>
> OK, I got it...the unregister can queue-up more work. Thanks for
> the explanation.

Hi, I checked here and the code above in led_classdev_unregister is the same in
2.6.31, so I think the patch from Larry should also be applied/sent to 2.6.31.x
stable, as the bug could happen there too.

>
> John
>

--
[]'s
Herton

2009-11-05 06:00:23

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On 11/04/2009 12:47 PM, John W. Linville wrote:
>
> They are musings -- but feel free to be inspired! :-)
>

I was somewhat inspired and tried the following alternative patch:

++++++++++++++++++++++++++++++++++++++++++++++++\

Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
===================================================================
--- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187_leds.c
+++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
@@ -107,6 +107,9 @@ static void rtl8187_led_brightness_set(s
struct ieee80211_hw *hw = led->dev;
struct rtl8187_priv *priv = hw->priv;

+ /* detect shutting down */
+ if (priv->leds_off)
+ return;
if (brightness == LED_OFF) {
ieee80211_queue_delayed_work(hw, &priv->led_off, 0);
/* The LED is off for 1/20 sec so that it just blinks. */
@@ -179,6 +182,7 @@ void rtl8187_leds_init(struct ieee80211_
ledpin = LED_PIN_GPIO0;
}

+ priv->leds_off = false;
INIT_DELAYED_WORK(&priv->led_on, led_turn_on);
INIT_DELAYED_WORK(&priv->led_off, led_turn_off);

@@ -210,6 +214,7 @@ void rtl8187_leds_exit(struct ieee80211_

/* turn the LED off before exiting */
ieee80211_queue_delayed_work(dev, &priv->led_off, 0);
+ priv->leds_off = true;
cancel_delayed_work_sync(&priv->led_off);
cancel_delayed_work_sync(&priv->led_on);
rtl8187_unregister_led(&priv->led_rx);
Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187.h
===================================================================
--- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187.h
+++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187.h
@@ -134,6 +134,7 @@ struct rtl8187_priv {
__le32 bits32;
} *io_dmabuf;
bool rfkill_off;
+ bool leds_off;
};

void rtl8187_write_phy(struct ieee80211_hw *dev, u8 addr, u32 data);

++++++++++++++++++++++++++++++++++++++++++++++

Yes, I know it is white-space damaged. I do not want it being applied!

By adding a new variable that is set whenever the LED system is being
shutdown, the kernel no longer panics; however, the LED is not turned
off on shutdown, further confirmation that USB transfers take a long
time to happen. I could add an msleep() between queuing the led_off
work and setting the leds_off flag, but this just seems like extra
complexity. I think we should go with the counter-intuitive order on
the shutdown.

Larry

2009-11-04 15:15:10

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On Wed, Nov 04, 2009 at 12:00:25AM -0600, Larry Finger wrote:
> As reported by Rick Farina ([email protected]), removing the RTL8187 USB
> stick, or unloading the driver rtl8187 using rmmod will cause a kernel oops.
> There are at least two forms of the failure, (1) BUG: Scheduling while atomic,
> and (2) a fatal kernel page fault. This problem is reported in Bugzilla #14539.
>
> This problem does not occur for kernel 2.6.31, but does for 2.6.32-rc2, thus
> it is technically a regression; however, bisection did not locate any faulty
> patch. The fix was found by comparing the faulty code in rtl8187 with p54usb.
> My interpretation is that the handling of work queues in mac80211 changed
> enough to the LEDs to be unregistered before tasks on the work queues are
> cancelled. Previously, these actions could be done in either order.
>
> Signed-off-by: Larry Finger <[email protected]>
> Reported-and-tested by: Rick Farina <[email protected]>
> ---
>
> John,
>
> This is 2.6.32 material. Sorry to take so long to get a patch, but it was
> difficult for me to locate the problem. Fortunately, I had the postings of the
> two flame wars to amuse me while all the kernel compilations were happening.
>
> Larry
> ---
>
> Index: wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
> ===================================================================
> --- wireless-testing.orig/drivers/net/wireless/rtl818x/rtl8187_leds.c
> +++ wireless-testing/drivers/net/wireless/rtl818x/rtl8187_leds.c
> @@ -210,10 +210,10 @@ void rtl8187_leds_exit(struct ieee80211_
>
> /* turn the LED off before exiting */
> ieee80211_queue_delayed_work(dev, &priv->led_off, 0);
> - cancel_delayed_work_sync(&priv->led_off);
> - cancel_delayed_work_sync(&priv->led_on);
> rtl8187_unregister_led(&priv->led_rx);
> rtl8187_unregister_led(&priv->led_tx);
> + cancel_delayed_work_sync(&priv->led_off);
> + cancel_delayed_work_sync(&priv->led_on);
> }
> #endif /* def CONFIG_RTL8187_LED */
>

This seems like a band-aid. If anything, the original order would
seem to make more sense.

Do you have a link to the original backtrace? I don't see one in
the bugzilla entry.

Thanks,

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-11-05 04:57:58

by Sid Hayn

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

John W. Linville wrote:
> On Wed, Nov 04, 2009 at 12:13:02PM -0600, Larry Finger wrote:
>
>> On 11/04/2009 10:54 AM, John W. Linville wrote:
>>
>>> On Wed, Nov 04, 2009 at 09:54:37AM -0600, Larry Finger wrote:
>>>
>>>
>>>> I will try once more to get netconsole working to capture the backtrace.
>>>>
>>> No need, I think I understand it now. The new order still sorta
>>> looks/feels "wrong", but it seems fine. Maybe an alternative would
>>> be to make the brightness_set routine aware of the shutdown and not
>>> queue the work? Maybe even ieee80211_queue_delayed_work could be
>>> made a bit smarter here?
>>>
>> Are either of these questions a request, or are they musings?
>>
>
> They are musings -- but feel free to be inspired! :-)
>
>
I can't speak for "proper coding style" or "preferred fixes" or whatever
you awesome coders call things, but I can say that myself and a few
others have extensively test _this_ fix and can no longer crash the
kernel. As the original reporter I am more than satisfied that this is
fixed. Just my 0.02$

Thanks,
Rick Farina


2009-11-04 18:13:03

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On 11/04/2009 10:54 AM, John W. Linville wrote:
> On Wed, Nov 04, 2009 at 09:54:37AM -0600, Larry Finger wrote:
>
>> I will try once more to get netconsole working to capture the backtrace.
>
> No need, I think I understand it now. The new order still sorta
> looks/feels "wrong", but it seems fine. Maybe an alternative would
> be to make the brightness_set routine aware of the shutdown and not
> queue the work? Maybe even ieee80211_queue_delayed_work could be
> made a bit smarter here?

Are either of these questions a request, or are they musings?

Larry

2009-11-04 17:00:16

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On Wed, Nov 04, 2009 at 09:54:37AM -0600, Larry Finger wrote:

> I will try once more to get netconsole working to capture the backtrace.

No need, I think I understand it now. The new order still sorta
looks/feels "wrong", but it seems fine. Maybe an alternative would
be to make the brightness_set routine aware of the shutdown and not
queue the work? Maybe even ieee80211_queue_delayed_work could be
made a bit smarter here?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-11-05 02:34:11

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rtl8187: Fix kernel oops when device is removed when LEDS enabled (Bugzilla #14539)

On 11/04/2009 06:14 PM, Herton Ronaldo Krzesinski wrote:
>
> Hi, I checked here and the code above in led_classdev_unregister is the same in
> 2.6.31, so I think the patch from Larry should also be applied/sent to 2.6.31.x
> stable, as the bug could happen there too.

Herton,

Technically you are correct; however, I did extensive testing of
2.6.31 and _NEVER_ got the failure. The sequencing is a bug waiting to
happen, but something in the post 2.6.31 merge actually enabled the
bug to happen. I tried bisection to determine which change actully did
that, but was not successful.

My feeling is that stable can be left alone. Of course, if bug reports
of kernel panics on unload of rtl8187 start occurring, we will know
how to fix it.

Larry