2009-07-29 10:01:48

by Alan Jenkins

[permalink] [raw]
Subject: Re: eeepc_hotkey rmmod issues

On 7/28/09, Corentin Chary <[email protected]> wrote:
> On Tue, Jul 28, 2009 at 9:19 PM, Alan
> Jenkins<[email protected]> wrote:
>> On 7/28/09, Luciano Rocha <[email protected]> wrote:
>>> On Tue, Jul 28, 2009 at 05:50:26PM +0100, Luciano Rocha wrote:
>>>> On Mon, Jul 27, 2009 at 10:45:14PM +0300, Pekka Enberg wrote:
>>>> > (Adding Corentin to cc)
>>>> >
>>>> > On Mon, Jul 27, 2009 at 10:27 PM, Luciano Rocha<[email protected]>
>>>> > wrote:
>>>> > > Also, a "rmmod eeepc_hotkeys" resulted in a kernel panic. If asked,
>>>> > > I'll
>>>> > > try to replicate it.
>>>> >
>>>> > Yes, please.
>>>>
>>>> Hm, rebooted without i2c_i801, browsed some, then did a rmmod
>>>> eeepc_laptop:
>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>
>>>> Two equal lines, yes. What does it mean?
>>>
>>> Nevermind, the wireless driver didn't like that the hardware
>>> disappeared.
>>
>> Thanks for the bug report anyway :-).
>>
>> So presumably this is what caused your oops earlier. I assume the
>> wireless toggle button doesn't normally cause any errors.
>>
>> The new rfkill core in 2.6.31 should avoid triggering this bug. The
>> new core won't disable the wireless when the eeepc-laptop module is
>> removed.
>>
>> But we should still fix the underlying problem. It sounds like
>> there's a narrow danger window on module unload. And it's still there
>> in 2.6.31-rc4:
>>
>> 1019 static void eeepc_rfkill_exit(void)
>> 1020 {
>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>> 1023 if (ehotk->wlan_rfkill)
>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>
>> Really we need to perform these unregistrations "at the same time".
>> The rfkill device relies on the notifier, but the notifier callback
>> also uses the rfkill device. I guess we will need to a mutex to
>> synchronize unregistration (and registration).
>
> I think 2.6.31 is ok,

> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
> rfkill_free, which was an error because
> the notifier callback uses the rfkill device.

Ok. I don't see how that causes Luciano's errors. So I guess he was
right to blame the wireless driver.

> But I believe that the rfkill device can work without the notifier
> (which is an acpi notifier).

I don't think it can.

If the rfkill device is set to "soft blocked", the pci device is
removed. If the acpi notifier is not called, the pci driver (e.g.
ath5k) won't realise the device is gone. The network device (e.g.
wlan0) will remain present, but it won't work.

So I believe there's a circular dependency which we need to resolve.
Would you like me to write a patch for it?

Regards
Alan


2009-07-29 10:26:28

by Corentin Chary

[permalink] [raw]
Subject: Re: eeepc_hotkey rmmod issues

On Wed, Jul 29, 2009 at 12:01 PM, Alan
Jenkins<[email protected]> wrote:
> On 7/28/09, Corentin Chary <[email protected]> wrote:
>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>> Jenkins<[email protected]> wrote:
>>> On 7/28/09, Luciano Rocha <[email protected]> wrote:
>>>> On Tue, Jul 28, 2009 at 05:50:26PM +0100, Luciano Rocha wrote:
>>>>> On Mon, Jul 27, 2009 at 10:45:14PM +0300, Pekka Enberg wrote:
>>>>> > (Adding Corentin to cc)
>>>>> >
>>>>> > On Mon, Jul 27, 2009 at 10:27 PM, Luciano Rocha<[email protected]>
>>>>> > wrote:
>>>>> > > Also, a "rmmod eeepc_hotkeys" resulted in a kernel panic. If asked,
>>>>> > > I'll
>>>>> > > try to replicate it.
>>>>> >
>>>>> > Yes, please.
>>>>>
>>>>> Hm, rebooted without i2c_i801, browsed some, then did a rmmod
>>>>> eeepc_laptop:
>>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>>
>>>>> Two equal lines, yes. What does it mean?
>>>>
>>>> Nevermind, the wireless driver didn't like that the hardware
>>>> disappeared.
>>>
>>> Thanks for the bug report anyway :-).
>>>
>>> So presumably this is what caused your oops earlier. ?I assume the
>>> wireless toggle button doesn't normally cause any errors.
>>>
>>> The new rfkill core in 2.6.31 should avoid triggering this bug. ?The
>>> new core won't disable the wireless when the eeepc-laptop module is
>>> removed.
>>>
>>> But we should still fix the underlying problem. ?It sounds like
>>> there's a narrow danger window on module unload. ?And it's still there
>>> in 2.6.31-rc4:
>>>
>>> 1019 static void eeepc_rfkill_exit(void)
>>> 1020 {
>>> 1021 ? ? ? ? eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>> 1022 ? ? ? ? eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>> 1023 ? ? ? ? if (ehotk->wlan_rfkill)
>>> 1024 ? ? ? ? ? ? ? ? rfkill_unregister(ehotk->wlan_rfkill);
>>>
>>> Really we need to perform these unregistrations "at the same time".
>>> The rfkill device relies on the notifier, but the notifier callback
>>> also uses the rfkill device. ?I guess we will need to a mutex to
>>> synchronize unregistration (and registration).
>>
>> I think 2.6.31 is ok,
>
>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>> rfkill_free, which was an error because
>> the notifier callback uses the rfkill device.
>
> Ok. ?I don't see how that causes Luciano's errors. ?So I guess he was
> right to blame the wireless driver.

If he was using 2.6.30, then :
eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
And the callback was still registered after rfkill_unregister(), *Ooops*

In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
should works.

>> But I believe that the rfkill device can work without the notifier
>> (which is an acpi notifier).
>
> I don't think it can.
>
> If the rfkill device is set to "soft blocked", the pci device is
> removed. ?If the acpi notifier is not called, the pci driver (e.g.
> ath5k) won't realise the device is gone. ?The network device (e.g.
> wlan0) will remain present, but it won't work.

Hum, there is a misunderstanding here. What I mean is : I think
eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).

And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop

Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
change in rfkill code.

> So I believe there's a circular dependency which we need to resolve.
> Would you like me to write a patch for it?

It's possible that I miss the issue here, so go ahead :)


--
Corentin Chary
http://xf.iksaif.net - http://uffs.org

2009-07-29 12:02:22

by Alan Jenkins

[permalink] [raw]
Subject: Re: eeepc_hotkey rmmod issues

On 7/29/09, Corentin Chary <[email protected]> wrote:
> On Wed, Jul 29, 2009 at 12:01 PM, Alan
> Jenkins<[email protected]> wrote:
>> On 7/28/09, Corentin Chary <[email protected]> wrote:
>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>> Jenkins<[email protected]> wrote:

>>>> But we should still fix the underlying problem. It sounds like
>>>> there's a narrow danger window on module unload. And it's still there
>>>> in 2.6.31-rc4:
>>>>
>>>> 1019 static void eeepc_rfkill_exit(void)
>>>> 1020 {
>>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>> 1023 if (ehotk->wlan_rfkill)
>>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>>
>>>> Really we need to perform these unregistrations "at the same time".
>>>> The rfkill device relies on the notifier, but the notifier callback
>>>> also uses the rfkill device. I guess we will need to a mutex to
>>>> synchronize unregistration (and registration).
>>>
>>> I think 2.6.31 is ok,
>>
>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>> rfkill_free, which was an error because
>>> the notifier callback uses the rfkill device.
>>
>> Ok. I don't see how that causes Luciano's errors. So I guess he was
>> right to blame the wireless driver.
>
> If he was using 2.6.30, then :
> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
> And the callback was still registered after rfkill_unregister(), *Ooops*
>
> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
> should works.
>
>>> But I believe that the rfkill device can work without the notifier
>>> (which is an acpi notifier).
>>
>> I don't think it can.
>>
>> If the rfkill device is set to "soft blocked", the pci device is
>> removed. If the acpi notifier is not called, the pci driver (e.g.
>> ath5k) won't realise the device is gone. The network device (e.g.
>> wlan0) will remain present, but it won't work.
>
> Hum, there is a misunderstanding here. What I mean is : I think
> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>
> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>
> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
> change in rfkill code.
>
>> So I believe there's a circular dependency which we need to resolve.
>> Would you like me to write a patch for it?
>
> It's possible that I miss the issue here, so go ahead :)

Thanks :)

Here is a test case to show the race I am talking about

diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
index ec560f1..c478db5 100644
--- a/drivers/platform/x86/eeepc-laptop.c
+++ b/drivers/platform/x86/eeepc-laptop.c
@@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
{
eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
+
+ //
+ // Simulated error
+ // Imagine that userspace set the wifi to "soft blocked" at this exact moment
+ // (or the wireless toggle key was pressed)
+ //
+ // The PCI device will disappear, but we will not see any notification
+ //
+ set_acpi(CM_ASL_WLAN, 0);
+ rfkill_set_sw_state(ehotk->wlan_rfkill, true);
+
if (ehotk->wlan_rfkill)
rfkill_unregister(ehotk->wlan_rfkill);
if (ehotk->bluetooth_rfkill)



If you unload eeepc-laptop with this simulated race, the wireless
interface stays around but stops working.

[ 191.391155] ath5k phy0: can't reset hardware (-5)
[ 191.432983] ath5k phy0: failed to wakeup the MAC Chip
[ 196.940835] __ratelimit: 21 callbacks suppressed

Alan

2009-07-29 12:15:59

by Corentin Chary

[permalink] [raw]
Subject: Re: eeepc_hotkey rmmod issues

On Wed, Jul 29, 2009 at 2:02 PM, Alan
Jenkins<[email protected]> wrote:
> On 7/29/09, Corentin Chary <[email protected]> wrote:
>> On Wed, Jul 29, 2009 at 12:01 PM, Alan
>> Jenkins<[email protected]> wrote:
>>> On 7/28/09, Corentin Chary <[email protected]> wrote:
>>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>>> Jenkins<[email protected]> wrote:
>
>>>>> But we should still fix the underlying problem. ?It sounds like
>>>>> there's a narrow danger window on module unload. ?And it's still there
>>>>> in 2.6.31-rc4:
>>>>>
>>>>> 1019 static void eeepc_rfkill_exit(void)
>>>>> 1020 {
>>>>> 1021 ? ? ? ? eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>>> 1022 ? ? ? ? eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>>> 1023 ? ? ? ? if (ehotk->wlan_rfkill)
>>>>> 1024 ? ? ? ? ? ? ? ? rfkill_unregister(ehotk->wlan_rfkill);
>>>>>
>>>>> Really we need to perform these unregistrations "at the same time".
>>>>> The rfkill device relies on the notifier, but the notifier callback
>>>>> also uses the rfkill device. ?I guess we will need to a mutex to
>>>>> synchronize unregistration (and registration).
>>>>
>>>> I think 2.6.31 is ok,
>>>
>>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>>> rfkill_free, which was an error because
>>>> the notifier callback uses the rfkill device.
>>>
>>> Ok. ?I don't see how that causes Luciano's errors. ?So I guess he was
>>> right to blame the wireless driver.
>>
>> If he was using 2.6.30, then :
>> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
>> And the callback was still registered after rfkill_unregister(), *Ooops*
>>
>> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
>> should works.
>>
>>>> But I believe that the rfkill device can work without the notifier
>>>> (which is an acpi notifier).
>>>
>>> I don't think it can.
>>>
>>> If the rfkill device is set to "soft blocked", the pci device is
>>> removed. ?If the acpi notifier is not called, the pci driver (e.g.
>>> ath5k) won't realise the device is gone. ?The network device (e.g.
>>> wlan0) will remain present, but it won't work.
>>
>> Hum, there is a misunderstanding here. What I mean is : I think
>> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>>
>> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>>
>> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
>> change in rfkill code.
>>
>>> So I believe there's a circular dependency which we need to resolve.
>>> Would you like me to write a patch for it?
>>
>> It's possible that I miss the issue here, so go ahead :)
>
> Thanks :)
>
> Here is a test case to show the race I am talking about
>
> diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
> index ec560f1..c478db5 100644
> --- a/drivers/platform/x86/eeepc-laptop.c
> +++ b/drivers/platform/x86/eeepc-laptop.c
> @@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
> ?{
> ? ? ? ?eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
> ? ? ? ?eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
> +
> + ? ? ? //
> + ? ? ? // Simulated error
> + ? ? ? // Imagine that userspace set the wifi to "soft blocked" at this exact moment
> + ? ? ? // (or the wireless toggle key was pressed)
> + ? ? ? //
> + ? ? ? // The PCI device will disappear, but we will not see any notification
> + ? ? ? //
> + ? ? ? set_acpi(CM_ASL_WLAN, 0);
> + ? ? ? rfkill_set_sw_state(ehotk->wlan_rfkill, true);
> +
> ? ? ? ?if (ehotk->wlan_rfkill)
> ? ? ? ? ? ? ? ?rfkill_unregister(ehotk->wlan_rfkill);
> ? ? ? ?if (ehotk->bluetooth_rfkill)
>
>
>
> If you unload eeepc-laptop with this simulated race, the wireless
> interface stays around but stops working.
>
> [ ?191.391155] ath5k phy0: can't reset hardware (-5)
> [ ?191.432983] ath5k phy0: failed to wakeup the MAC Chip
> [ ?196.940835] __ratelimit: 21 callbacks suppressed
>
> Alan
>

Indeed :) . Let's serialize that. Do you want me to do it ?
Thanks,

--
Corentin Chary
http://xf.iksaif.net - http://uffs.org

2009-07-29 12:17:17

by Alan Jenkins

[permalink] [raw]
Subject: Re: eeepc_hotkey rmmod issues

Corentin Chary wrote:
> On Wed, Jul 29, 2009 at 2:02 PM, Alan
> Jenkins<[email protected]> wrote:
>
>> On 7/29/09, Corentin Chary <[email protected]> wrote:
>>
>>> On Wed, Jul 29, 2009 at 12:01 PM, Alan
>>> Jenkins<[email protected]> wrote:
>>>
>>>> On 7/28/09, Corentin Chary <[email protected]> wrote:
>>>>
>>>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>>>> Jenkins<[email protected]> wrote:
>>>>>
>>>>>> But we should still fix the underlying problem. It sounds like
>>>>>> there's a narrow danger window on module unload. And it's still there
>>>>>> in 2.6.31-rc4:
>>>>>>
>>>>>> 1019 static void eeepc_rfkill_exit(void)
>>>>>> 1020 {
>>>>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>>>> 1023 if (ehotk->wlan_rfkill)
>>>>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>>>>
>>>>>> Really we need to perform these unregistrations "at the same time".
>>>>>> The rfkill device relies on the notifier, but the notifier callback
>>>>>> also uses the rfkill device. I guess we will need to a mutex to
>>>>>> synchronize unregistration (and registration).
>>>>>>
>>>>> I think 2.6.31 is ok,
>>>>>
>>>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>>>> rfkill_free, which was an error because
>>>>> the notifier callback uses the rfkill device.
>>>>>
>>>> Ok. I don't see how that causes Luciano's errors. So I guess he was
>>>> right to blame the wireless driver.
>>>>
>>> If he was using 2.6.30, then :
>>> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
>>> And the callback was still registered after rfkill_unregister(), *Ooops*
>>>
>>> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
>>> should works.
>>>
>>>
>>>>> But I believe that the rfkill device can work without the notifier
>>>>> (which is an acpi notifier).
>>>>>
>>>> I don't think it can.
>>>>
>>>> If the rfkill device is set to "soft blocked", the pci device is
>>>> removed. If the acpi notifier is not called, the pci driver (e.g.
>>>> ath5k) won't realise the device is gone. The network device (e.g.
>>>> wlan0) will remain present, but it won't work.
>>>>
>>> Hum, there is a misunderstanding here. What I mean is : I think
>>> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>>>
>>> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>>>
>>> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
>>> change in rfkill code.
>>>
>>>
>>>> So I believe there's a circular dependency which we need to resolve.
>>>> Would you like me to write a patch for it?
>>>>
>>> It's possible that I miss the issue here, so go ahead :)
>>>
>> Thanks :)
>>
>> Here is a test case to show the race I am talking about
>>
>> diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
>> index ec560f1..c478db5 100644
>> --- a/drivers/platform/x86/eeepc-laptop.c
>> +++ b/drivers/platform/x86/eeepc-laptop.c
>> @@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
>> {
>> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>> +
>> + //
>> + // Simulated error
>> + // Imagine that userspace set the wifi to "soft blocked" at this exact moment
>> + // (or the wireless toggle key was pressed)
>> + //
>> + // The PCI device will disappear, but we will not see any notification
>> + //
>> + set_acpi(CM_ASL_WLAN, 0);
>> + rfkill_set_sw_state(ehotk->wlan_rfkill, true);
>> +
>> if (ehotk->wlan_rfkill)
>> rfkill_unregister(ehotk->wlan_rfkill);
>> if (ehotk->bluetooth_rfkill)
>>
>>
>>
>> If you unload eeepc-laptop with this simulated race, the wireless
>> interface stays around but stops working.
>>
>> [ 191.391155] ath5k phy0: can't reset hardware (-5)
>> [ 191.432983] ath5k phy0: failed to wakeup the MAC Chip
>> [ 196.940835] __ratelimit: 21 callbacks suppressed
>>
>> Alan
>>
>>
>
> Indeed :) . Let's serialize that. Do you want me to do it ?
> Thanks,
>

It's ok, I'm already working on a fix.