2009-03-24 15:54:20

by TJ

[permalink] [raw]
Subject: iwl3945: Don't queue rfkill_poll work when module is exiting

Tim Gardner suggested I forward this as a possible stable-release
update. I found the problem in our current Ubuntu Jaunty tree - iwlwifi
version 1.2.26k. What follows is a copy of the patch for Jaunty
(2.6.28).

----------
Bug: #345710

When the wireless interface is active and the iwl3945 module is unloaded the
call to ieee80211_unregister_hw() would call iwl3945_mac_stop() which would
restart the delayed workqueue for rfkill_poll. That workqueue had already been
cancelled so when the next work item was run (2 seconds later) the system would
suffer a hard lock-up because the module had been unloaded by then.

This patch implements STATUS_EXIT_PENDING checks in places where the rfkill_poll
work is scheduled, and moves the final workqueue cancellation to occur after the
call to ieee80211_unregister_hw().

Bug discovered, experienced and fix tested on my PC.

Signed-off-by: TJ <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl3945-base.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index bb92db2..acaf038 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -6062,7 +6062,9 @@ static void iwl3945_rfkill_poll(struct work_struct *data)
if (test_bit(STATUS_RF_KILL_HW, &status) != test_bit(STATUS_RF_KILL_HW, &priv->status))
queue_work(priv->workqueue, &priv->rf_kill);

- queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
+ /* only queue if module isn't exiting */
+ if (! test_bit(STATUS_EXIT_PENDING, &priv->status))
+ queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
round_jiffies_relative(2 * HZ));

}
@@ -6588,7 +6590,10 @@ static void iwl3945_mac_stop(struct ieee80211_hw *hw)
flush_workqueue(priv->workqueue);

/* start polling the killswitch state again */
- queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
+
+ /* only queue if module isn't exiting */
+ if (! test_bit(STATUS_EXIT_PENDING, &priv->status))
+ queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
round_jiffies_relative(2 * HZ));

IWL_DEBUG_MAC80211("leave\n");
@@ -8166,7 +8171,6 @@ static void __devexit iwl3945_pci_remove(struct pci_dev *pdev)
sysfs_remove_group(&pdev->dev.kobj, &iwl3945_attribute_group);

iwl3945_rfkill_unregister(priv);
- cancel_delayed_work(&priv->rfkill_poll);
iwl3945_dealloc_ucode_pci(priv);

if (priv->rxq.bd)
@@ -8182,6 +8186,12 @@ static void __devexit iwl3945_pci_remove(struct pci_dev *pdev)
/*netif_stop_queue(dev); */
flush_workqueue(priv->workqueue);

+ /* ieee80211_unregister_hw calls iwl3945_mac_stop which used to restart the rfkill
+ polling. Although that now checks STATUS_EXIT_PENDING do cancel and wait for any
+ pending work to complete */
+ cancel_delayed_work(&priv->rfkill_poll);
+ cancel_work_sync(&priv->rfkill_poll);
+
/* ieee80211_unregister_hw calls iwl3945_mac_stop, which flushes
* priv->workqueue... so we can't take down the workqueue
* until now... */
--
1.6.0.4



2009-03-24 21:22:24

by TJ

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

On Tue, 2009-03-24 at 18:51 +0100, Helmut Schaa wrote:
> Am Dienstag, 24. M=C3=A4rz 2009 schrieb TJ:
> > Tim Gardner suggested I forward this as a possible stable-release
> > update. I found the problem in our current Ubuntu Jaunty tree - iwl=
wifi
> > version 1.2.26k. What follows is a copy of the patch for Jaunty
> > (2.6.28).
>=20
> The patch that introduced the rfkill polling did not hit mainline yet=
=2E It
> should pop up in 2.6.30, hence this patch is not needed for any stabl=
e
> series.
>=20
> I wasn't able to reproduce the issue by unloading iwl3945 here. Any h=
ints
> on how to reproduce this issue?

lspci -nn -s 06:00.0
06:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945A=
BG [Golan] Network Connection [8086:4222] (rev 02)

With the interface active and in use do:

sudo modprobe -r iwl3945

Approximately 2 seconds later the PC locks solid and requires a
power-off button restart.

Confirmed by another of our kernel team:

https://launchpad.net/bugs/345710

2009-03-25 03:45:04

by Huaxu Wan

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

Hi,

This issue is confirmed here too. But during compile, there is warning =
about
cancel_work_sync(), cancel_delayed_work_sync() should be used instead.

And, without the modification in iwl3945_rfkill_poll() and=20
iwl3945_mac_stop(), just with one line cancel_delayed_work_sync(), this=
issue
is also fixed in my testing.

Best regards
Huaxu

PS: The following is the log when iwl3945 was unloaded before patched.

root@Januty-devel:~# modprobe -r iwl3945
[ 164.840610] iwl3945: U iwl3945_pci_remove *** UNLOAD DRIVER ***
[ 164.846515] iwl3945: U __iwl3945_down iwl3945 is going down
[ 164.852315] iwl3945: U iwl3945_hw_nic_stop_master stop master
[ 164.858634] iwl3945: U iwl3945_clear_free_frames 0 frames on pre-all=
ocated heap on clear.
[ 164.917177] iwl3945: U __iwl3945_down iwl3945 is going down
[ 164.922735] iwl3945: U iwl3945_clear_free_frames 0 frames on pre-all=
ocated heap on clear.
[ 166.804090] BUG: unable to handle kernel paging request at 000077ff8=
b64b8c7
[ 166.808082] IP: [<ffffffff80262b62>] delayed_work_timer_fn+0x32/0x40
[ 166.808082] PGD 0=20
[ 166.808082] Oops: 0000 [#1] SMP=20
[ 166.808082] last sysfs file: /sys/module/cfg80211/refcnt
[ 166.808082] Dumping ftrace buffer:
[ 166.808082] (ftrace buffer empty)
[ 166.808082] CPU 1=20
[ 166.808082] Modules linked in: lp parport arc4 ecb psmouse iTCO_wdt =
video serio_raw pcspkr usbhid intel_agp iTCO_vendor_support output ehci=
_hcd uhci_hcd ]
[ 166.808082] Pid: 0, comm: swapper Not tainted 2.6.28.8 #1
[ 166.808082] RIP: 0010:[<ffffffff80262b62>] [<ffffffff80262b62>] del=
ayed_work_timer_fn+0x32/0x40
[ 166.808082] RSP: 0018:ffff88007b62be80 EFLAGS: 00010246
[ 166.808082] RAX: 000077ff8b64b8bf RBX: ffff8800738df7c8 RCX: ffff880=
0738df7c8
[ 166.808082] RDX: 0000000000000001 RSI: ffff8800738df7c8 RDI: ffff880=
0738df7c8
[ 166.808082] RBP: ffff88007b62be80 R08: ffff8800738df820 R09: 0000000=
000000000
[ 166.808082] R10: ffff88007b625e18 R11: 0000000000000000 R12: 0000000=
000000100
[ 166.808082] R13: ffff88007b614000 R14: ffff88007b62beb0 R15: fffffff=
f80262b30
[ 166.808082] FS: 0000000000000000(0000) GS:ffff88007bc02c00(0000) kn=
lGS:0000000000000000
[ 166.808082] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 166.808082] CR2: 000077ff8b64b8c7 CR3: 0000000000201000 CR4: 0000000=
0000006a0
[ 166.808082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
[ 166.808082] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000=
000000400
[ 166.808082] Process swapper (pid: 0, threadinfo ffff88007b624000, ta=
sk ffff88007b61c320)
[ 166.808082] Stack:
[ 166.808082] ffff88007b62bef0 ffffffff80259e39 ffff88007b615c18 ffff=
88007b615818
[ 166.808082] ffff88007b615418 ffff88007b615018 ffff88007b62beb0 ffff=
88007b62beb0
[ 166.808082] ffffffff80271efd 0000000000000008 0000000000000001 0000=
000000000100
[ 166.808082] Call Trace:
[ 166.808082] <IRQ> <0> [<ffffffff80259e39>] run_timer_softirq+0x179/=
0x260
[ 166.808082] [<ffffffff80271efd>] ? tick_handle_oneshot_broadcast+0x=
ed/0x100
[ 166.808082] [<ffffffff80254cfc>] __do_softirq+0x9c/0x170
[ 166.808082] [<ffffffff80213cac>] call_softirq+0x1c/0x30
[ 166.808082] [<ffffffff80214f0d>] do_softirq+0x5d/0xa0
[ 166.808082] [<ffffffff80254a7d>] irq_exit+0x8d/0xa0
[ 166.808082] [<ffffffff802151d5>] do_IRQ+0xc5/0x110
[ 166.808082] [<ffffffff80212b13>] ret_from_intr+0x0/0x29
[ 166.808082] <EOI> <0> [<ffffffff8046d906>] ? acpi_idle_enter_simple=
+0x166/0x1a4
[ 166.808082] [<ffffffff8046d8fe>] ? acpi_idle_enter_simple+0x15e/0x1=
a4
[ 166.808082] [<ffffffff805745a5>] ? cpuidle_idle_call+0xa5/0x100
[ 166.808082] [<ffffffff80210e85>] ? cpu_idle+0x65/0xc0
[ 166.808082] [<ffffffff8067e6fd>] ? start_secondary+0x168/0x1bb
[ 166.808082] Code: 65 8b 14 25 24 00 00 00 48 89 e5 48 83 e0 fc 48 8b=
40 38 8b 70 20 48 8b 00 85 f6 0f 45 15 ff c9 72 00 48 f7 d0 48 89 ce 4=
8 63 d2 <48> 8=20
[ 166.808082] RIP [<ffffffff80262b62>] delayed_work_timer_fn+0x32/0x4=
0
[ 166.808082] RSP <ffff88007b62be80>
[ 166.808082] CR2: 000077ff8b64b8c7
[ 167.086916] Kernel panic - not syncing: Fatal exception in interrupt


On 21:22 Tue 24 Mar, TJ wrote:
> On Tue, 2009-03-24 at 18:51 +0100, Helmut Schaa wrote:
> > Am Dienstag, 24. M=E4rz 2009 schrieb TJ:
> > > Tim Gardner suggested I forward this as a possible stable-release
> > > update. I found the problem in our current Ubuntu Jaunty tree - i=
wlwifi
> > > version 1.2.26k. What follows is a copy of the patch for Jaunty
> > > (2.6.28).
> >=20
> > The patch that introduced the rfkill polling did not hit mainline y=
et. It
> > should pop up in 2.6.30, hence this patch is not needed for any sta=
ble
> > series.
> >=20
> > I wasn't able to reproduce the issue by unloading iwl3945 here. Any=
hints
> > on how to reproduce this issue?
>=20
> lspci -nn -s 06:00.0
> 06:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 394=
5ABG [Golan] Network Connection [8086:4222] (rev 02)
>=20
> With the interface active and in use do:
>=20
> sudo modprobe -r iwl3945
>=20
> Approximately 2 seconds later the PC locks solid and requires a
> power-off button restart.
>=20
> Confirmed by another of our kernel team:
>=20
> https://launchpad.net/bugs/345710
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wirel=
ess" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-25 04:31:22

by TJ

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

On Wed, 2009-03-25 at 11:41 +0800, Huaxu Wan wrote:
> Hi,
>
> This issue is confirmed here too. But during compile, there is warning about
> cancel_work_sync(), cancel_delayed_work_sync() should be used instead.
>
> And, without the modification in iwl3945_rfkill_poll() and
> iwl3945_mac_stop(), just with one line cancel_delayed_work_sync(), this issue
> is also fixed in my testing.

Originally I did try adding just cancel_delayed_work_sync() although I
put it immediately following:

iwl3945_rfkill_unregister(priv);
cancel_delayed_work(&priv->rfkill_poll);

Wondered why that didn't work and hunted down the unnecessary calls to
queue_delayed_work() that were causing the issue and made them
conditional on STATUS_EXIT_PENDING.

Then I decided to be absolutely sure and moved cancel_delayed_work()
further down and for some reason mis-used cancel_work_sync() instead of
cancel_delayed_work_sync() - not paying attention I guess!




2009-03-24 17:51:22

by Helmut Schaa

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

Am Dienstag, 24. M=E4rz 2009 schrieb TJ:
> Tim Gardner suggested I forward this as a possible stable-release
> update. I found the problem in our current Ubuntu Jaunty tree - iwlwi=
fi
> version 1.2.26k. What follows is a copy of the patch for Jaunty
> (2.6.28).

The patch that introduced the rfkill polling did not hit mainline yet. =
It
should pop up in 2.6.30, hence this patch is not needed for any stable
series.

I wasn't able to reproduce the issue by unloading iwl3945 here. Any hin=
ts
on how to reproduce this issue?

Thanks,
Helmut

> ----------
> Bug: #345710
>=20
> When the wireless interface is active and the iwl3945 module is unloa=
ded the
> call to ieee80211_unregister_hw() would call iwl3945_mac_stop() which=
would
> restart the delayed workqueue for rfkill_poll. That workqueue had alr=
eady been
> cancelled so when the next work item was run (2 seconds later) the sy=
stem would
> suffer a hard lock-up because the module had been unloaded by then.
>=20
> This patch implements STATUS_EXIT_PENDING checks in places where the =
rfkill_poll
> work is scheduled, and moves the final workqueue cancellation to occu=
r after the
> call to ieee80211_unregister_hw().
>=20
> Bug discovered, experienced and fix tested on my PC.
>=20
> Signed-off-by: TJ <[email protected]>
> ---
> drivers/net/wireless/iwlwifi/iwl3945-base.c | 16 +++++++++++++---
> 1 files changed, 13 insertions(+), 3 deletions(-)
>=20
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/ne=
t/wireless/iwlwifi/iwl3945-base.c
> index bb92db2..acaf038 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -6062,7 +6062,9 @@ static void iwl3945_rfkill_poll(struct work_str=
uct *data)
> if (test_bit(STATUS_RF_KILL_HW, &status) !=3D test_bit(STATUS_RF_KI=
LL_HW, &priv->status))
> queue_work(priv->workqueue, &priv->rf_kill);
> =20
> - queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
> + /* only queue if module isn't exiting */
> + if (! test_bit(STATUS_EXIT_PENDING, &priv->status))
> + queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
> round_jiffies_relative(2 * HZ));
> =20
> }
> @@ -6588,7 +6590,10 @@ static void iwl3945_mac_stop(struct ieee80211_=
hw *hw)
> flush_workqueue(priv->workqueue);
> =20
> /* start polling the killswitch state again */
> - queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
> +
> + /* only queue if module isn't exiting */
> + if (! test_bit(STATUS_EXIT_PENDING, &priv->status))
> + queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
> round_jiffies_relative(2 * HZ));
> =20
> IWL_DEBUG_MAC80211("leave\n");
> @@ -8166,7 +8171,6 @@ static void __devexit iwl3945_pci_remove(struct=
pci_dev *pdev)
> sysfs_remove_group(&pdev->dev.kobj, &iwl3945_attribute_group);
> =20
> iwl3945_rfkill_unregister(priv);
> - cancel_delayed_work(&priv->rfkill_poll);
> iwl3945_dealloc_ucode_pci(priv);
> =20
> if (priv->rxq.bd)
> @@ -8182,6 +8186,12 @@ static void __devexit iwl3945_pci_remove(struc=
t pci_dev *pdev)
> /*netif_stop_queue(dev); */
> flush_workqueue(priv->workqueue);
> =20
> + /* ieee80211_unregister_hw calls iwl3945_mac_stop which used to res=
tart the rfkill
> + polling. Although that now checks STATUS_EXIT_PENDING do cancel =
and wait for any
> + pending work to complete */
> + cancel_delayed_work(&priv->rfkill_poll);
> + cancel_work_sync(&priv->rfkill_poll);
> +
> /* ieee80211_unregister_hw calls iwl3945_mac_stop, which flushes
> * priv->workqueue... so we can't take down the workqueue
> * until now... */

2009-03-26 19:33:43

by Reinette Chatre

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

On Tue, 2009-03-24 at 21:31 -0700, TJ wrote:
> On Wed, 2009-03-25 at 11:41 +0800, Huaxu Wan wrote:
> > Hi,
> >
> > This issue is confirmed here too. But during compile, there is warning about
> > cancel_work_sync(), cancel_delayed_work_sync() should be used instead.
> >
> > And, without the modification in iwl3945_rfkill_poll() and
> > iwl3945_mac_stop(), just with one line cancel_delayed_work_sync(), this issue
> > is also fixed in my testing.
>
> Originally I did try adding just cancel_delayed_work_sync() although I
> put it immediately following:
>
> iwl3945_rfkill_unregister(priv);
> cancel_delayed_work(&priv->rfkill_poll);
>
> Wondered why that didn't work and hunted down the unnecessary calls to
> queue_delayed_work() that were causing the issue and made them
> conditional on STATUS_EXIT_PENDING.
>
> Then I decided to be absolutely sure and moved cancel_delayed_work()
> further down and for some reason mis-used cancel_work_sync() instead of
> cancel_delayed_work_sync() - not paying attention I guess!

Sounds like this can be done in one line ... could you please resubmit?

Thanks

Reinette


2009-04-02 22:30:27

by Reinette Chatre

[permalink] [raw]
Subject: Re: iwl3945: Don't queue rfkill_poll work when module is exiting

Hi Huaxu,

On Tue, 2009-03-24 at 20:41 -0700, Huaxu Wan wrote:
> Hi,
>
> This issue is confirmed here too. But during compile, there is warning about
> cancel_work_sync(), cancel_delayed_work_sync() should be used instead.
>
> And, without the modification in iwl3945_rfkill_poll() and
> iwl3945_mac_stop(), just with one line cancel_delayed_work_sync(), this issue
> is also fixed in my testing.

TJ disappeared. We do need this fixed, would you like to submit your
patch instead?

Thanks

Reinette