2008-10-02 07:53:44

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

"Bob Copeland" <[email protected]> wrote:
> On Wed, Oct 1, 2008 at 6:34 PM, Elias Oltmanns <[email protected]> wrote:
>> "Bob Copeland" <[email protected]> wrote:
>
>>> On Wed, Oct 1, 2008 at 5:10 PM, Elias Oltmanns <[email protected]> wrote:
>> Oh, but it is cleaned up by ath5k_stop_hw(). The issue is a different
>> one here:
>>
>> ath5k_init() == ->start()
>> ath5k_stop_hw() == ->stop()
>>
>> Since the mac80211 layer never opened a device, it won't close it
>> either. Thus, ath5k_stop_hw() does not get called on module unload. By
>> calling ath5k_init() on resume, the driver has effectively started the
>> device when it was not supposed to do so and this event is not being
>> tracked by the higher layers.
>
> Ah, yes, good analysis... and ath5k_init is scheduling the timer. I
> doubt ieee80211_opened would fly; probably the better thing to do is
> ensure that the cleanup gets called regardless of whether ath5k_stop
> gets called (it shouldn't matter since the irqs all get created in
> _attach).

Not sure I agree there. Why should calibration take place regularly even
though the interface appears to be shut down from user-space's point of
view? It simply doesn't make sense to start the interface if nobody
intends to use it. Not that I know anything about it, but I imagine that
periodic recalibration might even be a waste of power, something most
laptop users will care about.

So, if higher layers won't provide any information about what interface
is considered opened, then ath5k has to track this sort of state
information itself if it intends to resume to the exact same state of
affairs as observed at the time of going into suspend. For all I know,
other drivers will have similar issues and track the open state which is
why I suggested ieee80211_opened, but you will know much more about
these things than I do. Just as an aside, at least on my system the
following doesn't work anyway:

1. # modprobe ath5k
2. # ifup ath5k
3. # ping <ap>
4. s2ram
5. resume
6. # ping <ap>

The last ping fails with network/host unreachable -- this is with a WPA
configuration and I haven't tested anything else. So, at least I always
have to make sure that I shut down all interfaces before suspending
anyway.

Regards,

Elias



2008-10-02 16:32:21

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

"Bob Copeland" <[email protected]> wrote:
> On Thu, 2 Oct 2008 08:52:58 -0400, Bob Copeland wrote
>> > Not sure I agree there. Why should calibration take place regularly even
>
>> > though the interface appears to be shut down from user-space's point of
>> > view? It simply doesn't make sense to start the interface if nobody
>> > intends to use it.
>>
>> Hmm, yeah, you're right. So, I guess we can just do your patch except
>> with another status bit that says it's active instead of changing
>> mac80211 (b43 does something along those lines).
>
> Something like this? It's kind of ugly, but the state bit needs
> to be in the _init/_stop critical section to avoid the race there.
> This makes it match other drivers' suspend methods, but is only a
> stop-gap until we have mac80211 suspend callbacks.
>
> Jiri, Nick, any comments?
>
> Based on a patch by Elias Oltmanns. We call ath5k_init in resume even
> if we didn't previously open the device. Besides starting up the
> device unnecessarily, this also causes an OOPS on rmmod because
> mac80211 will not invoke ath5k_stop and softirqs are left running after
> the module has been unloaded. Add a new state bit, ATH5K_STAT_STARTED,
> to indicate that we have been started up.

Sorry, but I don't think this is safe. Checking and restoring the
started flag has to be protected too, otherwise there can be races
against ->stop().

Regards,

Elias

2008-10-09 02:15:43

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

[gmail keeps dropping CCs for some reason]

On Tue, Oct 07, 2008 at 10:52:22PM +0200, Elias Oltmanns wrote:
> Bitops on sc->status have to be protected by the sc->lock as soon as
> ieee80211_register_hw() has been called.

Agreed...

> + mutex_lock(&sc->lock);
> + ath5k_init_leds(sc);

I'd rather leave ath5k_init_leds in attach so it somewhat matches up
with detach, but yeah I suppose it could use locking due to a
probe/start race.

The LED flag is not really a status flag compared to the rest, it just
says whether the hardware supports it or not. Another approach would
be to separate it from the others.

> @@ -2881,12 +2883,14 @@ static void ath5k_configure_filter(struct ieee80211_hw *hw,
> AR5K_RX_FILTER_MCAST);
>
> if (changed_flags & (FIF_PROMISC_IN_BSS | FIF_OTHER_BSS)) {
> + mutex_lock(&sc->lock);

Unfortunately, I don't believe configure_filter can sleep :(

--
Bob Copeland %% http://www.bobcopeland.com


2008-10-07 13:06:29

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Tue, Oct
Not sure 07, 2008 at 12:44:58PM +0200, Elias Oltmanns wrote:
> Looking through the code, I'm wondering about the atomicity requirements
> of sc->status. In my opinion, __set_bit() is not permissible in various
> places (including your use case). But since this is a problem that has
> been around before, I will send a separate patch once yours has been
> merged.

Ok, I don't see why it's a problem in this case, but I'll look for
your patch. Note we should be doing sc->status updates under a mutex
already; if not that's a bug.

--
Bob Copeland %% http://www.bobcopeland.com


2008-10-07 10:45:19

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

Bob Copeland <[email protected]> wrote:
[...]
> Subject: [PATCH] ath5k: correct hardware startup sequence in resume
>
> Based on a patch by Elias Oltmanns, we call ath5k_init in resume even
> if we didn't previously open the device. Besides starting up the
> device unnecessarily, this also causes an oops on rmmod because
> mac80211 will not invoke ath5k_stop and softirqs are left running after
> the module has been unloaded. Add a new state bit, ATH_STAT_STARTED,
> to indicate that we have been started up.
>
> Signed-off-by: Bob Copeland <[email protected]>
> ---
> drivers/net/wireless/ath5k/base.c | 28 ++++++++++++++++++++--------
> drivers/net/wireless/ath5k/base.h | 3 ++-
> 2 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
> index c151588..5388de8 100644
> --- a/drivers/net/wireless/ath5k/base.c
> +++ b/drivers/net/wireless/ath5k/base.c
[...]
> @@ -2248,7 +2257,7 @@ ath5k_stop_locked(struct ath5k_softc *sc)
> * stop is preempted).
> */
> static int
> -ath5k_stop_hw(struct ath5k_softc *sc)
> +ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)
^^^^^^^^^^^^^
That should be is_suspend for the sake of consistency.

> {
> int ret;
>
> @@ -2280,6 +2289,9 @@ ath5k_stop_hw(struct ath5k_softc *sc)
> }
> ath5k_txbuf_free(sc, sc->bbuf);
> mmiowb();
> +
> + if (update_status)
> + __clear_bit(ATH_STAT_STARTED, sc->status);

This cannot possibly be right. The condition has to be

if (!is_suspend)

> mutex_unlock(&sc->lock);
>
> del_timer_sync(&sc->calib_tim);
> @@ -2676,12 +2688,12 @@ ath5k_reset_wake(struct ath5k_softc *sc)
>
> static int ath5k_start(struct ieee80211_hw *hw)
> {
> - return ath5k_init(hw->priv);
> + return ath5k_init(hw->priv, false);
> }
>
> static void ath5k_stop(struct ieee80211_hw *hw)
> {
> - ath5k_stop_hw(hw->priv);
> + ath5k_stop_hw(hw->priv, false);
> }
>
> static int ath5k_add_interface(struct ieee80211_hw *hw,

Looking through the code, I'm wondering about the atomicity requirements
of sc->status. In my opinion, __set_bit() is not permissible in various
places (including your use case). But since this is a problem that has
been around before, I will send a separate patch once yours has been
merged.

Regards,

Elias

2008-10-03 19:43:58

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Fri, 03 Oct 2008 16:42:17 +0200, Elias Oltmanns wrote
> I still feel uneasy about this. Granted, I haven't thought this through
> too carefully, but I'd rather not rely on the fact that ath5k_stop_hw()
> will not get called between the check for ATH_STAT_STARTED and the call
> to ath5k_init if I can help it. Perhaps you can add an argument `reinit'
> to ath5k_init() and do something like this under the mutex:
>
> if (reinit && !test_bit(ATH_STAT_STARTED, sc->status)) {
> mutex_unlock(...);
> return;
> }

Shouldn't the freezer take care of this? If userspace is suspended until
after devices are resumed, then, in a hand-wavy argument, I don't believe
->stop() could be called.

Adding a flag to ath5k_init to make sure we do everything in the mutex would
certainly ensure that there's no issues. On the other hand, I don't want to
cruft up the code too much since we know at some point mac80211 is going to
take down the interfaces in its suspend/resume callbacks, at which time the
flags can probably get tossed.

--
Bob Copeland %% http://www.bobcopeland.com



2008-10-07 20:52:34

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

Bob Copeland <[email protected]> wrote:
> On Tue, Oct
> Not sure 07, 2008 at 12:44:58PM +0200, Elias Oltmanns wrote:
>> Looking through the code, I'm wondering about the atomicity requirements
>
>> of sc->status. In my opinion, __set_bit() is not permissible in various
>> places (including your use case). But since this is a problem that has
>> been around before, I will send a separate patch once yours has been
>> merged.
>
> Ok, I don't see why it's a problem in this case, but I'll look for
> your patch. Note we should be doing sc->status updates under a mutex
> already; if not that's a bug.

Alright, in that case __set_bit() is quite sufficient, of course. In
order to enforce this policy, I'd suggest the following patch.

Regards,

Elias
---
From: Elias Oltmanns <[email protected]>
Subject: [PATCH] ath5k: Ensure atomicity of bitops on sc->status

Bitops on sc->status have to be protected by the sc->lock as soon as
ieee80211_register_hw() has been called.

Signed-off-by: Elias Oltmanns <[email protected]>
---

drivers/net/wireless/ath5k/base.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index 0676c6d..2c737da 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -490,6 +490,9 @@ ath5k_pci_probe(struct pci_dev *pdev,
if (ret)
goto err_ah;

+ mutex_lock(&sc->lock);
+ ath5k_init_leds(sc);
+
ATH5K_INFO(sc, "Atheros AR%s chip found (MAC: 0x%x, PHY: 0x%x)\n",
ath5k_chip_name(AR5K_VERSION_VER,sc->ah->ah_mac_srev),
sc->ah->ah_mac_srev,
@@ -541,6 +544,7 @@ ath5k_pci_probe(struct pci_dev *pdev,

/* ready to process interrupts */
__clear_bit(ATH_STAT_INVALID, sc->status);
+ mutex_unlock(&sc->lock);

return 0;
err_ah:
@@ -749,8 +753,6 @@ ath5k_attach(struct pci_dev *pdev, struct ieee80211_hw *hw)
goto err_queues;
}

- ath5k_init_leds(sc);
-
return 0;
err_queues:
ath5k_txq_release(sc);
@@ -2881,12 +2883,14 @@ static void ath5k_configure_filter(struct ieee80211_hw *hw,
AR5K_RX_FILTER_MCAST);

if (changed_flags & (FIF_PROMISC_IN_BSS | FIF_OTHER_BSS)) {
+ mutex_lock(&sc->lock);
if (*new_flags & FIF_PROMISC_IN_BSS) {
rfilt |= AR5K_RX_FILTER_PROM;
__set_bit(ATH_STAT_PROMISC, sc->status);
}
else
__clear_bit(ATH_STAT_PROMISC, sc->status);
+ mutex_unlock(&sc->lock);
}

/* Note, AR5K_RX_FILTER_MCAST is already enabled */

2008-10-02 15:02:42

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Thu, 2 Oct 2008 08:52:58 -0400, Bob Copeland wrote
> > Not sure I agree there. Why should calibration take place regularly even
> > though the interface appears to be shut down from user-space's point of
> > view? It simply doesn't make sense to start the interface if nobody
> > intends to use it.
>
> Hmm, yeah, you're right. So, I guess we can just do your patch except
> with another status bit that says it's active instead of changing
> mac80211 (b43 does something along those lines).

Something like this? It's kind of ugly, but the state bit needs
to be in the _init/_stop critical section to avoid the race there.
This makes it match other drivers' suspend methods, but is only a
stop-gap until we have mac80211 suspend callbacks.

Jiri, Nick, any comments?

Based on a patch by Elias Oltmanns. We call ath5k_init in resume even
if we didn't previously open the device. Besides starting up the
device unnecessarily, this also causes an OOPS on rmmod because
mac80211 will not invoke ath5k_stop and softirqs are left running after
the module has been unloaded. Add a new state bit, ATH5K_STAT_STARTED,
to indicate that we have been started up.

---
drivers/net/wireless/ath5k/base.c | 21 ++++++++++++++++++---
drivers/net/wireless/ath5k/base.h | 3 ++-
2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index e09ed2c..a40b75f 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -626,11 +626,17 @@ ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)
{
struct ieee80211_hw *hw = pci_get_drvdata(pdev);
struct ath5k_softc *sc = hw->priv;
+ int is_started;

ath5k_led_off(sc);

+ is_started = test_bit(ATH_STAT_STARTED, sc->status);
ath5k_stop_hw(sc);

+ /* restore started flag for resume */
+ if (is_started)
+ set_bit(ATH_STAT_STARTED, sc->status);
+
free_irq(pdev->irq, sc);
pci_save_state(pdev);
pci_disable_device(pdev);
@@ -666,9 +672,12 @@ ath5k_pci_resume(struct pci_dev *pdev)
goto err_no_irq;
}

- err = ath5k_init(sc);
- if (err)
- goto err_irq;
+ if (test_bit(ATH_STAT_STARTED, sc->status)) {
+ err = ath5k_init(sc);
+ if (err)
+ goto err_irq;
+ }
+
ath5k_led_enable(sc);

/*
@@ -2153,6 +2162,8 @@ ath5k_init(struct ath5k_softc *sc)

mutex_lock(&sc->lock);

+ __clear_bit(ATH_STAT_STARTED, sc->status);
+
ATH5K_DBG(sc, ATH5K_DEBUG_RESET, "mode %d\n", sc->opmode);

/*
@@ -2177,6 +2188,8 @@ ath5k_init(struct ath5k_softc *sc)
if (ret)
goto done;

+ __set_bit(ATH_STAT_STARTED, sc->status);
+
/* Set ack to be sent at low bit-rates */
ath5k_hw_set_ack_bitrate_high(sc->ah, false);

@@ -2269,6 +2282,8 @@ ath5k_stop_hw(struct ath5k_softc *sc)
}
ath5k_txbuf_free(sc, sc->bbuf);
mmiowb();
+
+ __clear_bit(ATH_STAT_STARTED, sc->status);
mutex_unlock(&sc->lock);

del_timer_sync(&sc->calib_tim);
diff --git a/drivers/net/wireless/ath5k/base.h b/drivers/net/wireless/ath5k/base.h
index 9d0b728..06d1054 100644
--- a/drivers/net/wireless/ath5k/base.h
+++ b/drivers/net/wireless/ath5k/base.h
@@ -128,11 +128,12 @@ struct ath5k_softc {
size_t desc_len; /* size of TX/RX descriptors */
u16 cachelsz; /* cache line size */

- DECLARE_BITMAP(status, 4);
+ DECLARE_BITMAP(status, 5);
#define ATH_STAT_INVALID 0 /* disable hardware accesses */
#define ATH_STAT_MRRETRY 1 /* multi-rate retry support */
#define ATH_STAT_PROMISC 2
#define ATH_STAT_LEDSOFT 3 /* enable LED gpio status */
+#define ATH_STAT_STARTED 4 /* opened & irqs enabled */

unsigned int filter_flags; /* HW flags, AR5K_RX_FILTER_* */
unsigned int curmode; /* current phy mode */
--
1.5.4.2.182.gb3092




2008-10-02 09:25:00

by Johannes Berg

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Thu, 2008-10-02 at 09:53 +0200, Elias Oltmanns wrote:

> So, if higher layers won't provide any information about what interface
> is considered opened, then ath5k has to track this sort of state
> information itself if it intends to resume to the exact same state of
> affairs as observed at the time of going into suspend.

Well, if somebody would just fix suspend/resume as I've outlined
multiple times and is on the todo list...

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 14:39:12

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Mon, 06 Oct 2008 16:23:07 +0200, Johannes Berg wrote
> On Mon, 2008-10-06 at 10:12 -0400, Bob Copeland wrote:
>
> > Nonetheless, I admit not being expert in this area, nor do I know if relying
> > on the freezer is acceptable, so someone more clueful, please chime in.
>
> It's not, it's supposed to be removed.

Great, thanks for that info.

> However, I still think ath5k shouldn't be much concerned with all this
> because mac80211 should just disable all interfaces etc. from it when
> suspending.

Agreed. I guess if no one else gets to it by then, I can take a look at it in
a couple of weeks after my GRE.

--
Bob Copeland %% http://www.bobcopeland.com



2008-10-07 12:19:13

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Tue, Oct 7, 2008 at 6:44 AM, Elias Oltmanns <[email protected]> wrote:
>> -ath5k_stop_hw(struct ath5k_softc *sc)
>> +ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)
> ^^^^^^^^^^^^^
> That should be is_suspend for the sake of consistency.

Crap, I must've sent the wrong version. Same for the broken boolean test,
I had that fixed already. Sigh, one more on the way...

--
Bob Copeland %% http://www.bobcopeland.com

2008-10-07 20:48:36

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

Bob Copeland <[email protected]> wrote:
> On Tue, Oct 07, 2008 at 12:44:58PM +0200, Elias Oltmanns wrote:
>> Bob Copeland <[email protected]> wrote:
>> > -ath5k_stop_hw(struct ath5k_softc *sc)
>> > +ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)
>
> Now with more git-add action...
>
> Subject: [PATCH] ath5k: correct hardware startup sequence in resume
>
> Based on a patch by Elias Oltmanns, we call ath5k_init in resume even
> if we didn't previously open the device. Besides starting up the
> device unnecessarily, this also causes an oops on rmmod because
> mac80211 will not invoke ath5k_stop and softirqs are left running after
> the module has been unloaded. Add a new state bit, ATH_STAT_STARTED,
> to indicate that we have been started up.
>
> Signed-off-by: Bob Copeland <[email protected]>
> ---
> drivers/net/wireless/ath5k/base.c | 28 ++++++++++++++++++++--------
> drivers/net/wireless/ath5k/base.h | 3 ++-
> 2 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
> index c151588..ae1c152 100644
> --- a/drivers/net/wireless/ath5k/base.c
> +++ b/drivers/net/wireless/ath5k/base.c
[...]
> @@ -2280,6 +2289,9 @@ ath5k_stop_hw(struct ath5k_softc *sc)
> }
> ath5k_txbuf_free(sc, sc->bbuf);
> mmiowb();
> +
> + if (!is_suspend)
> + __clear_bit(ATH_STAT_STARTED, sc->status);
> mutex_unlock(&sc->lock);

Personally, I'd prefer keeping the memory barrier right next to the
mutex_unlock(), i.e.

ath5k_txbuf_free(sc, sc->bbuf);
+ if (!is_suspend)
+ __clear_bit(ATH_STAT_STARTED, sc->status);
+
mmiowb();
mutex_unlock(&sc->lock);

even though this is purely a matter of style in this particular case.
Otherwise:

Signed-off-by: Elias Oltmanns <[email protected]>

2008-10-03 14:14:07

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Thu, 2 Oct 2008 14:37:17 -0400, Bob Copeland wrote
> On Thu, Oct 2, 2008 at 12:31 PM, Elias Oltmanns <[email protected]> wrote:
> > Sorry, but I don't think this is safe. Checking and restoring the
> > started flag has to be protected too, otherwise there can be races
> > against ->stop().
>
> Yes, it's a bit of a mess. Even if it were serialized, a ->stop()
> happening between suspend's call to hw_stop and actually powering down
> the device would clear the flag. Ho hum.

Okay, as usual I'm wrong here; it will clear the flag but we don't care
since then we just wouldn't power up on resume.

Since no one else chimed in, here's take two with more chewing gum and
baling wire to fix the suspend/stop race.

---
drivers/net/wireless/ath5k/base.c | 24 +++++++++++++++++-------
drivers/net/wireless/ath5k/base.h | 3 ++-
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index e09ed2c..7e8fa2e 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -335,7 +335,7 @@ static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah,
u32 rstamp)
/* Interrupt handling */
static int ath5k_init(struct ath5k_softc *sc);
static int ath5k_stop_locked(struct ath5k_softc *sc);
-static int ath5k_stop_hw(struct ath5k_softc *sc);
+static int ath5k_stop_hw(struct ath5k_softc *sc, bool update_status);
static irqreturn_t ath5k_intr(int irq, void *dev_id);
static void ath5k_tasklet_reset(unsigned long data);

@@ -629,7 +629,7 @@ ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)

ath5k_led_off(sc);

- ath5k_stop_hw(sc);
+ ath5k_stop_hw(sc, false);

free_irq(pdev->irq, sc);
pci_save_state(pdev);
@@ -666,9 +666,12 @@ ath5k_pci_resume(struct pci_dev *pdev)
goto err_no_irq;
}

- err = ath5k_init(sc);
- if (err)
- goto err_irq;
+ if (test_bit(ATH_STAT_STARTED, sc->status)) {
+ err = ath5k_init(sc);
+ if (err)
+ goto err_irq;
+ }
+
ath5k_led_enable(sc);

/*
@@ -2153,6 +2156,8 @@ ath5k_init(struct ath5k_softc *sc)

mutex_lock(&sc->lock);

+ __clear_bit(ATH_STAT_STARTED, sc->status);
+
ATH5K_DBG(sc, ATH5K_DEBUG_RESET, "mode %d\n", sc->opmode);

/*
@@ -2177,6 +2182,8 @@ ath5k_init(struct ath5k_softc *sc)
if (ret)
goto done;

+ __set_bit(ATH_STAT_STARTED, sc->status);
+
/* Set ack to be sent at low bit-rates */
ath5k_hw_set_ack_bitrate_high(sc->ah, false);

@@ -2237,7 +2244,7 @@ ath5k_stop_locked(struct ath5k_softc *sc)
* stop is preempted).
*/
static int
-ath5k_stop_hw(struct ath5k_softc *sc)
+ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)
{
int ret;

@@ -2269,6 +2276,9 @@ ath5k_stop_hw(struct ath5k_softc *sc)
}
ath5k_txbuf_free(sc, sc->bbuf);
mmiowb();
+
+ if (update_status)
+ __clear_bit(ATH_STAT_STARTED, sc->status);
mutex_unlock(&sc->lock);

del_timer_sync(&sc->calib_tim);
@@ -2670,7 +2680,7 @@ static int ath5k_start(struct ieee80211_hw *hw)

static void ath5k_stop(struct ieee80211_hw *hw)
{
- ath5k_stop_hw(hw->priv);
+ ath5k_stop_hw(hw->priv, true);
}

static int ath5k_add_interface(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/ath5k/base.h b/drivers/net/wireless/ath5k/base.h
index 9d0b728..06d1054 100644
--- a/drivers/net/wireless/ath5k/base.h
+++ b/drivers/net/wireless/ath5k/base.h
@@ -128,11 +128,12 @@ struct ath5k_softc {
size_t desc_len; /* size of TX/RX descriptors */
u16 cachelsz; /* cache line size */

- DECLARE_BITMAP(status, 4);
+ DECLARE_BITMAP(status, 5);
#define ATH_STAT_INVALID 0 /* disable hardware accesses */
#define ATH_STAT_MRRETRY 1 /* multi-rate retry support */
#define ATH_STAT_PROMISC 2
#define ATH_STAT_LEDSOFT 3 /* enable LED gpio status */
+#define ATH_STAT_STARTED 4 /* opened & irqs enabled */

unsigned int filter_flags; /* HW flags, AR5K_RX_FILTER_* */
unsigned int curmode; /* current phy mode */
--
1.5.4.2.182.gb3092


--
Bob Copeland %% http://www.bobcopeland.com



2008-10-06 14:12:42

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Sun, Oct 5, 2008 at 8:45 AM, Elias Oltmanns <[email protected]> wrote:
> "Bob Copeland" <[email protected]> wrote:
> Well, I'm just not too sure about that. Please note that I'm not
> concerned about user space interfering too early in the resume process,
> which is indeed prevented by the freezer. It's rather that user space
> may initiate iface shutdown immediately before tasks are frozen and due
> to heavy load or other unfortunate circumstances the kernel doesn't get
> round to serve the request before suspend. When the system rsumes, the
> kernel completes the iface shutdown procedure by calling ath5k_init().

>From my reading of the code - kernel/power/process.c and kernel/power/main.c,
all the processes on the run queue (except PF_NOFREEZE tasks) should be
stopped before suspend_devices_and_enter is called, so even in your scenario,
it shouldn't be rescheduled until thaw_processes() in suspend_finish() is
called, at which point ->resume() has already happened.

Nonetheless, I admit not being expert in this area, nor do I know if relying
on the freezer is acceptable, so someone more clueful, please chime in.

Anyway, I don't really have any strong feelings either way, so I can post
an update that adds the reinit flag, then Jiri or Nick can take their pick :)
I probably won't get around to it until tonight, so if you want to go ahead
and prepare a patch, feel free. I think the boolean arguments to _init and
_hw_stop should be symmetric though, so it's probably worth inverting the
sense of the boolean passed to _hw_stop in my previous patch.

--
Bob Copeland %% http://www.bobcopeland.com

2008-10-02 18:37:18

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Thu, Oct 2, 2008 at 12:31 PM, Elias Oltmanns <[email protected]> wrote:
> Sorry, but I don't think this is safe. Checking and restoring the
> started flag has to be protected too, otherwise there can be races
> against ->stop().

Yes, it's a bit of a mess. Even if it were serialized, a ->stop()
happening between suspend's call to hw_stop and actually powering down
the device would clear the flag. Ho hum.

--
Bob Copeland %% http://www.bobcopeland.com

2008-10-05 12:46:21

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

"Bob Copeland" <[email protected]> wrote:
> On Fri, 03 Oct 2008 16:42:17 +0200, Elias Oltmanns wrote
>> I still feel uneasy about this. Granted, I haven't thought this through
>
>> too carefully, but I'd rather not rely on the fact that ath5k_stop_hw()
>> will not get called between the check for ATH_STAT_STARTED and the call
>> to ath5k_init if I can help it. Perhaps you can add an argument `reinit'
>> to ath5k_init() and do something like this under the mutex:
>>
>> if (reinit && !test_bit(ATH_STAT_STARTED, sc->status)) {
>> mutex_unlock(...);
>> return;
>> }
>
> Shouldn't the freezer take care of this? If userspace is suspended until
> after devices are resumed, then, in a hand-wavy argument, I don't believe
> ->stop() could be called.

Well, I'm just not too sure about that. Please note that I'm not
concerned about user space interfering too early in the resume process,
which is indeed prevented by the freezer. It's rather that user space
may initiate iface shutdown immediately before tasks are frozen and due
to heavy load or other unfortunate circumstances the kernel doesn't get
round to serve the request before suspend. When the system rsumes, the
kernel completes the iface shutdown procedure by calling ath5k_init().

>
> Adding a flag to ath5k_init to make sure we do everything in the mutex
>would
> certainly ensure that there's no issues. On the other hand, I don't want to
> cruft up the code too much since we know at some point mac80211 is going to
> take down the interfaces in its suspend/resume callbacks, at which time the
> flags can probably get tossed.

ath5k_init() is declared statically, so I don't see why adding an
argument which may well be dropped again later should be too much
trouble.

Regards,

Elias

2008-10-07 01:35:50

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Sun, Oct 05, 2008 at 02:45:54PM +0200, Elias Oltmanns wrote:
> ath5k_init() is declared statically, so I don't see why adding an
> argument which may well be dropped again later should be too much
> trouble.

Ok here's take 3, I gave it a brief bit of testing, seems to work fine.
If you agree can I get your signed-off-by, Elias? Toralf, do you want
to add your reported-by?

Subject: [PATCH] ath5k: correct hardware startup sequence in resume

Based on a patch by Elias Oltmanns, we call ath5k_init in resume even
if we didn't previously open the device. Besides starting up the
device unnecessarily, this also causes an oops on rmmod because
mac80211 will not invoke ath5k_stop and softirqs are left running after
the module has been unloaded. Add a new state bit, ATH_STAT_STARTED,
to indicate that we have been started up.

Signed-off-by: Bob Copeland <[email protected]>
---
drivers/net/wireless/ath5k/base.c | 28 ++++++++++++++++++++--------
drivers/net/wireless/ath5k/base.h | 3 ++-
2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index c151588..5388de8 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -340,9 +340,9 @@ static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
}

/* Interrupt handling */
-static int ath5k_init(struct ath5k_softc *sc);
+static int ath5k_init(struct ath5k_softc *sc, bool is_resume);
static int ath5k_stop_locked(struct ath5k_softc *sc);
-static int ath5k_stop_hw(struct ath5k_softc *sc);
+static int ath5k_stop_hw(struct ath5k_softc *sc, bool is_suspend);
static irqreturn_t ath5k_intr(int irq, void *dev_id);
static void ath5k_tasklet_reset(unsigned long data);

@@ -640,7 +640,7 @@ ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)

ath5k_led_off(sc);

- ath5k_stop_hw(sc);
+ ath5k_stop_hw(sc, true);

free_irq(pdev->irq, sc);
pci_save_state(pdev);
@@ -677,9 +677,10 @@ ath5k_pci_resume(struct pci_dev *pdev)
goto err_no_irq;
}

- err = ath5k_init(sc);
+ err = ath5k_init(sc, true);
if (err)
goto err_irq;
+
ath5k_led_enable(sc);

/*
@@ -2158,12 +2159,17 @@ ath5k_beacon_config(struct ath5k_softc *sc)
\********************/

static int
-ath5k_init(struct ath5k_softc *sc)
+ath5k_init(struct ath5k_softc *sc, bool is_resume)
{
int ret;

mutex_lock(&sc->lock);

+ if (is_resume && !test_bit(ATH_STAT_STARTED, sc->status))
+ goto out_ok;
+
+ __clear_bit(ATH_STAT_STARTED, sc->status);
+
ATH5K_DBG(sc, ATH5K_DEBUG_RESET, "mode %d\n", sc->opmode);

/*
@@ -2188,12 +2194,15 @@ ath5k_init(struct ath5k_softc *sc)
if (ret)
goto done;

+ __set_bit(ATH_STAT_STARTED, sc->status);
+
/* Set ack to be sent at low bit-rates */
ath5k_hw_set_ack_bitrate_high(sc->ah, false);

mod_timer(&sc->calib_tim, round_jiffies(jiffies +
msecs_to_jiffies(ath5k_calinterval * 1000)));

+out_ok:
ret = 0;
done:
mmiowb();
@@ -2248,7 +2257,7 @@ ath5k_stop_locked(struct ath5k_softc *sc)
* stop is preempted).
*/
static int
-ath5k_stop_hw(struct ath5k_softc *sc)
+ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)
{
int ret;

@@ -2280,6 +2289,9 @@ ath5k_stop_hw(struct ath5k_softc *sc)
}
ath5k_txbuf_free(sc, sc->bbuf);
mmiowb();
+
+ if (update_status)
+ __clear_bit(ATH_STAT_STARTED, sc->status);
mutex_unlock(&sc->lock);

del_timer_sync(&sc->calib_tim);
@@ -2676,12 +2688,12 @@ ath5k_reset_wake(struct ath5k_softc *sc)

static int ath5k_start(struct ieee80211_hw *hw)
{
- return ath5k_init(hw->priv);
+ return ath5k_init(hw->priv, false);
}

static void ath5k_stop(struct ieee80211_hw *hw)
{
- ath5k_stop_hw(hw->priv);
+ ath5k_stop_hw(hw->priv, false);
}

static int ath5k_add_interface(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/ath5k/base.h b/drivers/net/wireless/ath5k/base.h
index 9d0b728..06d1054 100644
--- a/drivers/net/wireless/ath5k/base.h
+++ b/drivers/net/wireless/ath5k/base.h
@@ -128,11 +128,12 @@ struct ath5k_softc {
size_t desc_len; /* size of TX/RX descriptors */
u16 cachelsz; /* cache line size */

- DECLARE_BITMAP(status, 4);
+ DECLARE_BITMAP(status, 5);
#define ATH_STAT_INVALID 0 /* disable hardware accesses */
#define ATH_STAT_MRRETRY 1 /* multi-rate retry support */
#define ATH_STAT_PROMISC 2
#define ATH_STAT_LEDSOFT 3 /* enable LED gpio status */
+#define ATH_STAT_STARTED 4 /* opened & irqs enabled */

unsigned int filter_flags; /* HW flags, AR5K_RX_FILTER_* */
unsigned int curmode; /* current phy mode */
--
1.5.4.2.182.gb3092


--
Bob Copeland %% http://www.bobcopeland.com


2008-10-07 12:58:09

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Tue, Oct 07, 2008 at 12:44:58PM +0200, Elias Oltmanns wrote:
> Bob Copeland <[email protected]> wrote:
> > -ath5k_stop_hw(struct ath5k_softc *sc)
> > +ath5k_stop_hw(struct ath5k_softc *sc, bool update_status)

Now with more git-add action...

Subject: [PATCH] ath5k: correct hardware startup sequence in resume

Based on a patch by Elias Oltmanns, we call ath5k_init in resume even
if we didn't previously open the device. Besides starting up the
device unnecessarily, this also causes an oops on rmmod because
mac80211 will not invoke ath5k_stop and softirqs are left running after
the module has been unloaded. Add a new state bit, ATH_STAT_STARTED,
to indicate that we have been started up.

Signed-off-by: Bob Copeland <[email protected]>
---
drivers/net/wireless/ath5k/base.c | 28 ++++++++++++++++++++--------
drivers/net/wireless/ath5k/base.h | 3 ++-
2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index c151588..ae1c152 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -340,9 +340,9 @@ static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
}

/* Interrupt handling */
-static int ath5k_init(struct ath5k_softc *sc);
+static int ath5k_init(struct ath5k_softc *sc, bool is_resume);
static int ath5k_stop_locked(struct ath5k_softc *sc);
-static int ath5k_stop_hw(struct ath5k_softc *sc);
+static int ath5k_stop_hw(struct ath5k_softc *sc, bool is_suspend);
static irqreturn_t ath5k_intr(int irq, void *dev_id);
static void ath5k_tasklet_reset(unsigned long data);

@@ -640,7 +640,7 @@ ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)

ath5k_led_off(sc);

- ath5k_stop_hw(sc);
+ ath5k_stop_hw(sc, true);

free_irq(pdev->irq, sc);
pci_save_state(pdev);
@@ -677,9 +677,10 @@ ath5k_pci_resume(struct pci_dev *pdev)
goto err_no_irq;
}

- err = ath5k_init(sc);
+ err = ath5k_init(sc, true);
if (err)
goto err_irq;
+
ath5k_led_enable(sc);

/*
@@ -2158,12 +2159,17 @@ ath5k_beacon_config(struct ath5k_softc *sc)
\********************/

static int
-ath5k_init(struct ath5k_softc *sc)
+ath5k_init(struct ath5k_softc *sc, bool is_resume)
{
int ret;

mutex_lock(&sc->lock);

+ if (is_resume && !test_bit(ATH_STAT_STARTED, sc->status))
+ goto out_ok;
+
+ __clear_bit(ATH_STAT_STARTED, sc->status);
+
ATH5K_DBG(sc, ATH5K_DEBUG_RESET, "mode %d\n", sc->opmode);

/*
@@ -2188,12 +2194,15 @@ ath5k_init(struct ath5k_softc *sc)
if (ret)
goto done;

+ __set_bit(ATH_STAT_STARTED, sc->status);
+
/* Set ack to be sent at low bit-rates */
ath5k_hw_set_ack_bitrate_high(sc->ah, false);

mod_timer(&sc->calib_tim, round_jiffies(jiffies +
msecs_to_jiffies(ath5k_calinterval * 1000)));

+out_ok:
ret = 0;
done:
mmiowb();
@@ -2248,7 +2257,7 @@ ath5k_stop_locked(struct ath5k_softc *sc)
* stop is preempted).
*/
static int
-ath5k_stop_hw(struct ath5k_softc *sc)
+ath5k_stop_hw(struct ath5k_softc *sc, bool is_suspend)
{
int ret;

@@ -2280,6 +2289,9 @@ ath5k_stop_hw(struct ath5k_softc *sc)
}
ath5k_txbuf_free(sc, sc->bbuf);
mmiowb();
+
+ if (!is_suspend)
+ __clear_bit(ATH_STAT_STARTED, sc->status);
mutex_unlock(&sc->lock);

del_timer_sync(&sc->calib_tim);
@@ -2676,12 +2688,12 @@ ath5k_reset_wake(struct ath5k_softc *sc)

static int ath5k_start(struct ieee80211_hw *hw)
{
- return ath5k_init(hw->priv);
+ return ath5k_init(hw->priv, false);
}

static void ath5k_stop(struct ieee80211_hw *hw)
{
- ath5k_stop_hw(hw->priv);
+ ath5k_stop_hw(hw->priv, false);
}

static int ath5k_add_interface(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/ath5k/base.h b/drivers/net/wireless/ath5k/base.h
index 9d0b728..06d1054 100644
--- a/drivers/net/wireless/ath5k/base.h
+++ b/drivers/net/wireless/ath5k/base.h
@@ -128,11 +128,12 @@ struct ath5k_softc {
size_t desc_len; /* size of TX/RX descriptors */
u16 cachelsz; /* cache line size */

- DECLARE_BITMAP(status, 4);
+ DECLARE_BITMAP(status, 5);
#define ATH_STAT_INVALID 0 /* disable hardware accesses */
#define ATH_STAT_MRRETRY 1 /* multi-rate retry support */
#define ATH_STAT_PROMISC 2
#define ATH_STAT_LEDSOFT 3 /* enable LED gpio status */
+#define ATH_STAT_STARTED 4 /* opened & irqs enabled */

unsigned int filter_flags; /* HW flags, AR5K_RX_FILTER_* */
unsigned int curmode; /* current phy mode */
--
1.5.4.2.182.gb3092

--
Bob Copeland %% http://www.bobcopeland.com


2008-10-03 14:42:34

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

"Bob Copeland" <[email protected]> wrote:
> On Thu, 2 Oct 2008 14:37:17 -0400, Bob Copeland wrote
>> On Thu, Oct 2, 2008 at 12:31 PM, Elias Oltmanns <[email protected]> wrote:
>
>> > Sorry, but I don't think this is safe. Checking and restoring the
>> > started flag has to be protected too, otherwise there can be races
>> > against ->stop().
>>
>> Yes, it's a bit of a mess. Even if it were serialized, a ->stop()
>> happening between suspend's call to hw_stop and actually powering down
>> the device would clear the flag. Ho hum.
>
> Okay, as usual I'm wrong here; it will clear the flag but we don't care
> since then we just wouldn't power up on resume.
>
> Since no one else chimed in, here's take two with more chewing gum and
> baling wire to fix the suspend/stop race.
>
> ---
> drivers/net/wireless/ath5k/base.c | 24 +++++++++++++++++-------
> drivers/net/wireless/ath5k/base.h | 3 ++-
> 2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
> index e09ed2c..7e8fa2e 100644
> --- a/drivers/net/wireless/ath5k/base.c
> +++ b/drivers/net/wireless/ath5k/base.c
[...]
> @@ -666,9 +666,12 @@ ath5k_pci_resume(struct pci_dev *pdev)
> goto err_no_irq;
> }
>
> - err = ath5k_init(sc);
> - if (err)
> - goto err_irq;
> + if (test_bit(ATH_STAT_STARTED, sc->status)) {
> + err = ath5k_init(sc);

I still feel uneasy about this. Granted, I haven't thought this through
too carefully, but I'd rather not rely on the fact that ath5k_stop_hw()
will not get called between the check for ATH_STAT_STARTED and the call
to ath5k_init if I can help it. Perhaps you can add an argument `reinit'
to ath5k_init() and do something like this under the mutex:

if (reinit && !test_bit(ATH_STAT_STARTED, sc->status)) {
mutex_unlock(...);
return;
}

Regards,

Elias

2008-10-06 14:23:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Mon, 2008-10-06 at 10:12 -0400, Bob Copeland wrote:

> Nonetheless, I admit not being expert in this area, nor do I know if relying
> on the freezer is acceptable, so someone more clueful, please chime in.

It's not, it's supposed to be removed.


However, I still think ath5k shouldn't be much concerned with all this
because mac80211 should just disable all interfaces etc. from it when
suspending.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-09 10:40:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Mon, 2008-10-06 at 10:36 -0400, Bob Copeland wrote:

> > However, I still think ath5k shouldn't be much concerned with all this
> > because mac80211 should just disable all interfaces etc. from it when
> > suspending.
>
> Agreed. I guess if no one else gets to it by then, I can take a look at it in
> a couple of weeks after my GRE.

Cool, see also
http://wireless.kernel.org/en/developers/todo-list#suspend.2BAC8-resumemanagement

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-02 12:52:59

by Bob Copeland

[permalink] [raw]
Subject: Re: [ath5k-devel] Oops with current kernel and ath5k

On Thu, Oct 2, 2008 at 3:53 AM, Elias Oltmanns <[email protected]> wrote:
>> Ah, yes, good analysis... and ath5k_init is scheduling the timer. I
>> doubt ieee80211_opened would fly; probably the better thing to do is
>> ensure that the cleanup gets called regardless of whether ath5k_stop
>> gets called (it shouldn't matter since the irqs all get created in
>> _attach).
>
> Not sure I agree there. Why should calibration take place regularly even
> though the interface appears to be shut down from user-space's point of
> view? It simply doesn't make sense to start the interface if nobody
> intends to use it.

Hmm, yeah, you're right. So, I guess we can just do your patch except
with another status bit that says it's active instead of changing
mac80211 (b43 does something along those lines).

--
Bob Copeland %% http://www.bobcopeland.com