2013-07-15 19:55:40

by Ortwin Glück

[permalink] [raw]
Subject: [BUG] 3.10 regression: hang on suspend

Hi,

My Samsung ultrabook hangs when suspending to RAM since this commit (bisected).
Disabling wifi before suspend works around the issue. All works fine with 3.9.y.

12e7f517029dad819c45eca9ca01fdb9ba57616b

Author: Stanislaw Gruszka <[email protected]>

Date: Thu Feb 28 10:55:26 2013 +0100

mac80211: cleanup generic suspend/resume procedures

Since now we disconnect before suspend, various code which save
connection state can now be removed from suspend and resume
procedure. Cleanup on resume side is smaller as ieee80211_reconfig()
is also used for H/W restart.

Signed-off-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>


Hardware:
01:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)

Sorry for not noticing this earlier.

Ortwin


2013-07-16 06:53:44

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

Hi

On Mon, Jul 15, 2013 at 09:40:13PM +0200, Ortwin Gl?ck wrote:
> My Samsung ultrabook hangs when suspending to RAM since this commit
> (bisected). Disabling wifi before suspend works around the issue.
> All works fine with 3.9.y.
>
> 12e7f517029dad819c45eca9ca01fdb9ba57616b
>
> Author: Stanislaw Gruszka <[email protected]>
>
> Date: Thu Feb 28 10:55:26 2013 +0100
>
> mac80211: cleanup generic suspend/resume procedures
>
> Since now we disconnect before suspend, various code which save
> connection state can now be removed from suspend and resume
> procedure. Cleanup on resume side is smaller as ieee80211_reconfig()
> is also used for H/W restart.
>
> Signed-off-by: Stanislaw Gruszka <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>

Apparently this commit changed suspend procedure on mac80211, but it's
not obvious for me why it hangs :-(

What is your user space configuration (are you using NM or other
software or maybe just wpa_supplicant)? Are you using wowlan?
If you do add no_console_suspend boot parameter does it print some
diagnostic messages during suspend before the hang ?

Thanks
Stanislaw

2013-07-16 07:34:54

by Ortwin Glück

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend



On 16.07.2013 08:56, Stanislaw Gruszka wrote:
> Apparently this commit changed suspend procedure on mac80211, but it's
> not obvious for me why it hangs :-(

Hangs are hard :-) It just sits there with a black screen and a white
cursor in the top left corner...

> What is your user space configuration (are you using NM or other
> software or maybe just wpa_supplicant)? Are you using wowlan?
> If you do add no_console_suspend boot parameter does it print some
> diagnostic messages during suspend before the hang ?

Yes, I am using NM under KDE, with KDE triggered suspend. No wowlan
AFAIK. The last thing I see in the log is something from NetworkManager
that sees the device switching off. I can try again tonight and give you
the exact message.

I will also try without NM and bare wpa_supplicant and a plain suspend
through sysfs.

Any debug options you want me to enable? Netconsole won't work however...

Ortwin

2013-07-16 09:49:22

by Arend van Spriel

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On 07/16/2013 09:34 AM, Ortwin Gl?ck wrote:
>
>
> On 16.07.2013 08:56, Stanislaw Gruszka wrote:
>> Apparently this commit changed suspend procedure on mac80211, but it's
>> not obvious for me why it hangs :-(
>
> Hangs are hard :-) It just sits there with a black screen and a white
> cursor in the top left corner...
>
>> What is your user space configuration (are you using NM or other
>> software or maybe just wpa_supplicant)? Are you using wowlan?
>> If you do add no_console_suspend boot parameter does it print some
>> diagnostic messages during suspend before the hang ?
>
> Yes, I am using NM under KDE, with KDE triggered suspend. No wowlan
> AFAIK. The last thing I see in the log is something from NetworkManager
> that sees the device switching off. I can try again tonight and give you
> the exact message.
>
> I will also try without NM and bare wpa_supplicant and a plain suspend
> through sysfs.
>
> Any debug options you want me to enable? Netconsole won't work however...

Can you get more debug info if you try (as root/sudo -i):

echo devices > /sys/power/pm_test
echo mem > /sys/power state

Regards,
Arend

> Ortwin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-07-16 18:23:39

by Ortwin Glück

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

Without NetworkManager, no X, on console and with plain jane wpa_supplicant I do
echo mem > /sys/power state

After that, it still responds to keyboard events: I can switch VT and type on
the consoles, but I can not login on a different VT (pressing Enter after the
username doesn't return). So I guess tasks have been frozen, but it hangs in
stopping devices.

settings for pm_test:
devices: same behviour as above
freezer: works as expected (stops tasks; sleeps; resumes tasks)

Nothing in the logs. How can I enable more log?

Ortwin

2013-07-17 12:27:07

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On Tue, Jul 16, 2013 at 08:23:33PM +0200, Ortwin Gl?ck wrote:
> Without NetworkManager, no X, on console and with plain jane wpa_supplicant I do
> echo mem > /sys/power state
>
> After that, it still responds to keyboard events: I can switch VT
> and type on the consoles, but I can not login on a different VT
> (pressing Enter after the username doesn't return). So I guess tasks
> have been frozen, but it hangs in stopping devices.
>
> settings for pm_test:
> devices: same behviour as above
> freezer: works as expected (stops tasks; sleeps; resumes tasks)
>
> Nothing in the logs. How can I enable more log?

This looks like livelock, i.e. kernel wait for mutex to be released
but that never happen , for example because of deadlock. Enabling
CONFIG_LOCKDEP=y could help to diagnose this. Other than that
sysreq key w (show blocked task) could help, see
Documentation/sysrq.txt for instructions how to use this key.
Also please remember set no_console_suspend boot option to
allow print messages on tty during suspend.

Stanislaw

2013-07-18 10:53:08

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On Wed, Jul 17, 2013 at 08:25:47PM +0200, Ortwin Gl?ck wrote:
> On 07/17/2013 02:29 PM, Stanislaw Gruszka wrote:
> >This looks like livelock
>
> OK got it. Just a GPF in the suspend path, not a livelock
> fortunately. With no_console_suspend the trace appears. Attached is
> a screenshot, transcription for your convenience is here:
>
> NULL pointer deref at iwlagn_mac_remove_interface+0x43/0x120

Unfortunately I still don't know why this happen. Please do

objdump -r -d --prefix-addresses net/mac80211/mac80211.ko > mac80211.txt
objdump -r -d --prefix-addresses drivers/net/wireless/iwlwifi/dvm/iwldvm.ko > iwldvm.txt

and send me mac80211.txt & iwldvm.txt files.

Thanks
Stanislaw

2013-07-19 12:06:01

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On Thu, Jul 18, 2013 at 08:57:59PM +0200, Ortwin Gl?ck wrote:
> On 07/18/2013 12:55 PM, Stanislaw Gruszka wrote:
> >objdump -r -d --prefix-addresses net/mac80211/mac80211.ko > mac80211.txt
> >objdump -r -d --prefix-addresses drivers/net/wireless/iwlwifi/dvm/iwldvm.ko > iwldvm.txt
>
> Here you go. Please note, that the offsets have changed due to
> .config changes. New trace attached. Transcript:
>
> iwlagn_mac_remove_interface+0x5f/0x160
> __ieee80211_suspend+0x4a1/0xb30
> ieee80211_suspend+0x1a/0x20

Crash happen because we call iwlagn_mac_remove_interface with
vif_priv->ctx NULL. I do not see any possibility of that other than
doing ->remove_interface without previous ->add_interface, but I do
not see how that possible.

Does crash happen on any suspend or on second one ?

I'm attaching patch, which should prevent crash (but not fix the issue,
just workaround it), plese apply it. If it make suspend work, please then
reload iwlwifi module with debug=0x3 option, suspend/resume machine and
provide dmesg output after that.

Thanks
Stanislaw

2013-07-19 12:08:14

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On Fri, Jul 19, 2013 at 02:08:42PM +0200, Stanislaw Gruszka wrote:
> On Thu, Jul 18, 2013 at 08:57:59PM +0200, Ortwin Gl?ck wrote:
> > On 07/18/2013 12:55 PM, Stanislaw Gruszka wrote:
> > >objdump -r -d --prefix-addresses net/mac80211/mac80211.ko > mac80211.txt
> > >objdump -r -d --prefix-addresses drivers/net/wireless/iwlwifi/dvm/iwldvm.ko > iwldvm.txt
> >
> > Here you go. Please note, that the offsets have changed due to
> > .config changes. New trace attached. Transcript:
> >
> > iwlagn_mac_remove_interface+0x5f/0x160
> > __ieee80211_suspend+0x4a1/0xb30
> > ieee80211_suspend+0x1a/0x20
>
> Crash happen because we call iwlagn_mac_remove_interface with
> vif_priv->ctx NULL. I do not see any possibility of that other than
> doing ->remove_interface without previous ->add_interface, but I do
> not see how that possible.
>
> Does crash happen on any suspend or on second one ?
>
> I'm attaching patch, which should prevent crash (but not fix the issue,
> just workaround it), plese apply it. If it make suspend work, please then
> reload iwlwifi module with debug=0x3 option, suspend/resume machine and
> provide dmesg output after that.

This time really attaching forgotten patch.

Stanislaw


Attachments:
(No filename) (1.18 kB)
iwlwifi_check_ctx_null.patch (501.00 B)
Download all attachments

2013-07-19 12:15:48

by Ortwin Glück

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend



On 19.07.2013 14:08, Stanislaw Gruszka wrote:
> Does crash happen on any suspend or on second one ?

The crash always happens on the first suspend.

Thanks for the patch, I will send results tonight.

Ortwin

2013-07-19 18:47:07

by Ortwin Glück

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On 07/19/2013 02:08 PM, Stanislaw Gruszka wrote:
> I'm attaching patch, which should prevent crash (but not fix the issue,
> just workaround it), plese apply it. If it make suspend work, please then
> reload iwlwifi module with debug=0x3 option, suspend/resume machine and
> provide dmesg output after that.


Here the requested dmesg. Please note that two different WARN_ONs trigger here
directly after each other.

To me it looks like iwlagn_mac_remove_interface() is called twice, but I am not
familiar with the code.

Thanks,

Ortwin


Attachments:
dmesg.log (164.46 kB)

2013-07-22 11:19:44

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On Fri, Jul 19, 2013 at 08:46:54PM +0200, Ortwin Gl?ck wrote:
> On 07/19/2013 02:08 PM, Stanislaw Gruszka wrote:
> >I'm attaching patch, which should prevent crash (but not fix the issue,
> >just workaround it), plese apply it. If it make suspend work, please then
> >reload iwlwifi module with debug=0x3 option, suspend/resume machine and
> >provide dmesg output after that.
>
>
> Here the requested dmesg. Please note that two different WARN_ONs
> trigger here directly after each other.
>
> To me it looks like iwlagn_mac_remove_interface() is called twice,
> but I am not familiar with the code.

We remove interface that we do not add in the driver. I think I found
reason of that - I removed below code in bad commit:

list_for_each_entry(sdata, &local->interfaces, list) {
[snip]
- switch (sdata->vif.type) {
- case NL80211_IFTYPE_AP_VLAN:
- case NL80211_IFTYPE_MONITOR:
- /* skip these */
- continue;

Attached patch should correct that. Please test if it fixes the
crash.

Thanks
Stanislaw


Attachments:
(No filename) (1.07 kB)
mac80211_fix_suspend_crash.patch (663.00 B)
Download all attachments

2013-07-22 19:05:46

by Ortwin Glück

[permalink] [raw]
Subject: Re: [BUG] 3.10 regression: hang on suspend

On 07/22/2013 01:22 PM, Stanislaw Gruszka wrote:
> We remove interface that we do not add in the driver. I think I found
> reason of that - I removed below code in bad commit:
>
> list_for_each_entry(sdata, &local->interfaces, list) {
> [snip]
> - switch (sdata->vif.type) {
> - case NL80211_IFTYPE_AP_VLAN:
> - case NL80211_IFTYPE_MONITOR:
> - /* skip these */
> - continue;

Oh yes, that makes a lot of sense! I have an extra monitoring interface
configured. If I remove that before suspend, the crash does not occur.

And your patch does fix the problem. Very nice!

Thank you,

Ortwin