2008-10-27 17:28:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

[Added some CCs]

On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> Hi,
>
> So I managed to bisect my suspend to RAM regression in 2.6.27-rc2
> to commit 3b7ee69d0caefbdb85a606a98bff841b8c63b97e ("mac80211: disassociate
> when moving to new BSS") by Tomas Winkler (Cc:-ed).
>
> Unfortunately it doesn't revert cleanly so I can't double check it.
>
> My laptop is a Vaio FZ240E Core2Duo@2GHz and I use the iwlagn module
> for the wireless connection.
>
> The symptom is that suspending to RAM from inside X with 'echo mem >
> /sys/power/state' leaves me a black screen when resuming back.
>
> I can test any patches and provide more information of my laptop too.

Carlos, thanks a lot for bisecting this.

Johannes, can you pls have a look?

Thanks,
Rafael


2008-10-27 23:12:31

by Tomas Winkler

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

h

On Tue, Oct 28, 2008 at 12:50 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Monday, 27 of October 2008, Tomas Winkler wrote:
>> On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
>> > On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
>> >>[...]
>> >>
>> >> Can someone try this one (it might be space broken I've just pasted
>> >> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
>> >> (latest linux-2.6.git)
>> >
>> >
>> > Yes, this one also works for me (I applied it manually).
>> Good, this was actually part of some older and bigger patch I wasn't
>> aware it has this affect. It has some millage in iwl5000 branch.
>> RFKIll went through some changes in the mainline so it wasn't merged yet.
>
> Are you going to push this patch upstream now? It's quite important to get
> this resume regression fixed ASAP.

Definitely I would post it to wireless and stable. I would like to
hear from Harvey, though if this is somehow related to b43 as well. I
will it give some more testing tomorrow morning I don't have any HW
right now and need to suspend to bed :)

Again, sorry for troubles.
Tomas

2008-10-27 22:04:54

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 22:07:42 +0100, Rafael J. Wysocki wrote:
> On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> > On Mon 27.Oct'08 at 21:51:00 +0100, Rafael J. Wysocki wrote:
> > > On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> > > > On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
> > > > > On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
> > > > >
> > > > > > > Do you get any kernel messages output? If you do, could you put messages
> > > > > > > into each line of ieee80211_set_disassoc to see where it hangs?
> > > > > >
> > > > > > No messages appear, just a black screen.
> > > > > >
> > > > > > But I can use the SysRq keys, and when I umount the
> > > > > > screen shows the message that umount succeed. I also tried SysRq+t but
> > > > > > the messages appear to fast to read.
> > > > >
> > > > > Ok, but that means you _can_ get messages, it would help a lot if you
> > > > > could put a few printks into the set_disassoc function before/after each
> > > > > other function call, so we know where exactly it hangs. Pretty much all
> > > > > of them could possibly hang if there is some sort of locking error
> > > > > happening or anything relies on userspace to be running...
> > > >
> > > > Ok, I humbly tried to do that with the patch at the end of the email,
> > > > but I did not appear to hang in this function tough.
> > > >
> > > > Somehow I could get some messages printed when it was a black screen
> > > > before (I think it has to do with the debug level I set with SysRq...)
> > > > and I could see all the printks I've put there.
> > > >
> > > > The good thing is that I could get the complete syslog of the boot until
> > > > the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
> > > > below applied). The last messages before the laptop become unresponsive
> > > > (except for the SysRq) were these ones:
> > > >
> > > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
> > > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
> > > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
> > > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
> > > > Oct 27 21:03:06 localhost kernel: before rcu_read_lock
> > > > Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
> > > > Oct 27 21:03:06 localhost kernel: before netif_carrier_off
> > > > Oct 27 21:03:06 localhost kernel: before ieee80211_sta
> > > > Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
> > > > Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
> > > > Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
> > > > Oct 27 21:03:06 localhost kernel: before sta_info_unlink
> > > > Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
> > > > Oct 27 21:03:06 localhost kernel: before sta_info_destroy
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
> > > > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > > > Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
> > > > Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
> > > > Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
> > > > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > > > Oct 27 21:03:06 localhost kernel: ata1: EH complete
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > > Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
> > > > Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
> > > > Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
> > > > Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
> > > > Oct 27 21:06:20 localhost kernel: Loglevel set to 4
> > > > Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
> > > > Oct 27 21:06:22 localhost kernel: Loglevel set to 6
> > > > Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
> > > > Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
> > > > Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
> > > > Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11
> > > >
> > > > and I have the complete trace also. I can try to put it somewhere in the web if it helps
> > > > (I already tried it, but I am new at the institute here and I could not set up my
> > > > webpage yet :-(
> > >
> > > Please attach it to http://bugzilla.kernel.org/show_bug.cgi?id=11845 .
> >
> > Done.
>
> Hmm, this looks like a result of playing with the magic SysRq. Did you try to
> press SysRq-something when the box became unresponsive?

Yes, that long trace is the result of SysRq+t but I never managed to
understand its result :-(

This last hang was somewhat different because I could read the messages
up to the point where the laptop was unresponsive (see the time in the
log, I waited 3 minutes until I decided to hit SysRq+t). I am not
sure why this happened, tough.

All the other hangs were with a complete black screen. _But_ if I used
some SysRq combination during the time the screen was black
I could see the result in the screen. I know this because when the resume
failed I always used to sync+umount+reboot using the SysRq, and those messages
from the SysRq I could read ("Emergency Sync", "Emergency umount" or something
like that).

Ok, now I see I have another patch to test from Tomas...will try that.

2008-10-28 16:12:12

by Tomas Winkler

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Tue, Oct 28, 2008 at 5:30 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, 28 of October 2008, Tomas Winkler wrote:
>> h
>>
>> On Tue, Oct 28, 2008 at 12:50 AM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Monday, 27 of October 2008, Tomas Winkler wrote:
>> >> On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
>> >> > On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
>> >> >>[...]
>> >> >>
>> >> >> Can someone try this one (it might be space broken I've just pasted
>> >> >> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
>> >> >> (latest linux-2.6.git)
>> >> >
>> >> >
>> >> > Yes, this one also works for me (I applied it manually).
>> >> Good, this was actually part of some older and bigger patch I wasn't
>> >> aware it has this affect. It has some millage in iwl5000 branch.
>> >> RFKIll went through some changes in the mainline so it wasn't merged yet.
>> >
>> > Are you going to push this patch upstream now? It's quite important to get
>> > this resume regression fixed ASAP.
>>
>> Definitely I would post it to wireless and stable.
>
> Please send a CC to me too.
Hmm. It looks like I've misspelled your email when I've sent the patch.
Will send it again
Tomas

2008-10-27 18:44:57

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
>
> > > Johannes, can you pls have a look?
>
> I did, and I have no idea. Makes no sense at all.

The only thing I can remotely think of is that iwlwifi doesn't like
being called back from within the call that it did to mac80211, which
obviously happens here. But I have no idea, the code as it stands is
correct, just the interaction with iwlwifi's resume seems to be broken.

Try this patch instead:

--- everything.orig/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:12.000000000 +0100
+++ everything/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:15.000000000 +0100
@@ -2084,7 +2084,6 @@ static void iwl_alive_start(struct iwl_p
iwl4965_error_recovery(priv);

iwl_power_update_mode(priv, 1);
- ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);

if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
iwl4965_set_mode(priv, priv->iw_mode);


johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 19:22:04

by Jens Axboe

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, Oct 27 2008, Johannes Berg wrote:
> On Mon, 2008-10-27 at 20:13 +0100, Jens Axboe wrote:
>
> > If you want something else tested, just let me know. I don't care a
> > whole lot about how it gets fixed, as long as it does :-). I use STR
> > heavily, so it's quite a burden to have it broken.
>
> I don't really know. Like I said to Carlos, figuring out which of the
> functions called from set_disassoc hangs would be useful, but you'd
> probably want to coordinate with him so you don't both do it. Or if you
> both do, it'd confirm it twice ;)

OK, I can try and do that tomorrow when I'm at the desktop again, I have
the dock there serial console.

--
Jens Axboe


2008-10-27 19:17:11

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 20:13 +0100, Jens Axboe wrote:

> If you want something else tested, just let me know. I don't care a
> whole lot about how it gets fixed, as long as it does :-). I use STR
> heavily, so it's quite a burden to have it broken.

I don't really know. Like I said to Carlos, figuring out which of the
functions called from set_disassoc hangs would be useful, but you'd
probably want to coordinate with him so you don't both do it. Or if you
both do, it'd confirm it twice ;)

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 21:07:02

by Tomas Winkler

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, Oct 27, 2008 at 10:51 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Monday, 27 of October 2008, Carlos R. Mafra wrote:
>> On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
>> > On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
>> >
>> > > > Do you get any kernel messages output? If you do, could you put messages
>> > > > into each line of ieee80211_set_disassoc to see where it hangs?
>> > >
>> > > No messages appear, just a black screen.
>> > >
>> > > But I can use the SysRq keys, and when I umount the
>> > > screen shows the message that umount succeed. I also tried SysRq+t but
>> > > the messages appear to fast to read.
>> >
>> > Ok, but that means you _can_ get messages, it would help a lot if you
>> > could put a few printks into the set_disassoc function before/after each
>> > other function call, so we know where exactly it hangs. Pretty much all
>> > of them could possibly hang if there is some sort of locking error
>> > happening or anything relies on userspace to be running...
>>
>> Ok, I humbly tried to do that with the patch at the end of the email,
>> but I did not appear to hang in this function tough.
>>
>> Somehow I could get some messages printed when it was a black screen
>> before (I think it has to do with the debug level I set with SysRq...)
>> and I could see all the printks I've put there.
>>
>> The good thing is that I could get the complete syslog of the boot until
>> the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
>> below applied). The last messages before the laptop become unresponsive
>> (except for the SysRq) were these ones:
>>
>> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
>> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
>> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
>> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
>> Oct 27 21:03:06 localhost kernel: before rcu_read_lock
>> Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
>> Oct 27 21:03:06 localhost kernel: before netif_carrier_off
>> Oct 27 21:03:06 localhost kernel: before ieee80211_sta
>> Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
>> Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
>> Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
>> Oct 27 21:03:06 localhost kernel: before sta_info_unlink
>> Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
>> Oct 27 21:03:06 localhost kernel: before sta_info_destroy
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
>> Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
>> Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
>> Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
>> Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
>> Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
>> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
>> Oct 27 21:03:06 localhost kernel: ata1: EH complete
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
>> Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
>> Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
>> Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
>> Oct 27 21:06:20 localhost kernel: Loglevel set to 4
>> Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
>> Oct 27 21:06:22 localhost kernel: Loglevel set to 6
>> Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
>> Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
>> Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
>> Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11
>>
>> and I have the complete trace also. I can try to put it somewhere in the web if it helps
>> (I already tried it, but I am new at the institute here and I could not set up my
>> webpage yet :-(
>
> Please attach it to http://bugzilla.kernel.org/show_bug.cgi?id=11845 .

Can someone try this one (it might be space broken I've just pasted
that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
(latest linux-2.6.git)

diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c
b/drivers/net/wireless/iwlwifi/iwl-agn.c
index 24a1aeb..321dbc8 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
@@ -2090,7 +2090,6 @@ static void iwl_alive_start(struct iwl_priv *priv)
iwl4965_error_recovery(priv);

iwl_power_update_mode(priv, 1);
- ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);

if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
iwl4965_set_mode(priv, priv->iw_mode);
@@ -2342,6 +2341,7 @@ static void iwl_bg_alive_start(struct work_struct *data)
mutex_lock(&priv->mutex);
iwl_alive_start(priv);
mutex_unlock(&priv->mutex);
+ ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
}

static void iwl4965_bg_rf_kill(struct work_struct *work)


Thanks
Tomas

2008-10-27 19:09:45

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 20:06 +0100, Jens Axboe wrote:
> On Mon, Oct 27 2008, Carlos R. Mafra wrote:
> > On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> > > On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > > > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> > > >
> > > > > > Johannes, can you pls have a look?
> > > >
> > > > I did, and I have no idea. Makes no sense at all.
> > >
> > > The only thing I can remotely think of is that iwlwifi doesn't like
> > > being called back from within the call that it did to mac80211, which
> > > obviously happens here. But I have no idea, the code as it stands is
> > > correct, just the interaction with iwlwifi's resume seems to be broken.
> > >
> > > Try this patch instead:
> >
> > Yep, with this patch it also works!
>
> Confirmed here as well, my x60 is happy again.

Thanks. Another alternative I could think of is deferring the
notifications to a work struct, but I'd rather see that in the driver I
think, not sure though, could be argued either way.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 19:13:52

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:

> > Do you get any kernel messages output? If you do, could you put messages
> > into each line of ieee80211_set_disassoc to see where it hangs?
>
> No messages appear, just a black screen.
>
> But I can use the SysRq keys, and when I umount the
> screen shows the message that umount succeed. I also tried SysRq+t but
> the messages appear to fast to read.

Ok, but that means you _can_ get messages, it would help a lot if you
could put a few printks into the set_disassoc function before/after each
other function call, so we know where exactly it hangs. Pretty much all
of them could possibly hang if there is some sort of locking error
happening or anything relies on userspace to be running...

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 18:59:37

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> >
> > > > Johannes, can you pls have a look?
> >
> > I did, and I have no idea. Makes no sense at all.
>
> The only thing I can remotely think of is that iwlwifi doesn't like
> being called back from within the call that it did to mac80211, which
> obviously happens here. But I have no idea, the code as it stands is
> correct, just the interaction with iwlwifi's resume seems to be broken.
>
> Try this patch instead:

Yep, with this patch it also works!


> --- everything.orig/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:12.000000000 +0100
> +++ everything/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:15.000000000 +0100
> @@ -2084,7 +2084,6 @@ static void iwl_alive_start(struct iwl_p
> iwl4965_error_recovery(priv);
>
> iwl_power_update_mode(priv, 1);
> - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>
> if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
> iwl4965_set_mode(priv, priv->iw_mode);
>
>
> johannes



2008-10-27 22:40:58

by Tomas Winkler

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
> On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
>>[...]
>>
>> Can someone try this one (it might be space broken I've just pasted
>> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
>> (latest linux-2.6.git)
>
>
> Yes, this one also works for me (I applied it manually).
Good, this was actually part of some older and bigger patch I wasn't
aware it has this affect. It has some millage in iwl5000 branch.
RFKIll went through some changes in the mainline so it wasn't merged yet.

commit a91ad840c23a70bc0eabe239e178e0d979a6d44e
Author: Emmanuel Grumbach <[email protected]>
Date: Mon Jun 30 17:48:26 2008 +0300
iwlwifi: bug fixes in RF-kill flows

Tomas
>
>
>> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c
>> b/drivers/net/wireless/iwlwifi/iwl-agn.c
>> index 24a1aeb..321dbc8 100644
>> --- a/drivers/net/wireless/iwlwifi/iwl-agn.c
>> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
>> @@ -2090,7 +2090,6 @@ static void iwl_alive_start(struct iwl_priv *priv)
>> iwl4965_error_recovery(priv);
>>
>> iwl_power_update_mode(priv, 1);
>> - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>>
>> if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
>> iwl4965_set_mode(priv, priv->iw_mode);
>> @@ -2342,6 +2341,7 @@ static void iwl_bg_alive_start(struct work_struct *data)
>> mutex_lock(&priv->mutex);
>> iwl_alive_start(priv);
>> mutex_unlock(&priv->mutex);
>> + ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>> }
>>
>> static void iwl4965_bg_rf_kill(struct work_struct *work)
>>
>>
>> Thanks
>> Tomas
>

2008-10-28 06:33:00

by Christian Borntraeger

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

Am Montag, 27. Oktober 2008 schrieb Tomas Winkler:
> Can someone try this one (it might be space broken I've just pasted
> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
> (latest linux-2.6.git)
>
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c
> b/drivers/net/wireless/iwlwifi/iwl-agn.c
> index 24a1aeb..321dbc8 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
> @@ -2090,7 +2090,6 @@ static void iwl_alive_start(struct iwl_priv *priv)
> iwl4965_error_recovery(priv);
>
> iwl_power_update_mode(priv, 1);
> - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>
> if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
> iwl4965_set_mode(priv, priv->iw_mode);
> @@ -2342,6 +2341,7 @@ static void iwl_bg_alive_start(struct work_struct
*data)
> mutex_lock(&priv->mutex);
> iwl_alive_start(priv);
> mutex_unlock(&priv->mutex);
> + ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
> }
>
> static void iwl4965_bg_rf_kill(struct work_struct *work)

This one works on my T61p.

2008-10-27 18:31:30

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:

> > Johannes, can you pls have a look?

I did, and I have no idea. Makes no sense at all.

> Hmmhh, I still have the problem when removing the mac80211 and other
> wireless modules before s2ram...

And that makes even less sense :)

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 19:03:26

by Johannes Berg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 20:00 +0100, Carlos R. Mafra wrote:
> On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> > On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> > >
> > > > > Johannes, can you pls have a look?
> > >
> > > I did, and I have no idea. Makes no sense at all.
> >
> > The only thing I can remotely think of is that iwlwifi doesn't like
> > being called back from within the call that it did to mac80211, which
> > obviously happens here. But I have no idea, the code as it stands is
> > correct, just the interaction with iwlwifi's resume seems to be broken.
> >
> > Try this patch instead:
>
> Yep, with this patch it also works!

> > --- everything.orig/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:12.000000000 +0100
> > +++ everything/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:15.000000000 +0100
> > @@ -2084,7 +2084,6 @@ static void iwl_alive_start(struct iwl_p
> > iwl4965_error_recovery(priv);
> >
> > iwl_power_update_mode(priv, 1);
> > - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);

alright, well, if nothing else turns up soon then we should probably put
this patch in rather than reverting the other one, imho mac80211 is
doing the right thing there and the driver is calling into it at a point
where either mac80211 or the driver cannot handle it.

Do you get any kernel messages output? If you do, could you put messages
into each line of ieee80211_set_disassoc to see where it hangs?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-27 19:07:39

by Jens Axboe

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, Oct 27 2008, Carlos R. Mafra wrote:
> On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> > On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> > >
> > > > > Johannes, can you pls have a look?
> > >
> > > I did, and I have no idea. Makes no sense at all.
> >
> > The only thing I can remotely think of is that iwlwifi doesn't like
> > being called back from within the call that it did to mac80211, which
> > obviously happens here. But I have no idea, the code as it stands is
> > correct, just the interaction with iwlwifi's resume seems to be broken.
> >
> > Try this patch instead:
>
> Yep, with this patch it also works!

Confirmed here as well, my x60 is happy again.

--
Jens Axboe


2008-10-27 20:51:30

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 21:51:00 +0100, Rafael J. Wysocki wrote:
> On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> > On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
> > > On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
> > >
> > > > > Do you get any kernel messages output? If you do, could you put messages
> > > > > into each line of ieee80211_set_disassoc to see where it hangs?
> > > >
> > > > No messages appear, just a black screen.
> > > >
> > > > But I can use the SysRq keys, and when I umount the
> > > > screen shows the message that umount succeed. I also tried SysRq+t but
> > > > the messages appear to fast to read.
> > >
> > > Ok, but that means you _can_ get messages, it would help a lot if you
> > > could put a few printks into the set_disassoc function before/after each
> > > other function call, so we know where exactly it hangs. Pretty much all
> > > of them could possibly hang if there is some sort of locking error
> > > happening or anything relies on userspace to be running...
> >
> > Ok, I humbly tried to do that with the patch at the end of the email,
> > but I did not appear to hang in this function tough.
> >
> > Somehow I could get some messages printed when it was a black screen
> > before (I think it has to do with the debug level I set with SysRq...)
> > and I could see all the printks I've put there.
> >
> > The good thing is that I could get the complete syslog of the boot until
> > the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
> > below applied). The last messages before the laptop become unresponsive
> > (except for the SysRq) were these ones:
> >
> > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
> > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
> > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
> > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
> > Oct 27 21:03:06 localhost kernel: before rcu_read_lock
> > Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
> > Oct 27 21:03:06 localhost kernel: before netif_carrier_off
> > Oct 27 21:03:06 localhost kernel: before ieee80211_sta
> > Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
> > Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
> > Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
> > Oct 27 21:03:06 localhost kernel: before sta_info_unlink
> > Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
> > Oct 27 21:03:06 localhost kernel: before sta_info_destroy
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
> > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
> > Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
> > Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
> > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > Oct 27 21:03:06 localhost kernel: ata1: EH complete
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
> > Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
> > Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
> > Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
> > Oct 27 21:06:20 localhost kernel: Loglevel set to 4
> > Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
> > Oct 27 21:06:22 localhost kernel: Loglevel set to 6
> > Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
> > Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
> > Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
> > Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11
> >
> > and I have the complete trace also. I can try to put it somewhere in the web if it helps
> > (I already tried it, but I am new at the institute here and I could not set up my
> > webpage yet :-(
>
> Please attach it to http://bugzilla.kernel.org/show_bug.cgi?id=11845 .

Done.


2008-10-27 22:28:20

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
>[...]
>
> Can someone try this one (it might be space broken I've just pasted
> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
> (latest linux-2.6.git)


Yes, this one also works for me (I applied it manually).


> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c
> b/drivers/net/wireless/iwlwifi/iwl-agn.c
> index 24a1aeb..321dbc8 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
> @@ -2090,7 +2090,6 @@ static void iwl_alive_start(struct iwl_priv *priv)
> iwl4965_error_recovery(priv);
>
> iwl_power_update_mode(priv, 1);
> - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>
> if (test_and_clear_bit(STATUS_MODE_PENDING, &priv->status))
> iwl4965_set_mode(priv, priv->iw_mode);
> @@ -2342,6 +2341,7 @@ static void iwl_bg_alive_start(struct work_struct *data)
> mutex_lock(&priv->mutex);
> iwl_alive_start(priv);
> mutex_unlock(&priv->mutex);
> + ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
> }
>
> static void iwl4965_bg_rf_kill(struct work_struct *work)
>
>
> Thanks
> Tomas

2008-10-27 20:46:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
> > On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
> >
> > > > Do you get any kernel messages output? If you do, could you put messages
> > > > into each line of ieee80211_set_disassoc to see where it hangs?
> > >
> > > No messages appear, just a black screen.
> > >
> > > But I can use the SysRq keys, and when I umount the
> > > screen shows the message that umount succeed. I also tried SysRq+t but
> > > the messages appear to fast to read.
> >
> > Ok, but that means you _can_ get messages, it would help a lot if you
> > could put a few printks into the set_disassoc function before/after each
> > other function call, so we know where exactly it hangs. Pretty much all
> > of them could possibly hang if there is some sort of locking error
> > happening or anything relies on userspace to be running...
>
> Ok, I humbly tried to do that with the patch at the end of the email,
> but I did not appear to hang in this function tough.
>
> Somehow I could get some messages printed when it was a black screen
> before (I think it has to do with the debug level I set with SysRq...)
> and I could see all the printks I've put there.
>
> The good thing is that I could get the complete syslog of the boot until
> the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
> below applied). The last messages before the laptop become unresponsive
> (except for the SysRq) were these ones:
>
> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
> Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
> Oct 27 21:03:06 localhost kernel: before rcu_read_lock
> Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
> Oct 27 21:03:06 localhost kernel: before netif_carrier_off
> Oct 27 21:03:06 localhost kernel: before ieee80211_sta
> Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
> Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
> Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
> Oct 27 21:03:06 localhost kernel: before sta_info_unlink
> Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
> Oct 27 21:03:06 localhost kernel: before sta_info_destroy
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
> Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
> Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
> Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
> Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> Oct 27 21:03:06 localhost kernel: ata1: EH complete
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
> Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
> Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
> Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
> Oct 27 21:06:20 localhost kernel: Loglevel set to 4
> Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
> Oct 27 21:06:22 localhost kernel: Loglevel set to 6
> Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
> Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
> Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
> Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11
>
> and I have the complete trace also. I can try to put it somewhere in the web if it helps
> (I already tried it, but I am new at the institute here and I could not set up my
> webpage yet :-(

Please attach it to http://bugzilla.kernel.org/show_bug.cgi?id=11845 .

Thanks,
Rafael

2008-10-27 18:14:09

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, 2008-10-27 at 18:32 +0100, Rafael J. Wysocki wrote:
> [Added some CCs]
>
> On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> > Hi,
> >
> > So I managed to bisect my suspend to RAM regression in 2.6.27-rc2
> > to commit 3b7ee69d0caefbdb85a606a98bff841b8c63b97e ("mac80211: disassociate
> > when moving to new BSS") by Tomas Winkler (Cc:-ed).
> >
> > Unfortunately it doesn't revert cleanly so I can't double check it.
> >
> > My laptop is a Vaio FZ240E Core2Duo@2GHz and I use the iwlagn module
> > for the wireless connection.
> >
> > The symptom is that suspending to RAM from inside X with 'echo mem >
> > /sys/power/state' leaves me a black screen when resuming back.
> >
> > I can test any patches and provide more information of my laptop too.
>
> Carlos, thanks a lot for bisecting this.
>
> Johannes, can you pls have a look?

Hmmhh, I still have the problem when removing the mac80211 and other
wireless modules before s2ram...

Soeren

2008-10-27 19:10:43

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 20:03:15 +0100, Johannes Berg wrote:
> On Mon, 2008-10-27 at 20:00 +0100, Carlos R. Mafra wrote:
> > On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> > > On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > > > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> > > >
> > > > > > Johannes, can you pls have a look?
> > > >
> > > > I did, and I have no idea. Makes no sense at all.
> > >
> > > The only thing I can remotely think of is that iwlwifi doesn't like
> > > being called back from within the call that it did to mac80211, which
> > > obviously happens here. But I have no idea, the code as it stands is
> > > correct, just the interaction with iwlwifi's resume seems to be broken.
> > >
> > > Try this patch instead:
> >
> > Yep, with this patch it also works!
>
> > > --- everything.orig/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:12.000000000 +0100
> > > +++ everything/drivers/net/wireless/iwlwifi/iwl-agn.c 2008-10-27 19:44:15.000000000 +0100
> > > @@ -2084,7 +2084,6 @@ static void iwl_alive_start(struct iwl_p
> > > iwl4965_error_recovery(priv);
> > >
> > > iwl_power_update_mode(priv, 1);
> > > - ieee80211_notify_mac(priv->hw, IEEE80211_NOTIFY_RE_ASSOC);
>
> alright, well, if nothing else turns up soon then we should probably put
> this patch in rather than reverting the other one, imho mac80211 is
> doing the right thing there and the driver is calling into it at a point
> where either mac80211 or the driver cannot handle it.
>
> Do you get any kernel messages output? If you do, could you put messages
> into each line of ieee80211_set_disassoc to see where it hangs?

No messages appear, just a black screen.

But I can use the SysRq keys, and when I umount the
screen shows the message that umount succeed. I also tried SysRq+t but
the messages appear to fast to read.


>
> johannes



2008-10-27 21:03:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> On Mon 27.Oct'08 at 21:51:00 +0100, Rafael J. Wysocki wrote:
> > On Monday, 27 of October 2008, Carlos R. Mafra wrote:
> > > On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
> > > > On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
> > > >
> > > > > > Do you get any kernel messages output? If you do, could you put messages
> > > > > > into each line of ieee80211_set_disassoc to see where it hangs?
> > > > >
> > > > > No messages appear, just a black screen.
> > > > >
> > > > > But I can use the SysRq keys, and when I umount the
> > > > > screen shows the message that umount succeed. I also tried SysRq+t but
> > > > > the messages appear to fast to read.
> > > >
> > > > Ok, but that means you _can_ get messages, it would help a lot if you
> > > > could put a few printks into the set_disassoc function before/after each
> > > > other function call, so we know where exactly it hangs. Pretty much all
> > > > of them could possibly hang if there is some sort of locking error
> > > > happening or anything relies on userspace to be running...
> > >
> > > Ok, I humbly tried to do that with the patch at the end of the email,
> > > but I did not appear to hang in this function tough.
> > >
> > > Somehow I could get some messages printed when it was a black screen
> > > before (I think it has to do with the debug level I set with SysRq...)
> > > and I could see all the printks I've put there.
> > >
> > > The good thing is that I could get the complete syslog of the boot until
> > > the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
> > > below applied). The last messages before the laptop become unresponsive
> > > (except for the SysRq) were these ones:
> > >
> > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
> > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
> > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
> > > Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
> > > Oct 27 21:03:06 localhost kernel: before rcu_read_lock
> > > Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
> > > Oct 27 21:03:06 localhost kernel: before netif_carrier_off
> > > Oct 27 21:03:06 localhost kernel: before ieee80211_sta
> > > Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
> > > Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
> > > Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
> > > Oct 27 21:03:06 localhost kernel: before sta_info_unlink
> > > Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
> > > Oct 27 21:03:06 localhost kernel: before sta_info_destroy
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
> > > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > > Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
> > > Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
> > > Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
> > > Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
> > > Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > > Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
> > > Oct 27 21:03:06 localhost kernel: ata1: EH complete
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> > > Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
> > > Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
> > > Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
> > > Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
> > > Oct 27 21:06:20 localhost kernel: Loglevel set to 4
> > > Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
> > > Oct 27 21:06:22 localhost kernel: Loglevel set to 6
> > > Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
> > > Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
> > > Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
> > > Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11
> > >
> > > and I have the complete trace also. I can try to put it somewhere in the web if it helps
> > > (I already tried it, but I am new at the institute here and I could not set up my
> > > webpage yet :-(
> >
> > Please attach it to http://bugzilla.kernel.org/show_bug.cgi?id=11845 .
>
> Done.

Hmm, this looks like a result of playing with the magic SysRq. Did you try to
press SysRq-something when the box became unresponsive?

Rafael

2008-10-28 15:25:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Tuesday, 28 of October 2008, Tomas Winkler wrote:
> h
>
> On Tue, Oct 28, 2008 at 12:50 AM, Rafael J. Wysocki <[email protected]> wrote:
> > On Monday, 27 of October 2008, Tomas Winkler wrote:
> >> On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
> >> > On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
> >> >>[...]
> >> >>
> >> >> Can someone try this one (it might be space broken I've just pasted
> >> >> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
> >> >> (latest linux-2.6.git)
> >> >
> >> >
> >> > Yes, this one also works for me (I applied it manually).
> >> Good, this was actually part of some older and bigger patch I wasn't
> >> aware it has this affect. It has some millage in iwl5000 branch.
> >> RFKIll went through some changes in the mainline so it wasn't merged yet.
> >
> > Are you going to push this patch upstream now? It's quite important to get
> > this resume regression fixed ASAP.
>
> Definitely I would post it to wireless and stable.

Please send a CC to me too.

Thanks,
Rafael

2008-10-27 20:38:52

by Carlos Mafra

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon 27.Oct'08 at 20:13:43 +0100, Johannes Berg wrote:
> On Mon, 2008-10-27 at 20:11 +0100, Carlos R. Mafra wrote:
>
> > > Do you get any kernel messages output? If you do, could you put messages
> > > into each line of ieee80211_set_disassoc to see where it hangs?
> >
> > No messages appear, just a black screen.
> >
> > But I can use the SysRq keys, and when I umount the
> > screen shows the message that umount succeed. I also tried SysRq+t but
> > the messages appear to fast to read.
>
> Ok, but that means you _can_ get messages, it would help a lot if you
> could put a few printks into the set_disassoc function before/after each
> other function call, so we know where exactly it hangs. Pretty much all
> of them could possibly hang if there is some sort of locking error
> happening or anything relies on userspace to be running...

Ok, I humbly tried to do that with the patch at the end of the email,
but I did not appear to hang in this function tough.

Somehow I could get some messages printed when it was a black screen
before (I think it has to do with the debug level I set with SysRq...)
and I could see all the printks I've put there.

The good thing is that I could get the complete syslog of the boot until
the it failed after suspending to RAM (in 2.6.28-rc2 with my debug patch
below applied). The last messages before the laptop become unresponsive
(except for the SysRq) were these ones:

Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:radio
Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:assoc
Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:RX
Oct 27 21:03:06 localhost kernel: Registered led device: iwl-phy0:TX
Oct 27 21:03:06 localhost kernel: before rcu_read_lock
Oct 27 21:03:06 localhost kernel: before netif_tx_stop_all_queues
Oct 27 21:03:06 localhost kernel: before netif_carrier_off
Oct 27 21:03:06 localhost kernel: before ieee80211_sta
Oct 27 21:03:06 localhost kernel: inside sef_disconnected 1
Oct 27 21:03:06 localhost kernel: before ieee8021_led_assoc
Oct 27 21:03:06 localhost kernel: before ieee8021_sta_send_apinfo
Oct 27 21:03:06 localhost kernel: before sta_info_unlink
Oct 27 21:03:06 localhost kernel: before rcu_read_unlock
Oct 27 21:03:06 localhost kernel: before sta_info_destroy
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Starting disk
Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
Oct 27 21:03:06 localhost kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
Oct 27 21:03:06 localhost kernel: ata1: irq_stat 0x00000040, connection status changed
Oct 27 21:03:06 localhost kernel: ata1: hard resetting link
Oct 27 21:03:06 localhost kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd ef/90:03:00:00:00:a0 succeeded
Oct 27 21:03:06 localhost kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Oct 27 21:03:06 localhost kernel: ata1.00: configured for UDMA/100
Oct 27 21:03:06 localhost kernel: ata1: EH complete
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 27 21:03:06 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 27 21:03:06 localhost kernel: Restarting tasks ... done.
Oct 27 21:03:07 localhost ifplugd(wlan0)[3182]: Link beat lost.
Oct 27 21:03:13 localhost ifplugd(wlan0)[3182]: Executing '/etc/ifplugd/ifplugd.action wlan0 down'.
Oct 27 21:06:20 localhost kernel: SysRq : Changing Loglevel
Oct 27 21:06:20 localhost kernel: Loglevel set to 4
Oct 27 21:06:22 localhost kernel: SysRq : Changing Loglevel
Oct 27 21:06:22 localhost kernel: Loglevel set to 6
Oct 27 21:06:32 localhost kernel: ffff80251670>] ? autoremove_wake_function+0x0/0x40
Oct 27 21:06:32 localhost kernel: [<ffffffff8024e200>] ? worker_thread+0x0/0x110
Oct 27 21:06:32 localhost kernel: [<ffffffff8025119d>] kthread+0x4d/0x80
Oct 27 21:06:32 localhost kernel: [<ffffffff8020d1b9>] child_rip+0xa/0x11

and I have the complete trace also. I can try to put it somewhere in the web if it helps
(I already tried it, but I am new at the institute here and I could not set up my
webpage yet :-(

This is the patch I applied, I don't know if this helps or not...

---
net/mac80211/mlme.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 87665d7..c6e3338 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -819,10 +819,12 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta;
u32 changed = BSS_CHANGED_ASSOC;

+ printk("before rcu_read_lock\n");
rcu_read_lock();

sta = sta_info_get(local, ifsta->bssid);
if (!sta) {
+ printk("before rcu_read_unlock\n");
rcu_read_unlock();
return;
}
@@ -834,18 +836,24 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
ifsta->assoc_scan_tries = 0;
ifsta->assoc_tries = 0;

+ printk("before netif_tx_stop_all_queues\n");
netif_tx_stop_all_queues(sdata->dev);
+ printk("before netif_carrier_off\n");
netif_carrier_off(sdata->dev);

+ printk("before ieee80211_sta\n");
ieee80211_sta_tear_down_BA_sessions(sdata, sta->sta.addr);

if (self_disconnected) {
- if (deauth)
+ if (deauth) {
+ printk("inside sef_disconnected 1\n");
ieee80211_send_deauth_disassoc(sdata,
IEEE80211_STYPE_DEAUTH, reason);
- else
+ } else {
+ printk("inside sef_disconnected 2\n");
ieee80211_send_deauth_disassoc(sdata,
IEEE80211_STYPE_DISASSOC, reason);
+ }
}

ifsta->flags &= ~IEEE80211_STA_ASSOCIATED;
@@ -858,18 +866,23 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
sdata->bss_conf.ht_conf = NULL;
sdata->bss_conf.ht_bss_conf = NULL;

+ printk("before ieee8021_led_assoc\n");
ieee80211_led_assoc(local, 0);
sdata->bss_conf.assoc = 0;

+ printk("before ieee8021_sta_send_apinfo\n");
ieee80211_sta_send_apinfo(sdata, ifsta);

if (self_disconnected)
ifsta->state = IEEE80211_STA_MLME_DISABLED;

+ printk("before sta_info_unlink\n");
sta_info_unlink(&sta);

+ printk("before rcu_read_unlock\n");
rcu_read_unlock();

+ printk("before sta_info_destroy\n");
sta_info_destroy(sta);
}






2008-10-27 23:25:11

by Harvey Harrison

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Tue, 2008-10-28 at 01:12 +0200, Tomas Winkler wrote:
> h
>
> On Tue, Oct 28, 2008 at 12:50 AM, Rafael J. Wysocki <[email protected]> wrote:
> > On Monday, 27 of October 2008, Tomas Winkler wrote:
> >> On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
> >> > On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
> >> >>[...]
> >> >>
> >> >> Can someone try this one (it might be space broken I've just pasted
> >> >> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
> >> >> (latest linux-2.6.git)
> >> >
> >> >
> >> > Yes, this one also works for me (I applied it manually).
> >> Good, this was actually part of some older and bigger patch I wasn't
> >> aware it has this affect. It has some millage in iwl5000 branch.
> >> RFKIll went through some changes in the mainline so it wasn't merged yet.
> >
> > Are you going to push this patch upstream now? It's quite important to get
> > this resume regression fixed ASAP.
>
> Definitely I would post it to wireless and stable. I would like to
> hear from Harvey, though if this is somehow related to b43 as well. I
> will it give some more testing tomorrow morning I don't have any HW
> right now and need to suspend to bed :)

Go ahead and send it, I retried with linus HEAD and suspend/resume is fine here,
not sure why it was failing, although it does seem to take longer to resume that
2.6.27, but that's purely subjective, no numbers to back that up.

Harvey


2008-10-27 19:14:15

by Jens Axboe

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Mon, Oct 27 2008, Johannes Berg wrote:
> On Mon, 2008-10-27 at 20:06 +0100, Jens Axboe wrote:
> > On Mon, Oct 27 2008, Carlos R. Mafra wrote:
> > > On Mon 27.Oct'08 at 19:44:42 +0100, Johannes Berg wrote:
> > > > On Mon, 2008-10-27 at 19:31 +0100, Johannes Berg wrote:
> > > > > On Mon, 2008-10-27 at 19:07 +0100, Soeren Sonnenburg wrote:
> > > > >
> > > > > > > Johannes, can you pls have a look?
> > > > >
> > > > > I did, and I have no idea. Makes no sense at all.
> > > >
> > > > The only thing I can remotely think of is that iwlwifi doesn't like
> > > > being called back from within the call that it did to mac80211, which
> > > > obviously happens here. But I have no idea, the code as it stands is
> > > > correct, just the interaction with iwlwifi's resume seems to be broken.
> > > >
> > > > Try this patch instead:
> > >
> > > Yep, with this patch it also works!
> >
> > Confirmed here as well, my x60 is happy again.
>
> Thanks. Another alternative I could think of is deferring the
> notifications to a work struct, but I'd rather see that in the driver I
> think, not sure though, could be argued either way.

If you want something else tested, just let me know. I don't care a
whole lot about how it gets fixed, as long as it does :-). I use STR
heavily, so it's quite a burden to have it broken.


--
Jens Axboe


2008-10-27 22:45:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Suspend to RAM regression in 2.6.28-rc2 (bisected)

On Monday, 27 of October 2008, Tomas Winkler wrote:
> On Tue, Oct 28, 2008 at 12:28 AM, Carlos R. Mafra <[email protected]> wrote:
> > On Mon 27.Oct'08 at 23:07:00 +0200, Tomas Winkler wrote:
> >>[...]
> >>
> >> Can someone try this one (it might be space broken I've just pasted
> >> that in) It's on top of 1d63e726408dfdb3e10ed8f00c383b30ebb333d3
> >> (latest linux-2.6.git)
> >
> >
> > Yes, this one also works for me (I applied it manually).
> Good, this was actually part of some older and bigger patch I wasn't
> aware it has this affect. It has some millage in iwl5000 branch.
> RFKIll went through some changes in the mainline so it wasn't merged yet.

Are you going to push this patch upstream now? It's quite important to get
this resume regression fixed ASAP.

Thanks,
Rafael