2009-01-11 19:53:15

by Zdenek Kabelac

[permalink] [raw]
Subject: 2.6.29-rc1 does not resume on Lenove T61

Hi

I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)

I've observed that suspend is unusable - it goes to sleep - the sleep
LED is on. After few secs system turns on back itself - and stays in
some frozen state
and the sleep LED is still on. Usually I get black screen - but once
I've noticed screen with this text:

x86 PAT enabled cpu0,.....
back to C
Extended CMOS year 2000

but most probably this is not very helpful - however it's kind of hard
to bisect kernel, because patches in the transition from
2.6.28->2.6.29-rc1 usually caused a lot of unstable behavior on my
T61.

So are there any suspected patches I could try to revert directly ?
(I would have assumed some new ACPI controlling for thinkpad - thought
I don't understand why it's get broken over and over again)

Zdenek


2009-01-11 20:15:51

by Maciej Rutecki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/11 Zdenek Kabelac <[email protected]>:
> Hi
>
> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
>
> I've observed that suspend is unusable - it goes to sleep - the sleep
> LED is on. After few secs system turns on back itself - and stays in
> some frozen state

A have similar situation, one difference: I get blank screen during
resume from suspend to ram. Also sometimes, like You, system turns on
back itself.

[...]

> but most probably this is not very helpful - however it's kind of hard
> to bisect kernel, because patches in the transition from
> 2.6.28->2.6.29-rc1 usually caused a lot of unstable behavior on my
> T61.

The same. Suspend to disk die on this message:
http://www.unixy.pl/maciek/download/kernel/2.6.29-rc1/pc/img_0002.jpg
(this is during go to suspend to disk, NOT resume)

I try bisect, but I got many other problems, like this:
http://www.unixy.pl/maciek/download/kernel/2.6.29-rc1/pc/img_0003.jpg

So, I cannot finish bisect.

>

Config, dmesg:
http://www.unixy.pl/maciek/download/kernel/2.6.29-rc1/pc/


--
Maciej Rutecki
http://www.maciek.unixy.pl

2009-01-11 22:59:57

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/11 Maciej Rutecki <[email protected]>:
> 2009/1/11 Zdenek Kabelac <[email protected]>:
>> Hi
>>
>> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
>>
>> I've observed that suspend is unusable - it goes to sleep - the sleep
>> LED is on. After few secs system turns on back itself - and stays in
>> some frozen state
>
> A have similar situation, one difference: I get blank screen during
> resume from suspend to ram. Also sometimes, like You, system turns on
> back itself.
>

So it looks like reverting this commit:

http://marc.info/?l=linux-kernel&m=123140019117968&w=4
(6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
(which is already a tracked regression)
fixes the problem with auto-resume

But the problem with deadlock in the resume phase is still there.

Zdenek

2009-01-12 00:48:27

by Heiko Carstens

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Sun, Jan 11, 2009 at 09:15:38PM +0100, Maciej Rutecki wrote:
> 2009/1/11 Zdenek Kabelac <[email protected]>:
> > Hi
> >
> > I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
> >
> > I've observed that suspend is unusable - it goes to sleep - the sleep
> > LED is on. After few secs system turns on back itself - and stays in
> > some frozen state
>
> A have similar situation, one difference: I get blank screen during
> resume from suspend to ram. Also sometimes, like You, system turns on
> back itself.
>
> [...]
>
> > but most probably this is not very helpful - however it's kind of hard
> > to bisect kernel, because patches in the transition from
> > 2.6.28->2.6.29-rc1 usually caused a lot of unstable behavior on my
> > T61.
>
> The same. Suspend to disk die on this message:
> http://www.unixy.pl/maciek/download/kernel/2.6.29-rc1/pc/img_0002.jpg
> (this is during go to suspend to disk, NOT resume)
>
> I try bisect, but I got many other problems, like this:
> http://www.unixy.pl/maciek/download/kernel/2.6.29-rc1/pc/img_0003.jpg
>
> So, I cannot finish bisect.

The bug seen in img_0003.jpg is fixed with

a0e280e0f33f6c859a235fb69a875ed8f3420388

So you could continue bisecting if you would apply the patch, compile
the kernel and test it, and revert the patch again before marking the
kernel good/bad.

2009-01-12 08:03:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Sunday 11 January 2009, Zdenek Kabelac wrote:
> 2009/1/11 Maciej Rutecki <[email protected]>:
> > 2009/1/11 Zdenek Kabelac <[email protected]>:
> >> Hi
> >>
> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
> >>
> >> I've observed that suspend is unusable - it goes to sleep - the sleep
> >> LED is on. After few secs system turns on back itself - and stays in
> >> some frozen state
> >
> > A have similar situation, one difference: I get blank screen during
> > resume from suspend to ram. Also sometimes, like You, system turns on
> > back itself.
> >
>
> So it looks like reverting this commit:
>
> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
> (which is already a tracked regression)
> fixes the problem with auto-resume
>
> But the problem with deadlock in the resume phase is still there.

Please check if unloading all of the USB controller modules before suspend
helps.

Thanks,
Rafael

2009-01-12 09:16:08

by Maciej Rutecki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/12 Rafael J. Wysocki <[email protected]>:

>
> Please check if unloading all of the USB controller modules before suspend
> helps.
>
> Thanks,
> Rafael
>

I try remowe usbhid, hid, ehci_hcd, uhci_hcd, psmouse (s2ram and
s2disk doesn't work), but I had problem with unloading usbcore:
ERROR: Module usbcore is in use

lsmod, after trying unload modules:
Module Size Used by
i915 139656 1
drm 149888 2 i915
tun 10820 0
acpi_cpufreq 7532 0
xt_tcpudp 2848 20
xt_limit 1956 1
xt_state 1888 3
iptable_nat 4832 0
nf_nat 18324 1 iptable_nat
nf_conntrack_ipv4 13324 6 iptable_nat,nf_nat
nf_conntrack 66248 4 xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1760 1 nf_conntrack_ipv4
iptable_filter 2496 1
ip_tables 11408 2 iptable_nat,iptable_filter
x_tables 15076 5
xt_tcpudp,xt_limit,xt_state,iptable_nat,ip_tables
ppdev 7300 0
lp 8708 0
aes_i586 7584 2
aes_generic 28160 1 aes_i586
cbc 3264 2
dm_crypt 12356 1
dm_mod 51716 5 dm_crypt
nvram 7116 0
fuse 54268 1
coretemp 5728 0
it87 20080 0
hwmon_vid 2976 1 it87
loop 14316 0
snd_hda_codec_realtek 188164 1
snd_hda_intel 23752 1
snd_hda_codec 59712 2 snd_hda_codec_realtek,snd_hda_intel
snd_pcm_oss 38560 0
snd_mixer_oss 14752 1 snd_pcm_oss
snd_pcm 74404 3 snd_hda_intel,snd_hda_codec,snd_pcm_oss
snd_seq_dummy 2596 0
snd_seq_oss 30208 0
snd_seq_midi 6144 0
snd_rawmidi 21088 1 snd_seq_midi
snd_seq_midi_event 6880 2 snd_seq_oss,snd_seq_midi
snd_seq 49808 6
snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq_midi_event
snd_timer 20680 2 snd_pcm,snd_seq
snd_seq_device 6860 5
snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_rawmidi,snd_seq
snd 55588 13
snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_seq_oss,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
8139too 23488 0
soundcore 7232 1 snd
rtc_cmos 10380 0
8139cp 18976 0
i2c_i801 8592 0
iTCO_wdt 10500 0
snd_page_alloc 8808 2 snd_hda_intel,snd_pcm
button 5840 0
rtc_core 16668 1 rtc_cmos
rtc_lib 2912 1 rtc_core
r8169 31652 0
mii 5344 3 8139too,8139cp,r8169
usbcore 146800 1
parport_pc 21636 1
parport 23424 3 ppdev,lp,parport_pc
evdev 9472 2

How to force unload (safe) usbcore module?

--
Maciej Rutecki
http://www.maciek.unixy.pl

2009-01-12 09:23:21

by Oliver Neukum

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

Am Monday 12 January 2009 10:15:48 schrieb Maciej Rutecki:
> I try remowe usbhid, hid, ehci_hcd, uhci_hcd, psmouse (s2ram and
> s2disk doesn't work), but I had problem with unloading usbcore:
> ERROR: Module usbcore is in use

That's OK. Without controller drivers usbcore does nothing.
Your problem is not related to usb.

(Most likely it was busy because you had usbfs mounted)

Regards
Oliver

2009-01-12 12:14:59

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/12 Rafael J. Wysocki <[email protected]>:
> On Sunday 11 January 2009, Zdenek Kabelac wrote:
>> 2009/1/11 Maciej Rutecki <[email protected]>:
>> > 2009/1/11 Zdenek Kabelac <[email protected]>:
>> >> Hi
>> >>
>> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
>> >>
>> >> I've observed that suspend is unusable - it goes to sleep - the sleep
>> >> LED is on. After few secs system turns on back itself - and stays in
>> >> some frozen state
>> >
>> > A have similar situation, one difference: I get blank screen during
>> > resume from suspend to ram. Also sometimes, like You, system turns on
>> > back itself.
>> >
>>
>> So it looks like reverting this commit:
>>
>> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
>> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
>> (which is already a tracked regression)
>> fixes the problem with auto-resume
>>
>> But the problem with deadlock in the resume phase is still there.
>
> Please check if unloading all of the USB controller modules before suspend
> helps.

I've booted to single mode without usbcore module (thus any load of
other usb modules fails)
(removed from initramdisk as well)

This time the resume stops with these 3 lines (I'm using
no_console_suspend kernel option):

....
thinkpad_acpi thinkpad_acpi: EARLY resume
thinkpad_hwmon thinkpad_hwmon: EARLY resume
Enabling non-boot CPUs...

And after this it stays in deadlock - LED for sleep is still being on

Zdenek

2009-01-12 12:41:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Monday 12 January 2009, Zdenek Kabelac wrote:
> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> > On Sunday 11 January 2009, Zdenek Kabelac wrote:
> >> 2009/1/11 Maciej Rutecki <[email protected]>:
> >> > 2009/1/11 Zdenek Kabelac <[email protected]>:
> >> >> Hi
> >> >>
> >> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
> >> >>
> >> >> I've observed that suspend is unusable - it goes to sleep - the sleep
> >> >> LED is on. After few secs system turns on back itself - and stays in
> >> >> some frozen state
> >> >
> >> > A have similar situation, one difference: I get blank screen during
> >> > resume from suspend to ram. Also sometimes, like You, system turns on
> >> > back itself.
> >> >
> >>
> >> So it looks like reverting this commit:
> >>
> >> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
> >> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
> >> (which is already a tracked regression)
> >> fixes the problem with auto-resume
> >>
> >> But the problem with deadlock in the resume phase is still there.
> >
> > Please check if unloading all of the USB controller modules before suspend
> > helps.
>
> I've booted to single mode without usbcore module (thus any load of
> other usb modules fails)
> (removed from initramdisk as well)
>
> This time the resume stops with these 3 lines (I'm using
> no_console_suspend kernel option):
>
> ....
> thinkpad_acpi thinkpad_acpi: EARLY resume
> thinkpad_hwmon thinkpad_hwmon: EARLY resume
> Enabling non-boot CPUs...

So it seems we have broken CPU hotplug again.

Does disabling/enabling CPU1 using
/sys/devices/system/cpu/cpu1/online work?

If it does, please boot with 'no_console_suspend' in the kernel command line,
run:

# echo core > /sys/power/pm_test
# echo 8 > /proc/sys/kernel/printk
# echo mem > /sys/power/state

and see what happens (you need to have PM_DEBUG set in the kernel .config).

Please send dmesg output generated right after the above (if it works).

Thanks,
Rafael

2009-01-12 12:50:29

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/12 Rafael J. Wysocki <[email protected]>:
> On Monday 12 January 2009, Zdenek Kabelac wrote:
>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>> > On Sunday 11 January 2009, Zdenek Kabelac wrote:
>> >> 2009/1/11 Maciej Rutecki <[email protected]>:
>> >> > 2009/1/11 Zdenek Kabelac <[email protected]>:
>> >> >> Hi
>> >> >>
>> >> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
>> >> >>
>> >> >> I've observed that suspend is unusable - it goes to sleep - the sleep
>> >> >> LED is on. After few secs system turns on back itself - and stays in
>> >> >> some frozen state
>> >> >
>> >> > A have similar situation, one difference: I get blank screen during
>> >> > resume from suspend to ram. Also sometimes, like You, system turns on
>> >> > back itself.
>> >> >
>> >>
>> >> So it looks like reverting this commit:
>> >>
>> >> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
>> >> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
>> >> (which is already a tracked regression)
>> >> fixes the problem with auto-resume
>> >>
>> >> But the problem with deadlock in the resume phase is still there.
>> >
>> > Please check if unloading all of the USB controller modules before suspend
>> > helps.
>>
>> I've booted to single mode without usbcore module (thus any load of
>> other usb modules fails)
>> (removed from initramdisk as well)
>>
>> This time the resume stops with these 3 lines (I'm using
>> no_console_suspend kernel option):
>>
>> ....
>> thinkpad_acpi thinkpad_acpi: EARLY resume
>> thinkpad_hwmon thinkpad_hwmon: EARLY resume
>> Enabling non-boot CPUs...
>
> So it seems we have broken CPU hotplug again.
>
> Does disabling/enabling CPU1 using
> /sys/devices/system/cpu/cpu1/online work?
>
> If it does, please boot with 'no_console_suspend' in the kernel command line,
> run:
>
> # echo core > /sys/power/pm_test
> # echo 8 > /proc/sys/kernel/printk
> # echo mem > /sys/power/state
>
> and see what happens (you need to have PM_DEBUG set in the kernel .config).
>
> Please send dmesg output generated right after the above (if it works).


I've taken from another Ingo's thread the idea to revert patch:

7503bfbae89eba07b46441a5d1594647f6b8ab7d


With this patch reverted and o/e/uhci_hcd & usbhid modules removed
before pm-suspend
(usbcore could be loaded, I've not trace which one of those usb
modules makes the problem)
my system resumes properl again.

Zdenek

PS: I'll do the above 'echo' trace later (being busy right now).

2009-01-12 17:19:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Monday 12 January 2009, Zdenek Kabelac wrote:
> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> > On Monday 12 January 2009, Zdenek Kabelac wrote:
> >> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> >> > On Sunday 11 January 2009, Zdenek Kabelac wrote:
> >> >> 2009/1/11 Maciej Rutecki <[email protected]>:
> >> >> > 2009/1/11 Zdenek Kabelac <[email protected]>:
> >> >> >> Hi
> >> >> >>
> >> >> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
> >> >> >>
> >> >> >> I've observed that suspend is unusable - it goes to sleep - the sleep
> >> >> >> LED is on. After few secs system turns on back itself - and stays in
> >> >> >> some frozen state
> >> >> >
> >> >> > A have similar situation, one difference: I get blank screen during
> >> >> > resume from suspend to ram. Also sometimes, like You, system turns on
> >> >> > back itself.
> >> >> >
> >> >>
> >> >> So it looks like reverting this commit:
> >> >>
> >> >> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
> >> >> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
> >> >> (which is already a tracked regression)
> >> >> fixes the problem with auto-resume
> >> >>
> >> >> But the problem with deadlock in the resume phase is still there.
> >> >
> >> > Please check if unloading all of the USB controller modules before suspend
> >> > helps.
> >>
> >> I've booted to single mode without usbcore module (thus any load of
> >> other usb modules fails)
> >> (removed from initramdisk as well)
> >>
> >> This time the resume stops with these 3 lines (I'm using
> >> no_console_suspend kernel option):
> >>
> >> ....
> >> thinkpad_acpi thinkpad_acpi: EARLY resume
> >> thinkpad_hwmon thinkpad_hwmon: EARLY resume
> >> Enabling non-boot CPUs...
> >
> > So it seems we have broken CPU hotplug again.
> >
> > Does disabling/enabling CPU1 using
> > /sys/devices/system/cpu/cpu1/online work?
> >
> > If it does, please boot with 'no_console_suspend' in the kernel command line,
> > run:
> >
> > # echo core > /sys/power/pm_test
> > # echo 8 > /proc/sys/kernel/printk
> > # echo mem > /sys/power/state
> >
> > and see what happens (you need to have PM_DEBUG set in the kernel .config).
> >
> > Please send dmesg output generated right after the above (if it works).
>
>
> I've taken from another Ingo's thread the idea to revert patch:
>
> 7503bfbae89eba07b46441a5d1594647f6b8ab7d
>
>
> With this patch reverted and o/e/uhci_hcd & usbhid modules removed
> before pm-suspend
> (usbcore could be loaded, I've not trace which one of those usb
> modules makes the problem)
> my system resumes properl again.

Sure, good idea. I've been running with this reverted recently.

> PS: I'll do the above 'echo' trace later (being busy right now).

That shouldn't be necessary if you can suspend-resume with
7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
modules unloaded.

Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
and see if suspend-resume works in this configuration.

Thanks,
Rafael

2009-01-13 14:05:05

by Michal Hocko

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

Hi,

I don't have Lenovo T61 but I seem to experience similar resume problem
with Fujitsu Siemens E series notebook except for auto-resume issue
which I have never experienced.

On Mon 12-01-09 13:50:12, Zdenek Kabelac wrote:
> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> > On Monday 12 January 2009, Zdenek Kabelac wrote:
> >> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> >> > On Sunday 11 January 2009, Zdenek Kabelac wrote:
> >> >> 2009/1/11 Maciej Rutecki <[email protected]>:
> >> >> > 2009/1/11 Zdenek Kabelac <[email protected]>:
> >> >> >> Hi
> >> >> >>
> >> >> >> I've booted and tested 2.6.29-rc1 (c59765042f53a79a7a65585042ff463b69cb248c)
> >> >> >>
> >> >> >> I've observed that suspend is unusable - it goes to sleep - the sleep
> >> >> >> LED is on. After few secs system turns on back itself - and stays in
> >> >> >> some frozen state
> >> >> >
> >> >> > A have similar situation, one difference: I get blank screen during
> >> >> > resume from suspend to ram. Also sometimes, like You, system turns on
> >> >> > back itself.
> >> >> >
> >> >>
> >> >> So it looks like reverting this commit:
> >> >>
> >> >> http://marc.info/?l=linux-kernel&m=123140019117968&w=4
> >> >> (6fd9086a518d4f14213a32fe6c9ac17fabebbc1e)
> >> >> (which is already a tracked regression)
> >> >> fixes the problem with auto-resume
> >> >>
> >> >> But the problem with deadlock in the resume phase is still there.
> >> >
> >> > Please check if unloading all of the USB controller modules before suspend
> >> > helps.
> >>
> >> I've booted to single mode without usbcore module (thus any load of
> >> other usb modules fails)
> >> (removed from initramdisk as well)
> >>
> >> This time the resume stops with these 3 lines (I'm using
> >> no_console_suspend kernel option):
> >>
> >> ....
> >> thinkpad_acpi thinkpad_acpi: EARLY resume
> >> thinkpad_hwmon thinkpad_hwmon: EARLY resume
> >> Enabling non-boot CPUs...
> >
> > So it seems we have broken CPU hotplug again.
> >
> > Does disabling/enabling CPU1 using
> > /sys/devices/system/cpu/cpu1/online work?
> >
> > If it does, please boot with 'no_console_suspend' in the kernel command line,
> > run:
> >
> > # echo core > /sys/power/pm_test
> > # echo 8 > /proc/sys/kernel/printk
> > # echo mem > /sys/power/state
> >
> > and see what happens (you need to have PM_DEBUG set in the kernel .config).
> >
> > Please send dmesg output generated right after the above (if it works).
>
>
> I've taken from another Ingo's thread the idea to revert patch:
>
> 7503bfbae89eba07b46441a5d1594647f6b8ab7d
>
>
> With this patch reverted and o/e/uhci_hcd & usbhid modules removed
> before pm-suspend
> (usbcore could be loaded, I've not trace which one of those usb
> modules makes the problem)
> my system resumes properl again.

This patch reverted helped to get my machine back up (if I suspend from X)
from s2r but screen is blank until I try to get to text console and
back.

I have tried to run the test suggested by Rafael, but I am not able to
resume (see attached suspend&resume log with Sysrq+T at the end).
I can see the following BUG during resume:
[...]
platform dock.1: LATE suspend
platform dock.0: LATE suspend
suspend debug: Waiting for 5 seconds.
BUG: using smp_processor_id() in preemptible [00000000] code: sh/2497
caller is retrigger_next_event+0x12/0xa5
Pid: 2497, comm: sh Tainted: G W 2.6.29-rc1-resume-fix #8
Call Trace:
[<c0401662>] ? printk+0xf/0x11
[<c023148a>] debug_smp_processor_id+0xa2/0xb8
[<c0137f10>] retrigger_next_event+0x12/0xa5
[<c0403d79>] ? _spin_unlock+0xf/0x23
[<c0138059>] hres_timers_resume+0xa/0xc
[<c013b7ee>] timekeeping_resume+0xf9/0xff
[<c028d600>] __sysdev_resume+0x14/0x38
[<c028d645>] sysdev_resume+0x21/0x54
[<c0291e90>] device_power_up+0xb/0x15
[<c0146fd2>] suspend_devices_and_enter+0xf7/0x159
[<c0147189>] enter_state+0x130/0x190
[<c0147278>] state_store+0x8f/0xa2
[<c01471e9>] ? state_store+0x0/0xa2
[<c0229505>] kobj_attr_store+0x1a/0x22
[<c01b4e07>] sysfs_write_file+0xb4/0xdf
[<c01b4d53>] ? sysfs_write_file+0x0/0xdf
[<c017de95>] vfs_write+0x8a/0x12e
[<c017dfd2>] sys_write+0x3b/0x60
[<c0102f71>] sysenter_do_call+0x12/0x25
[...]

I am not able to resume from the text console (e.g. with init=/bin/sh
results in the blanks screen) but when I resume from X and try to change
to a text console I cannot see anything (X seems to work correctly).

>
> Zdenek
>
> PS: I'll do the above 'echo' trace later (being busy right now).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic


Attachments:
(No filename) (4.71 kB)
resume-fix-debug.log (51.99 kB)
config-2.6.29-rc1-resume-fix (59.34 kB)
Download all attachments

2009-01-13 22:37:01

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/12 Rafael J. Wysocki <[email protected]>:
> On Monday 12 January 2009, Zdenek Kabelac wrote:

> Sure, good idea. I've been running with this reverted recently.
>
>> PS: I'll do the above 'echo' trace later (being busy right now).
>
> That shouldn't be necessary if you can suspend-resume with
> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
> modules unloaded.
>
> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
> and see if suspend-resume works in this configuration.
>

Hi

So I've check some find /sys/device | grep usb | grep power/wakeup
and there was no difference.
I've updated to latest git to be in sync
(e0b325d310a6b11f1538413fd557d2eb98f2fae5)
I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.

And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
keep my suspend/resume sequence working. (Though I would have say,
that now it takes fairly noticable time to get keyboard and synaptics
usable - but it might be connected with my move to evdev and hal... :)
)

So I'm adding cc: to David - maybe he has some suspected patches for
ehci_hcd ? (as doing a bisect in such a broken merge window is going
to give me probably a lot of unsable kernels nowdays....)

Zdenek

2009-01-13 22:41:59

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/13 Zdenek Kabelac <[email protected]>:
> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>
>> Sure, good idea. I've been running with this reverted recently.
>>
>>> PS: I'll do the above 'echo' trace later (being busy right now).
>>
>> That shouldn't be necessary if you can suspend-resume with
>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>> modules unloaded.
>>
>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>> and see if suspend-resume works in this configuration.
>>
>
> Hi
>
> So I've check some find /sys/device | grep usb | grep power/wakeup
> and there was no difference.
> I've updated to latest git to be in sync
> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>
> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
> keep my suspend/resume sequence working. (Though I would have say,
> that now it takes fairly noticable time to get keyboard and synaptics
> usable - but it might be connected with my move to evdev and hal... :)
> )
>
> So I'm adding cc: to David - maybe he has some suspected patches for
> ehci_hcd ? (as doing a bisect in such a broken merge window is going
> to give me probably a lot of unsable kernels nowdays....)
>

And I've forget to append trace from supend /resume with INFO trace:
(which might be a part of problem??)

acpi PNP0C14:00: legacy suspend
acpi device:2a: legacy suspend
acpi device:29: legacy suspend
acpi device:28: legacy suspend
acpi device:27: legacy suspend
acpi device:26: legacy suspend
acpi device:25: legacy suspend
acpi device:24: legacy suspend
acpi device:23: legacy suspend
acpi device:22: legacy suspend
acpi device:21: legacy suspend
acpi device:20: legacy suspend
acpi device:1f: legacy suspend
acpi device:1e: legacy suspend
acpi device:1d: legacy suspend
acpi device:1c: legacy suspend
acpi device:1b: legacy suspend
acpi device:1a: legacy suspend
acpi device:19: legacy suspend
acpi device:18: legacy suspend
acpi device:17: legacy suspend
acpi device:16: legacy suspend
acpi device:15: legacy suspend
acpi device:14: legacy suspend
acpi device:13: legacy suspend
acpi device:12: legacy suspend
acpi device:11: legacy suspend
acpi device:10: legacy suspend
acpi device:0f: legacy suspend
acpi device:0e: legacy suspend
acpi device:0d: legacy suspend
acpi device:0c: legacy suspend
acpi device:0b: legacy suspend
acpi device:0a: legacy suspend
acpi device:09: legacy suspend
acpi device:08: legacy suspend
acpi device:07: legacy suspend
acpi device:06: legacy suspend
acpi device:05: legacy suspend
acpi device:04: legacy suspend
acpi device:03: legacy suspend
acpi device:02: legacy suspend
thinkpad_hotkey IBM0068:00: legacy suspend
ac ACPI0003:00: legacy suspend
battery PNP0C0A:00: legacy suspend
power LNXPOWER:00: legacy suspend
ec PNP0C09:00: legacy suspend
acpi ATM1200:00: legacy suspend
acpi IBM0057:00: legacy suspend
acpi PNP0303:00: legacy suspend
acpi PNP0B00:00: legacy suspend
acpi PNP0C04:00: legacy suspend
acpi PNP0800:00: legacy suspend
acpi PNP0200:00: legacy suspend
acpi PNP0103:00: legacy suspend
acpi PNP0100:00: legacy suspend
acpi PNP0000:00: legacy suspend
acpi PNP0C02:00: legacy suspend
acpi device:01: legacy suspend
pci_root PNP0A08:00: legacy suspend
button PNP0C0E:00: legacy suspend
button PNP0C0D:00: legacy suspend
acpi PNP0C01:00: legacy suspend
pci_link PNP0C0F:07: legacy suspend
pci_link PNP0C0F:06: legacy suspend
pci_link PNP0C0F:05: legacy suspend
pci_link PNP0C0F:04: legacy suspend
pci_link PNP0C0F:03: legacy suspend
pci_link PNP0C0F:02: legacy suspend
pci_link PNP0C0F:01: legacy suspend
pci_link PNP0C0F:00: legacy suspend
acpi device:00: legacy suspend
processor ACPI_CPU:01: legacy suspend
processor ACPI_CPU:00: legacy suspend
button LNXPWRBN:00: legacy suspend
acpi LNXSYSTM:00: legacy suspend
ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-rc1 #9
-------------------------------------------------------
pm-suspend/1734 is trying to acquire lock:
(&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff80499fab>]
lock_policy_rwsem_write+0x4b/0x90

but task is already holding lock:
(&cpu_hotplug.lock){--..}, at: [<ffffffff80246742>] cpu_hotplug_begin+0x22/0x60

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&cpu_hotplug.lock){--..}:
[<ffffffff80270bf6>] __lock_acquire+0x1416/0x1db0
[<ffffffff80271621>] lock_acquire+0x91/0xc0
[<ffffffff8053d66c>] mutex_lock_nested+0xec/0x360
[<ffffffff8024695a>] get_online_cpus+0x3a/0x50
[<ffffffff802593c7>] work_on_cpu+0x67/0xb0
[<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
[<ffffffff80499bf8>] __cpufreq_driver_getavg+0x78/0x80
[<ffffffff8049bc7c>] do_dbs_timer+0x2ac/0x390
[<ffffffff80258535>] run_workqueue+0x105/0x240
[<ffffffff8025871f>] worker_thread+0xaf/0x130
[<ffffffff8025cc59>] kthread+0x49/0x90
[<ffffffff8020d5fa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
[<ffffffff80270dbc>] __lock_acquire+0x15dc/0x1db0
[<ffffffff80271621>] lock_acquire+0x91/0xc0
[<ffffffff8053df78>] down_write+0x48/0x80
[<ffffffff80499fab>] lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053b57d>] cpufreq_cpu_callback+0x6d/0x90
[<ffffffff80542f85>] notifier_call_chain+0x65/0xa0
[<ffffffff80261b39>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8052a129>] _cpu_down+0xa9/0x2f0
[<ffffffff8024683d>] disable_nonboot_cpus+0xbd/0x140
[<ffffffff8027e715>] suspend_devices_and_enter+0x115/0x1c0
[<ffffffff8027e956>] enter_state+0x166/0x1e0
[<ffffffff8027ea8a>] state_store+0xba/0x100
[<ffffffff803a1cb7>] kobj_attr_store+0x17/0x20
[<ffffffff8033512a>] sysfs_write_file+0xca/0x140
[<ffffffff802db22b>] vfs_write+0xcb/0x190
[<ffffffff802db3e0>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by pm-suspend/1734:
#0: (&buffer->mutex){--..}, at: [<ffffffff803350a3>]
sysfs_write_file+0x43/0x140
#1: (pm_mutex){--..}, at: [<ffffffff8027e861>] enter_state+0x71/0x1e0
#2: (cpu_add_remove_lock){--..}, at: [<ffffffff802467d3>]
disable_nonboot_cpus+0x53/0x140
#3: (&cpu_hotplug.lock){--..}, at: [<ffffffff80246742>]
cpu_hotplug_begin+0x22/0x60

stack backtrace:
Pid: 1734, comm: pm-suspend Not tainted 2.6.29-rc1 #9
Call Trace:
[<ffffffff8026f340>] print_circular_bug_tail+0xe0/0xf0
[<ffffffff80270dbc>] __lock_acquire+0x15dc/0x1db0
[<ffffffff8053f752>] ? _spin_unlock_irq+0x32/0x50
[<ffffffff8026ecba>] ? trace_hardirqs_on_caller+0x16a/0x1d0
[<ffffffff803a0539>] ? __next_cpu+0x19/0x30
[<ffffffff80258de9>] ? __cancel_work_timer+0x129/0x230
[<ffffffff80258d32>] ? __cancel_work_timer+0x72/0x230
[<ffffffff80271621>] lock_acquire+0x91/0xc0
[<ffffffff80499fab>] ? lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053df78>] down_write+0x48/0x80
[<ffffffff80499fab>] ? lock_policy_rwsem_write+0x4b/0x90
[<ffffffff80499fab>] lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053b57d>] cpufreq_cpu_callback+0x6d/0x90
[<ffffffff80542f85>] notifier_call_chain+0x65/0xa0
[<ffffffff80261b39>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8052a129>] _cpu_down+0xa9/0x2f0
[<ffffffff8024683d>] disable_nonboot_cpus+0xbd/0x140
[<ffffffff8027e715>] suspend_devices_and_enter+0x115/0x1c0
[<ffffffff8027e956>] enter_state+0x166/0x1e0
[<ffffffff8027ea8a>] state_store+0xba/0x100
[<ffffffff803a1cb7>] kobj_attr_store+0x17/0x20
[<ffffffff8033512a>] sysfs_write_file+0xca/0x140
[<ffffffff802db22b>] vfs_write+0xcb/0x190
[<ffffffff802db3e0>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
CPU1 is down
thinkpad_hwmon thinkpad_hwmon: LATE suspend
thinkpad_acpi thinkpad_acpi: LATE suspend
platform regulatory.0: LATE suspend
iTCO_wdt iTCO_wdt: LATE suspend
platform hdaps: LATE suspend
i8042 i8042: LATE suspend
serial8250 serial8250: LATE suspend
platform vesafb.0: LATE suspend
platform pcspkr: LATE suspend
pci 0000:15:00.5: LATE suspend
pci 0000:15:00.4: LATE suspend
pci 0000:15:00.3: LATE suspend
sdhci-pci 0000:15:00.2: LATE suspend
pci 0000:15:00.0: LATE suspend
iwl3945 0000:03:00.0: LATE suspend
i801_smbus 0000:00:1f.3: LATE suspend
ahci 0000:00:1f.2: LATE suspend
ata_piix 0000:00:1f.1: LATE suspend
pci 0000:00:1f.0: LATE suspend
pci 0000:00:1e.0: LATE suspend
pci 0000:00:1d.7: LATE suspend
uhci_hcd 0000:00:1d.2: LATE suspend
uhci_hcd 0000:00:1d.2: --> PCI D0 legacy
uhci_hcd 0000:00:1d.1: LATE suspend
uhci_hcd 0000:00:1d.1: --> PCI D0 legacy
uhci_hcd 0000:00:1d.0: LATE suspend
uhci_hcd 0000:00:1d.0: --> PCI D0 legacy
pcieport-driver 0000:00:1c.4: LATE suspend
pcieport-driver 0000:00:1c.3: LATE suspend
pcieport-driver 0000:00:1c.2: LATE suspend
pcieport-driver 0000:00:1c.1: LATE suspend
pcieport-driver 0000:00:1c.0: LATE suspend
HDA Intel 0000:00:1b.0: LATE suspend
pci 0000:00:1a.7: LATE suspend
uhci_hcd 0000:00:1a.1: LATE suspend
uhci_hcd 0000:00:1a.1: --> PCI D0 legacy
uhci_hcd 0000:00:1a.0: LATE suspend
uhci_hcd 0000:00:1a.0: --> PCI D0 legacy
e1000e 0000:00:19.0: LATE suspend, may wakeup
pci 0000:00:02.1: LATE suspend
pci 0000:00:02.0: LATE suspend
agpgart-intel 0000:00:00.0: LATE suspend
platform dock.2: LATE suspend
platform dock.1: LATE suspend
platform dock.0: LATE suspend
Extended CMOS year: 2000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Back to C!
Extended CMOS year: 2000
platform dock.0: EARLY resume
platform dock.1: EARLY resume
platform dock.2: EARLY resume
agpgart-intel 0000:00:00.0: EARLY resume
pci 0000:00:02.0: EARLY resume
pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900007,
writing 0x900403)
pci 0000:00:02.1: EARLY resume
pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000,
writing 0x900007)
e1000e 0000:00:19.0: EARLY resume
uhci_hcd 0000:00:1a.0: EARLY resume
uhci_hcd 0000:00:1a.0: PCI legacy resume
uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was
0x2800005, writing 0x2800001)
uhci_hcd 0000:00:1a.1: EARLY resume
uhci_hcd 0000:00:1a.1: PCI legacy resume
uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was
0x2800005, writing 0x2800001)
pci 0000:00:1a.7: EARLY resume
pci 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900106,
writing 0x2900102)
HDA Intel 0000:00:1b.0: EARLY resume
HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was
0x100106, writing 0x100102)
pcieport-driver 0000:00:1c.0: EARLY resume
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7
(was 0x20002020, writing 0x2020)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1
(was 0x100107, writing 0x100507)
pcieport-driver 0000:00:1c.1: EARLY resume
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1
(was 0x100107, writing 0x100507)
pcieport-driver 0000:00:1c.2: EARLY resume
pcieport-driver 0000:00:1c.2: restoring config space at offset 0x7
(was 0x20004040, writing 0x4040)
pcieport-driver 0000:00:1c.2: restoring config space at offset 0x1
(was 0x100107, writing 0x100507)
pcieport-driver 0000:00:1c.3: EARLY resume
pcieport-driver 0000:00:1c.3: restoring config space at offset 0x7
(was 0x20005050, writing 0x5050)
pcieport-driver 0000:00:1c.3: restoring config space at offset 0x1
(was 0x100107, writing 0x100507)
pcieport-driver 0000:00:1c.4: EARLY resume
pcieport-driver 0000:00:1c.4: restoring config space at offset 0x7
(was 0x20006060, writing 0x6060)
pcieport-driver 0000:00:1c.4: restoring config space at offset 0x1
(was 0x100107, writing 0x100507)
uhci_hcd 0000:00:1d.0: EARLY resume
uhci_hcd 0000:00:1d.0: PCI legacy resume
uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was
0x2800005, writing 0x2800001)
uhci_hcd 0000:00:1d.1: EARLY resume
uhci_hcd 0000:00:1d.1: PCI legacy resume
uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was
0x2800005, writing 0x2800001)
uhci_hcd 0000:00:1d.2: EARLY resume
uhci_hcd 0000:00:1d.2: PCI legacy resume
uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was
0x2800005, writing 0x2800001)
pci 0000:00:1d.7: EARLY resume
pci 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900106,
writing 0x2900102)
pci 0000:00:1e.0: EARLY resume
pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100005,
writing 0x100003)
pci 0000:00:1f.0: EARLY resume
ata_piix 0000:00:1f.1: EARLY resume
ahci 0000:00:1f.2: EARLY resume
i801_smbus 0000:00:1f.3: EARLY resume
iwl3945 0000:03:00.0: EARLY resume
pci 0000:15:00.0: EARLY resume
sdhci-pci 0000:15:00.2: EARLY resume
pci 0000:15:00.3: EARLY resume
pci 0000:15:00.4: EARLY resume
pci 0000:15:00.5: EARLY resume
platform pcspkr: EARLY resume
platform vesafb.0: EARLY resume
serial8250 serial8250: EARLY resume
i8042 i8042: EARLY resume
platform hdaps: EARLY resume
iTCO_wdt iTCO_wdt: EARLY resume
platform regulatory.0: EARLY resume
thinkpad_acpi thinkpad_acpi: EARLY resume
thinkpad_hwmon thinkpad_hwmon: EARLY resume
Enabling non-boot CPUs ...
lockdep: fixing up alternatives.
SMP alternatives: switching to SMP code
Booting processor 1 APIC 0x1 ip 0x6000
Initializing CPU#1



> Zdenek
>

2009-01-19 09:55:22

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/13 Zdenek Kabelac <[email protected]>:
> 2009/1/13 Zdenek Kabelac <[email protected]>:
>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>>
>>> Sure, good idea. I've been running with this reverted recently.
>>>
>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>>>
>>> That shouldn't be necessary if you can suspend-resume with
>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>>> modules unloaded.
>>>
>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>>> and see if suspend-resume works in this configuration.
>>>
>>
>> Hi
>>
>> So I've check some find /sys/device | grep usb | grep power/wakeup
>> and there was no difference.
>> I've updated to latest git to be in sync
>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>>
>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
>> keep my suspend/resume sequence working. (Though I would have say,
>> that now it takes fairly noticable time to get keyboard and synaptics
>> usable - but it might be connected with my move to evdev and hal... :)
>> )
>>
>> So I'm adding cc: to David - maybe he has some suspected patches for
>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
>> to give me probably a lot of unsable kernels nowdays....)
>>
>
> And I've forget to append trace from supend /resume with INFO trace:
> (which might be a part of problem??)

Hi


Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)

With this kernel I still have to keep reverted patch commit:
6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
(otherwise I see the auto-wake-up immediately after suspend)

I also keep module ehci_hcd away from my kernel - so the
suspend-resume seems to be working.

I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
problems during resume
http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
ugly Ooops with my configuration.
so for now I stay with my revert/ehci fix.

Also I still get the INFO trace:
processor ACPI_CPU:01: legacy suspend
processor ACPI_CPU:00: legacy suspend
button LNXPWRBN:00: legacy suspend
acpi LNXSYSTM:00: legacy suspend
ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-rc2 #14
-------------------------------------------------------
pm-suspend/2873 is trying to acquire lock:
(&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
lock_policy_rwsem_write+0x4b/0x90

but task is already holding lock:
(&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&cpu_hotplug.lock){--..}:
[<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
[<ffffffff80271711>] lock_acquire+0x91/0xc0
[<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
[<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
[<ffffffff802594b7>] work_on_cpu+0x67/0xb0
[<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
[<ffffffff80499ec8>] __cpufreq_driver_getavg+0x78/0x80
[<ffffffff8049bf4c>] do_dbs_timer+0x2ac/0x390
[<ffffffff80258625>] run_workqueue+0x105/0x240
[<ffffffff8025880f>] worker_thread+0xaf/0x130
[<ffffffff8025cd49>] kthread+0x49/0x90
[<ffffffff8020d5fa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
[<ffffffff80270eac>] __lock_acquire+0x15dc/0x1db0
[<ffffffff80271711>] lock_acquire+0x91/0xc0
[<ffffffff8053e2a8>] down_write+0x48/0x80
[<ffffffff8049a27b>] lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053b8ad>] cpufreq_cpu_callback+0x6d/0x90
[<ffffffff805432b5>] notifier_call_chain+0x65/0xa0
[<ffffffff80261c29>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8052a469>] _cpu_down+0xa9/0x2f0
[<ffffffff8024692d>] disable_nonboot_cpus+0xbd/0x140
[<ffffffff8027e805>] suspend_devices_and_enter+0x115/0x1c0
[<ffffffff8027ea46>] enter_state+0x166/0x1e0
[<ffffffff8027eb7a>] state_store+0xba/0x100
[<ffffffff803a1e07>] kobj_attr_store+0x17/0x20
[<ffffffff8033526a>] sysfs_write_file+0xca/0x140
[<ffffffff802db34b>] vfs_write+0xcb/0x190
[<ffffffff802db500>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by pm-suspend/2873:
#0: (&buffer->mutex){--..}, at: [<ffffffff803351e3>]
sysfs_write_file+0x43/0x140
#1: (pm_mutex){--..}, at: [<ffffffff8027e951>] enter_state+0x71/0x1e0
#2: (cpu_add_remove_lock){--..}, at: [<ffffffff802468c3>]
disable_nonboot_cpus+0x53/0x140
#3: (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>]
cpu_hotplug_begin+0x22/0x60

stack backtrace:
Pid: 2873, comm: pm-suspend Not tainted 2.6.29-rc2 #14
Call Trace:
[<ffffffff8026f430>] print_circular_bug_tail+0xe0/0xf0
[<ffffffff80270eac>] __lock_acquire+0x15dc/0x1db0
[<ffffffff8053fa82>] ? _spin_unlock_irq+0x32/0x50
[<ffffffff8026edaa>] ? trace_hardirqs_on_caller+0x16a/0x1d0
[<ffffffff803a0689>] ? __next_cpu+0x19/0x30
[<ffffffff80258ed9>] ? __cancel_work_timer+0x129/0x230
[<ffffffff80258e22>] ? __cancel_work_timer+0x72/0x230
[<ffffffff80271711>] lock_acquire+0x91/0xc0
[<ffffffff8049a27b>] ? lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053e2a8>] down_write+0x48/0x80
[<ffffffff8049a27b>] ? lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8049a27b>] lock_policy_rwsem_write+0x4b/0x90
[<ffffffff8053b8ad>] cpufreq_cpu_callback+0x6d/0x90
[<ffffffff805432b5>] notifier_call_chain+0x65/0xa0
[<ffffffff80261c29>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8052a469>] _cpu_down+0xa9/0x2f0
[<ffffffff8024692d>] disable_nonboot_cpus+0xbd/0x140
[<ffffffff8027e805>] suspend_devices_and_enter+0x115/0x1c0
[<ffffffff8027ea46>] enter_state+0x166/0x1e0
[<ffffffff8027eb7a>] state_store+0xba/0x100
[<ffffffff803a1e07>] kobj_attr_store+0x17/0x20
[<ffffffff8033526a>] sysfs_write_file+0xca/0x140
[<ffffffff802db34b>] vfs_write+0xcb/0x190
[<ffffffff802db500>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
Broke affinity for irq 9
Broke affinity for irq 17
Broke affinity for irq 29
kvm: disabling virtualization on CPU1
CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
CPU1 is down
thinkpad_hwmon thinkpad_hwmon: LATE suspend
thinkpad_acpi thinkpad_acpi: LATE suspend



Zdenek

2009-01-19 16:06:58

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/19 Zdenek Kabelac <[email protected]>:
> 2009/1/13 Zdenek Kabelac <[email protected]>:
>> 2009/1/13 Zdenek Kabelac <[email protected]>:
>>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>>>
>>>> Sure, good idea. I've been running with this reverted recently.
>>>>
>>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>>>>
>>>> That shouldn't be necessary if you can suspend-resume with
>>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>>>> modules unloaded.
>>>>
>>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>>>> and see if suspend-resume works in this configuration.
>>>>
>>>
>>> Hi
>>>
>>> So I've check some find /sys/device | grep usb | grep power/wakeup
>>> and there was no difference.
>>> I've updated to latest git to be in sync
>>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
>>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>>>
>>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
>>> keep my suspend/resume sequence working. (Though I would have say,
>>> that now it takes fairly noticable time to get keyboard and synaptics
>>> usable - but it might be connected with my move to evdev and hal... :)
>>> )
>>>
>>> So I'm adding cc: to David - maybe he has some suspected patches for
>>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
>>> to give me probably a lot of unsable kernels nowdays....)
>>>
>>
>> And I've forget to append trace from supend /resume with INFO trace:
>> (which might be a part of problem??)
>
> Hi
>
>
> Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
>
> With this kernel I still have to keep reverted patch commit:
> 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> (otherwise I see the auto-wake-up immediately after suspend)
>
> I also keep module ehci_hcd away from my kernel - so the
> suspend-resume seems to be working.
>
> I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
> problems during resume
> http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
> ugly Ooops with my configuration.
> so for now I stay with my revert/ehci fix.
>
> Also I still get the INFO trace:
> processor ACPI_CPU:01: legacy suspend
> processor ACPI_CPU:00: legacy suspend
> button LNXPWRBN:00: legacy suspend
> acpi LNXSYSTM:00: legacy suspend
> ACPI: Preparing to enter system sleep state S3
> Disabling non-boot CPUs ...
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.29-rc2 #14
> -------------------------------------------------------
> pm-suspend/2873 is trying to acquire lock:
> (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
> lock_policy_rwsem_write+0x4b/0x90
>
> but task is already holding lock:
> (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&cpu_hotplug.lock){--..}:
> [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
> [<ffffffff80271711>] lock_acquire+0x91/0xc0
> [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
> [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
> [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
> [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0


Ingo,


it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.

In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
conversions - if they involve any cpu-hotplug callback paths - may
lead to similar reports (and possible lockups).


--
Best regards,
Dmitry Adamushko

2009-01-19 16:13:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61


* Dmitry Adamushko <[email protected]> wrote:

> 2009/1/19 Zdenek Kabelac <[email protected]>:
> > 2009/1/13 Zdenek Kabelac <[email protected]>:
> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
> >>>
> >>>> Sure, good idea. I've been running with this reverted recently.
> >>>>
> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
> >>>>
> >>>> That shouldn't be necessary if you can suspend-resume with
> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
> >>>> modules unloaded.
> >>>>
> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
> >>>> and see if suspend-resume works in this configuration.
> >>>>
> >>>
> >>> Hi
> >>>
> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
> >>> and there was no difference.
> >>> I've updated to latest git to be in sync
> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> >>>
> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
> >>> keep my suspend/resume sequence working. (Though I would have say,
> >>> that now it takes fairly noticable time to get keyboard and synaptics
> >>> usable - but it might be connected with my move to evdev and hal... :)
> >>> )
> >>>
> >>> So I'm adding cc: to David - maybe he has some suspected patches for
> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
> >>> to give me probably a lot of unsable kernels nowdays....)
> >>>
> >>
> >> And I've forget to append trace from supend /resume with INFO trace:
> >> (which might be a part of problem??)
> >
> > Hi
> >
> >
> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
> >
> > With this kernel I still have to keep reverted patch commit:
> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> > (otherwise I see the auto-wake-up immediately after suspend)
> >
> > I also keep module ehci_hcd away from my kernel - so the
> > suspend-resume seems to be working.
> >
> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
> > problems during resume
> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
> > ugly Ooops with my configuration.
> > so for now I stay with my revert/ehci fix.
> >
> > Also I still get the INFO trace:
> > processor ACPI_CPU:01: legacy suspend
> > processor ACPI_CPU:00: legacy suspend
> > button LNXPWRBN:00: legacy suspend
> > acpi LNXSYSTM:00: legacy suspend
> > ACPI: Preparing to enter system sleep state S3
> > Disabling non-boot CPUs ...
> >
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.29-rc2 #14
> > -------------------------------------------------------
> > pm-suspend/2873 is trying to acquire lock:
> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
> > lock_policy_rwsem_write+0x4b/0x90
> >
> > but task is already holding lock:
> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (&cpu_hotplug.lock){--..}:
> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
>
>
> Ingo,
>
>
> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
>
> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
> conversions - if they involve any cpu-hotplug callback paths - may
> lead to similar reports (and possible lockups).

Guys, could you please try the patch below? It improves work_on_cpu() to
not be dependent on the kevent workqueue.

Ingo

---------------->
>From e1d9ec6246a2668a5d037f529877efb7cf176af8 Mon Sep 17 00:00:00 2001
From: Rusty Russell <[email protected]>
Date: Fri, 16 Jan 2009 15:31:15 -0800
Subject: [PATCH] work_on_cpu: Use our own workqueue.

Impact: remove potential clashes with generic kevent workqueue

Annoyingly, some places we want to use work_on_cpu are already in
workqueues. As per Ingo's suggestion, we create a different workqueue
for work_on_cpu.

Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
---
kernel/workqueue.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a35afdb..1f0c509 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -971,6 +971,8 @@ undo:
}

#ifdef CONFIG_SMP
+static struct workqueue_struct *work_on_cpu_wq __read_mostly;
+
struct work_for_cpu {
struct work_struct work;
long (*fn)(void *);
@@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
INIT_WORK(&wfc.work, do_work_for_cpu);
wfc.fn = fn;
wfc.arg = arg;
- schedule_work_on(cpu, &wfc.work);
+ queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
flush_work(&wfc.work);

return wfc.ret;
@@ -1019,4 +1021,8 @@ void __init init_workqueues(void)
hotcpu_notifier(workqueue_cpu_callback, 0);
keventd_wq = create_workqueue("events");
BUG_ON(!keventd_wq);
+#ifdef CONFIG_SMP
+ work_on_cpu_wq = create_workqueue("work_on_cpu");
+ BUG_ON(!work_on_cpu_wq);
+#endif
}

2009-01-19 16:42:16

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/19 Ingo Molnar <[email protected]>:
>
> * Dmitry Adamushko <[email protected]> wrote:
>
>> 2009/1/19 Zdenek Kabelac <[email protected]>:
>> > 2009/1/13 Zdenek Kabelac <[email protected]>:
>> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
>> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>> >>>
>> >>>> Sure, good idea. I've been running with this reverted recently.
>> >>>>
>> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>> >>>>
>> >>>> That shouldn't be necessary if you can suspend-resume with
>> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>> >>>> modules unloaded.
>> >>>>
>> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>> >>>> and see if suspend-resume works in this configuration.
>> >>>>
>> >>>
>> >>> Hi
>> >>>
>> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
>> >>> and there was no difference.
>> >>> I've updated to latest git to be in sync
>> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
>> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>> >>>
>> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
>> >>> keep my suspend/resume sequence working. (Though I would have say,
>> >>> that now it takes fairly noticable time to get keyboard and synaptics
>> >>> usable - but it might be connected with my move to evdev and hal... :)
>> >>> )
>> >>>
>> >>> So I'm adding cc: to David - maybe he has some suspected patches for
>> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
>> >>> to give me probably a lot of unsable kernels nowdays....)
>> >>>
>> >>
>> >> And I've forget to append trace from supend /resume with INFO trace:
>> >> (which might be a part of problem??)
>> >
>> > Hi
>> >
>> >
>> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
>> >
>> > With this kernel I still have to keep reverted patch commit:
>> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>> > (otherwise I see the auto-wake-up immediately after suspend)
>> >
>> > I also keep module ehci_hcd away from my kernel - so the
>> > suspend-resume seems to be working.
>> >
>> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
>> > problems during resume
>> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
>> > ugly Ooops with my configuration.
>> > so for now I stay with my revert/ehci fix.
>> >
>> > Also I still get the INFO trace:
>> > processor ACPI_CPU:01: legacy suspend
>> > processor ACPI_CPU:00: legacy suspend
>> > button LNXPWRBN:00: legacy suspend
>> > acpi LNXSYSTM:00: legacy suspend
>> > ACPI: Preparing to enter system sleep state S3
>> > Disabling non-boot CPUs ...
>> >
>> > =======================================================
>> > [ INFO: possible circular locking dependency detected ]
>> > 2.6.29-rc2 #14
>> > -------------------------------------------------------
>> > pm-suspend/2873 is trying to acquire lock:
>> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
>> > lock_policy_rwsem_write+0x4b/0x90
>> >
>> > but task is already holding lock:
>> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
>> >
>> > which lock already depends on the new lock.
>> >
>> >
>> > the existing dependency chain (in reverse order) is:
>> >
>> > -> #1 (&cpu_hotplug.lock){--..}:
>> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
>> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
>> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
>> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
>> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
>> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
>>
>>
>> Ingo,
>>
>>
>> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
>> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
>>
>> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
>> conversions - if they involve any cpu-hotplug callback paths - may
>> lead to similar reports (and possible lockups).
>
> Guys, could you please try the patch below? It improves work_on_cpu() to
> not be dependent on the kevent workqueue.

I guess, the following patch should also be applied (since
get_online_cpus() is a culprit here):

[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu

the patch is available here:

http://lkml.indiana.edu/hypermail/linux/kernel/0901.2/00375.html

>
> Ingo
>
> ---------------->
> From e1d9ec6246a2668a5d037f529877efb7cf176af8 Mon Sep 17 00:00:00 2001
> From: Rusty Russell <[email protected]>
> Date: Fri, 16 Jan 2009 15:31:15 -0800
> Subject: [PATCH] work_on_cpu: Use our own workqueue.
>
> Impact: remove potential clashes with generic kevent workqueue
>
> Annoyingly, some places we want to use work_on_cpu are already in
> workqueues. As per Ingo's suggestion, we create a different workqueue
> for work_on_cpu.
>
> Signed-off-by: Rusty Russell <[email protected]>
> Signed-off-by: Mike Travis <[email protected]>
> ---
> kernel/workqueue.c | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index a35afdb..1f0c509 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -971,6 +971,8 @@ undo:
> }
>
> #ifdef CONFIG_SMP
> +static struct workqueue_struct *work_on_cpu_wq __read_mostly;
> +
> struct work_for_cpu {
> struct work_struct work;
> long (*fn)(void *);
> @@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> INIT_WORK(&wfc.work, do_work_for_cpu);
> wfc.fn = fn;
> wfc.arg = arg;
> - schedule_work_on(cpu, &wfc.work);
> + queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
> flush_work(&wfc.work);
>
> return wfc.ret;
> @@ -1019,4 +1021,8 @@ void __init init_workqueues(void)
> hotcpu_notifier(workqueue_cpu_callback, 0);
> keventd_wq = create_workqueue("events");
> BUG_ON(!keventd_wq);
> +#ifdef CONFIG_SMP
> + work_on_cpu_wq = create_workqueue("work_on_cpu");
> + BUG_ON(!work_on_cpu_wq);
> +#endif
> }
>



--
Best regards,
Dmitry Adamushko

2009-01-19 16:45:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61


* Dmitry Adamushko <[email protected]> wrote:

> 2009/1/19 Ingo Molnar <[email protected]>:
> >
> > * Dmitry Adamushko <[email protected]> wrote:
> >
> >> 2009/1/19 Zdenek Kabelac <[email protected]>:
> >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
> >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
> >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
> >> >>>
> >> >>>> Sure, good idea. I've been running with this reverted recently.
> >> >>>>
> >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
> >> >>>>
> >> >>>> That shouldn't be necessary if you can suspend-resume with
> >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
> >> >>>> modules unloaded.
> >> >>>>
> >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
> >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
> >> >>>> and see if suspend-resume works in this configuration.
> >> >>>>
> >> >>>
> >> >>> Hi
> >> >>>
> >> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
> >> >>> and there was no difference.
> >> >>> I've updated to latest git to be in sync
> >> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
> >> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> >> >>>
> >> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
> >> >>> keep my suspend/resume sequence working. (Though I would have say,
> >> >>> that now it takes fairly noticable time to get keyboard and synaptics
> >> >>> usable - but it might be connected with my move to evdev and hal... :)
> >> >>> )
> >> >>>
> >> >>> So I'm adding cc: to David - maybe he has some suspected patches for
> >> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
> >> >>> to give me probably a lot of unsable kernels nowdays....)
> >> >>>
> >> >>
> >> >> And I've forget to append trace from supend /resume with INFO trace:
> >> >> (which might be a part of problem??)
> >> >
> >> > Hi
> >> >
> >> >
> >> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
> >> >
> >> > With this kernel I still have to keep reverted patch commit:
> >> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> >> > (otherwise I see the auto-wake-up immediately after suspend)
> >> >
> >> > I also keep module ehci_hcd away from my kernel - so the
> >> > suspend-resume seems to be working.
> >> >
> >> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
> >> > problems during resume
> >> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
> >> > ugly Ooops with my configuration.
> >> > so for now I stay with my revert/ehci fix.
> >> >
> >> > Also I still get the INFO trace:
> >> > processor ACPI_CPU:01: legacy suspend
> >> > processor ACPI_CPU:00: legacy suspend
> >> > button LNXPWRBN:00: legacy suspend
> >> > acpi LNXSYSTM:00: legacy suspend
> >> > ACPI: Preparing to enter system sleep state S3
> >> > Disabling non-boot CPUs ...
> >> >
> >> > =======================================================
> >> > [ INFO: possible circular locking dependency detected ]
> >> > 2.6.29-rc2 #14
> >> > -------------------------------------------------------
> >> > pm-suspend/2873 is trying to acquire lock:
> >> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
> >> > lock_policy_rwsem_write+0x4b/0x90
> >> >
> >> > but task is already holding lock:
> >> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
> >> >
> >> > which lock already depends on the new lock.
> >> >
> >> >
> >> > the existing dependency chain (in reverse order) is:
> >> >
> >> > -> #1 (&cpu_hotplug.lock){--..}:
> >> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
> >> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
> >> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
> >> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
> >> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
> >> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
> >>
> >>
> >> Ingo,
> >>
> >>
> >> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
> >> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
> >>
> >> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
> >> conversions - if they involve any cpu-hotplug callback paths - may
> >> lead to similar reports (and possible lockups).
> >
> > Guys, could you please try the patch below? It improves work_on_cpu() to
> > not be dependent on the kevent workqueue.
>
> I guess, the following patch should also be applied (since
> get_online_cpus() is a culprit here):
>
> [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu
>
> the patch is available here:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0901.2/00375.html

yeah - also attached below.

Ingo

>From 68564a46976017496c2227660930d81240f82355 Mon Sep 17 00:00:00 2001
From: Rusty Russell <[email protected]>
Date: Fri, 16 Jan 2009 15:31:15 -0800
Subject: [PATCH] work_on_cpu: don't try to get_online_cpus() in work_on_cpu.

Impact: remove potential circular lock dependency with cpu hotplug lock

This has caused more problems than it solved, with a pile of cpu
hotplug locking issues.

Followup patches will get_online_cpus() in callers that need it, but
if they don't do it they're no worse than before when they were using
set_cpus_allowed without locking.

Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
---
kernel/workqueue.c | 14 ++++----------
1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2f44583..a35afdb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -991,8 +991,8 @@ static void do_work_for_cpu(struct work_struct *w)
* @fn: the function to run
* @arg: the function arg
*
- * This will return -EINVAL in the cpu is not online, or the return value
- * of @fn otherwise.
+ * This will return the value @fn returns.
+ * It is up to the caller to ensure that the cpu doesn't go offline.
*/
long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
{
@@ -1001,14 +1001,8 @@ long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
INIT_WORK(&wfc.work, do_work_for_cpu);
wfc.fn = fn;
wfc.arg = arg;
- get_online_cpus();
- if (unlikely(!cpu_online(cpu)))
- wfc.ret = -EINVAL;
- else {
- schedule_work_on(cpu, &wfc.work);
- flush_work(&wfc.work);
- }
- put_online_cpus();
+ schedule_work_on(cpu, &wfc.work);
+ flush_work(&wfc.work);

return wfc.ret;
}

2009-01-19 19:26:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Monday 19 January 2009, Ingo Molnar wrote:
>
> * Dmitry Adamushko <[email protected]> wrote:
>
> > 2009/1/19 Ingo Molnar <[email protected]>:
> > >
> > > * Dmitry Adamushko <[email protected]> wrote:
> > >
> > >> 2009/1/19 Zdenek Kabelac <[email protected]>:
> > >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
> > >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
> > >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> > >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
> > >> >>>
> > >> >>>> Sure, good idea. I've been running with this reverted recently.
> > >> >>>>
> > >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
> > >> >>>>
> > >> >>>> That shouldn't be necessary if you can suspend-resume with
> > >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
> > >> >>>> modules unloaded.
> > >> >>>>
> > >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
> > >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
> > >> >>>> and see if suspend-resume works in this configuration.
> > >> >>>>
> > >> >>>
> > >> >>> Hi
> > >> >>>
> > >> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
> > >> >>> and there was no difference.
> > >> >>> I've updated to latest git to be in sync
> > >> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
> > >> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> > >> >>>
> > >> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
> > >> >>> keep my suspend/resume sequence working. (Though I would have say,
> > >> >>> that now it takes fairly noticable time to get keyboard and synaptics
> > >> >>> usable - but it might be connected with my move to evdev and hal... :)
> > >> >>> )
> > >> >>>
> > >> >>> So I'm adding cc: to David - maybe he has some suspected patches for
> > >> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
> > >> >>> to give me probably a lot of unsable kernels nowdays....)
> > >> >>>
> > >> >>
> > >> >> And I've forget to append trace from supend /resume with INFO trace:
> > >> >> (which might be a part of problem??)
> > >> >
> > >> > Hi
> > >> >
> > >> >
> > >> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
> > >> >
> > >> > With this kernel I still have to keep reverted patch commit:
> > >> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> > >> > (otherwise I see the auto-wake-up immediately after suspend)
> > >> >
> > >> > I also keep module ehci_hcd away from my kernel - so the
> > >> > suspend-resume seems to be working.
> > >> >
> > >> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
> > >> > problems during resume
> > >> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
> > >> > ugly Ooops with my configuration.
> > >> > so for now I stay with my revert/ehci fix.
> > >> >
> > >> > Also I still get the INFO trace:
> > >> > processor ACPI_CPU:01: legacy suspend
> > >> > processor ACPI_CPU:00: legacy suspend
> > >> > button LNXPWRBN:00: legacy suspend
> > >> > acpi LNXSYSTM:00: legacy suspend
> > >> > ACPI: Preparing to enter system sleep state S3
> > >> > Disabling non-boot CPUs ...
> > >> >
> > >> > =======================================================
> > >> > [ INFO: possible circular locking dependency detected ]
> > >> > 2.6.29-rc2 #14
> > >> > -------------------------------------------------------
> > >> > pm-suspend/2873 is trying to acquire lock:
> > >> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
> > >> > lock_policy_rwsem_write+0x4b/0x90
> > >> >
> > >> > but task is already holding lock:
> > >> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
> > >> >
> > >> > which lock already depends on the new lock.
> > >> >
> > >> >
> > >> > the existing dependency chain (in reverse order) is:
> > >> >
> > >> > -> #1 (&cpu_hotplug.lock){--..}:
> > >> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
> > >> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
> > >> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
> > >> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
> > >> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
> > >> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
> > >>
> > >>
> > >> Ingo,
> > >>
> > >>
> > >> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
> > >> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
> > >>
> > >> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
> > >> conversions - if they involve any cpu-hotplug callback paths - may
> > >> lead to similar reports (and possible lockups).
> > >
> > > Guys, could you please try the patch below? It improves work_on_cpu() to
> > > not be dependent on the kevent workqueue.
> >
> > I guess, the following patch should also be applied (since
> > get_online_cpus() is a culprit here):
> >
> > [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu
> >
> > the patch is available here:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.2/00375.html
>
> yeah - also attached below.

In fact I believe all three patches in the series at
http://lkml.org/lkml/2009/1/16/377
are necessary.

Thanks,
Rafael

2009-01-19 22:31:35

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/19 Rafael J. Wysocki <[email protected]>:
> On Monday 19 January 2009, Ingo Molnar wrote:
>>
>> * Dmitry Adamushko <[email protected]> wrote:
>>
>> > 2009/1/19 Ingo Molnar <[email protected]>:
>> > >
>> > > * Dmitry Adamushko <[email protected]> wrote:
>> > >
>> > >> 2009/1/19 Zdenek Kabelac <[email protected]>:
>> > >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
>> > >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
>> > >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>> > >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>> > >> >>>
>> > >> >>>> Sure, good idea. I've been running with this reverted recently.
>> > >> >>>>
>> > >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>> > >> >>>>
>> > >> >>>> That shouldn't be necessary if you can suspend-resume with
>> > >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>> > >> >>>> modules unloaded.
>> > >> >>>>
>> > >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>> > >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>> > >> >>>> and see if suspend-resume works in this configuration.
>> > >> >>>>
>> > >> >>>
>> > >> >>> Hi
>> > >> >>>
>> > >> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
>> > >> >>> and there was no difference.
>> > >> >>> I've updated to latest git to be in sync
>> > >> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
>> > >> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>> > >> >>>
>> > >> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
>> > >> >>> keep my suspend/resume sequence working. (Though I would have say,
>> > >> >>> that now it takes fairly noticable time to get keyboard and synaptics
>> > >> >>> usable - but it might be connected with my move to evdev and hal... :)
>> > >> >>> )
>> > >> >>>
>> > >> >>> So I'm adding cc: to David - maybe he has some suspected patches for
>> > >> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
>> > >> >>> to give me probably a lot of unsable kernels nowdays....)
>> > >> >>>
>> > >> >>
>> > >> >> And I've forget to append trace from supend /resume with INFO trace:
>> > >> >> (which might be a part of problem??)
>> > >> >
>> > >> > Hi
>> > >> >
>> > >> >
>> > >> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
>> > >> >
>> > >> > With this kernel I still have to keep reverted patch commit:
>> > >> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
>> > >> > (otherwise I see the auto-wake-up immediately after suspend)
>> > >> >
>> > >> > I also keep module ehci_hcd away from my kernel - so the
>> > >> > suspend-resume seems to be working.
>> > >> >
>> > >> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
>> > >> > problems during resume
>> > >> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
>> > >> > ugly Ooops with my configuration.
>> > >> > so for now I stay with my revert/ehci fix.
>> > >> >
>> > >> > Also I still get the INFO trace:
>> > >> > processor ACPI_CPU:01: legacy suspend
>> > >> > processor ACPI_CPU:00: legacy suspend
>> > >> > button LNXPWRBN:00: legacy suspend
>> > >> > acpi LNXSYSTM:00: legacy suspend
>> > >> > ACPI: Preparing to enter system sleep state S3
>> > >> > Disabling non-boot CPUs ...
>> > >> >
>> > >> > =======================================================
>> > >> > [ INFO: possible circular locking dependency detected ]
>> > >> > 2.6.29-rc2 #14
>> > >> > -------------------------------------------------------
>> > >> > pm-suspend/2873 is trying to acquire lock:
>> > >> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
>> > >> > lock_policy_rwsem_write+0x4b/0x90
>> > >> >
>> > >> > but task is already holding lock:
>> > >> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
>> > >> >
>> > >> > which lock already depends on the new lock.
>> > >> >
>> > >> >
>> > >> > the existing dependency chain (in reverse order) is:
>> > >> >
>> > >> > -> #1 (&cpu_hotplug.lock){--..}:
>> > >> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
>> > >> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
>> > >> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
>> > >> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
>> > >> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
>> > >> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
>> > >>
>> > >>
>> > >> Ingo,
>> > >>
>> > >>
>> > >> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
>> > >> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
>> > >>
>> > >> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
>> > >> conversions - if they involve any cpu-hotplug callback paths - may
>> > >> lead to similar reports (and possible lockups).
>> > >
>> > > Guys, could you please try the patch below? It improves work_on_cpu() to
>> > > not be dependent on the kevent workqueue.
>> >
>> > I guess, the following patch should also be applied (since
>> > get_online_cpus() is a culprit here):
>> >
>> > [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu
>> >
>> > the patch is available here:
>> >
>> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.2/00375.html
>>
>> yeah - also attached below.
>
> In fact I believe all three patches in the series at
> http://lkml.org/lkml/2009/1/16/377
> are necessary.

So I've made so far 2 tests - I've removed revert of the USB commit
(6fd9086a518d) - obvisously this result
in immediate wakeup after resume.

The first Ingo's proposal patch resulted in this oops message (before suspend):

usb 1-1: uevent
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/power/state
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc rfcomm sco l2cap
autofs4 sunrpc ipv6 binfmt_misc dm_snapshot dm_mirror dm_region_hash
dm_log dm_mod rtc_cmos rtc_core rtc_lib kvm_intel kvm i915 drm
i2c_algo_bit uinput arc4 ecb snd_hda_codec_analog cryptomgr aead
snd_hda_intel crypto_blkcipher btusb snd_hda_codec crypto_hash
crypto_algapi snd_seq_oss bluetooth iwl3945 sdhci_pci
snd_seq_midi_event sdhci mmc_core snd_seq snd_seq_device snd_pcm_oss
thinkpad_acpi snd_mixer_oss snd_pcm backlight snd_timer rfkill
led_class evdev snd i2c_i801 iTCO_wdt mac80211 button psmouse
soundcore e1000e i2c_core iTCO_vendor_support sr_mod battery serio_raw
nvram cdrom lib80211 ac intel_agp snd_page_alloc cfg80211 uhci_hcd
ohci_hcd usbcore [last unloaded: microcode]
Pid: 2244, comm: NetworkManager Not tainted 2.6.29-rc2 #15
RIP: 0010:[<ffffffff8053ccf1>] [<ffffffff8053ccf1>] wait_for_common+0x131/0x190
RSP: 0018:ffff88006ae6b730 EFLAGS: 00010296
RAX: 7fffffffffffffff RBX: ffff88006ae6b748 RCX: 0000000000000003
RDX: ffffffff80a8b1f0 RSI: ffff88006f908728 RDI: ffff88006f908000
RBP: ffff88006ae6b738 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88006f908728 R11: 0000000000000001 R12: ffff880079ca1d98
R13: 0000000000000000 R14: ffff880079d66000 R15: 0000000000000000
FS: 00007faf7fca9740(0000) GS:ffffffff80914040(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006c3444 CR3: 000000007c50d000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process NetworkManager (pid: 2244, threadinfo ffff88006ae6a000, task
ffff88006f908000)
Stack:
7d6ddc008053cde8 ffff8800ffff8800 ffffffff8025a695 ffff88006b050e90
ffffffff8025a580 0000000000000000 dead4ead00000303 ffffffffffffffff
ffffffffffffffff ffffffff80970c08 0000000000000000 ffffffff805f9e8e
Call Trace:
[<ffffffff8025a695>] synchronize_rcu+0x35/0x40
[<ffffffff8025a580>] ? wakeme_after_rcu+0x0/0x10
[<ffffffff8053cd2f>] ? wait_for_common+0x16f/0x190
[<ffffffff8024b364>] ? local_bh_enable+0xa4/0x110
[<ffffffff804c8c41>] ? dev_deactivate+0x151/0x1d0
[<ffffffff804b746d>] ? dev_close+0x6d/0xd0
[<ffffffffa0102042>] ? ieee80211_stop+0x562/0x570 [mac80211]
[<ffffffffa0101b59>] ? ieee80211_stop+0x79/0x570 [mac80211]
[<ffffffff8053fa4f>] ? _spin_unlock_bh+0x2f/0x40
[<ffffffff804c8c9a>] ? dev_deactivate+0x1aa/0x1d0
[<ffffffff804b747c>] ? dev_close+0x7c/0xd0
[<ffffffff804b703d>] ? dev_change_flags+0x9d/0x1e0
[<ffffffff804c0505>] ? do_setlink+0x2b5/0x440
[<ffffffff8053fa16>] ? _read_unlock+0x26/0x30
[<ffffffff804c0865>] ? rtnl_setlink+0x115/0x160
[<ffffffff8053db44>] ? mutex_lock_nested+0x284/0x360
[<ffffffff804c17da>] ? rtnetlink_rcv+0x1a/0x40
[<ffffffff804c198d>] ? rtnetlink_rcv_msg+0x18d/0x240
[<ffffffff804c1800>] ? rtnetlink_rcv_msg+0x0/0x240
[<ffffffff804cc7d9>] ? netlink_rcv_skb+0x89/0xb0
[<ffffffff804c17e9>] ? rtnetlink_rcv+0x29/0x40
[<ffffffff804cc1d4>] ? netlink_unicast+0x2c4/0x2e0
[<ffffffff804ae4de>] ? __alloc_skb+0x6e/0x150
[<ffffffff804cc404>] ? netlink_sendmsg+0x214/0x310
[<ffffffff804a5937>] ? sock_sendmsg+0x127/0x140
[<ffffffff8025d150>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8027218b>] ? lock_release_non_nested+0x9b/0x2e0
[<ffffffff802dbce6>] ? fget_light+0x106/0x110
[<ffffffff804a6697>] ? move_addr_to_kernel+0x57/0x60
[<ffffffff804af9af>] ? verify_iovec+0x3f/0xe0
[<ffffffff804a5ad9>] ? sys_sendmsg+0x189/0x320
[<ffffffff804a679f>] ? sys_sendto+0xff/0x120
[<ffffffff802f49da>] ? mntput_no_expire+0x2a/0x170
[<ffffffff802dbfba>] ? __fput+0x17a/0x1f0
[<ffffffff8026edba>] ? trace_hardirqs_on_caller+0x16a/0x1d0
[<ffffffff8029228e>] ? audit_syscall_entry+0x17e/0x1a0
[<ffffffff8053f5ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020c51b>] ? system_call_fastpath+0x16/0x1b
Code: 04 24 b8 01 00 00 00 48 0f 44 d8 4c 89 ef e8 87 2d 00 00 48 89
d8 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 <c3>
66 0f 1f 44 00 00 e8 83 43 d1 ff 85 c0 75 90 0f 1f 80 00 00
RIP [<ffffffff8053ccf1>] wait_for_common+0x131/0x190
RSP <ffff88006ae6b730>
---[ end trace a2c62d53604aab23 ]---
(elapsed 0.04 seconds) done.
Freezing remaining freezable tasks ... <7>hub 1-0:1.0: debounce: port
1: total 100ms stable 100ms status 0x100
(elapsed 0.10 seconds) done.
PM: Entering mem sleep

The second one (From 68564a46976017496c2227660930d81240f82355)
creates the same fault.

Thus obviously Rafael is probably right and some series of patches
are necessary though I'd prefer to get a nice clean patch against the
current git which I should try to apply as both Ingo's patches
generated some reject (solvable by hand).

Also - any idea how to fix my problem with 'ehci' (so far I'm simply
removing this module and no harm in the usability is visible) - and is
there a proposal fix for auto-resume regression or the patch revert is
the right solution for this moment ?

Zdenek

2009-01-19 23:50:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61


* Zdenek Kabelac <[email protected]> wrote:

> 2009/1/19 Rafael J. Wysocki <[email protected]>:
> > On Monday 19 January 2009, Ingo Molnar wrote:
> >>
> >> * Dmitry Adamushko <[email protected]> wrote:
> >>
> >> > 2009/1/19 Ingo Molnar <[email protected]>:
> >> > >
> >> > > * Dmitry Adamushko <[email protected]> wrote:
> >> > >
> >> > >> 2009/1/19 Zdenek Kabelac <[email protected]>:
> >> > >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
> >> > >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
> >> > >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
> >> > >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
> >> > >> >>>
> >> > >> >>>> Sure, good idea. I've been running with this reverted recently.
> >> > >> >>>>
> >> > >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
> >> > >> >>>>
> >> > >> >>>> That shouldn't be necessary if you can suspend-resume with
> >> > >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
> >> > >> >>>> modules unloaded.
> >> > >> >>>>
> >> > >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
> >> > >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
> >> > >> >>>> and see if suspend-resume works in this configuration.
> >> > >> >>>>
> >> > >> >>>
> >> > >> >>> Hi
> >> > >> >>>
> >> > >> >>> So I've check some find /sys/device | grep usb | grep power/wakeup
> >> > >> >>> and there was no difference.
> >> > >> >>> I've updated to latest git to be in sync
> >> > >> >>> (e0b325d310a6b11f1538413fd557d2eb98f2fae5)
> >> > >> >>> I'm still keeping reverted commit: 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> >> > >> >>>
> >> > >> >>> And I've figured out - the only 'modprobe -r ehci_hcd' is enough to
> >> > >> >>> keep my suspend/resume sequence working. (Though I would have say,
> >> > >> >>> that now it takes fairly noticable time to get keyboard and synaptics
> >> > >> >>> usable - but it might be connected with my move to evdev and hal... :)
> >> > >> >>> )
> >> > >> >>>
> >> > >> >>> So I'm adding cc: to David - maybe he has some suspected patches for
> >> > >> >>> ehci_hcd ? (as doing a bisect in such a broken merge window is going
> >> > >> >>> to give me probably a lot of unsable kernels nowdays....)
> >> > >> >>>
> >> > >> >>
> >> > >> >> And I've forget to append trace from supend /resume with INFO trace:
> >> > >> >> (which might be a part of problem??)
> >> > >> >
> >> > >> > Hi
> >> > >> >
> >> > >> >
> >> > >> > Just an update for 2.6.29-rc2 (f3b8436ad9a8ad36b3c9fa1fe030c7f38e5d3d0b)
> >> > >> >
> >> > >> > With this kernel I still have to keep reverted patch commit:
> >> > >> > 6fd9086a518d4f14213a32fe6c9ac17fabebbc1e.
> >> > >> > (otherwise I see the auto-wake-up immediately after suspend)
> >> > >> >
> >> > >> > I also keep module ehci_hcd away from my kernel - so the
> >> > >> > suspend-resume seems to be working.
> >> > >> >
> >> > >> > I've checked the ideas from thread: 2.6.29-rc1: [SOLVED] thinkpad
> >> > >> > problems during resume
> >> > >> > http://lkml.org/lkml/2009/1/17/181 and they seems to produce some
> >> > >> > ugly Ooops with my configuration.
> >> > >> > so for now I stay with my revert/ehci fix.
> >> > >> >
> >> > >> > Also I still get the INFO trace:
> >> > >> > processor ACPI_CPU:01: legacy suspend
> >> > >> > processor ACPI_CPU:00: legacy suspend
> >> > >> > button LNXPWRBN:00: legacy suspend
> >> > >> > acpi LNXSYSTM:00: legacy suspend
> >> > >> > ACPI: Preparing to enter system sleep state S3
> >> > >> > Disabling non-boot CPUs ...
> >> > >> >
> >> > >> > =======================================================
> >> > >> > [ INFO: possible circular locking dependency detected ]
> >> > >> > 2.6.29-rc2 #14
> >> > >> > -------------------------------------------------------
> >> > >> > pm-suspend/2873 is trying to acquire lock:
> >> > >> > (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<ffffffff8049a27b>]
> >> > >> > lock_policy_rwsem_write+0x4b/0x90
> >> > >> >
> >> > >> > but task is already holding lock:
> >> > >> > (&cpu_hotplug.lock){--..}, at: [<ffffffff80246832>] cpu_hotplug_begin+0x22/0x60
> >> > >> >
> >> > >> > which lock already depends on the new lock.
> >> > >> >
> >> > >> >
> >> > >> > the existing dependency chain (in reverse order) is:
> >> > >> >
> >> > >> > -> #1 (&cpu_hotplug.lock){--..}:
> >> > >> > [<ffffffff80270ce6>] __lock_acquire+0x1416/0x1db0
> >> > >> > [<ffffffff80271711>] lock_acquire+0x91/0xc0
> >> > >> > [<ffffffff8053d99c>] mutex_lock_nested+0xec/0x360
> >> > >> > [<ffffffff80246a4a>] get_online_cpus+0x3a/0x50
> >> > >> > [<ffffffff802594b7>] work_on_cpu+0x67/0xb0
> >> > >> > [<ffffffff8021e85e>] get_measured_perf+0x1e/0xb0
> >> > >>
> >> > >>
> >> > >> Ingo,
> >> > >>
> >> > >>
> >> > >> it looks like e39ad415ac15116df213dfa2aa2a4f1b0857af9c should have
> >> > >> been reverted together with 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
> >> > >>
> >> > >> In general, perhaps all "set_cpus_allowed_ptr() -> work_on_cpu()"
> >> > >> conversions - if they involve any cpu-hotplug callback paths - may
> >> > >> lead to similar reports (and possible lockups).
> >> > >
> >> > > Guys, could you please try the patch below? It improves work_on_cpu() to
> >> > > not be dependent on the kevent workqueue.
> >> >
> >> > I guess, the following patch should also be applied (since
> >> > get_online_cpus() is a culprit here):
> >> >
> >> > [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu
> >> >
> >> > the patch is available here:
> >> >
> >> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.2/00375.html
> >>
> >> yeah - also attached below.
> >
> > In fact I believe all three patches in the series at
> > http://lkml.org/lkml/2009/1/16/377
> > are necessary.
>
> So I've made so far 2 tests - I've removed revert of the USB commit
> (6fd9086a518d) - obvisously this result
> in immediate wakeup after resume.
>
> The first Ingo's proposal patch resulted in this oops message (before suspend):
>
> usb 1-1: uevent
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/power/state
> CPU 0
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
> iptable_filter ip_tables x_tables bridge stp llc rfcomm sco l2cap
> autofs4 sunrpc ipv6 binfmt_misc dm_snapshot dm_mirror dm_region_hash
> dm_log dm_mod rtc_cmos rtc_core rtc_lib kvm_intel kvm i915 drm
> i2c_algo_bit uinput arc4 ecb snd_hda_codec_analog cryptomgr aead
> snd_hda_intel crypto_blkcipher btusb snd_hda_codec crypto_hash
> crypto_algapi snd_seq_oss bluetooth iwl3945 sdhci_pci
> snd_seq_midi_event sdhci mmc_core snd_seq snd_seq_device snd_pcm_oss
> thinkpad_acpi snd_mixer_oss snd_pcm backlight snd_timer rfkill
> led_class evdev snd i2c_i801 iTCO_wdt mac80211 button psmouse
> soundcore e1000e i2c_core iTCO_vendor_support sr_mod battery serio_raw
> nvram cdrom lib80211 ac intel_agp snd_page_alloc cfg80211 uhci_hcd
> ohci_hcd usbcore [last unloaded: microcode]
> Pid: 2244, comm: NetworkManager Not tainted 2.6.29-rc2 #15
> RIP: 0010:[<ffffffff8053ccf1>] [<ffffffff8053ccf1>] wait_for_common+0x131/0x190
> RSP: 0018:ffff88006ae6b730 EFLAGS: 00010296
> RAX: 7fffffffffffffff RBX: ffff88006ae6b748 RCX: 0000000000000003
> RDX: ffffffff80a8b1f0 RSI: ffff88006f908728 RDI: ffff88006f908000
> RBP: ffff88006ae6b738 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff88006f908728 R11: 0000000000000001 R12: ffff880079ca1d98
> R13: 0000000000000000 R14: ffff880079d66000 R15: 0000000000000000
> FS: 00007faf7fca9740(0000) GS:ffffffff80914040(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006c3444 CR3: 000000007c50d000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process NetworkManager (pid: 2244, threadinfo ffff88006ae6a000, task
> ffff88006f908000)
> Stack:
> 7d6ddc008053cde8 ffff8800ffff8800 ffffffff8025a695 ffff88006b050e90
> ffffffff8025a580 0000000000000000 dead4ead00000303 ffffffffffffffff
> ffffffffffffffff ffffffff80970c08 0000000000000000 ffffffff805f9e8e
> Call Trace:
> [<ffffffff8025a695>] synchronize_rcu+0x35/0x40
> [<ffffffff8025a580>] ? wakeme_after_rcu+0x0/0x10
> [<ffffffff8053cd2f>] ? wait_for_common+0x16f/0x190
> [<ffffffff8024b364>] ? local_bh_enable+0xa4/0x110
> [<ffffffff804c8c41>] ? dev_deactivate+0x151/0x1d0
> [<ffffffff804b746d>] ? dev_close+0x6d/0xd0
> [<ffffffffa0102042>] ? ieee80211_stop+0x562/0x570 [mac80211]
> [<ffffffffa0101b59>] ? ieee80211_stop+0x79/0x570 [mac80211]
> [<ffffffff8053fa4f>] ? _spin_unlock_bh+0x2f/0x40
> [<ffffffff804c8c9a>] ? dev_deactivate+0x1aa/0x1d0
> [<ffffffff804b747c>] ? dev_close+0x7c/0xd0
> [<ffffffff804b703d>] ? dev_change_flags+0x9d/0x1e0
> [<ffffffff804c0505>] ? do_setlink+0x2b5/0x440
> [<ffffffff8053fa16>] ? _read_unlock+0x26/0x30
> [<ffffffff804c0865>] ? rtnl_setlink+0x115/0x160
> [<ffffffff8053db44>] ? mutex_lock_nested+0x284/0x360
> [<ffffffff804c17da>] ? rtnetlink_rcv+0x1a/0x40
> [<ffffffff804c198d>] ? rtnetlink_rcv_msg+0x18d/0x240
> [<ffffffff804c1800>] ? rtnetlink_rcv_msg+0x0/0x240
> [<ffffffff804cc7d9>] ? netlink_rcv_skb+0x89/0xb0
> [<ffffffff804c17e9>] ? rtnetlink_rcv+0x29/0x40
> [<ffffffff804cc1d4>] ? netlink_unicast+0x2c4/0x2e0
> [<ffffffff804ae4de>] ? __alloc_skb+0x6e/0x150
> [<ffffffff804cc404>] ? netlink_sendmsg+0x214/0x310
> [<ffffffff804a5937>] ? sock_sendmsg+0x127/0x140
> [<ffffffff8025d150>] ? autoremove_wake_function+0x0/0x40
> [<ffffffff8027218b>] ? lock_release_non_nested+0x9b/0x2e0
> [<ffffffff802dbce6>] ? fget_light+0x106/0x110
> [<ffffffff804a6697>] ? move_addr_to_kernel+0x57/0x60
> [<ffffffff804af9af>] ? verify_iovec+0x3f/0xe0
> [<ffffffff804a5ad9>] ? sys_sendmsg+0x189/0x320
> [<ffffffff804a679f>] ? sys_sendto+0xff/0x120
> [<ffffffff802f49da>] ? mntput_no_expire+0x2a/0x170
> [<ffffffff802dbfba>] ? __fput+0x17a/0x1f0
> [<ffffffff8026edba>] ? trace_hardirqs_on_caller+0x16a/0x1d0
> [<ffffffff8029228e>] ? audit_syscall_entry+0x17e/0x1a0
> [<ffffffff8053f5ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8020c51b>] ? system_call_fastpath+0x16/0x1b
> Code: 04 24 b8 01 00 00 00 48 0f 44 d8 4c 89 ef e8 87 2d 00 00 48 89
> d8 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 <c3>
> 66 0f 1f 44 00 00 e8 83 43 d1 ff 85 c0 75 90 0f 1f 80 00 00
> RIP [<ffffffff8053ccf1>] wait_for_common+0x131/0x190
> RSP <ffff88006ae6b730>
> ---[ end trace a2c62d53604aab23 ]---
> (elapsed 0.04 seconds) done.
> Freezing remaining freezable tasks ... <7>hub 1-0:1.0: debounce: port
> 1: total 100ms stable 100ms status 0x100
> (elapsed 0.10 seconds) done.
> PM: Entering mem sleep
>
> The second one (From 68564a46976017496c2227660930d81240f82355)
> creates the same fault.
>
> Thus obviously Rafael is probably right and some series of patches
> are necessary though I'd prefer to get a nice clean patch against the
> current git which I should try to apply as both Ingo's patches
> generated some reject (solvable by hand).

You can pull the current set of patches/fixes in this area via:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus

does that do the trick?

Ingo

2009-01-20 10:41:39

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/20 Ingo Molnar <[email protected]>:
>
> * Zdenek Kabelac <[email protected]> wrote:
>
>> 2009/1/19 Rafael J. Wysocki <[email protected]>:
>> > On Monday 19 January 2009, Ingo Molnar wrote:
>> >>
>> >> * Dmitry Adamushko <[email protected]> wrote:
>> >>
>> >> > 2009/1/19 Ingo Molnar <[email protected]>:
>> >> > >
>> >> > > * Dmitry Adamushko <[email protected]> wrote:
>> >> > >
>> >> > >> 2009/1/19 Zdenek Kabelac <[email protected]>:
>> >> > >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
>> >> > >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
>> >> > >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>> >> > >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>> >> > >> >>>
>> >> > >> >>>> Sure, good idea. I've been running with this reverted recently.
>> >> > >> >>>>
>> >> > >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>> >> > >> >>>>
>> >> > >> >>>> That shouldn't be necessary if you can suspend-resume with
>> >> > >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>> >> > >> >>>> modules unloaded.
>> >> > >> >>>>
>> >> > >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>> >> > >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>> >> > >> >>>> and see if suspend-resume works in this configuration.
>> >> > >> >>>>
>> >> > >> >>>
>>
>> The second one (From 68564a46976017496c2227660930d81240f82355)
>> creates the same fault.
>>
>> Thus obviously Rafael is probably right and some series of patches
>> are necessary though I'd prefer to get a nice clean patch against the
>> current git which I should try to apply as both Ingo's patches
>> generated some reject (solvable by hand).
>
> You can pull the current set of patches/fixes in this area via:
>
> git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
>
> does that do the trick?
>

Ok - there are some changes - though I'll need to do more tests - thus
I'll probably add few more post after checking what happens after
couple suspend/resume cycles.

But here is at least the first output from my log - seems to be
related to my USB EHCI problem:
(Before the machine usually died without logging this traceback)
Also - please note the message SPIN IRQ ALREADY DISABLED is from my
own little patch.
Usually when I keep ehci_hcd module in memory my machine dies after printing:
Extended CMOS year: 2000 - but this time it survived


Broke affinity for irq 1
Broke affinity for irq 12
Broke affinity for irq 20
kvm: disabling virtualization on CPU1
CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU1 is down
SPIN IRQ ALREADY DISABLED
Pid: 2982, comm: pm-suspend Not tainted 2.6.29-rc2 #17
Call Trace:
[<ffffffff8053ff37>] _spin_lock_irq+0x87/0x90
[<ffffffff80250976>] ? lock_timer_base+0x36/0x70
[<ffffffff8053bfbe>] schedule+0x13e/0x4ad
[<ffffffff80250c6e>] ? __mod_timer+0xbe/0xe0
[<ffffffff80250976>] ? lock_timer_base+0x36/0x70
[<ffffffff8026c56d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8053fc17>] ? _spin_unlock_irqrestore+0x57/0x70
[<ffffffff80250c6e>] ? __mod_timer+0xbe/0xe0
[<ffffffff8053cfa2>] schedule_timeout+0x62/0xd0
[<ffffffff80250500>] ? process_timeout+0x0/0x10
[<ffffffff8053cf9d>] ? schedule_timeout+0x5d/0xd0
[<ffffffff8053d029>] schedule_timeout_uninterruptible+0x19/0x20
[<ffffffff80250cad>] msleep+0x1d/0x30
[<ffffffff803b5058>] pci_set_power_state+0x268/0x300
[<ffffffffa00162bc>] usb_hcd_pci_suspend_late+0x6c/0x150 [usbcore]
[<ffffffff803b722f>] pci_legacy_suspend_late+0x2f/0x60
[<ffffffff803b745d>] pci_pm_suspend_noirq+0xad/0xc0
[<ffffffff80441ba2>] pm_noirq_op+0x162/0x1b0
[<ffffffff80442ae8>] device_power_down+0x48/0x180
[<ffffffff8027e8f6>] suspend_devices_and_enter+0x156/0x1c0
[<ffffffff8027eaf6>] enter_state+0x166/0x1e0
[<ffffffff8027ec2a>] state_store+0xba/0x100
[<ffffffff803a1eb7>] kobj_attr_store+0x17/0x20
[<ffffffff8033531a>] sysfs_write_file+0xca/0x140
[<ffffffff802db40b>] vfs_write+0xcb/0x190
[<ffffffff802db5c0>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
ehci_hcd 0000:00:1d.7: power state changed by ACPI to D3
ehci_hcd 0000:00:1a.7: power state changed by ACPI to D3
Extended CMOS year: 2000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Extended CMOS year: 2000

=================================
[ INFO: inconsistent lock state ]
2.6.29-rc2 #17
---------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
pm-suspend/2982 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&cpu_base->lock){++..}, at: [<ffffffff80260403>]
retrigger_next_event+0x93/0xf0
{in-hardirq-W} state was registered at:
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 337381
hardirqs last enabled at (337381): [<ffffffff8053fc03>]
_spin_unlock_irqrestore+0x43/0x70
hardirqs last disabled at (337380): [<ffffffff8053ff60>]
_spin_lock_irqsave+0x20/0x90
softirqs last enabled at (336752): [<ffffffff8024b67a>]
__do_softirq+0x12a/0x180
softirqs last disabled at (336595): [<ffffffff8020d6fc>] call_softirq+0x1c/0x50

other info that might help us debug this:
3 locks held by pm-suspend/2982:
#0: (&buffer->mutex){--..}, at: [<ffffffff80335293>]
sysfs_write_file+0x43/0x140
#1: (pm_mutex){--..}, at: [<ffffffff8027ea01>] enter_state+0x71/0x1e0
#2: (dpm_list_mtx){--..}, at: [<ffffffff80441cb2>] device_pm_lock+0x12/0x20

stack backtrace:
Pid: 2982, comm: pm-suspend Not tainted 2.6.29-rc2 #17
Call Trace:
[<ffffffff8026db3e>] print_usage_bug+0x18e/0x1f0
[<ffffffff8026eb10>] mark_lock+0xc40/0xcb0
[<ffffffff802700d8>] __lock_acquire+0x758/0x1db0
[<ffffffff80265909>] ? getnstimeofday+0x59/0xe0
[<ffffffff8026ebd6>] ? mark_held_locks+0x56/0xa0
[<ffffffff8053fc03>] ? _spin_unlock_irqrestore+0x43/0x70
[<ffffffff8026ee5a>] ? trace_hardirqs_on_caller+0x16a/0x1d0
[<ffffffff802717c1>] lock_acquire+0x91/0xc0
[<ffffffff80260403>] ? retrigger_next_event+0x93/0xf0
[<ffffffff8053fe01>] _spin_lock+0x31/0x70
[<ffffffff80260403>] ? retrigger_next_event+0x93/0xf0
[<ffffffff80260403>] retrigger_next_event+0x93/0xf0
[<ffffffff8026059b>] hres_timers_resume+0xb/0x10
[<ffffffff8026511d>] timekeeping_resume+0xfd/0x140
[<ffffffff8043b1a0>] __sysdev_resume+0x20/0x60
[<ffffffff8043b269>] sysdev_resume+0x89/0x180
[<ffffffff80442c30>] device_power_up+0x10/0x20
[<ffffffff8027e914>] suspend_devices_and_enter+0x174/0x1c0
[<ffffffff8027eaf6>] enter_state+0x166/0x1e0
[<ffffffff8027ec2a>] state_store+0xba/0x100
[<ffffffff803a1eb7>] kobj_attr_store+0x17/0x20
[<ffffffff8033531a>] sysfs_write_file+0xca/0x140
[<ffffffff802db40b>] vfs_write+0xcb/0x190
[<ffffffff802db5c0>] sys_write+0x50/0x90
[<ffffffff8020c51b>] system_call_fastpath+0x16/0x1b
ehci_hcd 0000:00:1a.7: power state changed by ACPI to D0
ehci_hcd 0000:00:1d.7: power state changed by ACPI to D0
Enabling non-boot CPUs ...
lockdep: fixing up alternatives.
SMP alternatives: switching to SMP code
Booting processor 1 APIC 0x1 ip 0x6000
Initializing CPU#1

I do not see now the previous workqueue backtrace.

Zdenek

2009-01-20 11:48:20

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/20 Zdenek Kabelac <[email protected]>:
> 2009/1/20 Ingo Molnar <[email protected]>:
>>
>> * Zdenek Kabelac <[email protected]> wrote:
>>
>>> 2009/1/19 Rafael J. Wysocki <[email protected]>:
>>> > On Monday 19 January 2009, Ingo Molnar wrote:
>>> >>
>>> >> * Dmitry Adamushko <[email protected]> wrote:
>>> >>
>>> >> > 2009/1/19 Ingo Molnar <[email protected]>:
>>> >> > >
>>> >> > > * Dmitry Adamushko <[email protected]> wrote:
>>> >> > >
>>> >> > >> 2009/1/19 Zdenek Kabelac <[email protected]>:
>>> >> > >> > 2009/1/13 Zdenek Kabelac <[email protected]>:
>>> >> > >> >> 2009/1/13 Zdenek Kabelac <[email protected]>:
>>> >> > >> >>> 2009/1/12 Rafael J. Wysocki <[email protected]>:
>>> >> > >> >>>> On Monday 12 January 2009, Zdenek Kabelac wrote:
>>> >> > >> >>>
>>> >> > >> >>>> Sure, good idea. I've been running with this reverted recently.
>>> >> > >> >>>>
>>> >> > >> >>>>> PS: I'll do the above 'echo' trace later (being busy right now).
>>> >> > >> >>>>
>>> >> > >> >>>> That shouldn't be necessary if you can suspend-resume with
>>> >> > >> >>>> 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted and the USB controller
>>> >> > >> >>>> modules unloaded.
>>> >> > >> >>>>
>>> >> > >> >>>> Instead, with 7503bfbae89eba07b46441a5d1594647f6b8ab7d reverted, please write
>>> >> > >> >>>> 'disabled' to the /sys/devices/.../power/wakeup files of all USB controllers
>>> >> > >> >>>> and see if suspend-resume works in this configuration.
>>> >> > >> >>>>
>>> >> > >> >>>
>>>
>>> The second one (From 68564a46976017496c2227660930d81240f82355)
>>> creates the same fault.
>>>
>>> Thus obviously Rafael is probably right and some series of patches
>>> are necessary though I'd prefer to get a nice clean patch against the
>>> current git which I should try to apply as both Ingo's patches
>>> generated some reject (solvable by hand).
>>
>> You can pull the current set of patches/fixes in this area via:
>>
>> git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
>>
>> does that do the trick?
>>
>
> Ok - there are some changes - though I'll need to do more tests - thus
> I'll probably add few more post after checking what happens after
> couple suspend/resume cycles.
>
> But here is at least the first output from my log - seems to be
> related to my USB EHCI problem:
> (Before the machine usually died without logging this traceback)
> Also - please note the message SPIN IRQ ALREADY DISABLED is from my
> own little patch.
> Usually when I keep ehci_hcd module in memory my machine dies after printing:
> Extended CMOS year: 2000 - but this time it survived
>
>
>
> I do not see now the previous workqueue backtrace.
>


Ok - bad news - my previous supend-resume tests were done in the
runlevel 1 - without network.
With network enabled - the error is back - thus it might be a bug in
ieee80211 stack ??

NetworkManager: <info> (wlan0): now unmanaged
NetworkManager: <info> (wlan0): device state change: 3 -> 1
NetworkManager: <info> (wlan0): cleaning up...
NetworkManager: <info> (wlan0): taking down device.
PM: Syncing filesystems ... done.
Freezing user space processes ... <3>iwl3945: Error: Response NULL in
'REPLY_ADD_STA'
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/power/state
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc sco l2cap bluetooth
autofs4 sunrpc ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_mod
rtc_cmos rtc_core rtc_lib kvm_intel kvm i915 drm i2c_algo_bit uinput
snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_seq_oss arc4 ecb
snd_seq_midi_event snd_seq cryptomgr snd_seq_device snd_pcm_oss aead
crypto_blkcipher crypto_hash snd_mixer_oss snd_pcm crypto_algapi
iwl3945 e1000e mac80211 snd_timer snd thinkpad_acpi soundcore i2c_i801
snd_page_alloc lib80211 psmouse rfkill evdev sdhci_pci sdhci mmc_core
button usbhid hid iTCO_wdt iTCO_vendor_support backlight nvram
led_class sr_mod cdrom intel_agp battery ac i2c_core cfg80211
serio_raw uhci_hcd ohci_hcd usbcore [last unloaded: microcode]
Pid: 2265, comm: NetworkManager Not tainted 2.6.29-rc2 #17
RIP: 0010:[<ffffffff8053ce01>] [<ffffffff8053ce01>] wait_for_common+0x131/0x190
RSP: 0018:ffff88006b509730 EFLAGS: 00010296
RAX: 7fffffffffffffff RBX: ffff88006b509748 RCX: 0000000000000003
RDX: ffffffff80a8d1f0 RSI: ffff88006b504b48 RDI: ffff88006b504420
RBP: ffff88006b509738 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88006b504b48 R11: 0000000000000001 R12: ffff88007ca52598
R13: 0000000000000000 R14: ffff88007a008000 R15: 0000000000000000
FS: 00007fcbe961f740(0000) GS:ffffffff80916040(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f172f1f9000 CR3: 000000006b6f3000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process NetworkManager (pid: 2265, threadinfo ffff88006b508000, task
ffff88006b504420)
Stack:
6a2b5d008053cef8 ffff8800ffff8800 ffffffff8025a735 0000000000000000
ffffffff8025a620 0000000000000000 dead4ead00000303 ffffffffffffffff
ffffffffffffffff ffffffff80972c08 0000000000000000 ffffffff805faed6
Call Trace:
[<ffffffff8025a735>] synchronize_rcu+0x35/0x40
[<ffffffff8025a620>] ? wakeme_after_rcu+0x0/0x10
[<ffffffff8053ce3f>] ? wait_for_common+0x16f/0x190
[<ffffffff8024b424>] ? local_bh_enable+0xa4/0x110
[<ffffffff804c8ce1>] ? dev_deactivate+0x151/0x1d0
[<ffffffff804b750d>] ? dev_close+0x6d/0xd0
[<ffffffffa016b042>] ? ieee80211_stop+0x562/0x570 [mac80211]
[<ffffffffa016ab59>] ? ieee80211_stop+0x79/0x570 [mac80211]
[<ffffffff8053fb5f>] ? _spin_unlock_bh+0x2f/0x40
[<ffffffff804c8d3a>] ? dev_deactivate+0x1aa/0x1d0
[<ffffffff804b751c>] ? dev_close+0x7c/0xd0
[<ffffffff804b70dd>] ? dev_change_flags+0x9d/0x1e0
[<ffffffff804c05a5>] ? do_setlink+0x2b5/0x440
[<ffffffff8053fb26>] ? _read_unlock+0x26/0x30
[<ffffffff804c0905>] ? rtnl_setlink+0x115/0x160
[<ffffffff8053dc54>] ? mutex_lock_nested+0x284/0x360
[<ffffffff804c187a>] ? rtnetlink_rcv+0x1a/0x40
[<ffffffff804c1a2d>] ? rtnetlink_rcv_msg+0x18d/0x240
[<ffffffff804c18a0>] ? rtnetlink_rcv_msg+0x0/0x240
[<ffffffff804cc879>] ? netlink_rcv_skb+0x89/0xb0
[<ffffffff804c1889>] ? rtnetlink_rcv+0x29/0x40
[<ffffffff804cc274>] ? netlink_unicast+0x2c4/0x2e0
[<ffffffff804ae57e>] ? __alloc_skb+0x6e/0x150
[<ffffffff804cc4a4>] ? netlink_sendmsg+0x214/0x310
[<ffffffff804a59d7>] ? sock_sendmsg+0x127/0x140
[<ffffffff8025d1f0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8027222b>] ? lock_release_non_nested+0x9b/0x2e0
[<ffffffff802dbd96>] ? fget_light+0x106/0x110
[<ffffffff804a6737>] ? move_addr_to_kernel+0x57/0x60
[<ffffffff804afa4f>] ? verify_iovec+0x3f/0xe0
[<ffffffff804a5b79>] ? sys_sendmsg+0x189/0x320
[<ffffffff804a683f>] ? sys_sendto+0xff/0x120
[<ffffffff802f4a8a>] ? mntput_no_expire+0x2a/0x170
[<ffffffff802dc06a>] ? __fput+0x17a/0x1f0
[<ffffffff8026ee5a>] ? trace_hardirqs_on_caller+0x16a/0x1d0
[<ffffffff8029232e>] ? audit_syscall_entry+0x17e/0x1a0
[<ffffffff8053f6de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020c51b>] ? system_call_fastpath+0x16/0x1b
Code: 04 24 b8 01 00 00 00 48 0f 44 d8 4c 89 ef e8 87 2d 00 00 48 89
d8 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 <c3>
66 0f 1f 44 00 00 e8 33 43 d1 ff 85 c0 75 90 0f 1f 80 00 00
RIP [<ffffffff8053ce01>] wait_for_common+0x131/0x190
RSP <ffff88006b509730>
---[ end trace c1a25abf9f5fc6e6 ]---
(elapsed 0.03 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.


Zdenek

2009-01-20 11:55:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61


(Cc:-ed Johannes Berg)

* Zdenek Kabelac <[email protected]> wrote:

> With network enabled - the error is back - thus it might be a bug in
> ieee80211 stack ??

yes, the crash implicates the ieee80211 stack [ieee80211_stop() in
net/mac80211/iface.c]:

> general protection fault: 0000 [#1] SMP
> RIP: 0010:[<ffffffff8053ce01>] [<ffffffff8053ce01>] wait_for_common+0x131/0x190
> Process NetworkManager (pid: 2265, threadinfo ffff88006b508000, task
> Call Trace:
> [<ffffffff8025a735>] synchronize_rcu+0x35/0x40
> [<ffffffff8025a620>] ? wakeme_after_rcu+0x0/0x10
> [<ffffffff8053ce3f>] ? wait_for_common+0x16f/0x190
> [<ffffffff8024b424>] ? local_bh_enable+0xa4/0x110
> [<ffffffff804c8ce1>] ? dev_deactivate+0x151/0x1d0
> [<ffffffff804b750d>] ? dev_close+0x6d/0xd0
> [<ffffffffa016b042>] ? ieee80211_stop+0x562/0x570 [mac80211]
> [<ffffffffa016ab59>] ? ieee80211_stop+0x79/0x570 [mac80211]
> [<ffffffff8053fb5f>] ? _spin_unlock_bh+0x2f/0x40
> [<ffffffff804c8d3a>] ? dev_deactivate+0x1aa/0x1d0
> [<ffffffff804b751c>] ? dev_close+0x7c/0xd0
> [<ffffffff804b70dd>] ? dev_change_flags+0x9d/0x1e0
> [<ffffffff804c05a5>] ? do_setlink+0x2b5/0x440
> [<ffffffff8053fb26>] ? _read_unlock+0x26/0x30
> [<ffffffff804c0905>] ? rtnl_setlink+0x115/0x160
> [<ffffffff8053dc54>] ? mutex_lock_nested+0x284/0x360
> [<ffffffff804c187a>] ? rtnetlink_rcv+0x1a/0x40
> [<ffffffff804c1a2d>] ? rtnetlink_rcv_msg+0x18d/0x240
> [<ffffffff804c18a0>] ? rtnetlink_rcv_msg+0x0/0x240
> [<ffffffff804cc879>] ? netlink_rcv_skb+0x89/0xb0
> [<ffffffff804c1889>] ? rtnetlink_rcv+0x29/0x40
> [<ffffffff804cc274>] ? netlink_unicast+0x2c4/0x2e0
> [<ffffffff804ae57e>] ? __alloc_skb+0x6e/0x150
> [<ffffffff804cc4a4>] ? netlink_sendmsg+0x214/0x310
> [<ffffffff804a59d7>] ? sock_sendmsg+0x127/0x140
> [<ffffffff8025d1f0>] ? autoremove_wake_function+0x0/0x40
> [<ffffffff8027222b>] ? lock_release_non_nested+0x9b/0x2e0
> [<ffffffff802dbd96>] ? fget_light+0x106/0x110
> [<ffffffff804a6737>] ? move_addr_to_kernel+0x57/0x60
> [<ffffffff804afa4f>] ? verify_iovec+0x3f/0xe0
> [<ffffffff804a5b79>] ? sys_sendmsg+0x189/0x320
> [<ffffffff804a683f>] ? sys_sendto+0xff/0x120
> [<ffffffff802f4a8a>] ? mntput_no_expire+0x2a/0x170
> [<ffffffff802dc06a>] ? __fput+0x17a/0x1f0
> [<ffffffff8026ee5a>] ? trace_hardirqs_on_caller+0x16a/0x1d0
> [<ffffffff8029232e>] ? audit_syscall_entry+0x17e/0x1a0
> [<ffffffff8053f6de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8020c51b>] ? system_call_fastpath+0x16/0x1b

Ingo

2009-01-22 15:14:55

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/20 Ingo Molnar <[email protected]>:
>
> (Cc:-ed Johannes Berg)
>
> * Zdenek Kabelac <[email protected]> wrote:
>
>> With network enabled - the error is back - thus it might be a bug in
>> ieee80211 stack ??
>
> yes, the crash implicates the ieee80211 stack [ieee80211_stop() in
> net/mac80211/iface.c]:
>
>> general protection fault: 0000 [#1] SMP
>> RIP: 0010:[<ffffffff8053ce01>] [<ffffffff8053ce01>] wait_for_common+0x131/0x190
>> Process NetworkManager (pid: 2265, threadinfo ffff88006b508000, task
>> Call Trace:
>> [<ffffffff8025a735>] synchronize_rcu+0x35/0x40
>> [<ffffffff8025a620>] ? wakeme_after_rcu+0x0/0x10
>> [<ffffffff8053ce3f>] ? wait_for_common+0x16f/0x190
>> [<ffffffff8024b424>] ? local_bh_enable+0xa4/0x110
>> [<ffffffff804c8ce1>] ? dev_deactivate+0x151/0x1d0
>> [<ffffffff804b750d>] ? dev_close+0x6d/0xd0
>> [<ffffffffa016b042>] ? ieee80211_stop+0x562/0x570 [mac80211]
>> [<ffffffffa016ab59>] ? ieee80211_stop+0x79/0x570 [mac80211]

Any progress with this problem ?

Btw - I've tried your branch tip-latest
(52a4061e1d88ad242c8022f68df3686c3bc05159)
- and this branch now resets my machine during resume. (it suspends -
auto-wake-up and reboots)
- so it this branch actually suggested to be used/tested ?

Zdenek

2009-01-22 21:17:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

On Thursday 22 January 2009, Zdenek Kabelac wrote:
> 2009/1/20 Ingo Molnar <[email protected]>:
> >
> > (Cc:-ed Johannes Berg)
> >
> > * Zdenek Kabelac <[email protected]> wrote:
> >
> >> With network enabled - the error is back - thus it might be a bug in
> >> ieee80211 stack ??
> >
> > yes, the crash implicates the ieee80211 stack [ieee80211_stop() in
> > net/mac80211/iface.c]:
> >
> >> general protection fault: 0000 [#1] SMP
> >> RIP: 0010:[<ffffffff8053ce01>] [<ffffffff8053ce01>] wait_for_common+0x131/0x190
> >> Process NetworkManager (pid: 2265, threadinfo ffff88006b508000, task
> >> Call Trace:
> >> [<ffffffff8025a735>] synchronize_rcu+0x35/0x40
> >> [<ffffffff8025a620>] ? wakeme_after_rcu+0x0/0x10
> >> [<ffffffff8053ce3f>] ? wait_for_common+0x16f/0x190
> >> [<ffffffff8024b424>] ? local_bh_enable+0xa4/0x110
> >> [<ffffffff804c8ce1>] ? dev_deactivate+0x151/0x1d0
> >> [<ffffffff804b750d>] ? dev_close+0x6d/0xd0
> >> [<ffffffffa016b042>] ? ieee80211_stop+0x562/0x570 [mac80211]
> >> [<ffffffffa016ab59>] ? ieee80211_stop+0x79/0x570 [mac80211]
>
> Any progress with this problem ?
>
> Btw - I've tried your branch tip-latest
> (52a4061e1d88ad242c8022f68df3686c3bc05159)
> - and this branch now resets my machine during resume. (it suspends -
> auto-wake-up and reboots)
> - so it this branch actually suggested to be used/tested ?

Apparently, it's missing the patch from
http://bugzilla.kernel.org/show_bug.cgi?id=12399

Thanks,
Rafael

2009-01-28 11:05:34

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.6.29-rc1 does not resume on Lenove T61

2009/1/22 Rafael J. Wysocki <[email protected]>:
> On Thursday 22 January 2009, Zdenek Kabelac wrote:
>> 2009/1/20 Ingo Molnar <[email protected]>:
>> >
>> > (Cc:-ed Johannes Berg)
>> >
>> > * Zdenek Kabelac <[email protected]> wrote:
>> >
>> >> With network enabled - the error is back - thus it might be a bug in
>> >> ieee80211 stack ??
>> >
>> > yes, the crash implicates the ieee80211 stack [ieee80211_stop() in
>> > net/mac80211/iface.c]:
>> >
>> Btw - I've tried your branch tip-latest
>> (52a4061e1d88ad242c8022f68df3686c3bc05159)
>> - and this branch now resets my machine during resume. (it suspends -
>> auto-wake-up and reboots)
>> - so it this branch actually suggested to be used/tested ?
>
> Apparently, it's missing the patch from
> http://bugzilla.kernel.org/show_bug.cgi?id=12399

Ok - I could confirm that kernel commit
e4a7ca29039e615ce13a61b9c6abfb2aa394e9a1 does work properly with
suspend/resume sequence - i.e. no auto-wakeup and no ugly gpf Ooops
could be seen - at least not for now :)

Here is just one warning I could see in my dmesg now:


pci 0000:15:00.0: suspend
pci 0000:15:00.0: PCI INT A disabled
iwl3945 0000:03:00.0: suspend
------------[ cut here ]------------
WARNING: at drivers/pci/pci-driver.c:368 pci_legacy_suspend+0xdd/0xf0()
Hardware name: 6464CTO
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_RE
JECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc sco
l2cap bluetooth autofs4 sunrpc ipv6 binfmt_misc loop dm_mirro
r dm_region_hash dm_log dm_mod kvm_intel kvm i915 drm i2c_algo_bit
uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec arc4 e
cb snd_seq_oss snd_seq_midi_event cryptomgr snd_seq aead
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm usbhid evdev
crypto_blk
cipher sdhci_pci snd_timer crypto_hash rtc_cmos sdhci mmc_core hid snd
crypto_algapi iwl3945 mac80211 psmouse rtc_core button sou
ndcore thinkpad_acpi rfkill backlight nvram led_class sr_mod rtc_lib
i2c_i801 i2c_core snd_page_alloc iTCO_wdt iTCO_vendor_suppor
t lib80211 serio_raw e1000e cfg80211 intel_agp battery ac cdrom
uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode]
Pid: 2754, comm: pm-suspend Not tainted 2.6.29-rc2 #23
Call Trace:
[<ffffffff802472ef>] warn_slowpath+0xaf/0x110
[<ffffffff80252e60>] ? process_timeout+0x0/0x10
[<ffffffff80553227>] ? schedule_timeout+0x77/0xf0
[<ffffffff803bed65>] ? pci_bus_write_config_word+0x75/0x90
[<ffffffff805532be>] ? schedule_timeout_uninterruptible+0x1e/0x20
[<ffffffff803c2d7e>] ? pci_raw_set_power_state+0x15e/0x240
[<ffffffff803c2f07>] ? pci_set_power_state+0xa7/0x180
[<ffffffff803c5f1d>] pci_legacy_suspend+0xdd/0xf0
[<ffffffff803c60d5>] pci_pm_suspend+0xa5/0xb0
[<ffffffff804531c2>] pm_op+0x162/0x1b0
[<ffffffff80453acf>] device_suspend+0x47f/0x630
[<ffffffff802820d7>] suspend_devices_and_enter+0x47/0x1c0
[<ffffffff802823e6>] enter_state+0x166/0x1e0
[<ffffffff8028251a>] state_store+0xba/0x100
[<ffffffff803b0077>] kobj_attr_store+0x17/0x20
[<ffffffff803414cf>] sysfs_write_file+0xcf/0x140
[<ffffffff802e4e1b>] vfs_write+0xcb/0x190
[<ffffffff802e4fe5>] sys_write+0x55/0x90
[<ffffffff8020c70b>] system_call_fastpath+0x16/0x1b
---[ end trace 5626061031e81c26 ]---
i801_smbus 0000:00:1f.3: suspend
ahci 0000:00:1f.2: suspend


Zdenek