2005-02-10 12:46:53

by John M Flinchbaugh

[permalink] [raw]
Subject: Thinkpad R40 freezes after swsusp resume

I can suspend my R40 with swsusp, then boot it and resume fine most of
the time.

I'd say nearly 50$ of the time though, the machine will freeze within 5
minutes of resuming.

SysRq doesn't work, no oops when in console mode, no network, no disk
activity, just frozen. Occassionally, I've seen a line or 2 of
pixels on my X screen get corrupted.

Here are some of the things I've tried adjusting:
pci=routeirq.
pci=noacpi (or whatever it is).
shutting down hotplug over suspend to disable USB.
disabling cpudynd and CPU frequency scaling.
...and probably a few other things i'm forgetting.
enabling lapic seemed to almost make it worse.

is any common hardware or subsystems on these machines known to not
suspend and resume properly? i never see these freezes on a clean boot,
and if the machine survives 10 minutes after a resume, i'll not see it
freeze either. i've witnessed this with every kernel, since i got the
machine in september (2.6.8.1, 2.6.9, 2.6.10, 2.6.11-rc3).

and finally, i know no one's had answers for me on this yet, so what
can i do to locate the problem and debug this myself? is there a way to
get past these lockups to at least get a call trace to see where i'm
stuck?

thank you everyone.
--
John M Flinchbaugh
[email protected]


Attachments:
(No filename) (1.23 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-02-10 18:31:35

by Pavel Machek

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

Hi!

> I can suspend my R40 with swsusp, then boot it and resume fine most of
> the time.
>
> I'd say nearly 50$ of the time though, the machine will freeze within 5
> minutes of resuming.
>
> SysRq doesn't work, no oops when in console mode, no network, no disk
> activity, just frozen. Occassionally, I've seen a line or 2 of
> pixels on my X screen get corrupted.
>
> Here are some of the things I've tried adjusting:
> pci=routeirq.
> pci=noacpi (or whatever it is).
> shutting down hotplug over suspend to disable USB.
> disabling cpudynd and CPU frequency scaling.
> ...and probably a few other things i'm forgetting.
> enabling lapic seemed to almost make it worse.

Try also acpi=off.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-02-11 17:11:00

by John M Flinchbaugh

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

On Thu, Feb 10, 2005 at 07:31:14PM +0100, Pavel Machek wrote:
> Try also acpi=off.

i was hoping for a test that's a bit more granular. might it be
possible to disable suspect bits of the acpi code instead of all of it?
i'm open to applying and testing patches.

disabling all of acpi for a week or 2 (sometimes my notebook will go
4 days of daily suspending and resuming without trouble, now) doesn't
sound like fun. i like acpi events and information.

my latest test is to trim my suspend script. from earlier versions of
the swsusp code, i had been disabling laptop-mode, turning swappiness up
to 100%, saving the hwclock, etc. having taken all that out, i find it
unnecessary anymore.

--
John M Flinchbaugh
[email protected]


Attachments:
(No filename) (736.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-02-11 18:33:07

by Pavel Machek

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

Hi!

> > Try also acpi=off.
>
> i was hoping for a test that's a bit more granular. might it be
> possible to disable suspect bits of the acpi code instead of all of it?
> i'm open to applying and testing patches.

Well, you'd have to write that code, I'd guess.

And I do not think you can really turn off thermal managment once you
enter ACPI mode.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-02-15 13:59:54

by John M Flinchbaugh

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

On Fri, Feb 11, 2005 at 07:32:41PM +0100, Pavel Machek wrote:
> > > Try also acpi=off.

> > i was hoping for a test that's a bit more granular. might it be
> > possible to disable suspect bits of the acpi code instead of all
> > of it?
> > i'm open to applying and testing patches.

> Well, you'd have to write that code, I'd guess.
> And I do not think you can really turn off thermal managment once
> you
> enter ACPI mode.

I've gotten 2.6.11-rc4 to freeze after a swsusp, so I'm testing acpi=off
now.

As Murphy's Law would have it, I usually get these lockups at
inopportune times when I really don't want to have to punch the power
button, like when I'm in a hurry trying to find something or during
long-running network backups. It also does it when sitting idle, so
this isn't a rule.

I've run most of a backup from an NFS mount to the local drive (for
about 10 minutes), stopped it, swsusp, ran another backup, and it's
looking fine so far.

To be sure that it's not going to freeze, I'd almost have to let it go
for a week, though, because sometimes I had just gotten lucky and not
seen the issue for upto 4 days at a time.

Assuming disabling ACPI causes my trouble to go away, what's my next
step in debugging this issue? I'd hate to just leave it at "Don't use
ACPI."

Thanks for your time.
--
John M Flinchbaugh
[email protected]


Attachments:
(No filename) (1.32 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-02-16 16:33:52

by Reinhard Tartler

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

John M Flinchbaugh wrote:
> I can suspend my R40 with swsusp, then boot it and resume fine most of
> the time.
>
> I'd say nearly 50$ of the time though, the machine will freeze within 5
> minutes of resuming.
>
> SysRq doesn't work, no oops when in console mode, no network, no disk=20
> activity, just frozen. Occassionally, I've seen a line or 2 of=20
> pixels on my X screen get corrupted.

Do you happen to use the madwifi drivers? If you are you might be
affected by bad interaction from madwifi with laptop mode patches. This
has been reported to ubuntu in
https://bugzilla.ubuntu.com/show_bug.cgi?id=6108

Unfortunatly, the only known fix up to now is to disable either madwifi
oder laptop-mode. :(

I'm also affected by this bug, and find this very annoying. Please CC:
when replying.

Greetings,
Reinhard


2005-02-16 19:55:14

by John M Flinchbaugh

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

On Wed, Feb 16, 2005 at 04:16:38PM +0000, Reinhard Tartler wrote:
> > I can suspend my R40 with swsusp, then boot it and resume fine
> > most of
> > the time.
> > I'd say nearly 50$ of the time though, the machine will freeze
> > within 5
> > minutes of resuming.
> > SysRq doesn't work, no oops when in console mode, no network, no
> > disk
> > activity, just frozen. Occassionally, I've seen a line or 2 of
> > pixels on my X screen get corrupted.
> Do you happen to use the madwifi drivers? If you are you might be
> affected by bad interaction from madwifi with laptop mode patches.
> This
> has been reported to ubuntu in
> https://bugzilla.ubuntu.com/show_bug.cgi?id=6108
>
> Unfortunatly, the only known fix up to now is to disable either
> madwifi
> oder laptop-mode. :(

I use the ipw2100.sf.net drivers, and I've gotten the freeze without
even having those modules loaded, so I doubt that's it.

I'm currently testing booting with acpi=off. While, I've not had a
freeze yet (nearly 2 days), I may take your advice and test with
laptop-mode disabled as well, since I could definitely tolerate the loss
of laptop-mode better than the loss of my ACPI events and info.
--
John M Flinchbaugh
[email protected]


Attachments:
(No filename) (1.19 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-02-22 14:22:32

by John M Flinchbaugh

[permalink] [raw]
Subject: Re: Thinkpad R40 freezes after swsusp resume

On Wed, Feb 16, 2005 at 02:59:40PM -0500, John M Flinchbaugh wrote:
> correcting the problem, so I can get swsusp and ACPI coexisting
> happily
> on my Thinkpad R40.
> Does anyone here on the ACPI list have some logical next steps for
> me to
> test?
> ----- Forwarded message from John M Flinchbaugh <[email protected]>
> As Murphy's Law would have it, I usually get these lockups at
> inopportune times when I really don't want to have to punch the
> power
> button, like when I'm in a hurry trying to find something or during
> long-running network backups. It also does it when sitting idle, so
> this isn't a rule.
>
> I've run most of a backup from an NFS mount to the local drive (for
> about 10 minutes), stopped it, swsusp, ran another backup, and it's
> looking fine so far.
>
> To be sure that it's not going to freeze, I'd almost have to let it
> go
> for a week, though, because sometimes I had just gotten lucky and
> not
> seen the issue for upto 4 days at a time.
> ----- End forwarded message -----

I've recompiled my 2.6.11-rc4 kernel without ACPI sleep states, and I've
enabled lots of debugging.

Upon swsusp, I see this oops:
[nosave pfn 0x38c]<7>[nosave pfn 0x38d]<3>Debug: sleeping function
called from invalid context at mm/slab.c:2082
in_atomic():0, irqs_disabled():1
[<c0102b07>] dump_stack+0x17/0x20
[<c0110e0c>] __might_sleep+0xac/0xc0
[<c01397ae>] kmem_cache_alloc+0x5e/0x60
[<c020dbf9>] acpi_pci_link_set+0x7a/0x24e
[<c020e292>] acpi_pci_link_resume+0x47/0x7d
[<c020e30d>] irqrouter_resume+0x45/0x6d
[<c0234357>] sysdev_resume+0xf7/0xfc
[<c02386e8>] device_power_up+0x8/0xe
[<c012e798>] swsusp_suspend+0x48/0x50
[<c012ebb1>] pm_suspend_disk+0x51/0xc0
[<c012d07a>] enter_state+0x8a/0x90
[<c012d1b3>] state_store+0xa3/0xaa
[<c0186ce7>] subsys_attr_store+0x37/0x40
[<c0186f4e>] flush_write_buffer+0x2e/0x40
[<c0186fcf>] sysfs_write_file+0x6f/0x90
[<c014f004>] vfs_write+0xa4/0x110
[<c014f121>] sys_write+0x41/0x70
[<c01025af>] syscall_call+0x7/0xb

Otherwise, the swsusp works and resumes.

After the resume, I see these messages:
Feb 22 09:00:54 navi kernel: osl-0958 [1385] os_wait_semaphore :
Failed to acquire semaphore[c14de5e0|1|0], AE_TIME
Feb 22 09:04:29 navi kernel: osl-0958 [1734] os_wait_semaphore :
Failed to acquire semaphore[c14de5e0|1|0], AE_TIME
Feb 22 09:07:53 navi kernel: osl-0958 [2076] os_wait_semaphore :
Failed to acquire semaphore[c14de5e0|1|0], AE_TIME
Feb 22 09:12:09 navi kernel: osl-0958 [2534] os_wait_semaphore :
Failed to acquire semaphore[c14de5e0|1|0], AE_TIME

I haven't frozen the box yet, but I wouldn't be surprised if these
contribute to the conditions that cause the freeze.

Thanks.
--
John M Flinchbaugh
[email protected]


Attachments:
(No filename) (2.67 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments