LinuxLists.cc - APM lockups since 2.6.25

2008-03-19 23:08:40

Subject: APM lockups since 2.6.25

Hi

I have an old notebook with APM and I am experiencing occasional APMD
lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
happen with 2.6.24 or before.

The bug happens about once a week or so.

If you have any idea how to debug it, you can send me a test code.

The config is attached, the machine has only a text console, no windows.

Mar 19 04:03:45 gerlinda kernel: INFO: task apmd:2059 blocked for more
than 120 seconds.
Mar 19 04:03:45 gerlinda kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 04:03:45 gerlinda kernel: apmd D c7af3720 2556 2059
1
Mar 19 04:03:45 gerlinda kernel: c7b64eb8 00000082 c78e2148
c7af3720 c0147f50 00000002 c0268415 ffffffff
Mar 19 04:03:45 gerlinda kernel: c0309bc0 c78e2000 00000002
c0267c95 c7b64ee4 c0309bc4 c0309be0 c74034f8
Mar 19 04:03:45 gerlinda kernel: c0309bc0 00000000 00000000
00000002 c0267d9d c0309be0 c0309be0 c78e2000
Mar 19 04:03:45 gerlinda kernel: Call Trace:
Mar 19 04:03:45 gerlinda kernel: [<c0147f50>] __writepage+0x0/0x30
Mar 19 04:03:45 gerlinda kernel: [<c0268415>] _spin_lock_irq+0x35/0x40
Mar 19 04:03:45 gerlinda kernel: [<c0267c95>]
rwsem_down_failed_common+0x75/0x160
Mar 19 04:03:45 gerlinda kernel: [<c0267d9d>]
rwsem_down_write_failed+0x1d/0x30Mar 19 04:03:45 gerlinda kernel:
[<c0267e16>] call_rwsem_down_write_failed+0x6/0x8
Mar 19 04:03:45 gerlinda kernel: [<c026753c>] down_write+0x4c/0x60
Mar 19 04:03:45 gerlinda kernel: [<c01f6e22>] device_suspend+0x22/0x280
Mar 19 04:03:45 gerlinda kernel: [<c01f6e22>] device_suspend+0x22/0x280
Mar 19 04:03:45 gerlinda kernel: [<c013cc42>] pm_send_all+0x62/0xc0
Mar 19 04:03:45 gerlinda kernel: [<c88597b7>] suspend+0x37/0x140 [apm]
Mar 19 04:03:45 gerlinda kernel: [<c885a6a4>] do_ioctl+0x144/0x170 [apm]
Mar 19 04:03:45 gerlinda kernel: [<c0268985>] lock_kernel+0x25/0x50
Mar 19 04:03:45 gerlinda kernel: [<c01708f8>] vfs_ioctl+0x78/0x90
Mar 19 04:03:45 gerlinda kernel: [<c017096c>] do_vfs_ioctl+0x5c/0x2b0
Mar 19 04:03:45 gerlinda kernel: [<c0268224>] _spin_lock+0x34/0x40
Mar 19 04:03:45 gerlinda kernel: [<c0170bfd>] sys_ioctl+0x3d/0x70
Mar 19 04:03:45 gerlinda kernel: [<c010305e>] syscall_call+0x7/0xb
Mar 19 04:03:45 gerlinda kernel: =======================
Mar 19 04:03:45 gerlinda kernel: no locks held by apmd/2059.

--- the lockup happens in down_write(&pm_sleep_rwsem); in device_suspend()

loaded modules:
Module Size Used by
dm_loop 12004 1
8250 24356 2
serial_core 23096 1 8250
pcspkr 3104 0
psmouse 40256 0
apm 21720 1
parport_pc 36132 1
plip 15880 0
parport 40328 2 parport_pc,plip
ide_cd_mod 36448 0
cdrom 36416 1 ide_cd_mod
ohci_hcd 24500 0
usbcore 148624 1 ohci_hcd
nls_iso8859_2 4608 1
nls_cp852 4864 1
vfat 13344 1
fat 51956 1 vfat
nls_base 8000 4 nls_iso8859_2,nls_cp852,vfat,fat
dm_snapshot 19364 0
dm_mirror 29072 0
dm_mod 59788 7 dm_loop,dm_snapshot,dm_mirror
rtc 13468 0
unix 29692 20

Mikulas

Attachments:

gerlinda.config (42.98 kB)

2008-03-20 10:32:19

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: APM lockups since 2.6.25

On Wednesday, 19 of March 2008, Mikulas Patocka wrote:
> Hi

Hi,

> I have an old notebook with APM and I am experiencing occasional APMD
> lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
> happen with 2.6.24 or before.
>
> The bug happens about once a week or so.
>
> If you have any idea how to debug it, you can send me a test code.

It is possible that the bug was fixed in -rc4. Can you please test -rc6 and
see if it's still present?

Thanks,
Rafael

2008-03-21 00:03:32

by Mikulas Patocka

[permalink] [raw]

Subject: Re: APM lockups since 2.6.25

Hi

> > Hi
>
> Hi,
>
> > I have an old notebook with APM and I am experiencing occasional APMD
> > lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
> > happen with 2.6.24 or before.
> >
> > The bug happens about once a week or so.
> >
> > If you have any idea how to debug it, you can send me a test code.
>
> It is possible that the bug was fixed in -rc4. Can you please test -rc6 and
> see if it's still present?
>
> Thanks,
> Rafael

I'm now running on -rc6, so I'll see (it takes some days for the bug to
appear). Where is that patch that went to -rc4 and that may fix it?

Mikulas

2008-03-21 00:28:20

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: APM lockups since 2.6.25

On Friday, 21 of March 2008, Mikulas Patocka wrote:
> Hi

Hi,

> > > I have an old notebook with APM and I am experiencing occasional APMD
> > > lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
> > > happen with 2.6.24 or before.
> > >
> > > The bug happens about once a week or so.
> > >
> > > If you have any idea how to debug it, you can send me a test code.
> >
> > It is possible that the bug was fixed in -rc4. Can you please test -rc6 and
> > see if it's still present?
> >
> > Thanks,
> > Rafael
>
> I'm now running on -rc6, so I'll see (it takes some days for the bug to
> appear). Where is that patch that went to -rc4 and that may fix it?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a8d37a37380e2b1500592d40b7ec384dbebe7a0

Thanks,
Rafael

2008-03-21 20:01:23

by Mikulas Patocka

[permalink] [raw]

Subject: APM crashes when IO is going on

Hi

> > > > I have an old notebook with APM and I am experiencing occasional APMD
> > > > lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
> > > > happen with 2.6.24 or before.
> > > >
> > > > The bug happens about once a week or so.
> > > >
> > > > If you have any idea how to debug it, you can send me a test code.
> > >
> > > It is possible that the bug was fixed in -rc4. Can you please test -rc6 and
> > > see if it's still present?
> > >
> > > Thanks,
> > > Rafael
> >
> > I'm now running on -rc6, so I'll see (it takes some days for the bug to
> > appear). Where is that patch that went to -rc4 and that may fix it?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a8d37a37380e2b1500592d40b7ec384dbebe7a0
>
> Thanks,
> Rafael

I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6

--- when I run three threads concurrently reading raw disk partition and
suspend, I get 100% reproducible failure. (with one thread running it
usually succeeds, sometimes fail)

Either the on-going I/O will jam BIOS and I need to remove power to
continue.

Or the machine suspends, wakes up and reports "hda: lost interrupt"
(2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
recover and it is not able to send any more disk IOs).

How is suspending disk IO supposed to work? And why it doesn't?

Mikulas

2008-03-21 21:17:58

by Mikulas Patocka

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

> > I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
> >
> > --- when I run three threads concurrently reading raw disk partition and
> > suspend, I get 100% reproducible failure. (with one thread running it
> > usually succeeds, sometimes fail)
>
> Are they userland threads or kernel threads?

Useland threads. Just dd if=/dev/hda of=/dev/null

> > Either the on-going I/O will jam BIOS and I need to remove power to
> > continue.
> >
> > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > recover and it is not able to send any more disk IOs).
> >
> > How is suspending disk IO supposed to work?
>
> That depends on the driver, if I understand your question correctly.

The driver is normal IDE.

> > And why it doesn't?
>
> Hard to tell. You didn't provide much information ...
>
> Thanks,
> Rafael

Mikulas

2008-03-21 21:16:47

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

On Friday, 21 of March 2008, Mikulas Patocka wrote:
> Hi
>
> > > > > I have an old notebook with APM and I am experiencing occasional APMD
> > > > > lockups after resume, with kernels 2.6.25rc1 snd 2.6.25rc3. They didn't
> > > > > happen with 2.6.24 or before.
> > > > >
> > > > > The bug happens about once a week or so.
> > > > >
> > > > > If you have any idea how to debug it, you can send me a test code.
> > > >
> > > > It is possible that the bug was fixed in -rc4. Can you please test -rc6 and
> > > > see if it's still present?
> > > >
> > > > Thanks,
> > > > Rafael
> > >
> > > I'm now running on -rc6, so I'll see (it takes some days for the bug to
> > > appear). Where is that patch that went to -rc4 and that may fix it?
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a8d37a37380e2b1500592d40b7ec384dbebe7a0
> >
> > Thanks,
> > Rafael
>
> I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
>
> --- when I run three threads concurrently reading raw disk partition and
> suspend, I get 100% reproducible failure. (with one thread running it
> usually succeeds, sometimes fail)

Are they userland threads or kernel threads?

> Either the on-going I/O will jam BIOS and I need to remove power to
> continue.
>
> Or the machine suspends, wakes up and reports "hda: lost interrupt"
> (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> recover and it is not able to send any more disk IOs).
>
> How is suspending disk IO supposed to work?

That depends on the driver, if I understand your question correctly.

> And why it doesn't?

Hard to tell. You didn't provide much information ...

Thanks,
Rafael

2008-03-21 21:46:12

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

On Friday, 21 of March 2008, Mikulas Patocka wrote:
> > > I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
> > >
> > > --- when I run three threads concurrently reading raw disk partition and
> > > suspend, I get 100% reproducible failure. (with one thread running it
> > > usually succeeds, sometimes fail)
> >
> > Are they userland threads or kernel threads?
>
> Useland threads. Just dd if=/dev/hda of=/dev/null
>
> > > Either the on-going I/O will jam BIOS and I need to remove power to
> > > continue.
> > >
> > > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > > recover and it is not able to send any more disk IOs).
> > >
> > > How is suspending disk IO supposed to work?
> >
> > That depends on the driver, if I understand your question correctly.
>
> The driver is normal IDE.

Do you mean IDE_GENERIC/BLK_DEV_GENERIC?

Rafael

2008-03-21 22:53:18

by Mikulas Patocka

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> On Friday, 21 of March 2008, Mikulas Patocka wrote:
> > > > I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
> > > >
> > > > --- when I run three threads concurrently reading raw disk partition and
> > > > suspend, I get 100% reproducible failure. (with one thread running it
> > > > usually succeeds, sometimes fail)
> > >
> > > Are they userland threads or kernel threads?
> >
> > Useland threads. Just dd if=/dev/hda of=/dev/null
> >
> > > > Either the on-going I/O will jam BIOS and I need to remove power to
> > > > continue.
> > > >
> > > > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > > > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > > > recover and it is not able to send any more disk IOs).
> > > >
> > > > How is suspending disk IO supposed to work?
> > >
> > > That depends on the driver, if I understand your question correctly.
> >
> > The driver is normal IDE.
>
> Do you mean IDE_GENERIC/BLK_DEV_GENERIC?

Compaq Triflex IDE. The computer is Compaq Armada 7400 (Pentium 2/300MHz)

Mikulas

> Rafael
>

2008-03-21 23:08:52

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

On Friday, 21 of March 2008, Mikulas Patocka wrote:
> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
>
> > On Friday, 21 of March 2008, Mikulas Patocka wrote:
> > > > > I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
> > > > >
> > > > > --- when I run three threads concurrently reading raw disk partition and
> > > > > suspend, I get 100% reproducible failure. (with one thread running it
> > > > > usually succeeds, sometimes fail)
> > > >
> > > > Are they userland threads or kernel threads?
> > >
> > > Useland threads. Just dd if=/dev/hda of=/dev/null
> > >
> > > > > Either the on-going I/O will jam BIOS and I need to remove power to
> > > > > continue.
> > > > >
> > > > > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > > > > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > > > > recover and it is not able to send any more disk IOs).
> > > > >
> > > > > How is suspending disk IO supposed to work?
> > > >
> > > > That depends on the driver, if I understand your question correctly.
> > >
> > > The driver is normal IDE.
> >
> > Do you mean IDE_GENERIC/BLK_DEV_GENERIC?
>
> Compaq Triflex IDE. The computer is Compaq Armada 7400 (Pentium 2/300MHz)

Can you attach a dmesg output taken after a fresh boot?

Thanks,
Rafael

2008-03-22 00:29:56

by Mikulas Patocka

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> On Friday, 21 of March 2008, Mikulas Patocka wrote:
> > On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> >
> > > On Friday, 21 of March 2008, Mikulas Patocka wrote:
> > > > > > I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
> > > > > >
> > > > > > --- when I run three threads concurrently reading raw disk partition and
> > > > > > suspend, I get 100% reproducible failure. (with one thread running it
> > > > > > usually succeeds, sometimes fail)
> > > > >
> > > > > Are they userland threads or kernel threads?
> > > >
> > > > Useland threads. Just dd if=/dev/hda of=/dev/null
> > > >
> > > > > > Either the on-going I/O will jam BIOS and I need to remove power to
> > > > > > continue.
> > > > > >
> > > > > > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > > > > > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > > > > > recover and it is not able to send any more disk IOs).
> > > > > >
> > > > > > How is suspending disk IO supposed to work?
> > > > >
> > > > > That depends on the driver, if I understand your question correctly.
> > > >
> > > > The driver is normal IDE.
> > >
> > > Do you mean IDE_GENERIC/BLK_DEV_GENERIC?
> >
> > Compaq Triflex IDE. The computer is Compaq Armada 7400 (Pentium 2/300MHz)
>
> Can you attach a dmesg output taken after a fresh boot?
>
> Thanks,
> Rafael

Here it is.

Mikulas

Attachments:

dm (6.05 kB)

2008-03-25 14:43:04

by Pavel Machek

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

Hi!

> > > I'm now running on -rc6, so I'll see (it takes some days for the bug to
> > > appear). Where is that patch that went to -rc4 and that may fix it?
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a8d37a37380e2b1500592d40b7ec384dbebe7a0
>
> I found another problem --- present in 2.6.23.1, 2.6.25rc3, 2.6.25rc6
>
> --- when I run three threads concurrently reading raw disk partition and
> suspend, I get 100% reproducible failure. (with one thread running it
> usually succeeds, sometimes fail)
>
> Either the on-going I/O will jam BIOS and I need to remove power to
> continue.
>
> Or the machine suspends, wakes up and reports "hda: lost interrupt"
> (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> recover and it is not able to send any more disk IOs).

Kernel should recover from lost interrupt. Can you track down this
regression?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-25 14:43:37

by Pavel Machek

[permalink] [raw]

Subject: Re: APM crashes when IO is going on

Hi!

> > > Either the on-going I/O will jam BIOS and I need to remove power to
> > > continue.
> > >
> > > Or the machine suspends, wakes up and reports "hda: lost interrupt"
> > > (2.6.23.1 was able to recover from this condition, 2.6.25rc3,6 does not
> > > recover and it is not able to send any more disk IOs).
> > >
> > > How is suspending disk IO supposed to work?

Suspending IO is not supposed to be needed in APM case, try ACPI.

(We could work around this.. You should be able to reuse
kernel/power/main.c to stop all user processes, then stop the
drivers...)

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html