2007-10-08 13:00:26

by Bernd Schubert

[permalink] [raw]
Subject: 2.6.23 regression: do_nanosleep will not return

Hi,

we have a system here were e.g. "sleep 1" will never finish. This is an
issue of 2.6.23, on all older kernel versions it did work fine.

Seems to hang in do_nanosleep()

[ 153.775792] sleep S 0000000000000000 0 5372 5341
[ 153.782385] ffff81007f0a9ea8 0000000000000082 0000000000000000 0000000000008efc
[ 153.790635] ffff81007f0a9e48 ffffffff802447b4 ffff81007f0c3080 0000000300000000
[ 153.798938] ffff81007f0c39c8 ffff81007f0c37c0 000000004001d908 0000000000000000
[ 153.806991] Call Trace:
[ 153.809937] [<ffffffff8048e4cd>] do_nanosleep+0x42/0x75
[ 153.815727] [<0000000000000001>]
[ 153.819383]
[ 153.775792] sleep S 0000000000000000 0 5372 5341


[ 330.669444] SysRq : Show Pending Timers
[ 330.673552] Timer List Version: v0.3
[ 330.677326] HRTIMER_MAX_CLOCK_BASES: 2
[ 330.681282] now at 255011372633 nsecs
[ 330.829981] active timers:
[ 330.832859] #0: <ffff81007f0e3de8>, hrtimer_wakeup, S:01
[ 330.838805] # expires at 260156346358 nsecs [in 5144973725 nsecs]

[ 337.046189] now at 261387685432 nsecs
[ 337.194966] active timers:
[ 337.197834] #0: <ffff81007f0e3de8>, hrtimer_wakeup, S:01
[ 337.203793] # expires at 260156346358 nsecs [in 18446744072478212542 nsecs]
[ 330.669444] SysRq : Show Pending Timers


Any ideas?

Thanks,
Bernd


2007-10-08 13:21:27

by Bernd Schubert

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return

Bernd Schubert wrote:

> Hi,
>
> we have a system here were e.g. "sleep 1" will never finish. This is an
> issue of 2.6.23, on all older kernel versions it did work fine.
>
> Seems to hang in do_nanosleep()
>


Update: Enabling hpet in the bios and setting clocksource=hpet as command
line parameter will fix it, but still its not nice that something that
worked without a problem in 2.6.22 and below suddenly doesn't work in
2.6.23.


Bernd

2007-10-08 14:33:18

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return

On Mon, 08 Oct 2007 15:20:26 +0200
Bernd Schubert <[email protected]> wrote:
> Bernd Schubert wrote:

> > we have a system here were e.g. "sleep 1" will never finish. This
> > is an issue of 2.6.23, on all older kernel versions it did work
> > fine.
> >
> > Seems to hang in do_nanosleep()
>
>
> Update: Enabling hpet in the bios and setting clocksource=hpet as
> command line parameter will fix it, but still its not nice that
> something that worked without a problem in 2.6.22 and below suddenly
> doesn't work in 2.6.23.

Which timer source is in use when the system hangs?

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2007-10-08 15:01:45

by Bernd Schubert

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return

On Monday 08 October 2007 16:32:52 Rik van Riel wrote:
> On Mon, 08 Oct 2007 15:20:26 +0200
>
> Bernd Schubert <[email protected]> wrote:
> > Bernd Schubert wrote:
> > > we have a system here were e.g. "sleep 1" will never finish. This
> > > is an issue of 2.6.23, on all older kernel versions it did work
> > > fine.
> > >
> > > Seems to hang in do_nanosleep()
> >
> > Update: Enabling hpet in the bios and setting clocksource=hpet as
> > command line parameter will fix it, but still its not nice that
> > something that worked without a problem in 2.6.22 and below suddenly
> > doesn't work in 2.6.23.
>
> Which timer source is in use when the system hangs?

Well, not the systems hangs, only processes running nanosleep. Well, since the
system is booted diskless, one of the very first commands is to
run "/etc/init.d/portmap start", which has a sleep call in its script and so
it will halt the boot process.

The problematic timer source is acpi_pm. Its also interesting that setting the
timer source
via /sys/devices/system/clocksource/clocksource0/current_clocksource won't
fix that problem. Only the boot option clocksource={other than acpi_pm} does
help.

Thanks,
Bernd

--
Bernd Schubert
Q-Leap Networks GmbH

2007-10-08 15:16:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return

On Mon, 2007-10-08 at 17:01 +0200, Bernd Schubert wrote:
> On Monday 08 October 2007 16:32:52 Rik van Riel wrote:
> > On Mon, 08 Oct 2007 15:20:26 +0200
> >
> > Bernd Schubert <[email protected]> wrote:
> > > Bernd Schubert wrote:
> > > > we have a system here were e.g. "sleep 1" will never finish. This
> > > > is an issue of 2.6.23, on all older kernel versions it did work
> > > > fine.
> > > >
> > > > Seems to hang in do_nanosleep()
> > >
> > > Update: Enabling hpet in the bios and setting clocksource=hpet as
> > > command line parameter will fix it, but still its not nice that
> > > something that worked without a problem in 2.6.22 and below suddenly
> > > doesn't work in 2.6.23.
> >
> > Which timer source is in use when the system hangs?
>
> Well, not the systems hangs, only processes running nanosleep. Well, since the
> system is booted diskless, one of the very first commands is to
> run "/etc/init.d/portmap start", which has a sleep call in its script and so
> it will halt the boot process.
>
> The problematic timer source is acpi_pm. Its also interesting that setting the
> timer source
> via /sys/devices/system/clocksource/clocksource0/current_clocksource won't
> fix that problem. Only the boot option clocksource={other than acpi_pm} does
> help.

Maybe Thomas knows..

2007-10-08 19:08:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return

On Monday, 8 October 2007 17:01, Bernd Schubert wrote:
> On Monday 08 October 2007 16:32:52 Rik van Riel wrote:
> > On Mon, 08 Oct 2007 15:20:26 +0200
> >
> > Bernd Schubert <[email protected]> wrote:
> > > Bernd Schubert wrote:
> > > > we have a system here were e.g. "sleep 1" will never finish. This
> > > > is an issue of 2.6.23, on all older kernel versions it did work
> > > > fine.
> > > >
> > > > Seems to hang in do_nanosleep()
> > >
> > > Update: Enabling hpet in the bios and setting clocksource=hpet as
> > > command line parameter will fix it, but still its not nice that
> > > something that worked without a problem in 2.6.22 and below suddenly
> > > doesn't work in 2.6.23.
> >
> > Which timer source is in use when the system hangs?
>
> Well, not the systems hangs, only processes running nanosleep. Well, since the
> system is booted diskless, one of the very first commands is to
> run "/etc/init.d/portmap start", which has a sleep call in its script and so
> it will halt the boot process.
>
> The problematic timer source is acpi_pm. Its also interesting that setting the
> timer source
> via /sys/devices/system/clocksource/clocksource0/current_clocksource won't
> fix that problem. Only the boot option clocksource={other than acpi_pm} does
> help.

I've created a bugzilla entry for this regression at

http://bugzilla.kernel.org/show_bug.cgi?id=9134

Please add a summary of your observations to it.

Thanks,
Rafael

2007-10-08 21:48:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.23 regression: do_nanosleep will not return


On Mon, 8 Oct 2007, Bernd Schubert wrote:

> Hi,
>
> we have a system here were e.g. "sleep 1" will never finish. This is an
> issue of 2.6.23, on all older kernel versions it did work fine.
>
> Seems to hang in do_nanosleep()
>
> [ 153.775792] sleep S 0000000000000000 0 5372 5341
> [ 153.782385] ffff81007f0a9ea8 0000000000000082 0000000000000000 0000000000008efc
> [ 153.790635] ffff81007f0a9e48 ffffffff802447b4 ffff81007f0c3080 0000000300000000
> [ 153.798938] ffff81007f0c39c8 ffff81007f0c37c0 000000004001d908 0000000000000000
> [ 153.806991] Call Trace:
> [ 153.809937] [<ffffffff8048e4cd>] do_nanosleep+0x42/0x75
> [ 153.815727] [<0000000000000001>]
> [ 153.819383]
> [ 153.775792] sleep S 0000000000000000 0 5372 5341
>
>
> [ 330.669444] SysRq : Show Pending Timers
> [ 330.673552] Timer List Version: v0.3
> [ 330.677326] HRTIMER_MAX_CLOCK_BASES: 2
> [ 330.681282] now at 255011372633 nsecs
> [ 330.829981] active timers:
> [ 330.832859] #0: <ffff81007f0e3de8>, hrtimer_wakeup, S:01
> [ 330.838805] # expires at 260156346358 nsecs [in 5144973725 nsecs]
>
> [ 337.046189] now at 261387685432 nsecs
> [ 337.194966] active timers:
> [ 337.197834] #0: <ffff81007f0e3de8>, hrtimer_wakeup, S:01
> [ 337.203793] # expires at 260156346358 nsecs [in 18446744072478212542 nsecs]

timer already expired ---------^^^^^^^^^^^^
(~1.2 seconds)

hmm, signedness problem (only in the display) ---------^^^^^^^^^^^^^^^^^^^

Can you please put a complete system description, your .config and a boot
log into bugzilla ?

Thanks,

tglx