2008-12-13 18:37:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.28-rc8-git1: Reported regressions from 2.6.27

This message contains a list of some regressions from 2.6.27, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.27, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2008-12-13 111 14 13
2008-12-07 106 20 17
2008-12-04 106 29 21
2008-11-22 93 25 15
2008-11-16 89 32 18
2008-11-09 73 40 27
2008-11-02 55 41 29
2008-10-25 26 25 20


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12210
Subject : 2.6.28-rc8 big regression in VM
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-12-12 18:38 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122910711005135&w=4
Handled-By : Wu Fengguang <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject : oldish top core dumps (in its meminfo() function)
Submitter : Andreas Mohr <[email protected]>
Date : 2008-12-12 18:49 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122910784006472&w=4
http://marc.info/?l=linux-kernel&m=122907511319288&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject : uml is very slow on 2.6.28 host
Submitter : Miklos Szeredi <[email protected]>
Date : 2008-12-12 9:35 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12195
Subject : "dd" make kernel panic
Submitter : alexs <[email protected]>
Date : 2008-12-10 18:07 (4 days old)
Handled-By : James Bottomley <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12178
Subject : Xorg crash at first start
Submitter : Cédric Godin <[email protected]>
Date : 2008-12-04 14:26 (10 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=52440211dcdc52c0b757f8b34d122e11b12cdd50
References : http://marc.info/?l=linux-kernel&m=122840082828098&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12160
Subject : networking oops after resume from s2ram (2.6.28-rc6)
Submitter : Marcin Slusarz <[email protected]>
Date : 2008-11-28 21:15 (16 days old)
References : http://marc.info/?l=linux-kernel&m=122790701615723&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12159
Subject : 2.6.28-rc6-git1 -- No sound produced from Intel HDA ALSA driver
Submitter : Miles Lane <[email protected]>
Date : 2008-11-27 20:33 (17 days old)
References : http://marc.info/?l=linux-kernel&m=122781805620212&w=4
Handled-By : Takashi Iwai <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12156
Subject : v2.6.28-rc2: x86_32 relocation regression?
Submitter : Vegard Nossum <[email protected]>
Date : 2008-11-24 21:19 (20 days old)
References : http://marc.info/?l=linux-kernel&m=122756158220966&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12155
Subject : Regression in 2.6.28-rc and 2.6.27-stable - hibernate related
Submitter : Fabio Comolli <[email protected]>
Date : 2008-11-23 16:17 (21 days old)
References : http://marc.info/?l=linux-kernel&m=122745709926361&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100
Subject : resume (S2R) broken by Intel microcode module, on A110L
Submitter : Andreas Mohr <[email protected]>
Date : 2008-11-25 08:48 (19 days old)
Handled-By : Dmitry Adamushko <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject : snd_hda_intel: power_save: sound cracks on powerdown
Submitter : Jens Weibler <[email protected]>
Date : 2008-11-18 12:07 (26 days old)
Handled-By : Takashi Iwai <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12028
Subject : i915 DRM is broken in 2.6.28-rc4
Submitter : Adam Tkac <[email protected]>
Date : 2008-11-14 01:50 (30 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11849
Subject : default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)
Submitter : Kumar Gala <[email protected]>
Date : 2008-10-24 12:45 (51 days old)
References : http://marc.info/?l=linux-kernel&m=122485245924125&w=4


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12047
Subject : ACPI toshiba: only register rfkill if bt is enabled
Submitter : Andrey Borzenkov <[email protected]>
Date : 2008-10-28 19:10 (47 days old)
References : http://marc.info/?l=linux-kernel&m=122522113619025&w=2
Handled-By : Frederik Deweerdt <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=122526843117478&w=2


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.27,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11808

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2008-12-13 18:37:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11849] default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11849
Subject : default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)
Submitter : Kumar Gala <[email protected]>
Date : 2008-10-24 12:45 (51 days old)
References : http://marc.info/?l=linux-kernel&m=122485245924125&w=4

2008-12-13 18:40:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100
Subject : resume (S2R) broken by Intel microcode module, on A110L
Submitter : Andreas Mohr <[email protected]>
Date : 2008-11-25 08:48 (19 days old)
Handled-By : Dmitry Adamushko <[email protected]>

2008-12-13 18:41:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12155
Subject : Regression in 2.6.28-rc and 2.6.27-stable - hibernate related
Submitter : Fabio Comolli <[email protected]>
Date : 2008-11-23 16:17 (21 days old)
References : http://marc.info/?l=linux-kernel&m=122745709926361&w=4

2008-12-13 18:41:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12047] ACPI toshiba: only register rfkill if bt is enabled

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12047
Subject : ACPI toshiba: only register rfkill if bt is enabled
Submitter : Andrey Borzenkov <[email protected]>
Date : 2008-10-28 19:10 (47 days old)
References : http://marc.info/?l=linux-kernel&m=122522113619025&w=2
Handled-By : Frederik Deweerdt <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=122526843117478&w=2

2008-12-13 18:41:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject : snd_hda_intel: power_save: sound cracks on powerdown
Submitter : Jens Weibler <[email protected]>
Date : 2008-11-18 12:07 (26 days old)
Handled-By : Takashi Iwai <[email protected]>

2008-12-13 18:42:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12028] i915 DRM is broken in 2.6.28-rc4

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12028
Subject : i915 DRM is broken in 2.6.28-rc4
Submitter : Adam Tkac <[email protected]>
Date : 2008-11-14 01:50 (30 days old)

2008-12-13 18:42:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12160] networking oops after resume from s2ram (2.6.28-rc6)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12160
Subject : networking oops after resume from s2ram (2.6.28-rc6)
Submitter : Marcin Slusarz <[email protected]>
Date : 2008-11-28 21:15 (16 days old)
References : http://marc.info/?l=linux-kernel&m=122790701615723&w=4

2008-12-13 18:42:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12178] Xorg crash at first start

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12178
Subject : Xorg crash at first start
Submitter : Cédric Godin <[email protected]>
Date : 2008-12-04 14:26 (10 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=52440211dcdc52c0b757f8b34d122e11b12cdd50
References : http://marc.info/?l=linux-kernel&m=122840082828098&w=4

2008-12-13 18:43:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12156] v2.6.28-rc2: x86_32 relocation regression?

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12156
Subject : v2.6.28-rc2: x86_32 relocation regression?
Submitter : Vegard Nossum <[email protected]>
Date : 2008-11-24 21:19 (20 days old)
References : http://marc.info/?l=linux-kernel&m=122756158220966&w=4

2008-12-13 18:43:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12195] "dd" make kernel panic

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12195
Subject : "dd" make kernel panic
Submitter : alexs <[email protected]>
Date : 2008-12-10 18:07 (4 days old)
Handled-By : James Bottomley <[email protected]>

2008-12-13 18:43:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12208] uml is very slow on 2.6.28 host

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject : uml is very slow on 2.6.28 host
Submitter : Miklos Szeredi <[email protected]>
Date : 2008-12-12 9:35 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4

2008-12-13 18:44:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12210] 2.6.28-rc8 big regression in VM

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12210
Subject : 2.6.28-rc8 big regression in VM
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-12-12 18:38 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122910711005135&w=4
Handled-By : Wu Fengguang <[email protected]>

2008-12-13 18:44:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12209] oldish top core dumps (in its meminfo() function)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject : oldish top core dumps (in its meminfo() function)
Submitter : Andreas Mohr <[email protected]>
Date : 2008-12-12 18:49 (2 days old)
References : http://marc.info/?l=linux-kernel&m=122910784006472&w=4
http://marc.info/?l=linux-kernel&m=122907511319288&w=4

2008-12-13 18:44:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #12159] 2.6.28-rc6-git1 -- No sound produced from Intel HDA ALSA driver

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.27. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12159
Subject : 2.6.28-rc6-git1 -- No sound produced from Intel HDA ALSA driver
Submitter : Miles Lane <[email protected]>
Date : 2008-11-27 20:33 (17 days old)
References : http://marc.info/?l=linux-kernel&m=122781805620212&w=4
Handled-By : Takashi Iwai <[email protected]>

2008-12-13 18:45:31

by Fabio Comolli

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi.

On Sat, Dec 13, 2008 at 5:33 PM, Rafael J. Wysocki <[email protected]> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.27. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12155
> Subject : Regression in 2.6.28-rc and 2.6.27-stable - hibernate related
> Submitter : Fabio Comolli <[email protected]>
> Date : 2008-11-23 16:17 (21 days old)
> References : http://marc.info/?l=linux-kernel&m=122745709926361&w=4

Still present. It has been bisected to:

---------------------------------------------------------------------------------------------------------------
commit 5e55aa8db085dad1aabb4574c73c23c7ae571e7b
Author: Dave Kleikamp <[email protected]>
Date: Sun Oct 26 18:20:14 2008 -0400

sched_clock: prevent scd->clock from moving backwards

commit 5b7dba4ff834259a5623e03a565748704a8fe449 upstream

sched_clock: prevent scd->clock from moving backwards

When sched_clock_cpu() couples the clocks between two cpus, it may
increment scd->clock beyond the GTOD tick window that __update_sched_clock()
uses to clamp the clock. A later call to __update_sched_clock() may move
the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
monotonic property.

This patch ensures that scd->clock will not be set backward.

Signed-off-by: Dave Kleikamp <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Chuck Ebbert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
------------------------------------------------------------------------------------------------------------

Both 2.6.27.8 and 2.6.28-rc8 with that commit reverted work fine
(well, at least they failed to show the bug so far).

2008-12-13 18:58:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Saturday, 13 of December 2008, Fabio Comolli wrote:
> Hi.

Hi,

> On Sat, Dec 13, 2008 at 5:33 PM, Rafael J. Wysocki <[email protected]> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.27. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12155
> > Subject : Regression in 2.6.28-rc and 2.6.27-stable - hibernate related
> > Submitter : Fabio Comolli <[email protected]>
> > Date : 2008-11-23 16:17 (21 days old)
> > References : http://marc.info/?l=linux-kernel&m=122745709926361&w=4
>
> Still present. It has been bisected to:
>
> ---------------------------------------------------------------------------------------------------------------
> commit 5e55aa8db085dad1aabb4574c73c23c7ae571e7b
> Author: Dave Kleikamp <[email protected]>
> Date: Sun Oct 26 18:20:14 2008 -0400
>
> sched_clock: prevent scd->clock from moving backwards
>
> commit 5b7dba4ff834259a5623e03a565748704a8fe449 upstream
>
> sched_clock: prevent scd->clock from moving backwards
>
> When sched_clock_cpu() couples the clocks between two cpus, it may
> increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> uses to clamp the clock. A later call to __update_sched_clock() may move
> the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> monotonic property.
>
> This patch ensures that scd->clock will not be set backward.
>
> Signed-off-by: Dave Kleikamp <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Cc: Chuck Ebbert <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ------------------------------------------------------------------------------------------------------------
>
> Both 2.6.27.8 and 2.6.28-rc8 with that commit reverted work fine
> (well, at least they failed to show the bug so far).

Thanks for the update, I have put this information into the Bugzilla entry.

Would everyone involved agree with reverting the above commit for now and
revisiting the issue in the 2.6.29 time frame?

Rafael

2008-12-14 23:11:25

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Sat, 2008-12-13 at 19:56 +0100, Rafael J. Wysocki wrote:

> Thanks for the update, I have put this information into the Bugzilla entry.
>
> Would everyone involved agree with reverting the above commit for now and
> revisiting the issue in the 2.6.29 time frame?

I agree. I can't explain what's wrong, but it seems obvious that my
patch either causes or uncovers a more serious problem than it fixed.

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2008-12-16 00:49:58

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Sat, 13 Dec 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
> Subject : uml is very slow on 2.6.28 host
> Submitter : Miklos Szeredi <[email protected]>
> Date : 2008-12-12 9:35 (2 days old)
> References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4

I did a bisection, and this is the commit which is responsible:

commit 464b75273f64be7c81fee975bd6ca9593df3427b
Author: Peter Zijlstra <[email protected]>
Date: Fri Oct 24 11:06:15 2008 +0200

sched: re-instate vruntime based wakeup preemption

The advantage is that vruntime based wakeup preemption has a better
conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
Therefore wakeup_gran is the granularity of unfairness we allow in order
to make progress.

Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

Miklos

2008-12-16 03:25:40

by Mike Galbraith

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 2008-12-16 at 01:49 +0100, Miklos Szeredi wrote:
> On Sat, 13 Dec 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
> > Subject : uml is very slow on 2.6.28 host
> > Submitter : Miklos Szeredi <[email protected]>
> > Date : 2008-12-12 9:35 (2 days old)
> > References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4
>
> I did a bisection, and this is the commit which is responsible:
>
> commit 464b75273f64be7c81fee975bd6ca9593df3427b
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Oct 24 11:06:15 2008 +0200
>
> sched: re-instate vruntime based wakeup preemption
>
> The advantage is that vruntime based wakeup preemption has a better
> conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
> Therefore wakeup_gran is the granularity of unfairness we allow in order
> to make progress.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Acked-by: Mike Galbraith <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>

If that commit is responsible, then it should also be very slow in pre
28 kernels, where the same exists. Hm, there's another possibility.
Can you try echo NO_LAST_BUDDY > /sys/kernel/debug/sched_features?

-Mike

2008-12-16 06:56:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 2008-12-16 at 01:49 +0100, Miklos Szeredi wrote:
> On Sat, 13 Dec 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
> > Subject : uml is very slow on 2.6.28 host
> > Submitter : Miklos Szeredi <[email protected]>
> > Date : 2008-12-12 9:35 (2 days old)
> > References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4
>
> I did a bisection, and this is the commit which is responsible:
>
> commit 464b75273f64be7c81fee975bd6ca9593df3427b
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Oct 24 11:06:15 2008 +0200
>
> sched: re-instate vruntime based wakeup preemption
>
> The advantage is that vruntime based wakeup preemption has a better
> conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
> Therefore wakeup_gran is the granularity of unfairness we allow in order
> to make progress.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Acked-by: Mike Galbraith <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>

How's 27? That code basically makes .28 do what .27 did, we tried
something else for a little while and that made stuff suck rocks.

2008-12-16 10:27:21

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 16 Dec 2008, Mike Galbraith wrote:
> On Tue, 2008-12-16 at 01:49 +0100, Miklos Szeredi wrote:
> > On Sat, 13 Dec 2008, Rafael J. Wysocki wrote:
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
> > > Subject : uml is very slow on 2.6.28 host
> > > Submitter : Miklos Szeredi <[email protected]>
> > > Date : 2008-12-12 9:35 (2 days old)
> > > References : http://marc.info/?l=linux-kernel&m=122907463518593&w=4
> >
> > I did a bisection, and this is the commit which is responsible:
> >
> > commit 464b75273f64be7c81fee975bd6ca9593df3427b
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Oct 24 11:06:15 2008 +0200
> >
> > sched: re-instate vruntime based wakeup preemption
> >
> > The advantage is that vruntime based wakeup preemption has a better
> > conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
> > Therefore wakeup_gran is the granularity of unfairness we allow in order
> > to make progress.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Acked-by: Mike Galbraith <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
>
> If that commit is responsible, then it should also be very slow in pre
> 28 kernels, where the same exists.

Everything prior to 2.6.28 was fine in this respect, so there must be
some subtle difference.

> Hm, there's another possibility.
> Can you try echo NO_LAST_BUDDY > /sys/kernel/debug/sched_features?

It didn't help, unfortunately.

Applying this patch on top of latest git (which essentially reverts
the above commit) fixes the slowness.

Thanks,
Miklos


---
kernel/sched_fair.c | 98 +++-------------------------------------------------
1 file changed, 6 insertions(+), 92 deletions(-)

Index: linux.git/kernel/sched_fair.c
===================================================================
--- linux.git.orig/kernel/sched_fair.c 2008-12-16 01:34:26.000000000 +0100
+++ linux.git/kernel/sched_fair.c 2008-12-16 01:34:54.000000000 +0100
@@ -143,49 +143,6 @@ static inline struct sched_entity *paren
return se->parent;
}

-/* return depth at which a sched entity is present in the hierarchy */
-static inline int depth_se(struct sched_entity *se)
-{
- int depth = 0;
-
- for_each_sched_entity(se)
- depth++;
-
- return depth;
-}
-
-static void
-find_matching_se(struct sched_entity **se, struct sched_entity **pse)
-{
- int se_depth, pse_depth;
-
- /*
- * preemption test can be made between sibling entities who are in the
- * same cfs_rq i.e who have a common parent. Walk up the hierarchy of
- * both tasks until we find their ancestors who are siblings of common
- * parent.
- */
-
- /* First walk up until both entities are at same depth */
- se_depth = depth_se(*se);
- pse_depth = depth_se(*pse);
-
- while (se_depth > pse_depth) {
- se_depth--;
- *se = parent_entity(*se);
- }
-
- while (pse_depth > se_depth) {
- pse_depth--;
- *pse = parent_entity(*pse);
- }
-
- while (!is_same_group(*se, *pse)) {
- *se = parent_entity(*se);
- *pse = parent_entity(*pse);
- }
-}
-
#else /* CONFIG_FAIR_GROUP_SCHED */

static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
@@ -236,11 +193,6 @@ static inline struct sched_entity *paren
return NULL;
}

-static inline void
-find_matching_se(struct sched_entity **se, struct sched_entity **pse)
-{
-}
-
#endif /* CONFIG_FAIR_GROUP_SCHED */


@@ -1291,8 +1243,8 @@ static unsigned long wakeup_gran(struct
* More easily preempt - nice tasks, while not making it harder for
* + nice tasks.
*/
- if (!sched_feat(ASYM_GRAN) || se->load.weight > NICE_0_LOAD)
- gran = calc_delta_fair(sysctl_sched_wakeup_granularity, se);
+ if (sched_feat(ASYM_GRAN))
+ gran = calc_delta_mine(gran, NICE_0_LOAD, &se->load);

return gran;
}
@@ -1345,6 +1268,7 @@ static void check_preempt_wakeup(struct
{
struct task_struct *curr = rq->curr;
struct sched_entity *se = &curr->se, *pse = &p->se;
+ s64 delta_exec;

if (unlikely(rt_prio(p->prio))) {
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
@@ -1398,19 +1322,9 @@ static void check_preempt_wakeup(struct
return;
}

- find_matching_se(&se, &pse);
-
- while (se) {
- BUG_ON(!pse);
-
- if (wakeup_preempt_entity(se, pse) == 1) {
- resched_task(curr);
- break;
- }
-
- se = parent_entity(se);
- pse = parent_entity(pse);
- }
+ delta_exec = se->sum_exec_runtime - se->prev_sum_exec_runtime;
+ if (delta_exec > wakeup_gran(pse))
+ resched_task(curr);
}

static struct task_struct *pick_next_task_fair(struct rq *rq)

2008-12-16 14:19:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 2008-12-16 at 11:26 +0100, Miklos Szeredi wrote:
> On Tue, 16 Dec 2008, Mike Galbraith wrote:

> > If that commit is responsible, then it should also be very slow in pre
> > 28 kernels, where the same exists.
>
> Everything prior to 2.6.28 was fine in this respect, so there must be
> some subtle difference.

Yeah, strange.

> > Hm, there's another possibility.
> > Can you try echo NO_LAST_BUDDY > /sys/kernel/debug/sched_features?
>
> It didn't help, unfortunately.

I'm happy to hear that actually.

> Applying this patch on top of latest git (which essentially reverts
> the above commit) fixes the slowness.

We definitely don't want to do that. Hm. There are only two commits
that spring to mind...

1af5f730fc1bf7c62ec9fb2d307206e18bf40a69, which is another hope not, and
3f3a490480d8ab96e0fe30a41f80f14e6a0c579d which doesn't seem likely.

-Mike

2008-12-16 15:27:31

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 16 Dec 2008, Mike Galbraith wrote:
> We definitely don't want to do that. Hm. There are only two commits
> that spring to mind...
>
> 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69, which is another hope not, and

This didn't fix it either.

> 3f3a490480d8ab96e0fe30a41f80f14e6a0c579d which doesn't seem likely.

This can't be reverted on latest git.

Is there a way to trace what is happening in the scheduler?

Thanks,
Miklos

2008-12-16 19:07:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related


* Rafael J. Wysocki <[email protected]> wrote:

> On Saturday, 13 of December 2008, Fabio Comolli wrote:
> > Hi.
>
> Hi,
>
> > On Sat, Dec 13, 2008 at 5:33 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.27. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12155
> > > Subject : Regression in 2.6.28-rc and 2.6.27-stable - hibernate related
> > > Submitter : Fabio Comolli <[email protected]>
> > > Date : 2008-11-23 16:17 (21 days old)
> > > References : http://marc.info/?l=linux-kernel&m=122745709926361&w=4
> >
> > Still present. It has been bisected to:
> >
> > ---------------------------------------------------------------------------------------------------------------
> > commit 5e55aa8db085dad1aabb4574c73c23c7ae571e7b
> > Author: Dave Kleikamp <[email protected]>
> > Date: Sun Oct 26 18:20:14 2008 -0400
> >
> > sched_clock: prevent scd->clock from moving backwards
> >
> > commit 5b7dba4ff834259a5623e03a565748704a8fe449 upstream
> >
> > sched_clock: prevent scd->clock from moving backwards
> >
> > When sched_clock_cpu() couples the clocks between two cpus, it may
> > increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> > uses to clamp the clock. A later call to __update_sched_clock() may move
> > the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> > monotonic property.
> >
> > This patch ensures that scd->clock will not be set backward.
> >
> > Signed-off-by: Dave Kleikamp <[email protected]>
> > Acked-by: Peter Zijlstra <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> > Cc: Chuck Ebbert <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > ------------------------------------------------------------------------------------------------------------
> >
> > Both 2.6.27.8 and 2.6.28-rc8 with that commit reverted work fine
> > (well, at least they failed to show the bug so far).
>
> Thanks for the update, I have put this information into the Bugzilla entry.
>
> Would everyone involved agree with reverting the above commit for now
> and revisiting the issue in the 2.6.29 time frame?

yeah - i think it's too late to do anything but a revert here.

Ingo

2008-12-16 19:15:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related



On Tue, 16 Dec 2008, Ingo Molnar wrote:
>
> yeah - i think it's too late to do anything but a revert here.

Yup. Already done. Commit ca7e716c7833aeaeb8fedd6d004c5f5d5e14d325.

Did -stable revert it too?

Linus

2008-12-16 19:19:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related


* Linus Torvalds <[email protected]> wrote:

> On Tue, 16 Dec 2008, Ingo Molnar wrote:
> >
> > yeah - i think it's too late to do anything but a revert here.
>
> Yup. Already done. Commit ca7e716c7833aeaeb8fedd6d004c5f5d5e14d325.
>
> Did -stable revert it too?

not that i know of (-stable tries to follow upstream with reverts too).
We could have put a Cc: [email protected] into the reverter commit ;-)

Ingo

2008-12-16 19:52:43

by Greg KH

[permalink] [raw]
Subject: Re: [Bug #12155] Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Tue, Dec 16, 2008 at 08:18:27PM +0100, Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
> > On Tue, 16 Dec 2008, Ingo Molnar wrote:
> > >
> > > yeah - i think it's too late to do anything but a revert here.
> >
> > Yup. Already done. Commit ca7e716c7833aeaeb8fedd6d004c5f5d5e14d325.
> >
> > Did -stable revert it too?
>
> not that i know of (-stable tries to follow upstream with reverts too).
> We could have put a Cc: [email protected] into the reverter commit ;-)

Yes, I'll go revert it later today, give me a chance to catch up :)

thanks,

greg k-h

2008-12-25 11:20:39

by Fengguang Wu

[permalink] [raw]
Subject: Re: [Bug #12210] 2.6.28-rc8 big regression in VM

Hi Rafael,

According to Lukas's latest comment, let's close this bug?

Thanks,
Fengguang

On Sat, Dec 13, 2008 at 06:33:29PM +0200, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.27. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12210
> Subject : 2.6.28-rc8 big regression in VM
> Submitter : Lukas Hejtmanek <[email protected]>
> Date : 2008-12-12 18:38 (2 days old)
> References : http://marc.info/?l=linux-kernel&m=122910711005135&w=4
> Handled-By : Wu Fengguang <[email protected]>
>
>

2008-12-25 14:11:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #12210] 2.6.28-rc8 big regression in VM

On Thursday, 25 of December 2008, Wu Fengguang wrote:
> Hi Rafael,
>
> According to Lukas's latest comment, let's close this bug?

Sure, closed.

Thanks,
Rafael


> On Sat, Dec 13, 2008 at 06:33:29PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.27. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12210
> > Subject : 2.6.28-rc8 big regression in VM
> > Submitter : Lukas Hejtmanek <[email protected]>
> > Date : 2008-12-12 18:38 (2 days old)
> > References : http://marc.info/?l=linux-kernel&m=122910711005135&w=4
> > Handled-By : Wu Fengguang <[email protected]>

2008-12-17 06:17:27

by Mike Galbraith

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host

On Tue, 2008-12-16 at 16:27 +0100, Miklos Szeredi wrote:

> Is there a way to trace what is happening in the scheduler?

Sure. Ingo has a script for gathering info (attached), if you run it,
please gzip up the output and send me a copy offline to eyeball.

There's also ftrace, but I've not tried that yet, so can't offer any
advice, I use primitive but effective time_after() + printk() with klogd
wakeup disabled (deadlock).

-Mike


Attachments:
cfs-debug-info.sh (3.48 kB)

2008-12-18 14:38:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #12208] uml is very slow on 2.6.28 host


* Mike Galbraith <[email protected]> wrote:

> On Tue, 2008-12-16 at 16:27 +0100, Miklos Szeredi wrote:
>
> > Is there a way to trace what is happening in the scheduler?
>
> Sure. Ingo has a script for gathering info (attached), if you run it,
> please gzip up the output and send me a copy offline to eyeball.
>
> There's also ftrace, but I've not tried that yet, so can't offer any
> advice, I use primitive but effective time_after() + printk() with klogd
> wakeup disabled (deadlock).

btw., there's a recent commit:

32a7600: printk: make printk more robust by not allowing recursion

since then printk shouldnt deadlock anymore, even if called from within
the scheduler.

Btw., ftrace_printk() can be used similarly (and you can capture it
nonstop via /debug/tracing/trace_pipe), and should not deadlock either.

Ingo