2008-01-05 19:40:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.24-rc6-git12: Reported regressions from 2.6.23

This message contains a list of some regressions from 2.6.23 reported since
2.6.24-rc1 was released, for which there are no fixes in the mainline I know
of. ?If any of them have been fixed already, please let me know.

If you know of any other unresolved regressions from 2.6.23, please let me know
either and I'll add them to the list. ?Also, please let me know if any of the
entries below are invalid.

Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
Today 139 28 15
2008-01-01 139 38 23
2007-12-21 118 21 13
2007-12-18 115 29 15
2007-12-12 106 31 17
2007-12-08 98 29 19
2007-12-01 85 29 18
2007-11-24 75 25 21
2007-11-19 68 26 21
2007-11-17 65 25 20


Unresolved regressions
----------------------

Subject : EHCI causes system to resume instantly from S4
Submitter : Maxim Levitsky <[email protected]>
Date : 2007-10-28 14:56
References : http://lkml.org/lkml/2007/10/27/66
http://bugzilla.kernel.org/show_bug.cgi?id=9258
Handled-By : "Rafael J. Wysocki" <[email protected]>
David Brownell <[email protected]>
Alan Stern <[email protected]>
Workaround : http://bugzilla.kernel.org/show_bug.cgi?id=9258#c30


Subject : SError: { DevExch } occuring and causing disruption
Submitter : Avuton Olrich <[email protected]>
Date : 2007-11-15 22:39
References : http://bugzilla.kernel.org/show_bug.cgi?id=9393
Handled-By : Tejun Heo <[email protected]>
Mark Lord <[email protected]>


Subject : 20000+ wake-ups/second in 2.6.24
Submitter : Mark Lord <[email protected]>
Date : 2007-12-02 04:23
References : http://lkml.org/lkml/2007/12/1/141
http://bugzilla.kernel.org/show_bug.cgi?id=9489
Handled-By : Arjan van de Ven <[email protected]>


Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
Submitter : "Parag Warudkar" <[email protected]>
Date : 2007-12-07 18:14
References : http://lkml.org/lkml/2007/12/7/299
http://bugzilla.kernel.org/show_bug.cgi?id=9525
Handled-By : "Pallipadi, Venkatesh" <[email protected]>
Thomas Gleixner <[email protected]>
Ingo Molnar <[email protected]>


Subject : BUG: bad unlock balance detected!
Submitter : Krzysztof Oledzki <[email protected]>
Date : 2007-12-11 03:17
References : http://bugzilla.kernel.org/show_bug.cgi?id=9542
Handled-By : Andrew Morton <[email protected]>
Herbert Xu <[email protected]>


Subject : PATA_HPT37X embezzles two ports
Submitter : "Bjoern Olausson" <[email protected]>
Date : 2007-12-12 11:05
References : http://lkml.org/lkml/2007/12/12/161
http://bugzilla.kernel.org/show_bug.cgi?id=9551
Handled-By :


Subject : Could not set non-blocking flag with 2.6.24-rc5
Submitter : Tino Keitel <[email protected]>
Date : 2007-12-13 16:27
References : http://lkml.org/lkml/2007/12/13/392
http://bugzilla.kernel.org/show_bug.cgi?id=9557
Handled-By :


Subject : swapping in 2.6.24-rc5-git3
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2007-12-17 14:04
References : http://lkml.org/lkml/2007/12/17/98
http://bugzilla.kernel.org/show_bug.cgi?id=9592
Handled-By : Jan Kara <[email protected]>


Subject : Problems on booting
Submitter : "werner" <[email protected]>
Date : 2007-12-22 14:29
References : http://lkml.org/lkml/2007/12/22/110
http://bugzilla.kernel.org/show_bug.cgi?id=9621
Handled-By :


Subject : ACPI or radeon: spontaneous reboot regression
Submitter : Matt Mackall <[email protected]>
Date : 2007-12-22 16:09
References : http://lkml.org/lkml/2007/12/22/139
http://bugzilla.kernel.org/show_bug.cgi?id=9624
Handled-By :


Subject : iptables won't work
Submitter : Kristoffer Malmstr?m <[email protected]>
Date : 2007-12-28
References : http://bugzilla.kernel.org/show_bug.cgi?id=9657
Handled-By : Patrick McHardy <[email protected]>


Subject : in 2.6.24-rc6 function keys stopped working - toshiba u300-13m, FSC V5505
Submitter : Keepa Mihail Sergeevich <[email protected]>
Date : 2007-12-29 16:07
References : http://bugzilla.kernel.org/show_bug.cgi?id=9663
Handled-By : Len Brown <[email protected]>


Subject : lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)
Submitter : Erez Zadok <[email protected]>
Date : 2007-12-24 18:02
References : http://lkml.org/lkml/2007/12/24/107
http://bugzilla.kernel.org/show_bug.cgi?id=9670
Handled-By :


Subject : IDE/ACPI related hibernation regression: Second attempt fails
Submitter : Mikko Vinni <[email protected]>
Date : 2007-12-31 13:27
References : http://lkml.org/lkml/2007/12/31/135
http://bugzilla.kernel.org/show_bug.cgi?id=9673
Handled-By : Andreas Mohr <[email protected]>


Subject : kexec buffer error
Submitter : Randy Dunlap <[email protected]>
Date : 2008-01-04 22:54
References : http://lkml.org/lkml/2008/1/4/255
http://bugzilla.kernel.org/show_bug.cgi?id=9693
Handled-By :


Regressions with patches
------------------------

Subject : [2.6.24-rc6] pdflush still stuck in D state regression
Submitter : "Tvrtko A. Ursulin" <[email protected]>
Date : 2007-11-02 09:54
References : http://bugzilla.kernel.org/show_bug.cgi?id=9291
Handled-By : Dave Kleikamp <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14219&action=view


Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
Submitter : Roland Dreier <[email protected]>
Date : 2007-11-08 14:55
References : http://lkml.org/lkml/2007/11/8/255
http://bugzilla.kernel.org/show_bug.cgi?id=9332
Handled-By : Takashi Iwai <[email protected]>
Ingo Molnar <[email protected]>
Patch : http://lkml.org/lkml/2007/11/16/66


Subject : jiffies counter leaps in 2.6.24-rc3
Submitter : Stefano Brivio <[email protected]>
Date : 2007-11-29 08:36
References : http://lkml.org/lkml/2007/11/24/53
http://bugzilla.kernel.org/show_bug.cgi?id=9475
Handled-By : Ingo Molnar <[email protected]>
Patch : http://lkml.org/lkml/2007/12/7/132


Subject : Battery shows up twice in kpowersave
Submitter : Rolf Eike Beer <[email protected]>
Date : 2007-12-03 12:06
References : http://bugzilla.kernel.org/show_bug.cgi?id=9494
Handled-By : Alexey Starikovskiy <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14137&action=view


Subject : 2.6.24-rc4 hwmon it87 probe fails
Submitter : Mike Houston <[email protected]>
Date : 2007-12-06 17:10
References : http://lkml.org/lkml/2007/12/4/466
http://bugzilla.kernel.org/show_bug.cgi?id=9514
Handled-By : Shaohua Li <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14267&action=view


Subject : RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
Submitter : Krzysztof Oledzki <[email protected]>
Date : 2007-12-11 03:20
References : http://bugzilla.kernel.org/show_bug.cgi?id=9543
Handled-By : Andrew Morton <[email protected]>
Herbert Xu <[email protected]>
Jay Vosburgh <[email protected]>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=9543#c6


Subject : Regression: Battery method parse error
Submitter : Bruce Duncan <[email protected]>
Date : 2007-12-23 13:00
References : http://bugzilla.kernel.org/show_bug.cgi?id=9627
Handled-By : Alexey Starikovskiy <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14181&action=view


Subject : 2.6.24-rc6 2.6.24-rc3 qstor timeouts during probe
Submitter : Alan Young <[email protected]>
Date : 2007-12-24 15:17
References : http://bugzilla.kernel.org/show_bug.cgi?id=9631
Handled-By : Tejun Heo <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14259&action=view


Subject : Unaligned accesses in xfs_file_readdir
Submitter : Dustin Marquess <[email protected]>
Date : 2007-12-25 17:47
References : http://bugzilla.kernel.org/show_bug.cgi?id=9635
Handled-By : Christoph Hellwig <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14255&action=view


Subject : restore ARMv6 OProfile support
Submitter : Adrian Bunk <[email protected]>
Date : 2007-12-28 11:39
References : http://bugzilla.kernel.org/show_bug.cgi?id=9653
http://lkml.org/lkml/2007/12/28/99
Handled-By : Mathieu Desnoyers <[email protected]>
Patch : http://lkml.org/lkml/2007/12/29/111


Subject : restore blackfin HARDWARE_PM support
Submitter : Adrian Bunk <[email protected]>
Date : 2007-12-28 11:40
References : http://lkml.org/lkml/2007/12/28/100
http://bugzilla.kernel.org/show_bug.cgi?id=9654
Handled-By : Mathieu Desnoyers <[email protected]>
Patch : http://lkml.org/lkml/2007/12/29/75


Subject : Booting from nfsroot fails
Submitter : Puzin, Dimitri <[email protected]>
Date : 2007-12-29 11:54
References : http://bugzilla.kernel.org/show_bug.cgi?id=9661
Handled-By : "David S. Miller" <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=14224&action=view


Subject : max_cstate VMware regression
Submitter : Mark Lord <[email protected]>
Date : 2008-01-03 14:54
References : http://lkml.org/lkml/2008/1/2/328
http://bugzilla.kernel.org/show_bug.cgi?id=9683
Handled-By : "Pallipadi, Venkatesh" <[email protected]>
Patch : http://lkml.org/lkml/2008/1/3/399


For details, please follow the links given in references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.23,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9243

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2008-01-05 20:08:56

by Alan

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

> Subject : PATA_HPT37X embezzles two ports
> Submitter : "Bjoern Olausson" <[email protected]>
> Date : 2007-12-12 11:05
> References : http://lkml.org/lkml/2007/12/12/161
> http://bugzilla.kernel.org/show_bug.cgi?id=9551

HPT374 patch was posted.

We also have a newly reported regression in USB pl2303

See [pl2303 regression] Linux 2.6.23 breaks gpsbabel's DG-100 support on
[email protected]

No bug # yet but will try and sort a patch out next week.

2008-01-05 23:36:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Saturday, 5 of January 2008, Alan Cox wrote:
> > Subject : PATA_HPT37X embezzles two ports
> > Submitter : "Bjoern Olausson" <[email protected]>
> > Date : 2007-12-12 11:05
> > References : http://lkml.org/lkml/2007/12/12/161
> > http://bugzilla.kernel.org/show_bug.cgi?id=9551
>
> HPT374 patch was posted.

Udated status, thanks.

> We also have a newly reported regression in USB pl2303
>
> See [pl2303 regression] Linux 2.6.23 breaks gpsbabel's DG-100 support on
> [email protected]
>
> No bug # yet but will try and sort a patch out next week.

Well, that's not a regression from 2.6.23.

2008-01-05 23:45:34

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

Rafael J. Wysocki wrote:
> This message contains a list of some regressions from 2.6.23 reported since
> 2.6.24-rc1 was released, for which there are no fixes in the mainline I know
> of. If any of them have been fixed already, please let me know.
..
> Subject : 20000+ wake-ups/second in 2.6.24
> Submitter : Mark Lord <[email protected]>
> Date : 2007-12-02 04:23
> References : http://lkml.org/lkml/2007/12/1/141
> http://bugzilla.kernel.org/show_bug.cgi?id=9489
> Handled-By : Arjan van de Ven <[email protected]>
>
..

I have only seen that one once, and I think it was Arjan who said
that it has been observed rarely by other people as well.
The bugzilla entry is mostly just to track the darned thing,
but it seems unlikely that anyone will find/fix it for 2.6.24.
No big deal, but it would be good to have somebody knowledgeable
in clocks/interrupts try and track it down.

I wonder if it's just a babbling IRQ on resume, before the driver
has run it's resume code or something ?

Cheers

2008-01-06 09:39:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23


* Rafael J. Wysocki <[email protected]> wrote:

> Subject : jiffies counter leaps in 2.6.24-rc3
> Submitter : Stefano Brivio <[email protected]>
> Date : 2007-11-29 08:36
> References : http://lkml.org/lkml/2007/11/24/53
> http://bugzilla.kernel.org/show_bug.cgi?id=9475
> Handled-By : Ingo Molnar <[email protected]>
> Patch : http://lkml.org/lkml/2007/12/7/132

this holds a series of problems, we've applied everything we wanted to
2.6.24 already we'll do the full stack of fixes for this in 2.6.25.
(changing printk was deemed inappropriate so late in the -rc cycle) So
perhaps mark this as WILL_FIX_LATER and unmark it as a regression?
Stefano, do you agree? b43 works fine for you in the latest .24-rc
kernel, correct?

Ingo

2008-01-06 09:45:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23


* Rafael J. Wysocki <[email protected]> wrote:

> Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> Submitter : "Parag Warudkar" <[email protected]>
> Date : 2007-12-07 18:14
> References : http://lkml.org/lkml/2007/12/7/299
> http://bugzilla.kernel.org/show_bug.cgi?id=9525
> Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> Thomas Gleixner <[email protected]>
> Ingo Molnar <[email protected]>

i think this only occurs with cpuidle, right? drivers/cpuidle/ and
CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
bug elsewhere triggered by that code), not a regression from v2.6.23.

this commit might also have improved things:

------------>
commit 2bacec8c318ca0418c0ee9ac662ee44207765dd4
Author: Ingo Molnar <[email protected]>
Date: Tue Dec 18 15:21:13 2007 +0100

sched: touch softlockup watchdog after idling

touch softlockup watchdog after idling.

Signed-off-by: Ingo Molnar <[email protected]>
<------------

albeit Parag reported actual real, human-noticeable delays in system
behavior and those cannot be false positives.

Ingo

2008-01-06 09:55:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23


* Mark Lord <[email protected]> wrote:

> Rafael J. Wysocki wrote:
>> This message contains a list of some regressions from 2.6.23 reported since
>> 2.6.24-rc1 was released, for which there are no fixes in the mainline I know
>> of. If any of them have been fixed already, please let me know.
> ..
>> Subject : 20000+ wake-ups/second in 2.6.24
>> Submitter : Mark Lord <[email protected]>
>> Date : 2007-12-02 04:23
>> References : http://lkml.org/lkml/2007/12/1/141
>> http://bugzilla.kernel.org/show_bug.cgi?id=9489
>> Handled-By : Arjan van de Ven <[email protected]>
>>
> ..
>
> I have only seen that one once, and I think it was Arjan who said that
> it has been observed rarely by other people as well. The bugzilla
> entry is mostly just to track the darned thing, but it seems unlikely
> that anyone will find/fix it for 2.6.24. No big deal, but it would be
> good to have somebody knowledgeable in clocks/interrupts try and track
> it down.
>
> I wonder if it's just a babbling IRQ on resume, before the driver has
> run it's resume code or something ?

i've read the discussions, and i cannot see it analyzed anywhere _what_
causes the wakeups. And how are these wakeups counted? Is this based on
powertop output:

Wakeups-from-idle per second : 20.4 interval: 1.8s

? Somewhere i saw it mentioned that "the CPU throws out of C mode". What
does that mean - does it mean we try to idle again and again, but we
immediately return from C mode - while this all looks like "idle" time
to the scheduler (so 'top' will show lots of idle time), but the ACPI
wakeup counters are going up like mad? What is /proc/interrupts doing
when this happens - is any of the irq sources going upwards?

Ingo

2008-01-06 13:00:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sunday, 6 of January 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > Subject : jiffies counter leaps in 2.6.24-rc3
> > Submitter : Stefano Brivio <[email protected]>
> > Date : 2007-11-29 08:36
> > References : http://lkml.org/lkml/2007/11/24/53
> > http://bugzilla.kernel.org/show_bug.cgi?id=9475
> > Handled-By : Ingo Molnar <[email protected]>
> > Patch : http://lkml.org/lkml/2007/12/7/132
>
> this holds a series of problems, we've applied everything we wanted to
> 2.6.24 already we'll do the full stack of fixes for this in 2.6.25.
> (changing printk was deemed inappropriate so late in the -rc cycle) So
> perhaps mark this as WILL_FIX_LATER and unmark it as a regression?

I'll be fine with that if Stefano agrees.

Thanks,
Rafael

2008-01-06 13:00:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sunday, 6 of January 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > Submitter : "Parag Warudkar" <[email protected]>
> > Date : 2007-12-07 18:14
> > References : http://lkml.org/lkml/2007/12/7/299
> > http://bugzilla.kernel.org/show_bug.cgi?id=9525
> > Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> > Thomas Gleixner <[email protected]>
> > Ingo Molnar <[email protected]>
>
> i think this only occurs with cpuidle, right? drivers/cpuidle/ and
> CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
> bug elsewhere triggered by that code), not a regression from v2.6.23.

OK, removed from the regressions list.

Thanks,
Rafael

2008-01-06 14:28:16

by Stefano Brivio

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sun, 6 Jan 2008 10:38:53 +0100
Ingo Molnar <[email protected]> wrote:

> * Rafael J. Wysocki <[email protected]> wrote:
>
> > Subject : jiffies counter leaps in 2.6.24-rc3
> > Submitter : Stefano Brivio <[email protected]>
> > Date : 2007-11-29 08:36
> > References : http://lkml.org/lkml/2007/11/24/53
> > http://bugzilla.kernel.org/show_bug.cgi?id=9475
> > Handled-By : Ingo Molnar <[email protected]>
> > Patch : http://lkml.org/lkml/2007/12/7/132
>
> this holds a series of problems, we've applied everything we wanted to
> 2.6.24 already we'll do the full stack of fixes for this in 2.6.25.
> (changing printk was deemed inappropriate so late in the -rc cycle) So
> perhaps mark this as WILL_FIX_LATER and unmark it as a regression?
> Stefano, do you agree?

Agreed.

> b43 works fine for you in the latest .24-rc kernel, correct?

Correct. :)


--
Ciao
Stefano

2008-01-06 16:21:51

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Jan 6, 2008 7:57 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Sunday, 6 of January 2008, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> > > Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > > Submitter : "Parag Warudkar" <[email protected]>
> > > Date : 2007-12-07 18:14
> > > References : http://lkml.org/lkml/2007/12/7/299
> > > http://bugzilla.kernel.org/show_bug.cgi?id=9525
> > > Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> > > Thomas Gleixner <[email protected]>
> > > Ingo Molnar <[email protected]>
> >
> > i think this only occurs with cpuidle, right? drivers/cpuidle/ and
> > CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
> > bug elsewhere triggered by that code), not a regression from v2.6.23.
>
> OK, removed from the regressions list.

Nope - actually it did happen without CPU_IDLE so it is definitely a regression.
(See http://lkml.org/lkml/2007/12/17/93 )

And it does happen with latest git which I believe has your patch Ingo.

So I would suggest to keep it on the list of regressions.

Thanks

Parag

2008-01-06 19:09:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sunday, 6 of January 2008, Parag Warudkar wrote:
> On Jan 6, 2008 7:57 AM, Rafael J. Wysocki <[email protected]> wrote:
> > On Sunday, 6 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > > Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > > > Submitter : "Parag Warudkar" <[email protected]>
> > > > Date : 2007-12-07 18:14
> > > > References : http://lkml.org/lkml/2007/12/7/299
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=9525
> > > > Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> > > > Thomas Gleixner <[email protected]>
> > > > Ingo Molnar <[email protected]>
> > >
> > > i think this only occurs with cpuidle, right? drivers/cpuidle/ and
> > > CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
> > > bug elsewhere triggered by that code), not a regression from v2.6.23.
> >
> > OK, removed from the regressions list.
>
> Nope - actually it did happen without CPU_IDLE so it is definitely a regression.
> (See http://lkml.org/lkml/2007/12/17/93 )
>
> And it does happen with latest git which I believe has your patch Ingo.
>
> So I would suggest to keep it on the list of regressions.

OK

Thanks for the clarification.

Greetings,
Rafael

2008-01-06 22:24:24

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sun, 6 Jan 2008 10:55:01 +0100
Ingo Molnar <[email protected]> wrote:

>
> * Mark Lord <[email protected]> wrote:
>
> > Rafael J. Wysocki wrote:
> >> This message contains a list of some regressions from 2.6.23
> >> reported since 2.6.24-rc1 was released, for which there are no
> >> fixes in the mainline I know of. If any of them have been fixed
> >> already, please let me know.
> > ..
> >> Subject : 20000+ wake-ups/second in 2.6.24
> >> Submitter : Mark Lord <[email protected]>
> >> Date : 2007-12-02 04:23
> >> References : http://lkml.org/lkml/2007/12/1/141
> >> http://bugzilla.kernel.org/show_bug.cgi?id=9489
> >> Handled-By : Arjan van de Ven <[email protected]>
> >>
> >
> > I wonder if it's just a babbling IRQ on resume, before the driver
> > has run it's resume code or something ?
>
> i've read the discussions, and i cannot see it analyzed anywhere
> _what_ causes the wakeups. And how are these wakeups counted? Is this
> based on powertop output:
>
> Wakeups-from-idle per second : 20.4 interval: 1.8s
>
> ? Somewhere i saw it mentioned that "the CPU throws out of C mode".
> What does that mean - does it mean we try to idle again and again,
> but we immediately return from C mode - while this all looks like
> "idle" time to the scheduler (so 'top' will show lots of idle time),
> but the ACPI wakeup counters are going up like mad? What
> is /proc/interrupts doing when this happens - is any of the irq
> sources going upwards?
>

what seems to happen (and this is based on seeing this on my own devel laptop, as well
as several other reports; Mark is by far not the only one) is something hardware
related, it's been seen on lots of different kernel versions.

It seems to mostly (but not 100%) happen with TI cardbus bridges, where for some reason,
once the yenta driver is loaded (unloading it later makes no difference), once in a while
we get into a mode where the CPU always immediately goes out of the C-state again.
On a hardware level, there are only a few things that cause a CPU to exit a C-state,
and one of them is a pending interrupt of some kind, which is the most likely thing going on here;
some device or apic being stuck with interrupt high, but somehow the CPU isn't actually
seeing the interrupt itself (or has it blocked!)

To call this a 2.6.24 regression is a mistake (as I've said before), it's not new to .24
by any means.

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-01-06 22:34:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Sunday, 6 of January 2008, Arjan van de Ven wrote:
> On Sun, 6 Jan 2008 10:55:01 +0100
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Mark Lord <[email protected]> wrote:
> >
> > > Rafael J. Wysocki wrote:
> > >> This message contains a list of some regressions from 2.6.23
> > >> reported since 2.6.24-rc1 was released, for which there are no
> > >> fixes in the mainline I know of. If any of them have been fixed
> > >> already, please let me know.
> > > ..
> > >> Subject : 20000+ wake-ups/second in 2.6.24
> > >> Submitter : Mark Lord <[email protected]>
> > >> Date : 2007-12-02 04:23
> > >> References : http://lkml.org/lkml/2007/12/1/141
> > >> http://bugzilla.kernel.org/show_bug.cgi?id=9489
> > >> Handled-By : Arjan van de Ven <[email protected]>
> > >>
> > >
> > > I wonder if it's just a babbling IRQ on resume, before the driver
> > > has run it's resume code or something ?
> >
> > i've read the discussions, and i cannot see it analyzed anywhere
> > _what_ causes the wakeups. And how are these wakeups counted? Is this
> > based on powertop output:
> >
> > Wakeups-from-idle per second : 20.4 interval: 1.8s
> >
> > ? Somewhere i saw it mentioned that "the CPU throws out of C mode".
> > What does that mean - does it mean we try to idle again and again,
> > but we immediately return from C mode - while this all looks like
> > "idle" time to the scheduler (so 'top' will show lots of idle time),
> > but the ACPI wakeup counters are going up like mad? What
> > is /proc/interrupts doing when this happens - is any of the irq
> > sources going upwards?
> >
>
> what seems to happen (and this is based on seeing this on my own devel laptop, as well
> as several other reports; Mark is by far not the only one) is something hardware
> related, it's been seen on lots of different kernel versions.
>
> It seems to mostly (but not 100%) happen with TI cardbus bridges, where for some reason,
> once the yenta driver is loaded (unloading it later makes no difference), once in a while
> we get into a mode where the CPU always immediately goes out of the C-state again.
> On a hardware level, there are only a few things that cause a CPU to exit a C-state,
> and one of them is a pending interrupt of some kind, which is the most likely thing going on here;
> some device or apic being stuck with interrupt high, but somehow the CPU isn't actually
> seeing the interrupt itself (or has it blocked!)
>
> To call this a 2.6.24 regression is a mistake (as I've said before), it's not new to .24
> by any means.

Thanks for the explanation, I'm removing this from the regressions list, then.

Greetings,
Rafael

2008-01-07 23:22:13

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Jan 6, 2008 2:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> On Sunday, 6 of January 2008, Parag Warudkar wrote:
> > On Jan 6, 2008 7:57 AM, Rafael J. Wysocki <[email protected]> wrote:
> > > On Sunday, 6 of January 2008, Ingo Molnar wrote:
> > > >
> > > > * Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > > Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > > > > Submitter : "Parag Warudkar" <[email protected]>
> > > > > Date : 2007-12-07 18:14
> > > > > References : http://lkml.org/lkml/2007/12/7/299
> > > > > http://bugzilla.kernel.org/show_bug.cgi?id=9525
> > > > > Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> > > > > Thomas Gleixner <[email protected]>
> > > > > Ingo Molnar <[email protected]>
> > > >
> > > > i think this only occurs with cpuidle, right? drivers/cpuidle/ and
> > > > CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
> > > > bug elsewhere triggered by that code), not a regression from v2.6.23.
> > >
> > > OK, removed from the regressions list.
> >
> > Nope - actually it did happen without CPU_IDLE so it is definitely a regression.
> > (See http://lkml.org/lkml/2007/12/17/93 )
> >
> > And it does happen with latest git which I believe has your patch Ingo.
> >
> > So I would suggest to keep it on the list of regressions.
>
> OK
>
> Thanks for the clarification.
>

BTW, I have so far tested 2.6.24-rc4/5/6/7 and 2.6.23.12 - all of
which have this problem.

Yesterday I went back to using 2.6.22.15 and after a day's uptime it
has not reproduced with the same config.

Time for git-bisect I suppose? (the only problem is that this takes
anywhere between 20 minutes to 8 hrs to confirm reliably.)

Parag

2008-01-07 23:53:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Tuesday, 8 of January 2008, Parag Warudkar wrote:
> On Jan 6, 2008 2:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Sunday, 6 of January 2008, Parag Warudkar wrote:
> > > On Jan 6, 2008 7:57 AM, Rafael J. Wysocki <[email protected]> wrote:
> > > > On Sunday, 6 of January 2008, Ingo Molnar wrote:
> > > > >
> > > > > * Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > > Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > > > > > Submitter : "Parag Warudkar" <[email protected]>
> > > > > > Date : 2007-12-07 18:14
> > > > > > References : http://lkml.org/lkml/2007/12/7/299
> > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=9525
> > > > > > Handled-By : "Pallipadi, Venkatesh" <[email protected]>
> > > > > > Thomas Gleixner <[email protected]>
> > > > > > Ingo Molnar <[email protected]>
> > > > >
> > > > > i think this only occurs with cpuidle, right? drivers/cpuidle/ and
> > > > > CPU_IDLE is new in 2.6.24 so this appears to be a bug in that code (or a
> > > > > bug elsewhere triggered by that code), not a regression from v2.6.23.
> > > >
> > > > OK, removed from the regressions list.
> > >
> > > Nope - actually it did happen without CPU_IDLE so it is definitely a regression.
> > > (See http://lkml.org/lkml/2007/12/17/93 )
> > >
> > > And it does happen with latest git which I believe has your patch Ingo.
> > >
> > > So I would suggest to keep it on the list of regressions.
> >
> > OK
> >
> > Thanks for the clarification.
> >
>
> BTW, I have so far tested 2.6.24-rc4/5/6/7 and 2.6.23.12 - all of
> which have this problem.

Well, now you're saying 2.6.23.12 is also affected, so this doesn't seem to
be a recent regression in fact?

Rafael

2008-01-08 00:49:17

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Jan 7, 2008 6:54 PM, Rafael J. Wysocki <[email protected]> wrote:

> Well, now you're saying 2.6.23.12 is also affected, so this doesn't seem to
> be a recent regression in fact?
>

I have run 2.6.23 series before but my usage pattern seems to have not
triggered the bug before.
But yes, this is a 6 month old regression based on today's findings.
(2.6.22 which doesn't have this issue was released on July 8).

Parag

2008-01-08 08:03:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.24-rc6-git12: Reported regressions from 2.6.23

On Mon, 7 Jan 2008, Parag Warudkar wrote:

> On Jan 7, 2008 6:54 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> > Well, now you're saying 2.6.23.12 is also affected, so this doesn't seem to
> > be a recent regression in fact?
> >
>
> I have run 2.6.23 series before but my usage pattern seems to have not
> triggered the bug before.
> But yes, this is a 6 month old regression based on today's findings.
> (2.6.22 which doesn't have this issue was released on July 8).

Hmm. Can you try plain 2.6.23 ? If it does not show the problem we
know that it is caused by one of the patches in the stable series.

Thanks,

tglx