2008-03-16 23:20:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.25-rc5-git6: Reported regressions from 2.6.24

This message contains a list of some regressions from 2.6.24 reported since
2.6.25-rc1 was released, for which there are no fixes in the mainline I know
of. ?If any of them have been fixed already, please let me know.

If you know of any other unresolved regressions from 2.6.24, please let me know
either and I'll add them to the list. ?Also, please let me know if any of the
entries below are invalid.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2008-03-17 148 38 30
2008-03-16 146 42 35
2008-03-14 145 45 39
2008-03-12 143 51 41
2008-03-11 141 58 43
2008-03-10 138 66 47
2008-03-03 115 65 49
2008-02-25 90 51 39
2008-02-17 61 45 37


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
Subject : mount: could not find filesystem
Submitter : Kamalesh Babulal <[email protected]>
Date : 2008-02-12 14:34 (34 days old)
References : http://lkml.org/lkml/2008/2/12/91
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
Yinghai Lu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
Submitter : Ben Nizette <[email protected]>
Date : 2008-02-12 12:46 (34 days old)
References : http://lkml.org/lkml/2008/2/12/148
Handled-By : Haavard Skinnemoen <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978
Subject : 2.6.25-rc1: volanoMark regression
Submitter : Zhang, Yanmin <[email protected]>
Date : 2008-02-13 10:30 (33 days old)
References : http://lkml.org/lkml/2008/2/13/128
http://lkml.org/lkml/2008/3/12/52
Handled-By : Srivatsa Vaddagiri <[email protected]>
Balbir Singh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9980
Subject : 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
Submitter : Jasper Bryant-Greene <[email protected]>
Date : 2008-02-13 12:25 (33 days old)
References : http://lkml.org/lkml/2008/2/13/181
Handled-By : Yinghai Lu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
Submitter : Linas ?virblis <[email protected]>
Date : 2008-02-13 22:38 (33 days old)
References : http://lkml.org/lkml/2008/2/13/566


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9995
Subject : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-02-15 04:51 (31 days old)
Handled-By : Zhang Rui <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10011
Subject : The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
Submitter : Fran?ois Valenduc <[email protected]>
Date : 2008-02-17 06:28 (29 days old)
Handled-By : Thomas Gleixner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10027
Subject : 2.6.25-rc[12] Video4Linux Bttv Regression
Submitter : Bongani Hlope <[email protected]>
Date : 2008-02-17 09:36 (29 days old)
References : http://lkml.org/lkml/2008/2/17/55
Handled-By : Mauro Carvalho Chehab <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10041
Subject : 2.6.25-rc1/2 regression: first-time login into gnome fails
Submitter : Romano Giannetti <[email protected]>
Date : 2008-02-18 11:56 (28 days old)
References : http://lkml.org/lkml/2008/2/18/145
Handled-By : Ray Lee <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
Subject : Spurious messages at boot, eventually hangs the usb subsustem
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-02-20 09:10 (26 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10065
Subject : 2.6.25-rc2 regression - hang on suspend
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-19 12:59 (27 days old)
References : http://lkml.org/lkml/2008/2/19/165
http://lkml.org/lkml/2008/2/17/381
Handled-By : Rafael J. Wysocki <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10067
Subject : TUNER_TDA8290=y, VIDEO_DEV=n build error
Submitter : Toralf F?rster <[email protected]>
Date : 2008-02-22 10:36 (24 days old)
References : http://lkml.org/lkml/2008/2/19/262


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10082
Subject : 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc
Submitter : Kamalesh Babulal <[email protected]>
Date : 2008-02-20 16:01 (26 days old)
References : http://lkml.org/lkml/2008/2/20/218
http://lkml.org/lkml/2008/1/18/71


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
Subject : 2.6.25-rc2 + smartd = hang
Submitter : Anders Eriksson <[email protected]>
Date : 2008-02-22 17:51 (24 days old)
References : http://lkml.org/lkml/2008/2/22/239
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10093
Subject : 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-23 18:55 (23 days old)
References : http://lkml.org/lkml/2008/2/23/263
http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
Handled-By : Pallipadi, Venkatesh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10117
Subject : 2.6.25-current-git hangs on boot (pci=nommconf helps)
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-23 18:55 (23 days old)
References : http://lkml.org/lkml/2008/2/23/263


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
Subject : INFO: possible circular locking in the resume
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-02-27 (19 days old)
References : http://lkml.org/lkml/2008/2/26/479
Handled-By : Gautham R Shenoy <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
Submitter : Marcin Slusarz <[email protected]>
Date : 2008-03-02 20:00 (15 days old)
References : http://lkml.org/lkml/2008/3/2/91
Handled-By : Peter Zijlstra <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
Submitter : Gabriel C <[email protected]>
Date : 2008-02-24 01:31 (22 days old)
References : http://lkml.org/lkml/2008/2/23/380
http://lkml.org/lkml/2008/2/24/281
Handled-By : Thomas Gleixner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10156
Subject : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-02-28 11:25 (18 days old)
References : http://lkml.org/lkml/2008/2/28/106


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10172
Subject : kvm: INFO: inconsistent lock state
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-03-05 03:26 (12 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190
Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
Submitter : Tarkan Erimer <[email protected]>
Date : 2008-03-05 05:01 (12 days old)
References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10203
Subject : 2.6.25 IOMMU breaks DMA for b43 on x86_64
Submitter : Christian Casteyde <[email protected]>
Date : 2008-03-09 00:55 (8 days old)
Handled-By : Michael Buesch <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
Subject : INFO: task mount:11202 blocked for more than 120 seconds
Submitter : Christian Kujau <[email protected]>
Date : 2008-03-07 21:32 (10 days old)
References : http://lkml.org/lkml/2008/3/7/308
http://lkml.org/lkml/2008/3/9/186


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211
Subject : drivers/media/video/cx2341x.c: undefined references
Submitter : Toralf F?rster <[email protected]>
Date : 2008-03-07 13:48 (10 days old)
References : http://lkml.org/lkml/2008/3/7/168


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10234
Subject : pciehp hang on hp ia64 rx6600
Submitter : Alex Chiang <[email protected]>
Date : 2008-03-12 00:47 (5 days old)
References : http://lkml.org/lkml/2008/3/12/31
Handled-By : Mark Lord <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10235
Subject : 2.6.25-rc5: Blank Screen with Intel 945
Submitter : Justin Madru <[email protected]>
Date : 2008-03-12 12:02 (5 days old)
References : http://lkml.org/lkml/2008/3/12/290
Handled-By : Jesse Barnes <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10238
Subject : netconsole still hangs
Submitter : Andrew Morton <[email protected]>
Date : 2008-03-12 23:14 (5 days old)
References : http://marc.info/?t=120536379200004&amp;r=1&amp;w=2
Handled-By : David Miller <[email protected]>
Stephen Hemminger <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10242
Subject : rm command hangs
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-03-14 05:47 (3 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10266
Subject : [PATCH] i810fb: Fix console switch regression
Submitter : Stefan Bauer <[email protected]>
Date : 2008-03-16 19:42 (1 days old)
References : http://lkml.org/lkml/2008/3/16/84


Regressionn with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9969
Subject : 2.6.24-git15 Keyboard Issue?
Submitter : Chris Holvenstot <[email protected]>
Date : 2008-02-06 14:02 (40 days old)
References : http://lkml.org/lkml/2008/2/6/100
http://lkml.org/lkml/2008/2/13/82
Handled-By : Thomas Gleixner <[email protected]>
Patch : http://lkml.org/lkml/2008/2/15/343


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10016
Subject : cobalt_btns.c &lt;-&gt; struct platform_device compile error
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-17 12:12 (29 days old)
References : http://lkml.org/lkml/2008/2/17/293
Handled-By : Yoichi Yuasa <[email protected]>
Patch : http://lkml.org/lkml/2008/3/9/25


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10017
Subject : cdev removal broke cobalt_btns.c compilation
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-17 12:14 (29 days old)
References : http://lkml.org/lkml/2008/2/17/295
Handled-By : Yoichi Yuasa <[email protected]>
Patch : http://lkml.org/lkml/2008/3/9/25


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10153
Subject : (regression) kernel/timeconst.h bugs with HZ=128
Submitter : David Brownell <[email protected]>
Date : 2008-02-26 19:32 (20 days old)
References : http://lkml.org/lkml/2008/2/26/294
Handled-By : H. Peter Anvin <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10186
Subject : SCSI_AIC94XX must depend on SCSI
Submitter : Toralf F?rster <[email protected]>
Date : 2008-03-06 19:09 (11 days old)
References : http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
Handled-By : Adrian Bunk <[email protected]>
Patch : http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10210
Subject : 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
Submitter : Rafael J. Wysocki <[email protected]>
Date : 2008-03-08 22:46 (9 days old)
References : http://lkml.org/lkml/2008/3/8/123
Handled-By : Tejun Heo <[email protected]>
Patch : http://lkml.org/lkml/2008/3/10/69


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10232
Subject : intel mtrr fixups apparently broke display and e1000 probe
Submitter : Stephen Gran <[email protected]>
Date : 2008-03-12 08:37 (5 days old)
Handled-By : Yinghai Lu <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10259
Subject : /sys/class/hwmon/hwmon0 is missing a device link
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-03-16 04:56 (1 days old)
Handled-By : Jean Delvare <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.24,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9832

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2008-03-16 23:34:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24



On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> Submitter : Linas ?virblis <[email protected]>
> Date : 2008-02-13 22:38 (33 days old)
> References : http://lkml.org/lkml/2008/2/13/566

This is most likely already fixed by commit
e82cc1288fa57857c6af8c57f3d07096d4bcd9d9.

Unless Linas can reproduce it with a newer kernel (I'm cutting an -rc6
right now, but any -git snapshot in the last few days should work) this
one should be closed. We can't keep things open just because the tester
hasn't tested.

Linus

2008-03-16 23:39:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Monday, 17 of March 2008, Linus Torvalds wrote:
>
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter : Linas ?virblis <[email protected]>
> > Date : 2008-02-13 22:38 (33 days old)
> > References : http://lkml.org/lkml/2008/2/13/566
>
> This is most likely already fixed by commit
> e82cc1288fa57857c6af8c57f3d07096d4bcd9d9.
>
> Unless Linas can reproduce it with a newer kernel (I'm cutting an -rc6
> right now, but any -git snapshot in the last few days should work) this
> one should be closed. We can't keep things open just because the tester
> hasn't tested.

Sure, I'll close it if there's no response in a couple of days.

Thanks,
Rafael

2008-03-17 00:21:20

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Rafael J. Wysocki wrote:

Hi,

> This message contains a list of some regressions from 2.6.24 reported since
> 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> of. If any of them have been fixed already, please let me know.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> Submitter : Gabriel C <[email protected]>
> Date : 2008-02-24 01:31 (22 days old)
> References : http://lkml.org/lkml/2008/2/23/380
> http://lkml.org/lkml/2008/2/24/281
> Handled-By : Thomas Gleixner <[email protected]>
>

Thomas do you want me to bisect ?

Or do you have any patches I could try ( really does not matter how experimental they are ) ?

Rafael the bug report is saying x86-64 Component while my box is 32bit :) Could you please correct this ?

Best Regards

Gabriel


2008-03-17 06:48:30

by Jason Wu

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

2008/3/17, Rafael J. Wysocki <[email protected]>:> This message contains a list of some regressions from 2.6.24 reported since> 2.6.25-rc1 was released, for which there are no fixes in the mainline I know> of. If any of them have been fixed already, please let me know.>> If you know of any other unresolved regressions from 2.6.24, please let me know> either and I'll add them to the list. Also, please let me know if any of the> entries below are invalid.>>> Listed regressions statistics:>> Date Total Pending Unresolved> ----------------------------------------> 2008-03-17 148 38 30> 2008-03-16 146 42 35> 2008-03-14 145 45 39> 2008-03-12 143 51 41> 2008-03-11 141 58 43> 2008-03-10 138 66 47> 2008-03-03 115 65 49> 2008-02-25 90 51 39> 2008-02-17 61 45 37>>> Unresolved regressions> ---------------------->> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962> Subject : mount: could not find filesystem> Submitter : Kamalesh Babulal <[email protected]>> Date : 2008-02-12 14:34 (34 days old)> References : http://lkml.org/lkml/2008/2/12/91> Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>> Yinghai Lu <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976> Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops> Submitter : Ben Nizette <[email protected]>> Date : 2008-02-12 12:46 (34 days old)> References : http://lkml.org/lkml/2008/2/12/148> Handled-By : Haavard Skinnemoen <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978> Subject : 2.6.25-rc1: volanoMark regression> Submitter : Zhang, Yanmin <[email protected]>> Date : 2008-02-13 10:30 (33 days old)> References : http://lkml.org/lkml/2008/2/13/128> http://lkml.org/lkml/2008/3/12/52> Handled-By : Srivatsa Vaddagiri <[email protected]>> Balbir Singh <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9980> Subject : 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang> Submitter : Jasper Bryant-Greene <[email protected]>> Date : 2008-02-13 12:25 (33 days old)> References : http://lkml.org/lkml/2008/2/13/181> Handled-By : Yinghai Lu <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983> Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)> Submitter : Linas ?virblis <[email protected]>> Date : 2008-02-13 22:38 (33 days old)> References : http://lkml.org/lkml/2008/2/13/566>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9995> Subject : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61> Submitter : Lukas Hejtmanek <[email protected]>> Date : 2008-02-15 04:51 (31 days old)> Handled-By : Zhang Rui <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10011> Subject : The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi> Submitter : Fran?ois Valenduc <[email protected]>> Date : 2008-02-17 06:28 (29 days old)> Handled-By : Thomas Gleixner <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10027> Subject : 2.6.25-rc[12] Video4Linux Bttv Regression> Submitter : Bongani Hlope <[email protected]>> Date : 2008-02-17 09:36 (29 days old)> References : http://lkml.org/lkml/2008/2/17/55> Handled-By : Mauro Carvalho Chehab <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10041> Subject : 2.6.25-rc1/2 regression: first-time login into gnome fails> Submitter : Romano Giannetti <[email protected]>> Date : 2008-02-18 11:56 (28 days old)> References : http://lkml.org/lkml/2008/2/18/145> Handled-By : Ray Lee <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051> Subject : Spurious messages at boot, eventually hangs the usb subsustem> Submitter : Jean-Luc Coulon <[email protected]>> Date : 2008-02-20 09:10 (26 days old)>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10065> Subject : 2.6.25-rc2 regression - hang on suspend> Submitter : Soeren Sonnenburg <[email protected]>> Date : 2008-02-19 12:59 (27 days old)> References : http://lkml.org/lkml/2008/2/19/165> http://lkml.org/lkml/2008/2/17/381> Handled-By : Rafael J. Wysocki <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10067> Subject : TUNER_TDA8290=y, VIDEO_DEV=n build error> Submitter : Toralf F?rster <[email protected]>> Date : 2008-02-22 10:36 (24 days old)> References : http://lkml.org/lkml/2008/2/19/262>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10082> Subject : 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc> Submitter : Kamalesh Babulal <[email protected]>> Date : 2008-02-20 16:01 (26 days old)> References : http://lkml.org/lkml/2008/2/20/218> http://lkml.org/lkml/2008/1/18/71>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086> Subject : 2.6.25-rc2 + smartd = hang> Submitter : Anders Eriksson <[email protected]>> Date : 2008-02-22 17:51 (24 days old)> References : http://lkml.org/lkml/2008/2/22/239> Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10093> Subject : 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple> Submitter : Soeren Sonnenburg <[email protected]>> Date : 2008-02-23 18:55 (23 days old)> References : http://lkml.org/lkml/2008/2/23/263> http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4> Handled-By : Pallipadi, Venkatesh <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10117> Subject : 2.6.25-current-git hangs on boot (pci=nommconf helps)> Submitter : Soeren Sonnenburg <[email protected]>> Date : 2008-02-23 18:55 (23 days old)> References : http://lkml.org/lkml/2008/2/23/263>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133> Subject : INFO: possible circular locking in the resume> Submitter : Zdenek Kabelac <[email protected]>> Date : 2008-02-27 (19 days old)> References : http://lkml.org/lkml/2008/2/26/479> Handled-By : Gautham R Shenoy <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146> Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)> Submitter : Marcin Slusarz <[email protected]>> Date : 2008-03-02 20:00 (15 days old)> References : http://lkml.org/lkml/2008/3/2/91> Handled-By : Peter Zijlstra <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box> Submitter : Gabriel C <[email protected]>> Date : 2008-02-24 01:31 (22 days old)> References : http://lkml.org/lkml/2008/2/23/380> http://lkml.org/lkml/2008/2/24/281> Handled-By : Thomas Gleixner <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10156> Subject : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest> Submitter : Zdenek Kabelac <[email protected]>> Date : 2008-02-28 11:25 (18 days old)> References : http://lkml.org/lkml/2008/2/28/106>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10172> Subject : kvm: INFO: inconsistent lock state> Submitter : Zdenek Kabelac <[email protected]>> Date : 2008-03-05 03:26 (12 days old)>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190> Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error> Submitter : Tarkan Erimer <[email protected]>> Date : 2008-03-05 05:01 (12 days old)> References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10203> Subject : 2.6.25 IOMMU breaks DMA for b43 on x86_64> Submitter : Christian Casteyde <[email protected]>> Date : 2008-03-09 00:55 (8 days old)> Handled-By : Michael Buesch <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207> Subject : INFO: task mount:11202 blocked for more than 120 seconds> Submitter : Christian Kujau <[email protected]>> Date : 2008-03-07 21:32 (10 days old)> References : http://lkml.org/lkml/2008/3/7/308> http://lkml.org/lkml/2008/3/9/186>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211> Subject : drivers/media/video/cx2341x.c: undefined references> Submitter : Toralf F?rster <[email protected]>> Date : 2008-03-07 13:48 (10 days old)> References : http://lkml.org/lkml/2008/3/7/168>I think patch of Mauro Carvalho Chehab can fix this bug.http://linuxtv.org/hg/v4l-dvb/rev/ba1a6a7bd53b
J
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10234> Subject : pciehp hang on hp ia64 rx6600> Submitter : Alex Chiang <[email protected]>> Date : 2008-03-12 00:47 (5 days old)> References : http://lkml.org/lkml/2008/3/12/31> Handled-By : Mark Lord <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10235> Subject : 2.6.25-rc5: Blank Screen with Intel 945> Submitter : Justin Madru <[email protected]>> Date : 2008-03-12 12:02 (5 days old)> References : http://lkml.org/lkml/2008/3/12/290> Handled-By : Jesse Barnes <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10238> Subject : netconsole still hangs> Submitter : Andrew Morton <[email protected]>> Date : 2008-03-12 23:14 (5 days old)> References : http://marc.info/?t=120536379200004&amp;r=1&amp;w=2> Handled-By : David Miller <[email protected]>> Stephen Hemminger <[email protected]>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10242> Subject : rm command hangs> Submitter : Jean-Luc Coulon <[email protected]>> Date : 2008-03-14 05:47 (3 days old)>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10266> Subject : [PATCH] i810fb: Fix console switch regression> Submitter : Stefan Bauer <[email protected]>> Date : 2008-03-16 19:42 (1 days old)> References : http://lkml.org/lkml/2008/3/16/84>>> Regressionn with patches> ------------------------>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9969> Subject : 2.6.24-git15 Keyboard Issue?> Submitter : Chris Holvenstot <[email protected]>> Date : 2008-02-06 14:02 (40 days old)> References : http://lkml.org/lkml/2008/2/6/100> http://lkml.org/lkml/2008/2/13/82> Handled-By : Thomas Gleixner <[email protected]>> Patch : http://lkml.org/lkml/2008/2/15/343>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10016> Subject : cobalt_btns.c &lt;-&gt; struct platform_device compile error> Submitter : Adrian Bunk <[email protected]>> Date : 2008-02-17 12:12 (29 days old)> References : http://lkml.org/lkml/2008/2/17/293> Handled-By : Yoichi Yuasa <[email protected]>> Patch : http://lkml.org/lkml/2008/3/9/25>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10017> Subject : cdev removal broke cobalt_btns.c compilation> Submitter : Adrian Bunk <[email protected]>> Date : 2008-02-17 12:14 (29 days old)> References : http://lkml.org/lkml/2008/2/17/295> Handled-By : Yoichi Yuasa <[email protected]>> Patch : http://lkml.org/lkml/2008/3/9/25>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10153> Subject : (regression) kernel/timeconst.h bugs with HZ=128> Submitter : David Brownell <[email protected]>> Date : 2008-02-26 19:32 (20 days old)> References : http://lkml.org/lkml/2008/2/26/294> Handled-By : H. Peter Anvin <[email protected]>> Patch : http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view> http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10186> Subject : SCSI_AIC94XX must depend on SCSI> Submitter : Toralf F?rster <[email protected]>> Date : 2008-03-06 19:09 (11 days old)> References : http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2> Handled-By : Adrian Bunk <[email protected]>> Patch : http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10210> Subject : 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali> Submitter : Rafael J. Wysocki <[email protected]>> Date : 2008-03-08 22:46 (9 days old)> References : http://lkml.org/lkml/2008/3/8/123> Handled-By : Tejun Heo <[email protected]>> Patch : http://lkml.org/lkml/2008/3/10/69>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10232> Subject : intel mtrr fixups apparently broke display and e1000 probe> Submitter : Stephen Gran <[email protected]>> Date : 2008-03-12 08:37 (5 days old)> Handled-By : Yinghai Lu <[email protected]>> Patch : http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10259> Subject : /sys/class/hwmon/hwmon0 is missing a device link> Submitter : Jean-Luc Coulon <[email protected]>> Date : 2008-03-16 04:56 (1 days old)> Handled-By : Jean Delvare <[email protected]>> Patch : http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view>>> For details, please visit the bug entries and follow the links given in> references.>> As you can see, there is a Bugzilla entry for each of the listed regressions.> There also is a Bugzilla entry used for tracking the regressions from 2.6.24,> unresolved as well as resolved, at:>> http://bugzilla.kernel.org/show_bug.cgi?id=9832>> Please let me know if there are any Bugzilla entries that should be added to> the list in there.>> Thanks,> Rafael>> --> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in> the body of a message to [email protected]> More majordomo info at http://vger.kernel.org/majordomo-info.html> Please read the FAQ at http://www.tux.org/lkml/>

-- BR'swenhsuanhttp://wenhsuanhack.spaces.live.com????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2008-03-17 16:18:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Mon, 17 Mar 2008, Gabriel C wrote:
> > Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter : Gabriel C <[email protected]>
> > Date : 2008-02-24 01:31 (22 days old)
> > References : http://lkml.org/lkml/2008/2/23/380
> > http://lkml.org/lkml/2008/2/24/281
> > Handled-By : Thomas Gleixner <[email protected]>
> >
>
> Thomas do you want me to bisect ?

That'd be great.

> Or do you have any patches I could try ( really does not matter how experimental they are ) ?

No, I have not the lightest clue whats going on.

Thanks,

tglx

2008-03-17 18:20:30

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Mon, 17 Mar 2008, Gabriel C wrote:
>>> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>> Submitter : Gabriel C <[email protected]>
>>> Date : 2008-02-24 01:31 (22 days old)
>>> References : http://lkml.org/lkml/2008/2/23/380
>>> http://lkml.org/lkml/2008/2/24/281
>>> Handled-By : Thomas Gleixner <[email protected]>
>>>
>> Thomas do you want me to bisect ?
>
> That'd be great.

Ok I'll start doing that later on today.

>
>> Or do you have any patches I could try ( really does not matter how experimental they are ) ?
>
> No, I have not the lightest clue whats going on.
>
> Thanks,
>
> tglx
>

Gabriel

2008-03-17 21:37:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Monday, 17 of March 2008, Jason Wu wrote:
> 2008/3/17, Rafael J. Wysocki <[email protected]>:
> > This message contains a list of some regressions from 2.6.24 reported since
> > 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> > of. If any of them have been fixed already, please let me know.
> >
> > If you know of any other unresolved regressions from 2.6.24, please let me know
> > either and I'll add them to the list. Also, please let me know if any of the
> > entries below are invalid.
> >
> >
> > Listed regressions statistics:
> >
> > Date Total Pending Unresolved
> > ----------------------------------------
> > 2008-03-17 148 38 30
> > 2008-03-16 146 42 35
> > 2008-03-14 145 45 39
> > 2008-03-12 143 51 41
> > 2008-03-11 141 58 43
> > 2008-03-10 138 66 47
> > 2008-03-03 115 65 49
> > 2008-02-25 90 51 39
> > 2008-02-17 61 45 37
> >
> >
> > Unresolved regressions
> > ----------------------
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject : mount: could not find filesystem
> > Submitter : Kamalesh Babulal <[email protected]>
> > Date : 2008-02-12 14:34 (34 days old)
> > References : http://lkml.org/lkml/2008/2/12/91
> > Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
> > Yinghai Lu <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter : Ben Nizette <[email protected]>
> > Date : 2008-02-12 12:46 (34 days old)
> > References : http://lkml.org/lkml/2008/2/12/148
> > Handled-By : Haavard Skinnemoen <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject : 2.6.25-rc1: volanoMark regression
> > Submitter : Zhang, Yanmin <[email protected]>
> > Date : 2008-02-13 10:30 (33 days old)
> > References : http://lkml.org/lkml/2008/2/13/128
> > http://lkml.org/lkml/2008/3/12/52
> > Handled-By : Srivatsa Vaddagiri <[email protected]>
> > Balbir Singh <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9980
> > Subject : 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
> > Submitter : Jasper Bryant-Greene <[email protected]>
> > Date : 2008-02-13 12:25 (33 days old)
> > References : http://lkml.org/lkml/2008/2/13/181
> > Handled-By : Yinghai Lu <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter : Linas ?virblis <[email protected]>
> > Date : 2008-02-13 22:38 (33 days old)
> > References : http://lkml.org/lkml/2008/2/13/566
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9995
> > Subject : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61
> > Submitter : Lukas Hejtmanek <[email protected]>
> > Date : 2008-02-15 04:51 (31 days old)
> > Handled-By : Zhang Rui <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10011
> > Subject : The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
> > Submitter : Fran?ois Valenduc <[email protected]>
> > Date : 2008-02-17 06:28 (29 days old)
> > Handled-By : Thomas Gleixner <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10027
> > Subject : 2.6.25-rc[12] Video4Linux Bttv Regression
> > Submitter : Bongani Hlope <[email protected]>
> > Date : 2008-02-17 09:36 (29 days old)
> > References : http://lkml.org/lkml/2008/2/17/55
> > Handled-By : Mauro Carvalho Chehab <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10041
> > Subject : 2.6.25-rc1/2 regression: first-time login into gnome fails
> > Submitter : Romano Giannetti <[email protected]>
> > Date : 2008-02-18 11:56 (28 days old)
> > References : http://lkml.org/lkml/2008/2/18/145
> > Handled-By : Ray Lee <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
> > Subject : Spurious messages at boot, eventually hangs the usb subsustem
> > Submitter : Jean-Luc Coulon <[email protected]>
> > Date : 2008-02-20 09:10 (26 days old)
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10065
> > Subject : 2.6.25-rc2 regression - hang on suspend
> > Submitter : Soeren Sonnenburg <[email protected]>
> > Date : 2008-02-19 12:59 (27 days old)
> > References : http://lkml.org/lkml/2008/2/19/165
> > http://lkml.org/lkml/2008/2/17/381
> > Handled-By : Rafael J. Wysocki <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10067
> > Subject : TUNER_TDA8290=y, VIDEO_DEV=n build error
> > Submitter : Toralf F?rster <[email protected]>
> > Date : 2008-02-22 10:36 (24 days old)
> > References : http://lkml.org/lkml/2008/2/19/262
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10082
> > Subject : 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc
> > Submitter : Kamalesh Babulal <[email protected]>
> > Date : 2008-02-20 16:01 (26 days old)
> > References : http://lkml.org/lkml/2008/2/20/218
> > http://lkml.org/lkml/2008/1/18/71
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
> > Subject : 2.6.25-rc2 + smartd = hang
> > Submitter : Anders Eriksson <[email protected]>
> > Date : 2008-02-22 17:51 (24 days old)
> > References : http://lkml.org/lkml/2008/2/22/239
> > Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10093
> > Subject : 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple
> > Submitter : Soeren Sonnenburg <[email protected]>
> > Date : 2008-02-23 18:55 (23 days old)
> > References : http://lkml.org/lkml/2008/2/23/263
> > http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
> > Handled-By : Pallipadi, Venkatesh <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10117
> > Subject : 2.6.25-current-git hangs on boot (pci=nommconf helps)
> > Submitter : Soeren Sonnenburg <[email protected]>
> > Date : 2008-02-23 18:55 (23 days old)
> > References : http://lkml.org/lkml/2008/2/23/263
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
> > Subject : INFO: possible circular locking in the resume
> > Submitter : Zdenek Kabelac <[email protected]>
> > Date : 2008-02-27 (19 days old)
> > References : http://lkml.org/lkml/2008/2/26/479
> > Handled-By : Gautham R Shenoy <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
> > Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
> > Submitter : Marcin Slusarz <[email protected]>
> > Date : 2008-03-02 20:00 (15 days old)
> > References : http://lkml.org/lkml/2008/3/2/91
> > Handled-By : Peter Zijlstra <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
> > Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter : Gabriel C <[email protected]>
> > Date : 2008-02-24 01:31 (22 days old)
> > References : http://lkml.org/lkml/2008/2/23/380
> > http://lkml.org/lkml/2008/2/24/281
> > Handled-By : Thomas Gleixner <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10156
> > Subject : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
> > Submitter : Zdenek Kabelac <[email protected]>
> > Date : 2008-02-28 11:25 (18 days old)
> > References : http://lkml.org/lkml/2008/2/28/106
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10172
> > Subject : kvm: INFO: inconsistent lock state
> > Submitter : Zdenek Kabelac <[email protected]>
> > Date : 2008-03-05 03:26 (12 days old)
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190
> > Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
> > Submitter : Tarkan Erimer <[email protected]>
> > Date : 2008-03-05 05:01 (12 days old)
> > References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10203
> > Subject : 2.6.25 IOMMU breaks DMA for b43 on x86_64
> > Submitter : Christian Casteyde <[email protected]>
> > Date : 2008-03-09 00:55 (8 days old)
> > Handled-By : Michael Buesch <[email protected]>
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject : INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter : Christian Kujau <[email protected]>
> > Date : 2008-03-07 21:32 (10 days old)
> > References : http://lkml.org/lkml/2008/3/7/308
> > http://lkml.org/lkml/2008/3/9/186
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211
> > Subject : drivers/media/video/cx2341x.c: undefined references
> > Submitter : Toralf F?rster <[email protected]>
> > Date : 2008-03-07 13:48 (10 days old)
> > References : http://lkml.org/lkml/2008/3/7/168
> >
> I think patch of Mauro Carvalho Chehab can fix this bug.
> http://linuxtv.org/hg/v4l-dvb/rev/ba1a6a7bd53b

Thanks, I updated the entry.

Rafael

2008-03-18 04:01:34

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>> Submitter : Gabriel C <[email protected]>
>>>> Date : 2008-02-24 01:31 (22 days old)
>>>> References : http://lkml.org/lkml/2008/2/23/380
>>>> http://lkml.org/lkml/2008/2/24/281
>>>> Handled-By : Thomas Gleixner <[email protected]>
>>>>
>>> Thomas do you want me to bisect ?
>> That'd be great.
>
> Ok I'll start doing that later on today.
>

I managed to bisect 'one of the bugs' down , I got some problems and used skip once because a revision didn't compiled ,
but it seems bisect got the right commit still. Sadly it seems there are 2 different bugs.

Also before I've started the bisect I've tested linux-next to be sure the bug(s) still exists and while rc1 got that already
I've started to bisect 2.6.24 -> 2.6.25-rc1.

cat .git/refs/bisect/bad
1ada5cba6a0318f90e45b38557e7b5206a9cba38

git show 1ada5cba6a0318f90e45b38557e7b5206a9cba38
commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
Author: Andi Kleen <[email protected]>
Date: Wed Jan 30 13:30:02 2008 +0100

clocksource: make clocksource watchdog cycle through online CPUs

This way it checks if the clocks are synchronized between CPUs too.
This might be able to detect slowly drifting TSCs which only
go wrong over longer time.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cabfa19..edd5ef8 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -142,8 +142,13 @@ static void clocksource_watchdog(unsigned long data)
}

if (!list_empty(&watchdog_list)) {
- __mod_timer(&watchdog_timer,
- watchdog_timer.expires + WATCHDOG_INTERVAL);
+ /* Cycle through CPUs to check if the CPUs stay synchronized to
+ * each other. */
+ int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
+ if (next_cpu >= NR_CPUS)
+ next_cpu = first_cpu(cpu_online_map);
+ watchdog_timer.expires += WATCHDOG_INTERVAL;
+ add_timer_on(&watchdog_timer, next_cpu);
}
spin_unlock(&watchdog_lock);
}
@@ -165,7 +170,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
if (!started && watchdog) {
watchdog_last = watchdog->read();
watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
- add_timer(&watchdog_timer);
+ add_timer_on(&watchdog_timer, first_cpu(cpu_online_map));
}
} else {
if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
@@ -186,7 +191,8 @@ static void clocksource_check_watchdog(struct clocksource *cs)
watchdog_last = watchdog->read();
watchdog_timer.expires =
jiffies + WATCHDOG_INTERVAL;
- add_timer(&watchdog_timer);
+ add_timer_on(&watchdog_timer,
+ first_cpu(cpu_online_map));
}
}
}


git bisect log
git-bisect start
# bad: [19af35546de68c872dcb687613e0902a602cb20e] Linux 2.6.25-rc1
git-bisect bad 19af35546de68c872dcb687613e0902a602cb20e
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
# bad: [d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f] x86: add PAGE_KERNEL_EXEC_NOCACHE
git-bisect bad d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f
# good: [fb46990dba94866462e90623e183d02ec591cf8f] [NETFILTER]: nf_queue: remove unnecessary hook existance check
git-bisect good fb46990dba94866462e90623e183d02ec591cf8f
# good: [936722922f6d2366378de606a40c14f96915474d] [IPV4] fib_trie: compute size when needed
git-bisect good 936722922f6d2366378de606a40c14f96915474d
# bad: [ff14c6164bd532a6dc9025c07d3b562f839f00a9] x86: x86-64 ia32 ptrace pt_regs cleanup
git-bisect bad ff14c6164bd532a6dc9025c07d3b562f839f00a9
# good: [c087567d3ffb2c7c61e091982e6ca45478394f1a] SUNRPC: Remove the obsolete RPC_WAITQ macro
git-bisect good c087567d3ffb2c7c61e091982e6ca45478394f1a
# bad: [af7a78e9258ffcca681e080cbc857f854869144f] x86: move mce related declarations
git-bisect bad af7a78e9258ffcca681e080cbc857f854869144f
# good: [34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a] SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
git-bisect good 34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a
# bad: [83bd01024b1fdfc41d9b758e5669e80fca72df66] x86: protect against sigaltstack wraparound
git-bisect bad 83bd01024b1fdfc41d9b758e5669e80fca72df66
# good: [efd9ac8630e89b9ee7ce64008bd7783952374f37] time: fold __get_realtime_clock_ts() into getnstimeofday()
git-bisect good efd9ac8630e89b9ee7ce64008bd7783952374f37
# bad: [37a47db8d7f0f38dac5acf5a13abbc8f401707fa] x86: assign IRQs to HPET timers, fix
git-bisect bad 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
# skip: [316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba] x86: restrict PIT clocksource usage
git-bisect skip 316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba
# bad: [4713e22ce81eb8b3353e16435362eb3d0ec95640] clocksource: add unregister function to disable unusable clocksources
git-bisect bad 4713e22ce81eb8b3353e16435362eb3d0ec95640
# bad: [1ada5cba6a0318f90e45b38557e7b5206a9cba38] clocksource: make clocksource watchdog cycle through online CPUs
git-bisect bad 1ada5cba6a0318f90e45b38557e7b5206a9cba38
# good: [1077f5a917b7c630231037826b344b2f7f5b903f] clocksource.c: use init_timer_deferrable for clocksource_watchdog
git-bisect good 1077f5a917b7c630231037826b344b2f7f5b903f


Also the broken revision died with that :

arch/x86/kernel/i8253.c: In function 'init_pit_clocksource':
arch/x86/kernel/i8253.c:207: error: implicit declaration of function 'is_hpet_enabled'
make[1]: *** [arch/x86/kernel/i8253.o] Error 1
make: *** [arch/x86/kernel] Error 2

If you tell me on how to fix that I'll restart the bisect from there , just in case ..


Also reverting the commit from 2.6.25-rc1 fixes the 'Tsc being unstable thing' but it does not fix the hang
when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.

I will try to bisect this hang also , most probably on weekend.


Also I reverted that commit from git head and an kernel compiles right now, I'll let you know in a bit if that worked out.

Please let me know if you need more informations.


Best Regards,

Gabriel

2008-03-18 04:24:57

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>>> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>>> Submitter : Gabriel C <[email protected]>
>>>>> Date : 2008-02-24 01:31 (22 days old)
>>>>> References : http://lkml.org/lkml/2008/2/23/380
>>>>> http://lkml.org/lkml/2008/2/24/281
>>>>> Handled-By : Thomas Gleixner <[email protected]>
>>>>>
>>>> Thomas do you want me to bisect ?
>>> That'd be great.
>> Ok I'll start doing that later on today.
>>
>
> I managed to bisect 'one of the bugs' down , I got some problems and used skip once because a revision didn't compiled ,
> but it seems bisect got the right commit still. Sadly it seems there are 2 different bugs.
>
> Also before I've started the bisect I've tested linux-next to be sure the bug(s) still exists and while rc1 got that already
> I've started to bisect 2.6.24 -> 2.6.25-rc1.
>
> cat .git/refs/bisect/bad
> 1ada5cba6a0318f90e45b38557e7b5206a9cba38
>
> git show 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> Author: Andi Kleen <[email protected]>
> Date: Wed Jan 30 13:30:02 2008 +0100
>
> clocksource: make clocksource watchdog cycle through online CPUs
>
> This way it checks if the clocks are synchronized between CPUs too.
> This might be able to detect slowly drifting TSCs which only
> go wrong over longer time.
>
> Signed-off-by: Andi Kleen <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index cabfa19..edd5ef8 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -142,8 +142,13 @@ static void clocksource_watchdog(unsigned long data)
> }
>
> if (!list_empty(&watchdog_list)) {
> - __mod_timer(&watchdog_timer,
> - watchdog_timer.expires + WATCHDOG_INTERVAL);
> + /* Cycle through CPUs to check if the CPUs stay synchronized to
> + * each other. */
> + int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
> + if (next_cpu >= NR_CPUS)
> + next_cpu = first_cpu(cpu_online_map);
> + watchdog_timer.expires += WATCHDOG_INTERVAL;
> + add_timer_on(&watchdog_timer, next_cpu);
> }
> spin_unlock(&watchdog_lock);
> }
> @@ -165,7 +170,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
> if (!started && watchdog) {
> watchdog_last = watchdog->read();
> watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
> - add_timer(&watchdog_timer);
> + add_timer_on(&watchdog_timer, first_cpu(cpu_online_map));
> }
> } else {
> if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
> @@ -186,7 +191,8 @@ static void clocksource_check_watchdog(struct clocksource *cs)
> watchdog_last = watchdog->read();
> watchdog_timer.expires =
> jiffies + WATCHDOG_INTERVAL;
> - add_timer(&watchdog_timer);
> + add_timer_on(&watchdog_timer,
> + first_cpu(cpu_online_map));
> }
> }
> }
>
>
> git bisect log
> git-bisect start
> # bad: [19af35546de68c872dcb687613e0902a602cb20e] Linux 2.6.25-rc1
> git-bisect bad 19af35546de68c872dcb687613e0902a602cb20e
> # good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
> git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
> # bad: [d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f] x86: add PAGE_KERNEL_EXEC_NOCACHE
> git-bisect bad d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f
> # good: [fb46990dba94866462e90623e183d02ec591cf8f] [NETFILTER]: nf_queue: remove unnecessary hook existance check
> git-bisect good fb46990dba94866462e90623e183d02ec591cf8f
> # good: [936722922f6d2366378de606a40c14f96915474d] [IPV4] fib_trie: compute size when needed
> git-bisect good 936722922f6d2366378de606a40c14f96915474d
> # bad: [ff14c6164bd532a6dc9025c07d3b562f839f00a9] x86: x86-64 ia32 ptrace pt_regs cleanup
> git-bisect bad ff14c6164bd532a6dc9025c07d3b562f839f00a9
> # good: [c087567d3ffb2c7c61e091982e6ca45478394f1a] SUNRPC: Remove the obsolete RPC_WAITQ macro
> git-bisect good c087567d3ffb2c7c61e091982e6ca45478394f1a
> # bad: [af7a78e9258ffcca681e080cbc857f854869144f] x86: move mce related declarations
> git-bisect bad af7a78e9258ffcca681e080cbc857f854869144f
> # good: [34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a] SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
> git-bisect good 34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a
> # bad: [83bd01024b1fdfc41d9b758e5669e80fca72df66] x86: protect against sigaltstack wraparound
> git-bisect bad 83bd01024b1fdfc41d9b758e5669e80fca72df66
> # good: [efd9ac8630e89b9ee7ce64008bd7783952374f37] time: fold __get_realtime_clock_ts() into getnstimeofday()
> git-bisect good efd9ac8630e89b9ee7ce64008bd7783952374f37
> # bad: [37a47db8d7f0f38dac5acf5a13abbc8f401707fa] x86: assign IRQs to HPET timers, fix
> git-bisect bad 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
> # skip: [316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba] x86: restrict PIT clocksource usage
> git-bisect skip 316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba
> # bad: [4713e22ce81eb8b3353e16435362eb3d0ec95640] clocksource: add unregister function to disable unusable clocksources
> git-bisect bad 4713e22ce81eb8b3353e16435362eb3d0ec95640
> # bad: [1ada5cba6a0318f90e45b38557e7b5206a9cba38] clocksource: make clocksource watchdog cycle through online CPUs
> git-bisect bad 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> # good: [1077f5a917b7c630231037826b344b2f7f5b903f] clocksource.c: use init_timer_deferrable for clocksource_watchdog
> git-bisect good 1077f5a917b7c630231037826b344b2f7f5b903f
>
>
> Also the broken revision died with that :
>
> arch/x86/kernel/i8253.c: In function 'init_pit_clocksource':
> arch/x86/kernel/i8253.c:207: error: implicit declaration of function 'is_hpet_enabled'
> make[1]: *** [arch/x86/kernel/i8253.o] Error 1
> make: *** [arch/x86/kernel] Error 2
>
> If you tell me on how to fix that I'll restart the bisect from there , just in case ..
>
>
> Also reverting the commit from 2.6.25-rc1 fixes the 'Tsc being unstable thing' but it does not fix the hang
> when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.
>
> I will try to bisect this hang also , most probably on weekend.
>
>
> Also I reverted that commit from git head and an kernel compiles right now, I'll let you know in a bit if that worked out.

Worked out :)

git head - 1ada5cba6a0318f90e45b38557e7b5206a9cba38 works here.

dmesg|grep clocksource
[ 0.563915] Time: tsc clocksource has been installed.

uname -a
Linux lara 2.6.25-rc6-00014-gbde4f8f-dirty #2 SMP PREEMPT Tue Mar 18 04:48:53 CET 2008 i686 GNU/Linux


Gabriel

2008-03-21 15:24:38

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>>> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>>> Submitter : Gabriel C <[email protected]>
>>>>> Date : 2008-02-24 01:31 (22 days old)
>>>>> References : http://lkml.org/lkml/2008/2/23/380
>>>>> http://lkml.org/lkml/2008/2/24/281
>>>>> Handled-By : Thomas Gleixner <[email protected]>
>>>>>
>>>> Thomas do you want me to bisect ?
>>> That'd be great.
>> Ok I'll start doing that later on today.
>>
>
[ snip ]

> still hangs when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.
>
> I will try to bisect this hang also , most probably on weekend.
>

Correction on this one.

Current git head boots just fine with clocksource=acpi_pm here , I just don't know which commit fixed it.

Gabriel

2008-03-21 16:28:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Fri, 21 Mar 2008, Gabriel C wrote:

> > still hangs when I boot with clocksource=acpi_pm so that seems to
> > be introduced in a different commit.
> >
> > I will try to bisect this hang also , most probably on weekend.
> >
>
> Correction on this one.
>
> Current git head boots just fine with clocksource=acpi_pm here , I
> just don't know which commit fixed it.

Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
be interesting to find the commit which fixed the acpi_pm timer
problem unvoluntary.

Also, can you please reapply the reverted clocksource patch ? I have
the feeling that the acpi_pm one was the real problem which was
triggered the modfied watchdog.

Thanks,

tglx

2008-03-21 16:47:00

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>
>>> still hangs when I boot with clocksource=acpi_pm so that seems to
>>> be introduced in a different commit.
>>>
>>> I will try to bisect this hang also , most probably on weekend.
>>>
>> Correction on this one.
>>
>> Current git head boots just fine with clocksource=acpi_pm here , I
>> just don't know which commit fixed it.
>
> Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
> be interesting to find the commit which fixed the acpi_pm timer
> problem unvoluntary.

I can try to find it.

>
> Also, can you please reapply the reverted clocksource patch ? I have
> the feeling that the acpi_pm one was the real problem which was
> triggered the modfied watchdog.

Sure I can , will do so in some minutes and let you know.

>
> Thanks,
>
> tglx


Gabriel

2008-03-21 18:11:54

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Fri, 21 Mar 2008, Gabriel C wrote:
>>
>>>> still hangs when I boot with clocksource=acpi_pm so that seems to
>>>> be introduced in a different commit.
>>>>
>>>> I will try to bisect this hang also , most probably on weekend.
>>>>
>>> Correction on this one.
>>>
>>> Current git head boots just fine with clocksource=acpi_pm here , I
>>> just don't know which commit fixed it.
>> Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
>> be interesting to find the commit which fixed the acpi_pm timer
>> problem unvoluntary.
>
> I can try to find it.
>
>> Also, can you please reapply the reverted clocksource patch ? I have
>> the feeling that the acpi_pm one was the real problem which was
>> triggered the modfied watchdog.
>
> Sure I can , will do so in some minutes and let you know.

It took a bit longer sorry but I have more infos now.

The acpi_pm was not related to that I still get the problem.

Of course I still can try to find the commit which magically fixed acpi_pm if you really want.

It seems like it breaks only when you enable HT and only on 2 socket motherboards.
( at least the ones I own , I know is old hardware but worked fine for me )

Also disabling the second CPU and enabling HT works , enabling both CPUs and disabling HT works ,
booting with enabled HT and both CPUs but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
booting with both CPUs HT on but maxcpus=3 breaks also.

Also I have another dual motherboard here 604 socket with 2 2,4 GHz Xeon's.
The motherboard has the storage controller somewhat broken but for a quick test it is still good :) and I see the
same thing.

Does that make any sense ?

Also all that tested on 2.6.25-rc6-00224-gae51801-dirty ( dirty while I reverted the revert =) )

Gabriel

2008-03-21 18:51:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Fri, 21 Mar 2008, Gabriel C wrote:
> >> Also, can you please reapply the reverted clocksource patch ? I have
> >> the feeling that the acpi_pm one was the real problem which was
> >> triggered the modfied watchdog.
> >
> > Sure I can , will do so in some minutes and let you know.
>
> It took a bit longer sorry but I have more infos now.
>
> The acpi_pm was not related to that I still get the problem.
>
> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.

Just if you are really bored. :) I would have asked if it had fixed
the TSC issue.

> It seems like it breaks only when you enable HT and only on 2 socket motherboards.
> ( at least the ones I own , I know is old hardware but worked fine for me )

Hmm. I wonder why a dual socket board survives the initial sync test.

> Also disabling the second CPU and enabling HT works , enabling both
> CPUs and disabling HT works , booting with enabled HT and both CPUs
> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
> booting with both CPUs HT on but maxcpus=3 breaks also.
>
> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
> Xeon's. The motherboard has the storage controller somewhat broken
> but for a quick test it is still good :) and I see the same thing.
>
> Does that make any sense ?

Not really. Can you please revert the reverted revert again and run

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

on your machine with all CPUs and HT enabled ?

Thanks,
tglx

2008-03-21 19:23:35

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>> Also, can you please reapply the reverted clocksource patch ? I have
>>>> the feeling that the acpi_pm one was the real problem which was
>>>> triggered the modfied watchdog.
>>> Sure I can , will do so in some minutes and let you know.
>> It took a bit longer sorry but I have more infos now.
>>
>> The acpi_pm was not related to that I still get the problem.
>>
>> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.
>
> Just if you are really bored. :) I would have asked if it had fixed
> the TSC issue.
>
>> It seems like it breaks only when you enable HT and only on 2 socket motherboards.
>> ( at least the ones I own , I know is old hardware but worked fine for me )
>
> Hmm. I wonder why a dual socket board survives the initial sync test.
>
>> Also disabling the second CPU and enabling HT works , enabling both
>> CPUs and disabling HT works , booting with enabled HT and both CPUs
>> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
>> booting with both CPUs HT on but maxcpus=3 breaks also.
>>
>> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
>> Xeon's. The motherboard has the storage controller somewhat broken
>> but for a quick test it is still good :) and I see the same thing.
>>
>> Does that make any sense ?
>
> Not really. Can you please revert the reverted revert again and run
>
> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>
> on your machine with all CPUs and HT enabled ?

Sure , doing so now.

>
> Thanks,
> tglx


Gabriel

2008-03-21 20:55:55

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>>> Also, can you please reapply the reverted clocksource patch ? I have
>>>>> the feeling that the acpi_pm one was the real problem which was
>>>>> triggered the modfied watchdog.
>>>> Sure I can , will do so in some minutes and let you know.
>>> It took a bit longer sorry but I have more infos now.
>>>
>>> The acpi_pm was not related to that I still get the problem.
>>>
>>> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.
>> Just if you are really bored. :) I would have asked if it had fixed
>> the TSC issue.
>>
>>> It seems like it breaks only when you enable HT and only on 2 socket motherboards.
>>> ( at least the ones I own , I know is old hardware but worked fine for me )
>> Hmm. I wonder why a dual socket board survives the initial sync test.
>>
>>> Also disabling the second CPU and enabling HT works , enabling both
>>> CPUs and disabling HT works , booting with enabled HT and both CPUs
>>> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
>>> booting with both CPUs HT on but maxcpus=3 breaks also.
>>>
>>> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
>>> Xeon's. The motherboard has the storage controller somewhat broken
>>> but for a quick test it is still good :) and I see the same thing.
>>>
>>> Does that make any sense ?
>> Not really. Can you please revert the reverted revert again and run
>>
>> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>>
>> on your machine with all CPUs and HT enabled ?
>
> Sure , doing so now.
>

Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++ minutes the time I was away for food =) )

...

4 CPUs, running 4 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)
- clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)

| 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0

...

Gabriel

2008-03-21 21:16:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Fri, 21 Mar 2008, Gabriel C wrote:
> >>> Does that make any sense ?
> >> Not really. Can you please revert the reverted revert again and run
> >>
> >> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
> >>
> >> on your machine with all CPUs and HT enabled ?
> >
> > Sure , doing so now.
> >
>
> Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++
> minutes the time I was away for food =) )
> ...
>
> 4 CPUs, running 4 parallel test-tasks.
> checking for time-warps via:
> - read time stamp counter (RDTSC) instruction (cycle resolution)
> - gettimeofday (TOD) syscall (usec resolution)
> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
>
> | 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0

Amazing. I never found a multi socket box where the TSC's were in sync.

So the rotating watchdog triggers for a yet to figure out reason.

Oh, now that the pm timer seems to work again, can you try the following:

apply the reverted patch again and let the box boot. At some point the
TSC is marked unstable and is replaced by acpi_pm clocksource.

What result does timewarp.c show in that situation ?

Thanks,
tglx

2008-03-21 21:59:25

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>>> Does that make any sense ?
>>>> Not really. Can you please revert the reverted revert again and run
>>>>
>>>> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>>>>
>>>> on your machine with all CPUs and HT enabled ?
>>> Sure , doing so now.
>>>
>> Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++
>> minutes the time I was away for food =) )
>> ...
>>
>> 4 CPUs, running 4 parallel test-tasks.
>> checking for time-warps via:
>> - read time stamp counter (RDTSC) instruction (cycle resolution)
>> - gettimeofday (TOD) syscall (usec resolution)
>> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
>>
>> | 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0
>
> Amazing. I never found a multi socket box where the TSC's were in sync.
>
> So the rotating watchdog triggers for a yet to figure out reason.
>
> Oh, now that the pm timer seems to work again, can you try the following:
>
> apply the reverted patch again and let the box boot. At some point the
> TSC is marked unstable and is replaced by acpi_pm clocksource.
>
> What result does timewarp.c show in that situation ?

Here it is , same kernel + Andi's patch :

./time-warp-test
4 CPUs, running 4 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)
- clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)

| 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0

>
> Thanks,
> tglx


Gabriel

2008-03-21 22:10:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Fri, 21 Mar 2008, Gabriel C wrote:
> > So the rotating watchdog triggers for a yet to figure out reason.
> >
> > Oh, now that the pm timer seems to work again, can you try the following:
> >
> > apply the reverted patch again and let the box boot. At some point the
> > TSC is marked unstable and is replaced by acpi_pm clocksource.
> >
> > What result does timewarp.c show in that situation ?
>
> Here it is , same kernel + Andi's patch :
>
> ./time-warp-test
> 4 CPUs, running 4 parallel test-tasks.
> checking for time-warps via:
> - read time stamp counter (RDTSC) instruction (cycle resolution)
> - gettimeofday (TOD) syscall (usec resolution)
> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
>
> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0

Ok. So the watchdog trigger is a false positive.

Thinking more about it, it looks like Andi's change triggers some
hidden bug in the combination of NO_HZ and add_timer_on(), where the
CPU on which the timer is added is likely in a long idle sleep. I look
into this tomorrow.

Thanks for testing

tglx

2008-03-22 11:22:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Fri, 21 Mar 2008, Thomas Gleixner wrote:
> >
> > | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0
>
> Ok. So the watchdog trigger is a false positive.
>
> Thinking more about it, it looks like Andi's change triggers some
> hidden bug in the combination of NO_HZ and add_timer_on(), where the
> CPU on which the timer is added is likely in a long idle sleep. I look
> into this tomorrow.

Ok. Here is what's happening:

CPU0 runs the watchdog timer and schedules it on CPU1.

With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
boot process there is probably no timer pending on CPU1, which means
the idle sleep is infinite.

Now some time later CPU1 gets woken by an interrupt/IPI and runs the
timer wheel. At this point the pm_timer which is the reference clock
has already wrapped around, so the watchdog thinks that there is a
huge time difference and marks the TSC unstable.

Aside of that watchdog issue this also affects the other users of
add_timer_on(): e.g. queue_delayed_work_on().

Can you please apply the patch below and verify it with Andi's
watchdog patch applied ?

Thanks,

tglx

---
include/linux/tick.h | 4 ++++
kernel/time/tick-sched.c | 30 ++++++++++++++++++++++++++++++
kernel/timer.c | 14 +++++++++++++-
3 files changed, 47 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h
+++ linux-2.6/include/linux/tick.h
@@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
extern ktime_t tick_nohz_get_sleep_length(void);
extern void tick_nohz_stop_idle(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern int tick_nohz_cpu_needs_wakeup(int cpu);
+extern void tick_nohz_rescan_timers_on(int cpu);
# else
static inline void tick_nohz_stop_sched_tick(void) { }
static inline void tick_nohz_restart_sched_tick(void) { }
@@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
}
static inline void tick_nohz_stop_idle(int cpu) { }
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
+static inline void tick_nohz_rescan_timers_on(int cpu) { }
# endif /* !NO_HZ */

#endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -183,6 +183,36 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
}

/**
+ * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
+ *
+ * when add_timer_on() happens on a CPU which is in a long idle sleep,
+ * then we need to wake it up so the timer wheel gets reevaluated.
+ *
+ * Note: we use idle_cpu() which checks the idle state lockless, but
+ * we are ordered against the other cpu which might be on the way to
+ * idle by the timer base lock, which we hold.
+ */
+int tick_nohz_cpu_needs_wakeup(int cpu)
+{
+ return tick_nohz_enabled && idle_cpu(cpu) &&
+ (cpu != smp_processor_id());
+}
+
+/**
+ * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
+ *
+ * When a CPU is idle and a timer got added to this CPU timer wheel
+ * via add_timer_on() then we need to make sure that the CPU
+ * reevaluates the timer wheel. Otherwise the timer might be delayed
+ * for a real long time.
+ */
+void tick_nohz_rescan_timers_on(int cpu)
+{
+ if (tick_nohz_enabled && idle_cpu(cpu))
+ smp_send_reschedule(cpu);
+}
+
+/**
* tick_nohz_stop_sched_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
{
struct tvec_base *base = per_cpu(tvec_bases, cpu);
unsigned long flags;
+ int wakeidle;

timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
spin_unlock_irqrestore(&base->lock, flags);
-}

+ if (wakeidle)
+ tick_nohz_rescan_timers_on(cpu);
+}

/**
* mod_timer - modify a timer's timeout

2008-03-22 13:35:08

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Thomas Gleixner wrote:
>>> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0
>> Ok. So the watchdog trigger is a false positive.
>>
>> Thinking more about it, it looks like Andi's change triggers some
>> hidden bug in the combination of NO_HZ and add_timer_on(), where the
>> CPU on which the timer is added is likely in a long idle sleep. I look
>> into this tomorrow.
>
> Ok. Here is what's happening:
>
> CPU0 runs the watchdog timer and schedules it on CPU1.
>
> With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> boot process there is probably no timer pending on CPU1, which means
> the idle sleep is infinite.
>
> Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> timer wheel. At this point the pm_timer which is the reference clock
> has already wrapped around, so the watchdog thinks that there is a
> huge time difference and marks the TSC unstable.
>
> Aside of that watchdog issue this also affects the other users of
> add_timer_on(): e.g. queue_delayed_work_on().
>
> Can you please apply the patch below and verify it with Andi's
> watchdog patch applied ?


Did that , git head , Andi's + your patch but TSC is still marked unstable.

>
> Thanks,
>
> tglx
>


Gabriel

2008-03-22 14:22:21

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

> CPU0 runs the watchdog timer and schedules it on CPU1.
>
> With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> boot process there is probably no timer pending on CPU1, which means
> the idle sleep is infinite.
>
> Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> timer wheel. At this point the pm_timer which is the reference clock
> has already wrapped around, so the watchdog thinks that there is a

In my old original own noidletick code I simply limited all sleeps
to below the wrap around of the primary timer. Wouldn't something
like that work?

In the case of the watchdog i guess it would need to be limited
to the wrap around of multiple timers, at least all that
are used by the watchdog.

I'm not sure just doing this for add_timer_on() only is correct.
After all it could affect any other code not run by add_timer_on()
couldn't it?

-Andi

2008-03-22 14:31:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sat, 22 Mar 2008, Gabriel C wrote:
> Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> > timer wheel. At this point the pm_timer which is the reference clock
> > has already wrapped around, so the watchdog thinks that there is a
> > huge time difference and marks the TSC unstable.
> >
> > Aside of that watchdog issue this also affects the other users of
> > add_timer_on(): e.g. queue_delayed_work_on().
> >
> > Can you please apply the patch below and verify it with Andi's
> > watchdog patch applied ?
>
>
> Did that , git head , Andi's + your patch but TSC is still marked unstable.

Doh, stupid me. We do not reevaluate the timer wheel, when we just
wake up via the smp_reschedule IPI when the resched flag on the other
CPU is not set. That's a separate vector which is not going through
irq_enter() / irq_exit().

Does the patch below solve the problem ?

Thanks,

tglx

---
include/linux/tick.h | 4 +++
kernel/time/tick-sched.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
kernel/timer.c | 14 ++++++++++++-
3 files changed, 67 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h
+++ linux-2.6/include/linux/tick.h
@@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
extern ktime_t tick_nohz_get_sleep_length(void);
extern void tick_nohz_stop_idle(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern int tick_nohz_cpu_needs_wakeup(int cpu);
+extern void tick_nohz_rescan_timers_on(int cpu);
# else
static inline void tick_nohz_stop_sched_tick(void) { }
static inline void tick_nohz_restart_sched_tick(void) { }
@@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
}
static inline void tick_nohz_stop_idle(int cpu) { }
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
+static inline void tick_nohz_rescan_timers_on(int cpu) { }
# endif /* !NO_HZ */

#endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -183,6 +183,56 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
}

/**
+ * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
+ *
+ * when add_timer_on() happens on a CPU which is in a long idle sleep,
+ * then we need to wake it up so the timer wheel gets reevaluated.
+ *
+ * Note: we use idle_cpu() which checks the idle state lockless, but
+ * we are ordered against the other cpu which might be on the way to
+ * idle by the timer base lock, which we hold.
+ */
+int tick_nohz_cpu_needs_wakeup(int cpu)
+{
+ return tick_nohz_enabled && idle_cpu(cpu) &&
+ (cpu != smp_processor_id());
+}
+
+/*
+ * Rescan the timer wheel, when
+ *
+ * - the CPU is idle
+ * - the CPU is not processing an interupt
+ * - the need_resched flag is off
+ */
+static void tick_nohz_rescan_timers(void *unused)
+{
+ int cpu = smp_processor_id();
+
+ if (!idle_cpu(cpu) || in_interrupt() || need_resched())
+ return;
+
+ tick_nohz_stop_idle(cpu);
+ tick_nohz_update_jiffies();
+ tick_nohz_stop_sched_tick();
+}
+
+/**
+ * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
+ *
+ * When a CPU is idle and a timer got added to this CPU timer wheel
+ * via add_timer_on() then we need to make sure that the CPU
+ * reevaluates the timer wheel. Otherwise the timer might be delayed
+ * for a real long time.
+ */
+void tick_nohz_rescan_timers_on(int cpu)
+{
+ if (tick_nohz_enabled && idle_cpu(cpu))
+ smp_call_function_single(cpu, tick_nohz_rescan_timers, NULL,
+ 0, 0);
+}
+
+/**
* tick_nohz_stop_sched_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
{
struct tvec_base *base = per_cpu(tvec_bases, cpu);
unsigned long flags;
+ int wakeidle;

timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
spin_unlock_irqrestore(&base->lock, flags);
-}

+ if (wakeidle)
+ tick_nohz_rescan_timers_on(cpu);
+}

/**
* mod_timer - modify a timer's timeout

2008-03-22 14:42:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sat, 22 Mar 2008, Andi Kleen wrote:
> > CPU0 runs the watchdog timer and schedules it on CPU1.
> >
> > With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> > boot process there is probably no timer pending on CPU1, which means
> > the idle sleep is infinite.
> >
> > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> > timer wheel. At this point the pm_timer which is the reference clock
> > has already wrapped around, so the watchdog thinks that there is a
>
> In my old original own noidletick code I simply limited all sleeps
> to below the wrap around of the primary timer. Wouldn't something
> like that work?

No, it does not solve the real problem of not reevaluating the timer
wheel on the idle CPU when a timer gets added from some other CPU. We
would paper over the watchdog issue, but postponing a timer event,
which was added cross CPU to some artifical expiry time is simply
wrong.

> I'm not sure just doing this for add_timer_on() only is correct.
> After all it could affect any other code not run by add_timer_on()
> couldn't it?

No, it's limited to add_timer_on() simply because no other code can
add a new timer (timer_list or hrtimer) which modifies the next event
on another CPU. There is also the rare case, when one CPU runs the
timer callback and the other one modifies the timer, but that's not
relevant for the NOHZ problem because the CPU which runs the callback
is not idle at this point.

All other timer operations are CPU local and reevaluated before the
CPU goes idle again.

Thanks,

tglx

2008-03-22 15:13:49

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Gabriel C wrote:
> > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
>>> timer wheel. At this point the pm_timer which is the reference clock
>>> has already wrapped around, so the watchdog thinks that there is a
>>> huge time difference and marks the TSC unstable.
>>>
>>> Aside of that watchdog issue this also affects the other users of
>>> add_timer_on(): e.g. queue_delayed_work_on().
>>>
>>> Can you please apply the patch below and verify it with Andi's
>>> watchdog patch applied ?
>>
>> Did that , git head , Andi's + your patch but TSC is still marked unstable.
>
> Doh, stupid me. We do not reevaluate the timer wheel, when we just
> wake up via the smp_reschedule IPI when the resched flag on the other
> CPU is not set. That's a separate vector which is not going through
> irq_enter() / irq_exit().
>
> Does the patch below solve the problem ?

With this one TSC is fine but now I get a warning on boot :

..

[ 0.041037] ------------[ cut here ]------------
[ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
[ 0.041074] Modules linked in:
[ 0.041087] Pid: 1, comm: swapper Not tainted 2.6.25-rc6-00243-g028011e-dirty #12
[ 0.041107] [<c011b51c>] warn_on_slowpath+0x40/0x65
[ 0.041128] [<c012b543>] autoremove_wake_function+0xd/0x2d
[ 0.041148] [<c033f28b>] schedule_timeout+0x13/0x99
[ 0.041167] [<c011690d>] __wake_up+0x29/0x39
[ 0.041182] [<c011690d>] __wake_up+0x29/0x39
[ 0.041197] [<c0128769>] call_usermodehelper_exec+0x97/0xa2
[ 0.041214] [<c010dff4>] native_smp_call_function_mask+0x23/0x11e
[ 0.041233] [<c01d4a66>] kobject_uevent_env+0x346/0x368
[ 0.041251] [<c010e46d>] smp_call_function_single+0x50/0x6f
[ 0.041268] [<c01336d2>] tick_nohz_rescan_timers_on+0x27/0x2b
[ 0.041287] [<c013109f>] clocksource_register+0x162/0x174
[ 0.041306] [<c0436203>] kernel_init+0x126/0x25e
[ 0.041322] [<c011943d>] schedule_tail+0x17/0x44
[ 0.041337] [<c0103c7a>] ret_from_fork+0x6/0x1c
[ 0.041353] [<c04360dd>] kernel_init+0x0/0x25e
[ 0.041367] [<c04360dd>] kernel_init+0x0/0x25e
[ 0.041381] [<c01049ab>] kernel_thread_helper+0x7/0x10
[ 0.041397] =======================
[ 0.041417] ---[ end trace ca143223eefdc828 ]---

..

Full dmesg there -> http://frugalware.org/~crazy/dmesg/dmesg_tsc


>
> Thanks,
>
> tglx
>

Gabriel

2008-03-22 16:33:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sat, 22 Mar 2008, Gabriel C wrote:
> Thomas Gleixner wrote:
> > On Sat, 22 Mar 2008, Gabriel C wrote:
> > > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> >>> timer wheel. At this point the pm_timer which is the reference clock
> >>> has already wrapped around, so the watchdog thinks that there is a
> >>> huge time difference and marks the TSC unstable.
> >>>
> >>> Aside of that watchdog issue this also affects the other users of
> >>> add_timer_on(): e.g. queue_delayed_work_on().
> >>>
> >>> Can you please apply the patch below and verify it with Andi's
> >>> watchdog patch applied ?
> >>
> >> Did that , git head , Andi's + your patch but TSC is still marked unstable.
> >
> > Doh, stupid me. We do not reevaluate the timer wheel, when we just
> > wake up via the smp_reschedule IPI when the resched flag on the other
> > CPU is not set. That's a separate vector which is not going through
> > irq_enter() / irq_exit().
> >
> > Does the patch below solve the problem ?
>
> With this one TSC is fine but now I get a warning on boot :

Good. It confirms my assumptions about the root cause.

> [ 0.041037] ------------[ cut here ]------------
> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()

Grr. I'll work out a solution for that one.

Thanks,

tglx

2008-03-22 21:57:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sat, 22 Mar 2008, Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Gabriel C wrote:
> > With this one TSC is fine but now I get a warning on boot :
>
> Good. It confirms my assumptions about the root cause.
>
> > [ 0.041037] ------------[ cut here ]------------
> > [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>
> Grr. I'll work out a solution for that one.

Gabriel,

I'm happy to rack your nerves some more.

After discussing the issue with Peter and Ingo the following solution
seems to be the one which is the least intrusive.

Can you please give it a test ride ?

Thanks,

tglx
---
include/linux/sched.h | 6 ++++++
kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
kernel/timer.c | 10 +++++++++-
3 files changed, 57 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)

extern void sched_idle_next(void);

+#ifdef CONFIG_NO_HZ
+extern void wake_up_idle_cpu(int cpu);
+#else
+static inline void wake_up_idle_cpu(int cpu) { }
+#endif
+
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_latency;
extern unsigned int sysctl_sched_min_granularity;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -848,6 +848,48 @@ static inline void resched_task(struct t
__resched_task(p, TIF_NEED_RESCHED);
}

+#ifdef CONFIG_NO_HZ
+/*
+ * When add_timer_on() enqueues a timer into the timer wheel of an
+ * idle CPU then this timer might expire before the next timer event
+ * which is scheduled to wake up that CPU. In case of a completely
+ * idle system the next event might even be infinite time into the
+ * future. wake_up_idle_cpu() ensures that the CPU is woken up and
+ * leaves the inner idle loop so the newle added timer is taken into
+ * account when the CPU goes back to idle and evaluates the timer
+ * wheel for the next timer event.
+ */
+void wake_up_idle_cpu(int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ if (cpu == smp_processor_id())
+ return;
+
+ /*
+ * This is safe, as this function is called with the timer
+ * wheel base lock of (cpu) held. When the CPU is on the way
+ * to idle and has not yet set rq->curr to idle then it will
+ * be serialized on the timer wheel base lock and take the new
+ * timer into account automatically.
+ */
+ if (rq->curr != rq->idle)
+ return;
+
+ /*
+ * We can set TIF_RESCHED on the idle task of the other CPU
+ * lockless. The worst case is that the other CPU runs the
+ * idle task through an additional NOOP schedule()
+ */
+ set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
+
+ /* NEED_RESCHED must be visible before we test polling */
+ smp_mb();
+ if (!tsk_is_polling(rq->idle))
+ smp_send_reschedule(cpu);
+}
+#endif
+
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wake_up_idle_cpu(cpu);
spin_unlock_irqrestore(&base->lock, flags);
}

-
/**
* mod_timer - modify a timer's timeout
* @timer: the timer to be modified

2008-03-22 22:41:31

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>> With this one TSC is fine but now I get a warning on boot :
>> Good. It confirms my assumptions about the root cause.
>>
>>> [ 0.041037] ------------[ cut here ]------------
>>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>> Grr. I'll work out a solution for that one.
>
> Gabriel,
>
> I'm happy to rack your nerves some more.

No worries :)

>
> After discussing the issue with Peter and Ingo the following solution
> seems to be the one which is the least intrusive.
>
> Can you please give it a test ride ?

Done , git head + Andi's patch + this version of your patch does work here.

Also time-warp-test is just fine and everything else seems to work.


> ---
> include/linux/sched.h | 6 ++++++
> kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> kernel/timer.c | 10 +++++++++-
> 3 files changed, 57 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/linux/sched.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sched.h
> +++ linux-2.6/include/linux/sched.h
> @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)
>
> extern void sched_idle_next(void);
>
> +#ifdef CONFIG_NO_HZ
> +extern void wake_up_idle_cpu(int cpu);
> +#else
> +static inline void wake_up_idle_cpu(int cpu) { }
> +#endif
> +
> #ifdef CONFIG_SCHED_DEBUG
> extern unsigned int sysctl_sched_latency;
> extern unsigned int sysctl_sched_min_granularity;
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -848,6 +848,48 @@ static inline void resched_task(struct t
> __resched_task(p, TIF_NEED_RESCHED);
> }
>
> +#ifdef CONFIG_NO_HZ
> +/*
> + * When add_timer_on() enqueues a timer into the timer wheel of an
> + * idle CPU then this timer might expire before the next timer event
> + * which is scheduled to wake up that CPU. In case of a completely
> + * idle system the next event might even be infinite time into the
> + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
> + * leaves the inner idle loop so the newle added timer is taken into
> + * account when the CPU goes back to idle and evaluates the timer
> + * wheel for the next timer event.
> + */
> +void wake_up_idle_cpu(int cpu)
> +{
> + struct rq *rq = cpu_rq(cpu);
> +
> + if (cpu == smp_processor_id())
> + return;
> +
> + /*
> + * This is safe, as this function is called with the timer
> + * wheel base lock of (cpu) held. When the CPU is on the way
> + * to idle and has not yet set rq->curr to idle then it will
> + * be serialized on the timer wheel base lock and take the new
> + * timer into account automatically.
> + */
> + if (rq->curr != rq->idle)
> + return;
> +
> + /*
> + * We can set TIF_RESCHED on the idle task of the other CPU
> + * lockless. The worst case is that the other CPU runs the
> + * idle task through an additional NOOP schedule()
> + */
> + set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
> +
> + /* NEED_RESCHED must be visible before we test polling */
> + smp_mb();
> + if (!tsk_is_polling(rq->idle))
> + smp_send_reschedule(cpu);
> +}
> +#endif
> +
> #ifdef CONFIG_SCHED_HRTICK
> /*
> * Use HR-timers to deliver accurate preemption points.
> Index: linux-2.6/kernel/timer.c
> ===================================================================
> --- linux-2.6.orig/kernel/timer.c
> +++ linux-2.6/kernel/timer.c
> @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
> spin_lock_irqsave(&base->lock, flags);
> timer_set_base(timer, base);
> internal_add_timer(base, timer);
> + /*
> + * Check whether the other CPU is idle and needs to be
> + * triggered to reevaluate the timer wheel when nohz is
> + * active. We are protected against the other CPU fiddling
> + * with the timer by holding the timer base lock. This also
> + * makes sure that a CPU on the way to idle can not evaluate
> + * the timer wheel.
> + */
> + wake_up_idle_cpu(cpu);
> spin_unlock_irqrestore(&base->lock, flags);
> }
>
> -
> /**
> * mod_timer - modify a timer's timeout
> * @timer: the timer to be modified

2008-03-23 11:00:53

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>>> With this one TSC is fine but now I get a warning on boot :
>>> Good. It confirms my assumptions about the root cause.
>>>
>>>> [ 0.041037] ------------[ cut here ]------------
>>>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>>> Grr. I'll work out a solution for that one.
>> Gabriel,
>>
>> I'm happy to rack your nerves some more.
>
> No worries :)
>
>> After discussing the issue with Peter and Ingo the following solution
>> seems to be the one which is the least intrusive.
>>
>> Can you please give it a test ride ?
>
> Done , git head + Andi's patch + this version of your patch does work here.
>
> Also time-warp-test is just fine and everything else seems to work.

Also I've tested with my other motherboard and is fine too :)

Feel free to add my Tested-by when you push this patch.

>
>
>> ---
>> include/linux/sched.h | 6 ++++++
>> kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>> kernel/timer.c | 10 +++++++++-
>> 3 files changed, 57 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/include/linux/sched.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/sched.h
>> +++ linux-2.6/include/linux/sched.h
>> @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)
>>
>> extern void sched_idle_next(void);
>>
>> +#ifdef CONFIG_NO_HZ
>> +extern void wake_up_idle_cpu(int cpu);
>> +#else
>> +static inline void wake_up_idle_cpu(int cpu) { }
>> +#endif
>> +
>> #ifdef CONFIG_SCHED_DEBUG
>> extern unsigned int sysctl_sched_latency;
>> extern unsigned int sysctl_sched_min_granularity;
>> Index: linux-2.6/kernel/sched.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/sched.c
>> +++ linux-2.6/kernel/sched.c
>> @@ -848,6 +848,48 @@ static inline void resched_task(struct t
>> __resched_task(p, TIF_NEED_RESCHED);
>> }
>>
>> +#ifdef CONFIG_NO_HZ
>> +/*
>> + * When add_timer_on() enqueues a timer into the timer wheel of an
>> + * idle CPU then this timer might expire before the next timer event
>> + * which is scheduled to wake up that CPU. In case of a completely
>> + * idle system the next event might even be infinite time into the
>> + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
>> + * leaves the inner idle loop so the newle added timer is taken into
>> + * account when the CPU goes back to idle and evaluates the timer
>> + * wheel for the next timer event.
>> + */
>> +void wake_up_idle_cpu(int cpu)
>> +{
>> + struct rq *rq = cpu_rq(cpu);
>> +
>> + if (cpu == smp_processor_id())
>> + return;
>> +
>> + /*
>> + * This is safe, as this function is called with the timer
>> + * wheel base lock of (cpu) held. When the CPU is on the way
>> + * to idle and has not yet set rq->curr to idle then it will
>> + * be serialized on the timer wheel base lock and take the new
>> + * timer into account automatically.
>> + */
>> + if (rq->curr != rq->idle)
>> + return;
>> +
>> + /*
>> + * We can set TIF_RESCHED on the idle task of the other CPU
>> + * lockless. The worst case is that the other CPU runs the
>> + * idle task through an additional NOOP schedule()
>> + */
>> + set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
>> +
>> + /* NEED_RESCHED must be visible before we test polling */
>> + smp_mb();
>> + if (!tsk_is_polling(rq->idle))
>> + smp_send_reschedule(cpu);
>> +}
>> +#endif
>> +
>> #ifdef CONFIG_SCHED_HRTICK
>> /*
>> * Use HR-timers to deliver accurate preemption points.
>> Index: linux-2.6/kernel/timer.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/timer.c
>> +++ linux-2.6/kernel/timer.c
>> @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
>> spin_lock_irqsave(&base->lock, flags);
>> timer_set_base(timer, base);
>> internal_add_timer(base, timer);
>> + /*
>> + * Check whether the other CPU is idle and needs to be
>> + * triggered to reevaluate the timer wheel when nohz is
>> + * active. We are protected against the other CPU fiddling
>> + * with the timer by holding the timer base lock. This also
>> + * makes sure that a CPU on the way to idle can not evaluate
>> + * the timer wheel.
>> + */
>> + wake_up_idle_cpu(cpu);
>> spin_unlock_irqrestore(&base->lock, flags);
>> }
>>
>> -
>> /**
>> * mod_timer - modify a timer's timeout
>> * @timer: the timer to be modified
>

2008-03-23 19:02:09

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Hi Rafael,

On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> Subject : INFO: task mount:11202 blocked for more than 120 seconds
> Submitter : Christian Kujau <[email protected]>
> Date : 2008-03-07 21:32 (10 days old)
> References : http://lkml.org/lkml/2008/3/7/308
> http://lkml.org/lkml/2008/3/9/186
>

The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532

Thanks for maintaining the regression list,
Christian.
--
BOFH excuse #91:

Mouse chewed through power cable

2008-03-23 19:07:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sunday, 23 of March 2008, Christian Kujau wrote:
> Hi Rafael,
>
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject : INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter : Christian Kujau <[email protected]>
> > Date : 2008-03-07 21:32 (10 days old)
> > References : http://lkml.org/lkml/2008/3/7/308
> > http://lkml.org/lkml/2008/3/9/186
> >
>
> The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
> I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532

Is the patch present in the mainline yet?

> Thanks for maintaining the regression list,

You're welcome. :-)

Thanks,
Rafael

2008-03-23 19:40:26

by Christian Lamparter

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Sunday 23 March 2008 20:06:56 Rafael J. Wysocki wrote:
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > > Subject : INFO: task mount:11202 blocked for more than 120 seconds
> > > Submitter : Christian Kujau <[email protected]>
> > > Date : 2008-03-07 21:32 (10 days old)
> > > References : http://lkml.org/lkml/2008/3/7/308
> > > http://lkml.org/lkml/2008/3/9/186
> >
> > The other Christian reported this as fixed:
> > http://lkml.org/lkml/2008/3/17/232 I too can confirm that the hangs are
> > gone now: http://lkml.org/lkml/2008/3/21/532
>
> Is the patch present in the mainline yet?
No... it isn't in the mainline?! (or was is commited as I wrote this mail?!)
anyway, can someone please merge the patch there?

http://lkml.org/lkml/2008/3/17/214

Regards,
Christian

2008-03-23 21:17:56

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> Submitter : Linas Žvirblis <[email protected]>
> Date : 2008-02-13 22:38 (33 days old)
> References : http://lkml.org/lkml/2008/2/13/566

Linas did not respond any more, and you closed the bug :)

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
> Subject : Spurious messages at boot, eventually hangs the usb subsustem
> Submitter : Jean-Luc Coulon <[email protected]>
> Date : 2008-02-20 09:10 (26 days old)

Hm, Jean-Luc said:
> ------- Comment #4 From Jean-Luc Coulon 2008-03-09 22:50:19 ----
> BTW, I can normally boot my system since rc4

Close?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
> Subject : 2.6.25-rc2 + smartd = hang
> Submitter : Anders Eriksson <[email protected]>
> Date : 2008-02-22 17:51 (24 days old)
> References : http://lkml.org/lkml/2008/2/22/239
> Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>

http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.


> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
> Subject : INFO: possible circular locking in the resume
> Submitter : Zdenek Kabelac <[email protected]>
> Date : 2008-02-27 (19 days old)
> References : http://lkml.org/lkml/2008/2/26/479
> Handled-By : Gautham R Shenoy <[email protected]>

Gautham said on 2008-02-28 he has a patch - but did not post it. What now?


> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
> Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
> Submitter : Marcin Slusarz <[email protected]>
> Date : 2008-03-02 20:00 (15 days old)
> References : http://lkml.org/lkml/2008/3/2/91
> Handled-By : Peter Zijlstra <[email protected]>

Seems to be fixed: http://lkml.org/lkml/2008/3/23/275

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> Submitter : Gabriel C <[email protected]>
> Date : 2008-02-24 01:31 (22 days old)
> References : http://lkml.org/lkml/2008/2/23/380
> http://lkml.org/lkml/2008/2/24/281
> Handled-By : Thomas Gleixner <[email protected]>

Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
Which introduced a WARNNG:, fixed by the subsequent:
http://lkml.org/lkml/2008/3/23/199

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190
> Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
> Submitter : Tarkan Erimer <[email protected]>
> Date : 2008-03-05 05:01 (12 days old)
> References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html

Bugzila entry is closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211
> Subject : drivers/media/video/cx2341x.c: undefined references
> Submitter : Toralf Förster <[email protected]>
> Date : 2008-03-07 13:48 (10 days old)
> References : http://lkml.org/lkml/2008/3/7/168

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10234
> Subject : pciehp hang on hp ia64 rx6600
> Submitter : Alex Chiang <[email protected]>
> Date : 2008-03-12 00:47 (5 days old)
> References : http://lkml.org/lkml/2008/3/12/31
> Handled-By : Mark Lord <[email protected]>

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10238
> Subject : netconsole still hangs
> Submitter : Andrew Morton <[email protected]>
> Date : 2008-03-12 23:14 (5 days old)
> References : http://marc.info/?t=120536379200004&amp;r=1&amp;w=2
> Handled-By : David Miller <[email protected]>
> Stephen Hemminger <[email protected]>

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10242
> Subject : rm command hangs
> Submitter : Jean-Luc Coulon <[email protected]>
> Date : 2008-03-14 05:47 (3 days old)

Maybe related to http://bugzilla.kernel.org/show_bug.cgi?id=10207, which
is (about to be) closed?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10266
> Subject : [PATCH] i810fb: Fix console switch regression
> Submitter : Stefan Bauer <[email protected]>
> Date : 2008-03-16 19:42 (1 days old)
> References : http://lkml.org/lkml/2008/3/16/84

Closed.

> Regressionn with patches
> ------------------------
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10016
> Subject : cobalt_btns.c &lt;-&gt; struct platform_device compile error
> Submitter : Adrian Bunk <[email protected]>
> Date : 2008-02-17 12:12 (29 days old)
> References : http://lkml.org/lkml/2008/2/17/293
> Handled-By : Yoichi Yuasa <[email protected]>
> Patch : http://lkml.org/lkml/2008/3/9/25

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10017
> Subject : cdev removal broke cobalt_btns.c compilation
> Submitter : Adrian Bunk <[email protected]>
> Date : 2008-02-17 12:14 (29 days old)
> References : http://lkml.org/lkml/2008/2/17/295
> Handled-By : Yoichi Yuasa <[email protected]>
> Patch : http://lkml.org/lkml/2008/3/9/25

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10186
> Subject : SCSI_AIC94XX must depend on SCSI
> Submitter : Toralf Förster <[email protected]>
> Date : 2008-03-06 19:09 (11 days old)
> References : http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
> Handled-By : Adrian Bunk <[email protected]>
> Patch : http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2

Testing...

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10210
> Subject : 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
> Submitter : Rafael J. Wysocki <[email protected]>
> Date : 2008-03-08 22:46 (9 days old)
> References : http://lkml.org/lkml/2008/3/8/123
> Handled-By : Tejun Heo <[email protected]>
> Patch : http://lkml.org/lkml/2008/3/10/69

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10232
> Subject : intel mtrr fixups apparently broke display and e1000 probe
> Submitter : Stephen Gran <[email protected]>
> Date : 2008-03-12 08:37 (5 days old)
> Handled-By : Yinghai Lu <[email protected]>
> Patch : http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view

Closed.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10259
> Subject : /sys/class/hwmon/hwmon0 is missing a device link
> Submitter : Jean-Luc Coulon <[email protected]>
> Date : 2008-03-16 04:56 (1 days old)
> Handled-By : Jean Delvare <[email protected]>
> Patch : http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view

Closed.


Thanks,
Christian.
--
BOFH excuse #387:

Your computer's union contract is set to expire at midnight.

2008-03-23 21:30:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Hi,

Please have a look at the latest report:
http://lkml.org/lkml/2008/3/21/516

On Sunday, 23 of March 2008, Christian Kujau wrote:
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter : Linas Žvirblis <[email protected]>
> > Date : 2008-02-13 22:38 (33 days old)
> > References : http://lkml.org/lkml/2008/2/13/566
>
> Linas did not respond any more, and you closed the bug :)
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
> > Subject : Spurious messages at boot, eventually hangs the usb subsustem
> > Submitter : Jean-Luc Coulon <[email protected]>
> > Date : 2008-02-20 09:10 (26 days old)
>
> Hm, Jean-Luc said:
> > ------- Comment #4 From Jean-Luc Coulon 2008-03-09 22:50:19 ----
> > BTW, I can normally boot my system since rc4
>
> Close?

Yes, if he doesn't respond for a couple of days.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
> > Subject : 2.6.25-rc2 + smartd = hang
> > Submitter : Anders Eriksson <[email protected]>
> > Date : 2008-02-22 17:51 (24 days old)
> > References : http://lkml.org/lkml/2008/2/22/239
> > Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
>
> http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.

Yes, it's closed now.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
> > Subject : INFO: possible circular locking in the resume
> > Submitter : Zdenek Kabelac <[email protected]>
> > Date : 2008-02-27 (19 days old)
> > References : http://lkml.org/lkml/2008/2/26/479
> > Handled-By : Gautham R Shenoy <[email protected]>
>
> Gautham said on 2008-02-28 he has a patch - but did not post it. What now?

The reporter is unresponsive. We're waiting for him to respond.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
> > Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
> > Submitter : Marcin Slusarz <[email protected]>
> > Date : 2008-03-02 20:00 (15 days old)
> > References : http://lkml.org/lkml/2008/3/2/91
> > Handled-By : Peter Zijlstra <[email protected]>
>
> Seems to be fixed: http://lkml.org/lkml/2008/3/23/275

Yes, I've already updated the entry with this patch.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
> > Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter : Gabriel C <[email protected]>
> > Date : 2008-02-24 01:31 (22 days old)
> > References : http://lkml.org/lkml/2008/2/23/380
> > http://lkml.org/lkml/2008/2/24/281
> > Handled-By : Thomas Gleixner <[email protected]>
>
> Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
> Which introduced a WARNNG:, fixed by the subsequent:
> http://lkml.org/lkml/2008/3/23/199

This is not on the list any more.

BTW, the reports I send reflect the state of the Bugzilla entries. The closed
entries will not be reported next time.

Thanks,
Rafael

2008-03-23 23:32:01

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>>>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>>>> With this one TSC is fine but now I get a warning on boot :
>>>> Good. It confirms my assumptions about the root cause.
>>>>
>>>>> [ 0.041037] ------------[ cut here ]------------
>>>>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>>>> Grr. I'll work out a solution for that one.
>>> Gabriel,
>>>
>>> I'm happy to rack your nerves some more.
>> No worries :)
>>
>>> After discussing the issue with Peter and Ingo the following solution
>>> seems to be the one which is the least intrusive.
>>>
>>> Can you please give it a test ride ?
>> Done , git head + Andi's patch + this version of your patch does work here.
>>
>> Also time-warp-test is just fine and everything else seems to work.
>
> Also I've tested with my other motherboard and is fine too :)
>
> Feel free to add my Tested-by when you push this patch.

Heh :/

...

[ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
[ 5920.650516] Time: acpi_pm clocksource has been installed.

...

Seems like something still triggers that :/

2008-03-24 10:26:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Mon, 24 Mar 2008, Gabriel C wrote:
> >>> Can you please give it a test ride ?
> >> Done , git head + Andi's patch + this version of your patch does work here.
> >>
> >> Also time-warp-test is just fine and everything else seems to work.
> >
> > Also I've tested with my other motherboard and is fine too :)
> >
> > Feel free to add my Tested-by when you push this patch.
>
> Heh :/
>
> ...
>
> [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
> [ 5920.650516] Time: acpi_pm clocksource has been installed.
>
> ...
>
> Seems like something still triggers that :/

Hmm. Can you please apply the patch below. It add some more info and
triggers the sysrq-q timer list printout when the watchdog
triggers. That might us give some insight into this.

Thanks,
tglx

---
kernel/time/clocksource.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -87,8 +87,10 @@ static void clocksource_ratewd(struct cl
if (delta > -WATCHDOG_THRESHOLD && delta < WATCHDOG_THRESHOLD)
return;

- printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
- cs->name, delta);
+ printk(KERN_WARNING
+ "Clocksource %s unstable (delta = %Ld ns) E:%lu J:%lu\n",
+ cs->name, delta, watchdog_timer.expires, jiffies);
+ sysrq_timer_list_show();
cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
clocksource_change_rating(cs, 0);
list_del(&cs->wd_list);

2008-03-24 22:33:22

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Mon, 24 Mar 2008, Gabriel C wrote:
>>>>> Can you please give it a test ride ?
>>>> Done , git head + Andi's patch + this version of your patch does work here.
>>>>
>>>> Also time-warp-test is just fine and everything else seems to work.
>>> Also I've tested with my other motherboard and is fine too :)
>>>
>>> Feel free to add my Tested-by when you push this patch.
>> Heh :/
>>
>> ...
>>
>> [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
>> [ 5920.650516] Time: acpi_pm clocksource has been installed.
>>
>> ...
>>
>> Seems like something still triggers that :/
>
> Hmm. Can you please apply the patch below. It add some more info and
> triggers the sysrq-q timer list printout when the watchdog
> triggers. That might us give some insight into this.

Sorry for the lag , I was out the whole day.

Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):

...

[34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
[34528.893380] Timer List Version: v0.3
[34528.893386] HRTIMER_MAX_CLOCK_BASES: 2
[34528.893392] now at 34510722407314 nsecs
[34528.893396]
[34528.893399] cpu: 0
[34528.893402] clock 0:
[34528.893404] .index: 0
[34528.893407] .resolution: 1 nsecs
[34528.893409] .get_time: ktime_get_real
[34528.893422] .offset: 1206358214734619011 nsecs
[34528.893425] active timers:
[34528.893428] clock 1:
[34528.893430] .index: 1
[34528.893433] .resolution: 1 nsecs
[34528.893435] .get_time: ktime_get
[34528.893440] .offset: 0 nsecs
[34528.893443] active timers:
[34528.893445] #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893467] # expires at 34510723000000 nsecs [in 592686 nsecs]
[34528.893470] #1: <e26a7d68>, it_real_fn, S:01
[34528.893481] # expires at 34510724648354 nsecs [in 2241040 nsecs]
[34528.893485] #2: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893495] # expires at 34510997616597 nsecs [in 275209283 nsecs]
[34528.893498] #3: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893508] # expires at 34511115498292 nsecs [in 393090978 nsecs]
[34528.893512] #4: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893521] # expires at 34511328809630 nsecs [in 606402316 nsecs]
[34528.893525] #5: <e26a7d68>, it_real_fn, S:01
[34528.893534] # expires at 34511515619673 nsecs [in 793212359 nsecs]
[34528.893537] #6: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893547] # expires at 34512265383335 nsecs [in 1542976021 nsecs]
[34528.893551] #7: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893561] # expires at 34518835323224 nsecs [in 8112915910 nsecs]
[34528.893564] #8: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893574] # expires at 34546891223588 nsecs [in 36168816274 nsecs]
[34528.893578] #9: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893588] # expires at 36035545999324 nsecs [in 1524823592010 nsecs]
[34528.893592] #10: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893601] # expires at 36035980869577 nsecs [in 1525258462263 nsecs]
[34528.893606] .expires_next : 34510723000000 nsecs
[34528.893609] .hres_active : 1
[34528.893612] .nr_events : 3447408
[34528.893615] .nohz_mode : 2
[34528.893618] .idle_tick : 34510712000000 nsecs
[34528.893621] .tick_stopped : 0
[34528.893624] .idle_jiffies : 34210712
[34528.893627] .idle_calls : 3267634
[34528.893630] .idle_sleeps : 1588325
[34528.893633] .idle_entrytime : 34510722486118 nsecs
[34528.893636] .idle_waketime : 34510722348607 nsecs
[34528.893640] .idle_exittime : 34510722383780 nsecs
[34528.893643] .idle_sleeptime : 33379861006002 nsecs
[34528.893646] .last_jiffies : 34210723
[34528.893649] .next_jiffies : 34210725
[34528.893652] .idle_expires : 34510724000000 nsecs
[34528.893655] jiffies: 34210723
[34528.893657]
[34528.893660] cpu: 1
[34528.893662] clock 0:
[34528.893664] .index: 0
[34528.893666] .resolution: 1 nsecs
[34528.893669] .get_time: ktime_get_real
[34528.893675] .offset: 1206358214734619011 nsecs
[34528.893677] active timers:
[34528.893680] clock 1:
[34528.893682] .index: 1
[34528.893685] .resolution: 1 nsecs
[34528.893687] .get_time: ktime_get
[34528.893692] .offset: 0 nsecs
[34528.893694] active timers:
[34528.893697] #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893706] # expires at 34510996000000 nsecs [in 273592686 nsecs]
[34528.893710] .expires_next : 34510996000000 nsecs
[34528.893713] .hres_active : 1
[34528.893716] .nr_events : 3081558
[34528.893719] .nohz_mode : 2
[34528.893722] .idle_tick : 34510713125000 nsecs
[34528.893725] .tick_stopped : 1
[34528.893727] .idle_jiffies : 34210713
[34528.893730] .idle_calls : 2673472
[34528.893733] .idle_sleeps : 1233326
[34528.893736] .idle_entrytime : 34510712135468 nsecs
[34528.893740] .idle_waketime : 34507995998292 nsecs
[34528.893743] .idle_exittime : 34510711012024 nsecs
[34528.893746] .idle_sleeptime : 33654735968486 nsecs
[34528.893749] .last_jiffies : 34210713
[34528.893752] .next_jiffies : 34210997
[34528.893755] .idle_expires : 34510996000000 nsecs
[34528.893758] jiffies: 34210723
[34528.893760]
[34528.893763] cpu: 2
[34528.893765] clock 0:
[34528.893767] .index: 0
[34528.893769] .resolution: 1 nsecs
[34528.893772] .get_time: ktime_get_real
[34528.893778] .offset: 1206358214734619011 nsecs
[34528.893780] active timers:
[34528.893783] clock 1:
[34528.893785] .index: 1
[34528.893787] .resolution: 1 nsecs
[34528.893790] .get_time: ktime_get
[34528.893795] .offset: 0 nsecs
[34528.893797] active timers:
[34528.893799] #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893809] # expires at 34511541000000 nsecs [in 818592686 nsecs]
[34528.893813] .expires_next : 34511541000000 nsecs
[34528.893815] .hres_active : 1
[34528.893818] .nr_events : 2005329
[34528.893821] .nohz_mode : 2
[34528.893824] .idle_tick : 34510562250000 nsecs
[34528.893827] .tick_stopped : 1
[34528.893830] .idle_jiffies : 34210562
[34528.893833] .idle_calls : 1749202
[34528.893836] .idle_sleeps : 898585
[34528.893839] .idle_entrytime : 34510561258541 nsecs
[34528.893842] .idle_waketime : 34509285251187 nsecs
[34528.893845] .idle_exittime : 34510176022616 nsecs
[34528.893848] .idle_sleeptime : 33931425421772 nsecs
[34528.893851] .last_jiffies : 34210562
[34528.893854] .next_jiffies : 34211542
[34528.893858] .idle_expires : 34511541000000 nsecs
[34528.893860] jiffies: 34210723
[34528.893863]
[34528.893865] cpu: 3
[34528.893867] clock 0:
[34528.893869] .index: 0
[34528.893872] .resolution: 1 nsecs
[34528.893874] .get_time: ktime_get_real
[34528.893880] .offset: 1206358214734619011 nsecs
[34528.893883] active timers:
[34528.893885] clock 1:
[34528.893887] .index: 1
[34528.893890] .resolution: 1 nsecs
[34528.893892] .get_time: ktime_get
[34528.893897] .offset: 0 nsecs
[34528.893899] active timers:
[34528.893902] #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893911] # expires at 34510723375000 nsecs [in 967686 nsecs]
[34528.893915] .expires_next : 34510723375000 nsecs
[34528.893918] .hres_active : 1
[34528.893921] .nr_events : 1532911
[34528.893923] .nohz_mode : 2
[34528.893926] .idle_tick : 34510713375000 nsecs
[34528.893929] .tick_stopped : 0
[34528.893932] .idle_jiffies : 34210714
[34528.893935] .idle_calls : 1350449
[34528.893938] .idle_sleeps : 896094
[34528.893941] .idle_entrytime : 34510713334805 nsecs
[34528.893944] .idle_waketime : 34509973216268 nsecs
[34528.893947] .idle_exittime : 34510722367621 nsecs
[34528.893951] .idle_sleeptime : 34031256949569 nsecs
[34528.893954] .last_jiffies : 34210714
[34528.893957] .next_jiffies : 34240714
[34528.893960] .idle_expires : 34540713000000 nsecs
[34528.893963] jiffies: 34210723
[34528.893965]
[34528.893967]
[34528.893969] Tick Device: mode: 1
[34528.893972] Clock Event Device: pit
[34528.893976] max_delta_ns: 27461866
[34528.893979] min_delta_ns: 12571
[34528.893982] mult: 5124677
[34528.893984] shift: 32
[34528.893987] mode: 1
[34528.893990] next_event: 9223372036854775807 nsecs
[34528.893992] set_next_event: pit_next_event
[34528.894000] set_mode: init_pit_timer
[34528.894005] event_handler: tick_handle_oneshot_broadcast
[34528.894013] tick_broadcast_mask: 00000000
[34528.894016] tick_broadcast_oneshot_mask: 00000000
[34528.894019]
[34528.894021]
[34528.894023] Tick Device: mode: 1
[34528.894026] Clock Event Device: lapic
[34528.894030] max_delta_ns: 1346255303
[34528.894033] min_delta_ns: 2407
[34528.894035] mult: 26762229
[34528.894038] shift: 32
[34528.894041] mode: 3
[34528.894044] next_event: 34510724000000 nsecs
[34528.894046] set_next_event: lapic_next_event
[34528.894054] set_mode: lapic_timer_setup
[34528.894059] event_handler: hrtimer_interrupt
[34528.894064]
[34528.894066] Tick Device: mode: 1
[34528.894069] Clock Event Device: lapic
[34528.894073] max_delta_ns: 1346255303
[34528.894075] min_delta_ns: 2407
[34528.894078] mult: 26762229
[34528.894081] shift: 32
[34528.894083] mode: 3
[34528.894086] next_event: 34510996000000 nsecs
[34528.894089] set_next_event: lapic_next_event
[34528.894094] set_mode: lapic_timer_setup
[34528.894099] event_handler: hrtimer_interrupt
[34528.894104]
[34528.894107] Tick Device: mode: 1
[34528.894109] Clock Event Device: lapic
[34528.894113] max_delta_ns: 1346255303
[34528.894115] min_delta_ns: 2407
[34528.894118] mult: 26762229
[34528.894121] shift: 32
[34528.894123] mode: 3
[34528.894126] next_event: 34511541000000 nsecs
[34528.894129] set_next_event: lapic_next_event
[34528.894134] set_mode: lapic_timer_setup
[34528.894139] event_handler: hrtimer_interrupt
[34528.894144]
[34528.894146] Tick Device: mode: 1
[34528.894149] Clock Event Device: lapic
[34528.894153] max_delta_ns: 1346255303
[34528.894155] min_delta_ns: 2407
[34528.894158] mult: 26762229
[34528.894161] shift: 32
[34528.894163] mode: 3
[34528.894166] next_event: 34510723375000 nsecs
[34528.894169] set_next_event: lapic_next_event
[34528.894174] set_mode: lapic_timer_setup
[34528.894179] event_handler: hrtimer_interrupt
[34528.894184]
[34528.894350] Time: acpi_pm clocksource has been installed.

...

And that made irqbalance go mad which got killed by OOM , very strange.


>
> Thanks,
> tglx


Gabriel

2008-03-25 08:07:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Mon, 24 Mar 2008, Gabriel C wrote:
> > Hmm. Can you please apply the patch below. It add some more info and
> > triggers the sysrq-q timer list printout when the watchdog
> > triggers. That might us give some insight into this.
>
> Sorry for the lag , I was out the whole day.
>
> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
>
> ...
>
> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723

Ok. The timer got delayed. It got delayed because it is initialized as
a deferrable timer, which is obviously wrong. Sigh, I signed off on
that commit myself without thinking about the consequences.

Can you please apply the patch below on top of the others?

> ...
>
> And that made irqbalance go mad which got killed by OOM , very strange.

Ouch.

revert: 1077f5a917b7c630231037826b344b2f7f5b903f

---
kernel/time/clocksource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
if (watchdog)
del_timer(&watchdog_timer);
watchdog = cs;
- init_timer_deferrable(&watchdog_timer);
+ init_timer(&watchdog_timer);
watchdog_timer.function = clocksource_watchdog;

/* Reset watchdog cycles */

2008-03-26 12:44:07

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

Thomas Gleixner wrote:
> On Mon, 24 Mar 2008, Gabriel C wrote:
>>> Hmm. Can you please apply the patch below. It add some more info and
>>> triggers the sysrq-q timer list printout when the watchdog
>>> triggers. That might us give some insight into this.
>> Sorry for the lag , I was out the whole day.
>>
>> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
>>
>> ...
>>
>> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
>
> Ok. The timer got delayed. It got delayed because it is initialized as
> a deferrable timer, which is obviously wrong. Sigh, I signed off on
> that commit myself without thinking about the consequences.
>
> Can you please apply the patch below on top of the others?

Box is up for almost one day with that patch on top the other ones and everything is fine so far.


> revert: 1077f5a917b7c630231037826b344b2f7f5b903f
>
> ---
> kernel/time/clocksource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/time/clocksource.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/clocksource.c
> +++ linux-2.6/kernel/time/clocksource.c
> @@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
> if (watchdog)
> del_timer(&watchdog_timer);
> watchdog = cs;
> - init_timer_deferrable(&watchdog_timer);
> + init_timer(&watchdog_timer);
> watchdog_timer.function = clocksource_watchdog;
>
> /* Reset watchdog cycles */
>
>


Gabriel

2008-03-26 14:58:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

On Wed, 26 Mar 2008, Gabriel C wrote:
> Thomas Gleixner wrote:
> > On Mon, 24 Mar 2008, Gabriel C wrote:
> >>> Hmm. Can you please apply the patch below. It add some more info and
> >>> triggers the sysrq-q timer list printout when the watchdog
> >>> triggers. That might us give some insight into this.
> >> Sorry for the lag , I was out the whole day.
> >>
> >> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
> >>
> >> ...
> >>
> >> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
> >
> > Ok. The timer got delayed. It got delayed because it is initialized as
> > a deferrable timer, which is obviously wrong. Sigh, I signed off on
> > that commit myself without thinking about the consequences.
> >
> > Can you please apply the patch below on top of the others?
>
> Box is up for almost one day with that patch on top the other ones and everything is fine so far.

Thanks for testing. I push the patches Linuswards.

@Andi: The revert of the reverted clocksource watchdog is staged for .26

Thanks,

tglx