2008-03-10 23:16:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.25-rc5: Reported regressions from 2.6.24

[We have closed some entries since yesterday, located some patches and found
a couple of new regressions, so here's an updated report.]

This message contains a list of some regressions from 2.6.24 reported since
2.6.25-rc1 was released, for which there are no fixes in the mainline I know
of. ?If any of them have been fixed already, please let me know.

If you know of any other unresolved regressions from 2.6.24, please let me know
either and I'll add them to the list. ?Also, please let me know if any of the
entries below are invalid.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2008-03-11 141 58 43
2008-03-10 138 66 47
? 2008-03-03 ? ? ?115 ? ? ? 65 ? ? ? ? ?49
? 2008-02-25 ? ? ? 90 ? ? ? 51 ? ? ? ? ?39
? 2008-02-17 ? ? ? 61 ? ? ? 45 ? ? ? ? ?37


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9954
Subject : iwl3945: not only it periodically dies, it also BUG()s
Submitter : Pavel Machek <[email protected]>
Date : 2008-02-05 22:44
References : http://lkml.org/lkml/2008/2/5/453
Handled-By : Chatre, Reinette <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9958
Subject : parisc compile error
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-08 01:12
References : http://lkml.org/lkml/2008/2/7/572
Handled-By : Kyle McMartin <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
Subject : mount: could not find filesystem
Submitter : Kamalesh Babulal <[email protected]>
Date : 2008-02-12 14:34
References : http://lkml.org/lkml/2008/2/12/91
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
Yinghai Lu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
Submitter : Ben Nizette <[email protected]>
Date : 2008-02-12 12:46
References : http://lkml.org/lkml/2008/2/12/148
Handled-By : Haavard Skinnemoen <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978
Subject : 2.6.25-rc1: volanoMark 45% regression
Submitter : Zhang, Yanmin <[email protected]>
Date : Fri Jan 25 21:08:00 2008 +0100
References : http://lkml.org/lkml/2008/2/13/128
Handled-By : Srivatsa Vaddagiri <[email protected]>
Balbir Singh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9980
Subject : 2.6.25-rc1 on Sun Ultra 40
Submitter : Jasper Bryant-Greene <[email protected]>
Date : 2008-02-13 12:25
References : http://lkml.org/lkml/2008/2/13/181
Handled-By : Yinghai Lu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
Submitter : Linas ?virblis <[email protected]>
Date : 2008-02-13 22:38
References : http://lkml.org/lkml/2008/2/13/566


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9984
Subject : problem with starting 2.6.25-rc1 and latest git
Submitter : Mariusz Kozlowski <[email protected]>
Date : 2008-02-13 23:16
References : http://lkml.org/lkml/2008/2/13/587


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9992
Subject : 2.6.24-git: kmap_atomic() WARN_ON()
Submitter : Thomas Gleixner <[email protected]>
Date : 2008-02-07 00:58
References : http://lkml.org/lkml/2008/2/6/451
http://lkml.org/lkml/2007/1/14/38


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9995
Subject : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-02-15 04:51
Handled-By : Zhang Rui <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10011
Subject : The computer is blocked when X is started
Submitter : Fran?ois Valenduc <[email protected]>
Date : 2008-02-17 06:28
Handled-By : Thomas Gleixner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10016
Subject : cobalt_btns.c &lt;-&gt; struct platform_device compile error
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-17 12:12
References : http://lkml.org/lkml/2008/2/17/293


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10017
Subject : cdev removal broke cobalt_btns.c compilation
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-17 12:14
References : http://lkml.org/lkml/2008/2/17/295


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10025
Subject : Current git very broken on the Dreamcast
Submitter : Adrian McMenamin <[email protected]>
Date : 2008-02-16 19:38
References : http://lkml.org/lkml/2008/2/16/196
Handled-By : Kristoffer Ericson <[email protected]>
Magnus Damm <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10027
Subject : 2.6.25-rc[12] Video4Linux Bttv Regression
Submitter : Bongani Hlope <[email protected]>
Date : 2008-02-17 09:36
References : http://lkml.org/lkml/2008/2/17/55
Handled-By : Mauro Carvalho Chehab <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10033
Subject : mips yosemite_defconfig compile error
Submitter : Adrian Bunk <[email protected]>
Date : 2008-02-17 16:45
References : http://lkml.org/lkml/2008/2/17/383


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10041
Subject : 2.6.25-rc1/2 regression: first-time login into gnome fails
Submitter : Romano Giannetti <[email protected]>
Date : 2008-02-18 11:56
References : http://lkml.org/lkml/2008/2/18/145
Handled-By : Ray Lee <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
Subject : Spurious messages at boot, eventually hangs the usb subsustem
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-02-20 09:10


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10061
Subject : Hang in md5_resync
Submitter : Steinar H. Gunderson <[email protected]>
Date : 2008-02-21 13:13


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10065
Subject : 2.6.25-rc2 regression - hang on suspend
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-19 12:59
References : http://lkml.org/lkml/2008/2/19/165
http://lkml.org/lkml/2008/2/17/381
Handled-By : Rafael J. Wysocki <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10067
Subject : TUNER_TDA8290=y, VIDEO_DEV=n build error
Submitter : Toralf F?rster <[email protected]>
Date : 2008-02-22 10:36
References : http://lkml.org/lkml/2008/2/19/262


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10078
Subject : USB OOPS 2.6.25-rc2-git1
Submitter : Andre Tomt <[email protected]>
Date : 2008-02-19 16:19
References : http://lkml.org/lkml/2008/2/19/253
Handled-By : David Brownell <[email protected]>
Alan Stern <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10080
Subject : 2.6.25-rc2: ohci1394 problem
Submitter : Thomas Meyer <[email protected]>
Date : 2008-02-20 08:47
References : http://lkml.org/lkml/2008/2/20/58
Handled-By : Stefan Richter <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10082
Subject : [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc
Submitter : Kamalesh Babulal <[email protected]>
Date : 2008-02-20 16:01
References : http://lkml.org/lkml/2008/2/20/218
http://lkml.org/lkml/2008/1/18/71


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10084
Subject : 2.6.25-rc2-git4 BUG: sysfs_readdir
Submitter : Randy Dunlap <[email protected]>
Date : 2008-02-21 17:25
References : http://lkml.org/lkml/2008/2/21/212
Handled-By : Greg KH <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
Subject : 2.6.25-rc2 + smartd = hang
Submitter : Anders Eriksson <[email protected]>
Date : Sat Jan 26 20:13:12 2008 +0100
References : http://lkml.org/lkml/2008/2/22/239
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10093
Subject : 2.6.25-current-git hangs on boot
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-23 18:55
References : http://lkml.org/lkml/2008/2/23/263
http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
Handled-By : Pallipadi, Venkatesh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10097
Subject : SMP BUG in __nf_conntrack_find
Submitter : Christian Casteyde <[email protected]>
Date : 2008-02-25 10:44


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10100
Subject : 208c70a45624400fafd7511b96bc426bf01f8f5e breaks EC init
Submitter : Michael S. Tsirkin <[email protected]>
Date : 2008-02-25 20:19
References : http://lkml.org/lkml/2008/2/25/282
Handled-By : Alexey Starikovskiy <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10102
Subject : 2.6.25-rc2 Regression Thinkpad acpi
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-02-25 12:47
References : http://lkml.org/lkml/2008/2/25/73
Handled-By : Henrique de Moraes Holschuh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10117
Subject : 2.6.25-current-git hangs on boot (pci=nommconf helps)
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2008-02-23 18:55
References : http://lkml.org/lkml/2008/2/23/263


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
Subject : INFO: possible circular locking in the resume
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-02-27
References : http://lkml.org/lkml/2008/2/26/479
Handled-By : Gautham R Shenoy <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
Submitter : Marcin Slusarz <[email protected]>
Date : Fri Jan 25 21:08:29 2008 +0100
References : http://lkml.org/lkml/2008/3/2/91
Handled-By : Peter Zijlstra <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
Submitter : Gabriel C <[email protected]>
Date : 2008-02-24 01:31
References : http://lkml.org/lkml/2008/2/23/380
http://lkml.org/lkml/2008/2/24/281
Handled-By : Thomas Gleixner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10156
Subject : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-02-28 11:25
References : http://lkml.org/lkml/2008/2/28/106


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10164
Subject : ntfs build failure on no-mmu
Submitter : Mike Frysinger <[email protected]>
Date : 2008-03-03 11:05
References : http://lkml.org/lkml/2008/3/1/179


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10172
Subject : INFO: inconsistent lock state
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-03-05 03:26


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190
Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
Submitter : Tarkan Erimer <[email protected]>
Date : 2008-03-05 05:01
References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10191
Subject : Treason uncloaked spams syslog with latest git
Submitter : Thomas Gleixner <[email protected]>
Date : 2008-03-06 05:47
References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/2444.html


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10203
Subject : Unable to ifconfig up b43 wireless interface
Submitter : Christian Casteyde <[email protected]>
Date : 2008-03-09 00:55
Handled-By : Michael Buesch <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
Subject : INFO: task mount:11202 blocked for more than 120 seconds
Submitter : Christian Kujau <[email protected]>
Date : 2008-03-07 21:32
References : http://lkml.org/lkml/2008/3/7/308
http://lkml.org/lkml/2008/3/9/186
Handled-By : David Chinner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211
Subject : build #408 issue for v2.6.25-rc4-56-gd7fe321 in cx2341x_ctrl_get_menu
Submitter : Toralf F?rster <[email protected]>
Date : 2008-03-07 13:48
References : http://lkml.org/lkml/2008/3/7/168


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10214
Subject : [regression] 2.6.25-rc4 snd-es18xx broken on Alpha
Submitter : Bob Tracy <[email protected]>
Date : 2008-03-08 04:58
References : http://lkml.org/lkml/2008/3/7/409
Handled-By : Ivan Kokshaysky <[email protected]>
Rene Herman <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10219
Subject : 25rc4-git3 blockdev/loopback related lockdep trace.
Submitter : Dave Jones <[email protected]>
Date : 2008-03-10 16:01
References : http://lkml.org/lkml/2008/3/10/120


Regressionn with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9969
Subject : 2.6.24-git15 Keyboard Issue?
Submitter : Chris Holvenstot <[email protected]>
Date : 2008-02-06 14:02
References : http://lkml.org/lkml/2008/2/6/100
http://lkml.org/lkml/2008/2/13/82
Handled-By : Thomas Gleixner <[email protected]>
Patch : http://lkml.org/lkml/2008/2/15/343


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10013
Subject : tbench regression in 2.6.25-rc1
Submitter : Zhang, Yanmin <[email protected]>
Date : 2008-02-15 02:52
References : http://lkml.org/lkml/2008/2/14/546
Handled-By : Eric Dumazet <[email protected]>
David Miller <[email protected]>
Patch : http://lkml.org/lkml/2008/2/18/66
http://lkml.org/lkml/2008/2/18/117


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10031
Subject : [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...
Submitter : Andrey Borzenkov <[email protected]>
Date : 2008-02-16 13:36
References : http://lkml.org/lkml/2008/2/17/125
Handled-By : Kok, Auke <[email protected]>
Patch : http://lkml.org/lkml/2008/2/21/259


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10104
Subject : 2.6.25-rc3: WARNING: at arch/x86/mm/ioremap.c:137
Submitter : Phil Oester <[email protected]>
Date : 2008-02-25 03:09
References : http://lkml.org/lkml/2008/2/24/265
Handled-By : Ingo Molnar <[email protected]>
Patch : http://lkml.org/lkml/2008/3/10/240


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10122
Subject : FIXED_PHY must depend on PHYLIB=y
Submitter : Olaf Hering <[email protected]>
Date : 2008-02-27 07:14
References : http://lkml.org/lkml/2008/2/27/90
Handled-By : Adrian Bunk <[email protected]>
Patch : http://lkml.org/lkml/2008/2/27/157


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3) kernels
Submitter : Guennadi Liakhovetski <[email protected]>
Date : Tue May 22 22:47:54 2007 -0400
References : http://lkml.org/lkml/2008/3/10/340
Handled-By : Gautham R Shenoy <[email protected]>
Patch : http://lkml.org/lkml/2008/3/10/91


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10132
Subject : 2.6.25 git regression, oops on boot
Submitter : Jonathan McDowell <[email protected]>
Date : 2008-02-29 11:09
References : http://marc.info/?l=linux-kernel&amp;m=120423268404812&amp;w=2
http://lkml.org/lkml/2008/2/28/369
Handled-By : Zhang Rui <[email protected]>
Lin Ming <[email protected]>
Patch : http://lkml.org/lkml/2008/2/29/49


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10153
Subject : (regression) kernel/timeconst.h bugs with HZ=128
Submitter : David Brownell <[email protected]>
Date : 2008-02-26 19:32
References : http://lkml.org/lkml/2008/2/26/294
Handled-By : H. Peter Anvin <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10168
Subject : WARNING: at drivers/usb/host/ehci-hcd.c:287
Submitter : Christian Kujau <[email protected]>
Date : 2008-03-03 01:05
References : http://lkml.org/lkml/2008/3/2/171
Handled-By : Alan Stern <[email protected]>
David Brownell <[email protected]>
Patch : http://lkml.org/lkml/2008/3/4/420


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10186
Subject : SCSI_AIC94XX must depend on SCSI
Submitter : Toralf F?rster <[email protected]>
Date : 2008-03-06 19:09
References : http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
Handled-By : Adrian Bunk <[email protected]>
Patch : http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10189
Subject : libata: allow LLDs w/o any reset method
Submitter : Ingo Molnar <[email protected]>
Date : 2008-03-06 10:25
References : http://marc.info/?l=linux-kernel&amp;m=120479928020617&amp;w=2
Handled-By : Tejun Heo <[email protected]>
Patch : http://marc.info/?l=linux-ide&amp;m=120477660124629&amp;w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10209
Subject : 2.6.25 sysdev API problem
Submitter : Mikael Pettersson <[email protected]>
Date : 2008-03-08 16:56
References : http://lkml.org/lkml/2008/3/8/59
Handled-By : Greg KH <[email protected]>
Balaji Rao <[email protected]>
Patch : http://lkml.org/lkml/2008/3/9/10


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10210
Subject : [Regression] 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
Submitter : Rafael J. Wysocki <[email protected]>
Date : 2008-03-08 22:46
References : http://lkml.org/lkml/2008/3/8/123
Handled-By : Tejun Heo <[email protected]>
Patch : http://lkml.org/lkml/2008/3/10/69


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10218
Subject : [patch] fix ACPI boot regression (was: Re: Linux 2.6.25-rc5)
Submitter : Ingo Molnar <[email protected]>
Date : 2008-03-10 18:04
References : http://lkml.org/lkml/2008/3/10/171
Handled-By : Ingo Molnar <[email protected]>
Patch : http://lkml.org/lkml/2008/3/10/171


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.24,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9832

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2008-03-11 00:40:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9992
> Subject : 2.6.24-git: kmap_atomic() WARN_ON()
> Submitter : Thomas Gleixner <[email protected]>
> Date : 2008-02-07 00:58
> References : http://lkml.org/lkml/2008/2/6/451
> http://lkml.org/lkml/2007/1/14/38

Solved by b445c56815d84b9fce40707f99811bdc354458e0


> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
> Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3) kernels
> Submitter : Guennadi Liakhovetski <[email protected]>
> Date : Tue May 22 22:47:54 2007 -0400
> References : http://lkml.org/lkml/2008/3/10/340
> Handled-By : Gautham R Shenoy <[email protected]>
> Patch : http://lkml.org/lkml/2008/3/10/91

FWIW, I have this same problem.

2008-03-11 01:06:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Tuesday, 11 of March 2008, Jeff Garzik wrote:
> Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9992
> > Subject : 2.6.24-git: kmap_atomic() WARN_ON()
> > Submitter : Thomas Gleixner <[email protected]>
> > Date : 2008-02-07 00:58
> > References : http://lkml.org/lkml/2008/2/6/451
> > http://lkml.org/lkml/2007/1/14/38
>
> Solved by b445c56815d84b9fce40707f99811bdc354458e0

Thanks, Adrian has just closed it.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
> > Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3) kernels
> > Submitter : Guennadi Liakhovetski <[email protected]>
> > Date : Tue May 22 22:47:54 2007 -0400
> > References : http://lkml.org/lkml/2008/3/10/340
> > Handled-By : Gautham R Shenoy <[email protected]>
> > Patch : http://lkml.org/lkml/2008/3/10/91
>
> FWIW, I have this same problem.

Does the patch help?

2008-03-11 02:16:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24



On Mon, 10 Mar 2008, Jeff Garzik wrote:
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
> > Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3)
> > kernels
> > Submitter : Guennadi Liakhovetski <[email protected]>
> > Date : Tue May 22 22:47:54 2007 -0400
> > References : http://lkml.org/lkml/2008/3/10/340
> > Handled-By : Gautham R Shenoy <[email protected]>
> > Patch : http://lkml.org/lkml/2008/3/10/91
>
> FWIW, I have this same problem.

There's a newer patch in

http://lkml.org/lkml/2008/3/10/343

which I think should replace the 2008/3/10/91 one, but which needs
testing.

Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
problem for you?

(Andrew - I saw you say that the older patch fixed things for you, does
the newer one - on its own - also do so?)

Linus

2008-03-11 03:01:19

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Mon, 10 Mar 2008 19:15:45 -0700 (PDT) Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 10 Mar 2008, Jeff Garzik wrote:
> >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
> > > Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3)
> > > kernels
> > > Submitter : Guennadi Liakhovetski <[email protected]>
> > > Date : Tue May 22 22:47:54 2007 -0400
> > > References : http://lkml.org/lkml/2008/3/10/340
> > > Handled-By : Gautham R Shenoy <[email protected]>
> > > Patch : http://lkml.org/lkml/2008/3/10/91
> >
> > FWIW, I have this same problem.
>
> There's a newer patch in
>
> http://lkml.org/lkml/2008/3/10/343
>
> which I think should replace the 2008/3/10/91 one, but which needs
> testing.
>
> Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
> problem for you?
>
> (Andrew - I saw you say that the older patch fixed things for you, does
> the newer one - on its own - also do so?)
>

Yes, it does.

2008-03-11 08:28:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24


* Andrew Morton <[email protected]> wrote:

> > There's a newer patch in
> >
> > http://lkml.org/lkml/2008/3/10/343
> >
> > which I think should replace the 2008/3/10/91 one, but which needs
> > testing.
> >
> > Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
> > problem for you?
> >
> > (Andrew - I saw you say that the older patch fixed things for you, does
> > the newer one - on its own - also do so?)
> >
>
> Yes, it does.

great. I've picked it up too and will push it through the test-grind.

Ingo

2008-03-11 12:23:09

by Stefan Richter

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10080
> Subject : 2.6.25-rc2: ohci1394 problem
> Submitter : Thomas Meyer <[email protected]>
> Date : 2008-02-20 08:47
> References : http://lkml.org/lkml/2008/2/20/58
> Handled-By : Stefan Richter <[email protected]>

Thomas wrote on 2008-02-25:
''So i did a "make clean" and a "make" (not a make
-j3 as i use to do) and recompiled 2.6.25-rc3 and now it works again.
Case closed under strange error.''

I have closed the bugzilla bug now.
--
Stefan Richter
-=====-==--- --== -=-==
http://arcgraph.de/sr/

2008-03-11 13:05:33

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Tue, Mar 11, 2008 at 01:22:42PM +0100, Stefan Richter wrote:
> Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10080
> > Subject : 2.6.25-rc2: ohci1394 problem
> > Submitter : Thomas Meyer <[email protected]>
> > Date : 2008-02-20 08:47
> > References : http://lkml.org/lkml/2008/2/20/58
> > Handled-By : Stefan Richter <[email protected]>
>
> Thomas wrote on 2008-02-25:
> ''So i did a "make clean" and a "make" (not a make
> -j3 as i use to do) and recompiled 2.6.25-rc3 and now it works again.
> Case closed under strange error.''
>...

Although I don't think this would cause the error, it would be nice if
Thomas could verify that the -j3 did not cause the problem.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-03-11 18:58:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Linus Torvalds wrote:
> On Mon, 10 Mar 2008, Jeff Garzik wrote:
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
>>> Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3)

>> FWIW, I have this same problem.

> Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
> problem for you?

Nope. I am running baadac8b10c5ac15ce3d26b68fa266c8889b163f now, and it
still hangs on reboot or power-off.

Interestingly, if I reboot -immediately- from gdm, it succeeds. However
if I login to Fedora GNOME via gdm, and load my standard apps (1001
terminals, firefox, tbird, IRC) reboot and poweroff no longer work.

My guess was always some ACPI regression. I'll bisect today or
tomorrow. It is reproducible regression that appeared recently (circa
2.6.24 or 2.6.25-rc1 I think), so I should be able to find the culprit.

Jeff


2008-03-11 22:44:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Tuesday, 11 of March 2008, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > On Mon, 10 Mar 2008, Jeff Garzik wrote:
> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
> >>> Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3)
>
> >> FWIW, I have this same problem.
>
> > Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
> > problem for you?
>
> Nope. I am running baadac8b10c5ac15ce3d26b68fa266c8889b163f now, and it
> still hangs on reboot or power-off.
>
> Interestingly, if I reboot -immediately- from gdm, it succeeds. However
> if I login to Fedora GNOME via gdm, and load my standard apps (1001
> terminals, firefox, tbird, IRC) reboot and poweroff no longer work.
>
> My guess was always some ACPI regression. I'll bisect today or
> tomorrow. It is reproducible regression that appeared recently (circa
> 2.6.24 or 2.6.25-rc1 I think), so I should be able to find the culprit.

In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
reverting

commit fd7d1ced29e5beb88c9068801da7a362606d8273
Author: Greg Kroah-Hartman <[email protected]>
Date: Tue May 22 22:47:54 2007 -0400

PCI: make pci_bus a struct device

fixes the problem for him (this seems to be yet another reboot/poweroff IOW).

Thanks,
Rafael

2008-03-12 20:02:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24



On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:

>
> In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
> reverting
>
> commit fd7d1ced29e5beb88c9068801da7a362606d8273
> Author: Greg Kroah-Hartman <[email protected]>
> Date: Tue May 22 22:47:54 2007 -0400
>
> PCI: make pci_bus a struct device
>
> fixes the problem for him (this seems to be yet another reboot/poweroff IOW).

Ahh, I thought this was done already, but nope, my PCI pull from Greg
didn't contain the revert.

Greg? I know you must be aware of the problem, because you replied to the
email at some point. Wazzup?

Linus

2008-03-12 20:32:42

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Wed, Mar 12, 2008 at 01:01:15PM -0700, Linus Torvalds wrote:
>
>
> On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
>
> >
> > In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
> > reverting
> >
> > commit fd7d1ced29e5beb88c9068801da7a362606d8273
> > Author: Greg Kroah-Hartman <[email protected]>
> > Date: Tue May 22 22:47:54 2007 -0400
> >
> > PCI: make pci_bus a struct device
> >
> > fixes the problem for him (this seems to be yet another reboot/poweroff IOW).
>
> Ahh, I thought this was done already, but nope, my PCI pull from Greg
> didn't contain the revert.
>
> Greg? I know you must be aware of the problem, because you replied to the
> email at some point. Wazzup?

I'm still trying to figure out why his is the only machine having
problems with this. I think it's an acpi "we walk the list of pci
devices twice" type thing, but don't know yet.

I'm still working on it...

thanks,

greg k-h

2008-03-12 21:27:33

by Greg KH

[permalink] [raw]
Subject: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 01:32:05PM -0700, Greg KH wrote:
> On Wed, Mar 12, 2008 at 01:01:15PM -0700, Linus Torvalds wrote:
> >
> >
> > On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
> >
> > >
> > > In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
> > > reverting
> > >
> > > commit fd7d1ced29e5beb88c9068801da7a362606d8273
> > > Author: Greg Kroah-Hartman <[email protected]>
> > > Date: Tue May 22 22:47:54 2007 -0400
> > >
> > > PCI: make pci_bus a struct device
> > >
> > > fixes the problem for him (this seems to be yet another reboot/poweroff IOW).
> >
> > Ahh, I thought this was done already, but nope, my PCI pull from Greg
> > didn't contain the revert.
> >
> > Greg? I know you must be aware of the problem, because you replied to the
> > email at some point. Wazzup?
>
> I'm still trying to figure out why his is the only machine having
> problems with this. I think it's an acpi "we walk the list of pci
> devices twice" type thing, but don't know yet.

Ok, I think I got it. And it looks like an ACPI bug, but one that we
might have been ignoring for a long time...


In looking at the log files at boot, we see that we are using ACPI to
find the PCI devices:

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]

Followed by a lot of kobjects for pci devices being added, including
this root bus:
kobject: '0000:01:00.0' (c7c978cc): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
kobject: '0000:01:00.0' (c7c978cc): kobject_uevent_env
kobject: '0000:01:00.0' (c7c978cc): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:00.0'
kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'
kobject: '0000:01' (c7c35900): kobject_uevent_env
kobject: '0000:01' (c7c35900): fill_kobj_path: path = '/class/pci_bus/0000:01'

All is fine, until later on we decide to fallback to the "old" style of
probing:
PCI: Probing PCI hardware
kobject (c7c35900): tried to init an initialized object, something is seriously wrong.
Pid: 1, comm: swapper Not tainted 2.6.25-rc2-testpm #30
[<c01ea0e9>] kobject_init+0x89/0x90
[<c025094e>] device_initialize+0x1e/0x90
[<c025119b>] device_register+0xb/0x20
[<c01f3fd8>] pci_bus_add_devices+0x98/0x140
[<c030aff7>] ? pcibios_scan_root+0x27/0xa0
[<c03f69d0>] pci_legacy_init+0x50/0xf0
[<c03db5c2>] kernel_init+0x132/0x310
[<c010303a>] ? ret_from_fork+0x6/0x1c
[<c03db490>] ? kernel_init+0x0/0x310
[<c03db490>] ? kernel_init+0x0/0x310
[<c0103d3f>] kernel_thread_helper+0x7/0x18
=======================
kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'

This shows that we are trying to register the exact same kobject that we
had already previously registered. Not nice...

Now we have a check in the pci bus code to not register anything that we
had already registered in the past:

list_for_each_entry(dev, &bus->devices, bus_list) {
/*
* Skip already-present devices (which are on the
* global device list.)
*/
if (!list_empty(&dev->global_list))
continue;
retval = pci_bus_add_device(dev);

But, in redoing the pci list logic (coming in .26 and in -mm and -next)
I realized that this wasn't a real check, as this list is just a
"shadow" list that some types of pci probing never set up.

So that explains why the warning we get when trying to register a device
multiple times in the kobject core.

But why does this happen in the first place?

The code in arch/x86/pci/legacy.c::pci_legacy_init() checks the
pcibios_scanned flag to determine if we had already scanned the PCI bus.
Which we did in the ACPI code, right?

So, Len, shouldn't we be setting this flag in the ACPI core if we had
already scanned the pci bus there?

I can fix this problem by putting the check in the pci core in
pci_bus_add_devices() like we have done in -next, but I think that we
also need to do something in ACPI as well.

Guennadi, could you test the -next kernel tree to see if the logic there
solves this issue for you?

thanks,

greg k-h

2008-03-12 21:38:56

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 02:27:04PM -0700, Greg KH wrote:
> On Wed, Mar 12, 2008 at 01:32:05PM -0700, Greg KH wrote:
> > On Wed, Mar 12, 2008 at 01:01:15PM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > > >
> > > > In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
> > > > reverting
> > > >
> > > > commit fd7d1ced29e5beb88c9068801da7a362606d8273
> > > > Author: Greg Kroah-Hartman <[email protected]>
> > > > Date: Tue May 22 22:47:54 2007 -0400
> > > >
> > > > PCI: make pci_bus a struct device
> > > >
> > > > fixes the problem for him (this seems to be yet another reboot/poweroff IOW).
> > >
> > > Ahh, I thought this was done already, but nope, my PCI pull from Greg
> > > didn't contain the revert.
> > >
> > > Greg? I know you must be aware of the problem, because you replied to the
> > > email at some point. Wazzup?
> >
> > I'm still trying to figure out why his is the only machine having
> > problems with this. I think it's an acpi "we walk the list of pci
> > devices twice" type thing, but don't know yet.
>
> Ok, I think I got it. And it looks like an ACPI bug, but one that we
> might have been ignoring for a long time...
>
>
> In looking at the log files at boot, we see that we are using ACPI to
> find the PCI devices:
>
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
>
> Followed by a lot of kobjects for pci devices being added, including
> this root bus:
> kobject: '0000:01:00.0' (c7c978cc): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
> kobject: '0000:01:00.0' (c7c978cc): kobject_uevent_env
> kobject: '0000:01:00.0' (c7c978cc): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:00.0'
> kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'
> kobject: '0000:01' (c7c35900): kobject_uevent_env
> kobject: '0000:01' (c7c35900): fill_kobj_path: path = '/class/pci_bus/0000:01'
>
> All is fine, until later on we decide to fallback to the "old" style of
> probing:
> PCI: Probing PCI hardware
> kobject (c7c35900): tried to init an initialized object, something is seriously wrong.
> Pid: 1, comm: swapper Not tainted 2.6.25-rc2-testpm #30
> [<c01ea0e9>] kobject_init+0x89/0x90
> [<c025094e>] device_initialize+0x1e/0x90
> [<c025119b>] device_register+0xb/0x20
> [<c01f3fd8>] pci_bus_add_devices+0x98/0x140
> [<c030aff7>] ? pcibios_scan_root+0x27/0xa0
> [<c03f69d0>] pci_legacy_init+0x50/0xf0
> [<c03db5c2>] kernel_init+0x132/0x310
> [<c010303a>] ? ret_from_fork+0x6/0x1c
> [<c03db490>] ? kernel_init+0x0/0x310
> [<c03db490>] ? kernel_init+0x0/0x310
> [<c0103d3f>] kernel_thread_helper+0x7/0x18
> =======================
> kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'
>
> This shows that we are trying to register the exact same kobject that we
> had already previously registered. Not nice...
>
> Now we have a check in the pci bus code to not register anything that we
> had already registered in the past:
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> /*
> * Skip already-present devices (which are on the
> * global device list.)
> */
> if (!list_empty(&dev->global_list))
> continue;
> retval = pci_bus_add_device(dev);
>
> But, in redoing the pci list logic (coming in .26 and in -mm and -next)
> I realized that this wasn't a real check, as this list is just a
> "shadow" list that some types of pci probing never set up.
>
> So that explains why the warning we get when trying to register a device
> multiple times in the kobject core.
>
> But why does this happen in the first place?
>
> The code in arch/x86/pci/legacy.c::pci_legacy_init() checks the
> pcibios_scanned flag to determine if we had already scanned the PCI bus.
> Which we did in the ACPI code, right?
>
> So, Len, shouldn't we be setting this flag in the ACPI core if we had
> already scanned the pci bus there?
>
> I can fix this problem by putting the check in the pci core in
> pci_bus_add_devices() like we have done in -next, but I think that we
> also need to do something in ACPI as well.
>
> Guennadi, could you test the -next kernel tree to see if the logic there
> solves this issue for you?

Actually, here's a simple patch from -next that should test this logic
for you. Can you let me know if this solves the start up WARNING dump
for you?

thanks,

greg k-h

------------

Date: Thu, 14 Feb 2008 14:56:56 -0800
From: Greg Kroah-Hartman <[email protected]>
Subject: PCI: add is_added flag to struct pci_dev

This lets us check if the device is really added to the driver core or
not, which is what we need when walking some of the bus lists. The flag
is there in anticipation of getting rid of the other PCI device list,
which is what we used to check in this situation.

Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/platforms/pseries/pci_dlpar.c | 7 ++-----
drivers/pci/bus.c | 11 ++++-------
drivers/pci/probe.c | 2 +-
drivers/pci/remove.c | 6 ++----
include/linux/pci.h | 1 +
5 files changed, 10 insertions(+), 17 deletions(-)

--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -88,11 +88,8 @@ pcibios_fixup_new_pci_devices(struct pci
struct pci_dev *dev;

list_for_each_entry(dev, &bus->devices, bus_list) {
- /*
- * Skip already-present devices (which are on the
- * global device list.)
- */
- if (list_empty(&dev->global_list)) {
+ /* Skip already-added devices */
+ if (!dev->is_added) {
int i;

/* Fill device archdata and setup iommu table */
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -84,6 +84,7 @@ int pci_bus_add_device(struct pci_dev *d
if (retval)
return retval;

+ dev->is_added = 1;
down_write(&pci_bus_sem);
list_add_tail(&dev->global_list, &pci_devices);
up_write(&pci_bus_sem);
@@ -112,11 +113,8 @@ void pci_bus_add_devices(struct pci_bus
int retval;

list_for_each_entry(dev, &bus->devices, bus_list) {
- /*
- * Skip already-present devices (which are on the
- * global device list.)
- */
- if (!list_empty(&dev->global_list))
+ /* Skip already-added devices */
+ if (dev->is_added)
continue;
retval = pci_bus_add_device(dev);
if (retval)
@@ -124,8 +122,7 @@ void pci_bus_add_devices(struct pci_bus
}

list_for_each_entry(dev, &bus->devices, bus_list) {
-
- BUG_ON(list_empty(&dev->global_list));
+ BUG_ON(!dev->is_added);

/*
* If there is an unattached subordinate bus, attach
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -984,7 +984,7 @@ EXPORT_SYMBOL(pci_scan_single_device);
*
* Scan a PCI slot on the specified PCI bus for devices, adding
* discovered devices to the @bus->devices list. New devices
- * will have an empty dev->global_list head.
+ * will not have is_added set.
*/
int pci_scan_slot(struct pci_bus *bus, int devfn)
{
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -18,13 +18,11 @@ static void pci_free_resources(struct pc

static void pci_stop_dev(struct pci_dev *dev)
{
- if (!dev->global_list.next)
- return;
-
- if (!list_empty(&dev->global_list)) {
+ if (dev->is_added) {
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);
device_unregister(&dev->dev);
+ dev->is_added = 0;
down_write(&pci_bus_sem);
list_del(&dev->global_list);
dev->global_list.next = dev->global_list.prev = NULL;
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -181,6 +181,7 @@ struct pci_dev {
unsigned int transparent:1; /* Transparent PCI bridge */
unsigned int multifunction:1;/* Part of multi-function device */
/* keep track of device state */
+ unsigned int is_added:1;
unsigned int is_busmaster:1; /* device is busmaster */
unsigned int no_msi:1; /* device may not use msi */
unsigned int no_d1d2:1; /* only allow d0 or d3 */

2008-03-12 21:43:38

by Len Brown

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wednesday 12 March 2008, Greg KH wrote:
> On Wed, Mar 12, 2008 at 01:32:05PM -0700, Greg KH wrote:
> > On Wed, Mar 12, 2008 at 01:01:15PM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > > >
> > > > In http://bugzilla.kernel.org/show_bug.cgi?id=10123 Guennadi says that
> > > > reverting
> > > >
> > > > commit fd7d1ced29e5beb88c9068801da7a362606d8273
> > > > Author: Greg Kroah-Hartman <[email protected]>
> > > > Date: Tue May 22 22:47:54 2007 -0400
> > > >
> > > > PCI: make pci_bus a struct device
> > > >
> > > > fixes the problem for him (this seems to be yet another reboot/poweroff IOW).
> > >
> > > Ahh, I thought this was done already, but nope, my PCI pull from Greg
> > > didn't contain the revert.
> > >
> > > Greg? I know you must be aware of the problem, because you replied to the
> > > email at some point. Wazzup?
> >
> > I'm still trying to figure out why his is the only machine having
> > problems with this. I think it's an acpi "we walk the list of pci
> > devices twice" type thing, but don't know yet.
>
> Ok, I think I got it. And it looks like an ACPI bug, but one that we
> might have been ignoring for a long time...
>
>
> In looking at the log files at boot, we see that we are using ACPI to
> find the PCI devices:
>
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]

This is just ACPI telling us that it found a PCI Interrupt Routing table.
We load it and have it available for reference later when the PCI
devices request their IRQs. ie. it responds to PCI probing,
it doesn't cause PCI probing.

> Followed by a lot of kobjects for pci devices being added, including
> this root bus:
> kobject: '0000:01:00.0' (c7c978cc): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
> kobject: '0000:01:00.0' (c7c978cc): kobject_uevent_env
> kobject: '0000:01:00.0' (c7c978cc): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:00.0'
> kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'
> kobject: '0000:01' (c7c35900): kobject_uevent_env
> kobject: '0000:01' (c7c35900): fill_kobj_path: path = '/class/pci_bus/0000:01'

I don't think ACPI is doing this directly.
More likely that PNP is doing it via PNPACPI.
(try booting with pnpacpi=off)

> All is fine, until later on we decide to fallback to the "old" style of
> probing:
> PCI: Probing PCI hardware

Why do we fall back?
I don't see this line at all on my test box.

-Len

> kobject (c7c35900): tried to init an initialized object, something is seriously wrong.
> Pid: 1, comm: swapper Not tainted 2.6.25-rc2-testpm #30
> [<c01ea0e9>] kobject_init+0x89/0x90
> [<c025094e>] device_initialize+0x1e/0x90
> [<c025119b>] device_register+0xb/0x20
> [<c01f3fd8>] pci_bus_add_devices+0x98/0x140
> [<c030aff7>] ? pcibios_scan_root+0x27/0xa0
> [<c03f69d0>] pci_legacy_init+0x50/0xf0
> [<c03db5c2>] kernel_init+0x132/0x310
> [<c010303a>] ? ret_from_fork+0x6/0x1c
> [<c03db490>] ? kernel_init+0x0/0x310
> [<c03db490>] ? kernel_init+0x0/0x310
> [<c0103d3f>] kernel_thread_helper+0x7/0x18
> =======================
> kobject: '0000:01' (c7c35900): kobject_add_internal: parent: 'pci_bus', set: 'devices'
>
> This shows that we are trying to register the exact same kobject that we
> had already previously registered. Not nice...
>
> Now we have a check in the pci bus code to not register anything that we
> had already registered in the past:
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> /*
> * Skip already-present devices (which are on the
> * global device list.)
> */
> if (!list_empty(&dev->global_list))
> continue;
> retval = pci_bus_add_device(dev);
>
> But, in redoing the pci list logic (coming in .26 and in -mm and -next)
> I realized that this wasn't a real check, as this list is just a
> "shadow" list that some types of pci probing never set up.
>
> So that explains why the warning we get when trying to register a device
> multiple times in the kobject core.
>
> But why does this happen in the first place?
>
> The code in arch/x86/pci/legacy.c::pci_legacy_init() checks the
> pcibios_scanned flag to determine if we had already scanned the PCI bus.
> Which we did in the ACPI code, right?
>
> So, Len, shouldn't we be setting this flag in the ACPI core if we had
> already scanned the pci bus there?
>
> I can fix this problem by putting the check in the pci core in
> pci_bus_add_devices() like we have done in -next, but I think that we
> also need to do something in ACPI as well.
>
> Guennadi, could you test the -next kernel tree to see if the logic there
> solves this issue for you?
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-03-12 22:12:54

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10191
> Subject : Treason uncloaked spams syslog with latest git
> Submitter : Thomas Gleixner <[email protected]>
> Date : 2008-03-06 05:47
> References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/2444.html

Dunno if you got this, but this seems to be fixed:
http://lkml.org/lkml/2008/3/11/435

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> Subject : INFO: task mount:11202 blocked for more than 120 seconds
> Submitter : Christian Kujau <[email protected]>
> Date : 2008-03-07 21:32
> References : http://lkml.org/lkml/2008/3/7/308
> http://lkml.org/lkml/2008/3/9/186
> Handled-By : David Chinner <[email protected]>

FWIW, it has been reported by Chr too: http://lkml.org/lkml/2008/3/12/313
And David could be taken out of the loop, as it seems dm-crypt related
([email protected] is already notified), not XFS related.


Thanks for maintaining this list,
Christian.
--
BOFH excuse #348:

We're on Token Ring, and it looks like the token got loose.

2008-03-12 22:21:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Wed, 12 Mar 2008, Greg KH wrote:
>
> Ok, I think I got it. And it looks like an ACPI bug, but one that we
> might have been ignoring for a long time...

I still think that the fact that it regressed in that PCI patch means that
there is simply something wrong with the patch. At the very least that
patch changed behaviour, which was *not* what it was claiming it was
doing.

I do think it's triggered by the "acpi=noirq" setting: that means that
ACPI *won't* disable the legacy scan. Now, admittedly that's a really odd
thing to do, and I think it's really strange how pci_acpi_init() does that

pcibios_scanned++;

in a place where it is not actually scanning the bus, so I do agree that
ACPI is doing something really odd here, but the fact is, this code all
used to work.

Can we please just fix the regression caused by that offending patch? In
other words: why did that patch change behaviour AT ALL?

Quite frankly, we're too late in the game to say "this exposed some other
long-time bug". That *particular* patch needs to be fixed, or reverted. We
can look at changing ACPI in 2.6.26, not in -rc6.

Linus

2008-03-12 22:27:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Wed, 12 Mar 2008, Greg KH wrote:
>
> Actually, here's a simple patch from -next that should test this logic
> for you. Can you let me know if this solves the start up WARNING dump
> for you?

This patch looks bogus.

Why do you introduce a "dev->is_added" field that apparently has to match
the old "list_empty(&dev->global_list)" 1:1 anyway?

In other words: when is it *ever* permissible for "is_added" to have a
different value from the "list_empty(..)" logic? And if they must always
match (and it looks like they have to, since you set and clear the flag
exactly when you add/remove it from the list), then what exactly is this
supposed to fix?

Linus

2008-03-12 22:35:01

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 03:20:51PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > Ok, I think I got it. And it looks like an ACPI bug, but one that we
> > might have been ignoring for a long time...
>
> I still think that the fact that it regressed in that PCI patch means that
> there is simply something wrong with the patch. At the very least that
> patch changed behaviour, which was *not* what it was claiming it was
> doing.
>
> I do think it's triggered by the "acpi=noirq" setting: that means that
> ACPI *won't* disable the legacy scan. Now, admittedly that's a really odd
> thing to do, and I think it's really strange how pci_acpi_init() does that
>
> pcibios_scanned++;
>
> in a place where it is not actually scanning the bus, so I do agree that
> ACPI is doing something really odd here, but the fact is, this code all
> used to work.
>
> Can we please just fix the regression caused by that offending patch? In
> other words: why did that patch change behaviour AT ALL?
>
> Quite frankly, we're too late in the game to say "this exposed some other
> long-time bug". That *particular* patch needs to be fixed, or reverted. We
> can look at changing ACPI in 2.6.26, not in -rc6.

What happend in .25-rc was that we now catch these kinds of problems
(watching for duplicate kobjects to be registered and such.) So this
might have always been happening, but no warning was ever produced.

I can revert the "catch this kind of thing" patch, but I don't think
that's the real solution here :)

The reason we aren't shutting down is also due to the way kobjects now
work. If you don't clean up properly, they linger around and something
on the shutdown path (I haven't figured that out yet) doesn't want to
stop the machine.

We have seen this in a number of places, all catching real problems in
subsystems where they were grabbing 2 references to an object, and then
only releasing one when finished (cpufreq is an example of this.) When
those are fixed, the shutdown problem goes away.

So in this case, we are registering a kobject twice, which increases the
reference count, and then we never clean it up on shutdown properly as
we only drop it once. Hence the shutdown problem.

So, we need to not register the device twice, my patch should fix that
problem. Or we can add the pcibios_scanned++ call somewhere in PNP or
ACPI to prevent us from ever attempting to try to register the device
twice. Either way should fix Guennadi's issue.

thanks,

greg k-h

2008-03-12 22:54:50

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 03:25:41PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > Actually, here's a simple patch from -next that should test this logic
> > for you. Can you let me know if this solves the start up WARNING dump
> > for you?
>
> This patch looks bogus.
>
> Why do you introduce a "dev->is_added" field that apparently has to match
> the old "list_empty(&dev->global_list)" 1:1 anyway?
>
> In other words: when is it *ever* permissible for "is_added" to have a
> different value from the "list_empty(..)" logic? And if they must always
> match (and it looks like they have to, since you set and clear the flag
> exactly when you add/remove it from the list), then what exactly is this
> supposed to fix?

In the patch series in -next, it is supposed to replace the list_empty()
logic exactly, as that list goes away in the next patch in the series.

So yes, it is not a "fix" per-say, but would be nice to see if it solves
this issue in some way.

All I can think is that somehow this pci device for the root hub isn't
added to that extra list (as that is only done in the pcibios logic) and
so it isn't set.

I can't get a box here to produce both of those PCI: messages myself,
and neither can Len, so something is really odd here. And that has
nothing to do with the pci_bus rework, that is just showing the problem
more accuratly now. Even if it were to be reverted, the root problem
would still be present.

thanks,

greg k-h

2008-03-12 23:02:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Wednesday, 12 of March 2008, Christian Kujau wrote:
> On Tue, 11 Mar 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10191
> > Subject : Treason uncloaked spams syslog with latest git
> > Submitter : Thomas Gleixner <[email protected]>
> > Date : 2008-03-06 05:47
> > References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/2444.html
>
> Dunno if you got this, but this seems to be fixed:
> http://lkml.org/lkml/2008/3/11/435

Yes, I noticed the patch and updated the Bugzilla entry with a link to it,
but it will be closed when the patch appears in the Linus' tree.

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject : INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter : Christian Kujau <[email protected]>
> > Date : 2008-03-07 21:32
> > References : http://lkml.org/lkml/2008/3/7/308
> > http://lkml.org/lkml/2008/3/9/186
> > Handled-By : David Chinner <[email protected]>
>
> FWIW, it has been reported by Chr too: http://lkml.org/lkml/2008/3/12/313

Yes.

> And David could be taken out of the loop, as it seems dm-crypt related
> ([email protected] is already notified), not XFS related.

The entry has already been updated and Herbert is on its CC list.

Thanks,
Rafael

2008-03-12 23:04:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Wed, 12 Mar 2008, Greg KH wrote:
>
> What happend in .25-rc was that we now catch these kinds of problems
> (watching for duplicate kobjects to be registered and such.) So this
> might have always been happening, but no warning was ever produced.

It's not the warning that worries me. It's the apparent oops (keyboard
leds blinking?) at shutdown/poweroff!

> The reason we aren't shutting down is also due to the way kobjects now
> work. If you don't clean up properly, they linger around and something
> on the shutdown path (I haven't figured that out yet) doesn't want to
> stop the machine.

.. and that's my issue! We're too late in the game to try to figure things
out and leave things hanging. The patch broke something, it needs to be
fixed or reverted. It's been going on too long.

I think it should have been reverted probably two weeks ago already. We
can re-apply it early in the 2.6.26 series, and then try to fix it right.

Since there is at least a patch worth trying now, I'll hold off reverting
it and wait for Guennardi to test the patch, but the fact is, we shouldn't
have a known-broken kernel for several weeks, when there is a known fix
for it in reverting a single commit!

We have _way_ too many regressions as it is. Regressions are bad. Ones
that have known causes and haven't been fixed in three weeks are
unacceptable.

Linus

2008-03-12 23:12:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Wed, 12 Mar 2008, Greg KH wrote:
>
> I can't get a box here to produce both of those PCI: messages myself,
> and neither can Len, so something is really odd here.

You can't?

I can trivially reproduce the warnings on my laptop by just adding
"acpi=noirq" to the command line in grub.

PCI: Probing PCI hardware
kobject (ffff81007e08d9c8): tried to init an initialized object, something is seriously wrong.
Pid: 1, comm: swapper Not tainted 2.6.25-rc3-00081-g7704a8b #29

Call Trace:
[<ffffffff8054f921>] __down_read+0x12/0x93
[<ffffffff80313d60>] kobject_init+0x39/0x82
[<ffffffff803956d6>] device_initialize+0x25/0xa4
[<ffffffff80395f83>] device_register+0x9/0x12
[<ffffffff80322cdc>] pci_bus_add_devices+0xe2/0x13e
[<ffffffff807491be>] pci_legacy_init+0x66/0xf9
[<ffffffff8039763e>] bus_register+0x15b/0x221
[<ffffffff8072a6ba>] kernel_init+0x14a/0x2b4
[<ffffffff8020be38>] child_rip+0xa/0x12
[<ffffffff8072a570>] kernel_init+0x0/0x2b4
[<ffffffff8020be2e>] child_rip+0x0/0x12

did you try just adding that simple command line thing?

Linus

2008-03-12 23:17:11

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 04:02:57PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > What happend in .25-rc was that we now catch these kinds of problems
> > (watching for duplicate kobjects to be registered and such.) So this
> > might have always been happening, but no warning was ever produced.
>
> It's not the warning that worries me. It's the apparent oops (keyboard
> leds blinking?) at shutdown/poweroff!

It oopses at shutdown? I thought this was originally reported as a
"will not power off" which for a while was attributed to the cpufreq fix
that went into -rc2 or -rc3.

I didn't realize there was an oops, sorry.

> > The reason we aren't shutting down is also due to the way kobjects now
> > work. If you don't clean up properly, they linger around and something
> > on the shutdown path (I haven't figured that out yet) doesn't want to
> > stop the machine.
>
> .. and that's my issue! We're too late in the game to try to figure things
> out and leave things hanging. The patch broke something, it needs to be
> fixed or reverted. It's been going on too long.
>
> I think it should have been reverted probably two weeks ago already. We
> can re-apply it early in the 2.6.26 series, and then try to fix it right.
>
> Since there is at least a patch worth trying now, I'll hold off reverting
> it and wait for Guennardi to test the patch, but the fact is, we shouldn't
> have a known-broken kernel for several weeks, when there is a known fix
> for it in reverting a single commit!
>
> We have _way_ too many regressions as it is. Regressions are bad. Ones
> that have known causes and haven't been fixed in three weeks are
> unacceptable.

Sorry, I thought this was just a warning at boot time.

It would be interesting to see if reverting the pci_bus patch did
anything about the fact that we register the root PCI bus through two
different methods.

thanks,

greg k-h

2008-03-12 23:17:48

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, 12 Mar 2008, Linus Torvalds wrote:

> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > What happend in .25-rc was that we now catch these kinds of problems
> > (watching for duplicate kobjects to be registered and such.) So this
> > might have always been happening, but no warning was ever produced.
>
> It's not the warning that worries me. It's the apparent oops (keyboard
> leds blinking?) at shutdown/poweroff!

No, no oops, no blinking LEDs. The machine just stays there after syncing
SCSI disks. I can still call sysrqs, and I've captured them with the
serial console - see complete dumps here:
http://bugzilla.kernel.org/attachment.cgi?id=15057

> Since there is at least a patch worth trying now, I'll hold off reverting
> it and wait for Guennardi to test the patch, but the fact is, we shouldn't
> have a known-broken kernel for several weeks, when there is a known fix
> for it in reverting a single commit!

I'll test it in about 12 hours.

Thanks
Guennadi
---
Guennadi Liakhovetski

2008-03-12 23:32:57

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, 12 Mar 2008, Greg KH wrote:

> It oopses at shutdown? I thought this was originally reported as a
> "will not power off" which for a while was attributed to the cpufreq fix
> that went into -rc2 or -rc3.

As I already replied to Linus, no, it doesn't.

> It would be interesting to see if reverting the pci_bus patch did
> anything about the fact that we register the root PCI bus through two
> different methods.

You mean this: http://marc.info/?l=linux-kernel&m=120483340622706&w=2

Thanks
Guennadi
---
Guennadi Liakhovetski

2008-03-12 23:36:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > It's not the warning that worries me. It's the apparent oops (keyboard
> > leds blinking?) at shutdown/poweroff!
>
> It oopses at shutdown? I thought this was originally reported as a
> "will not power off" which for a while was attributed to the cpufreq fix
> that went into -rc2 or -rc3.
>
> I didn't realize there was an oops, sorry.

I'm not at all sure there is an oops - in fact, I'd have expected it to
show up on the serial console if there was one.

The bug report says that the keyboard leds blink, which is *sometimes* due
to having the led oops blinking code enabled, but hey, no actual oops was
ever shown, and sometimes a blink is just a blink.

Did you see the full dmesg from syslog? That one has not just the
warnings, but also sysrq output at the point it hangs. The suspicious
thing seems to be

halt R running 0 3291 3289
c013d75a 7488242e 00000180 75b8ffa0 c6efbddc c6efbddc c0426d00 c6efbdf0
c0125da4 c6efbe10 c0125ed1 0000000a 00000001 c0426d00 c0426d00 00000046
b7efcff4 c6efbe20 00000046 c0426d00 c0426d00 c6efbe2c c0126068 c110c060
Call Trace:
[<c013d75a>] ? tick_program_event+0x4a/0x80
[<c0125da4>] ? _local_bh_enable+0x24/0x80
[<c0125ed1>] ? __do_softirq+0xd1/0xf0
[<c0126068>] ? irq_exit+0x28/0x90
[<c0313079>] ? preempt_schedule_irq+0x49/0x70
[<c0103c28>] ? apic_timer_interrupt+0x28/0x30
[<c0251c58>] ? device_shutdown+0x48/0x70
[<c012e618>] ? kernel_shutdown_prepare+0x28/0x30
[<c012e630>] ? kernel_power_off+0x10/0x40
...

which makes me suspect we're in some endless loop in device_shutdown(),
but that's just a random guess (it seems to be running on the othe CPU:
CPU0 is in idle - and when that happens the stack trace is really not
very reliable at all, so take all that with a huge pinch of salt!).

> Sorry, I thought this was just a warning at boot time.

If it had been just the warning, I would ignore it as a good thing to be
cleaned up later. But no, the original problem was the inability to halt
and reboot, and the bugzilla entry says

It also introduces these two errors:
^^^^

with underlining by me. So the warnings in themselves are just an
interesting coincidence (and probably related to the cause, of course).

Linus

2008-03-12 23:37:35

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Thu, Mar 13, 2008 at 12:32:48AM +0100, Guennadi Liakhovetski wrote:
> On Wed, 12 Mar 2008, Greg KH wrote:
>
> > It oopses at shutdown? I thought this was originally reported as a
> > "will not power off" which for a while was attributed to the cpufreq fix
> > that went into -rc2 or -rc3.
>
> As I already replied to Linus, no, it doesn't.
>
> > It would be interesting to see if reverting the pci_bus patch did
> > anything about the fact that we register the root PCI bus through two
> > different methods.
>
> You mean this: http://marc.info/?l=linux-kernel&m=120483340622706&w=2

Yes, the warnings go away as there is no more struct device to register,
but the big "PCI:" messages from the syslog at startup with the patch
reverted is what I am curious about.

I'll test more in a few hours, have to go herd the kids off to piano
lessons...

thanks,

greg k-h

2008-03-12 23:37:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)



On Thu, 13 Mar 2008, Guennadi Liakhovetski wrote:
>
> No, no oops, no blinking LEDs.

Oh, the original report says:

"Problem Description: Power off / reboot blink keyboard LEDs and leave
the system at .."

which is why I thought it might be a hidden oops.

But that must have been some unrelated and misleading red herring.

Linus

2008-03-12 23:47:20

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, 12 Mar 2008, Linus Torvalds wrote:

> On Thu, 13 Mar 2008, Guennadi Liakhovetski wrote:
> >
> > No, no oops, no blinking LEDs.
>
> Oh, the original report says:
>
> "Problem Description: Power off / reboot blink keyboard LEDs and leave
> the system at .."
>
> which is why I thought it might be a hidden oops.
>
> But that must have been some unrelated and misleading red herring.

Ah, ok, I see now. No, no herring here. The LEDs just blink _once_ as they
always do before shutdown (on this machine?). But the machine stays on.

Thanks
Guennadi
---
Guennadi Liakhovetski

2008-03-13 04:49:15

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 04:09:17PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > I can't get a box here to produce both of those PCI: messages myself,
> > and neither can Len, so something is really odd here.
>
> You can't?
>
> I can trivially reproduce the warnings on my laptop by just adding
> "acpi=noirq" to the command line in grub.
>
> PCI: Probing PCI hardware
> kobject (ffff81007e08d9c8): tried to init an initialized object, something is seriously wrong.
> Pid: 1, comm: swapper Not tainted 2.6.25-rc3-00081-g7704a8b #29
>
> Call Trace:
> [<ffffffff8054f921>] __down_read+0x12/0x93
> [<ffffffff80313d60>] kobject_init+0x39/0x82
> [<ffffffff803956d6>] device_initialize+0x25/0xa4
> [<ffffffff80395f83>] device_register+0x9/0x12
> [<ffffffff80322cdc>] pci_bus_add_devices+0xe2/0x13e
> [<ffffffff807491be>] pci_legacy_init+0x66/0xf9
> [<ffffffff8039763e>] bus_register+0x15b/0x221
> [<ffffffff8072a6ba>] kernel_init+0x14a/0x2b4
> [<ffffffff8020be38>] child_rip+0xa/0x12
> [<ffffffff8072a570>] kernel_init+0x0/0x2b4
> [<ffffffff8020be2e>] child_rip+0x0/0x12
>
> did you try just adding that simple command line thing?

This wasn't doing anything on my laptop, but it does cause the warning
on my mac mini, thanks for showing how to trigger it.

And that's with the patch I posted, so that's no good.

Let me see if I can figure it out now that I can reproduce it...

thanks,

greg k-h

2008-03-13 05:04:28

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Tue, Mar 11, 2008 at 12:14:52AM +0100, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> Subject : INFO: task mount:11202 blocked for more than 120 seconds
> Submitter : Christian Kujau <[email protected]>
> Date : 2008-03-07 21:32
> References : http://lkml.org/lkml/2008/3/7/308
> http://lkml.org/lkml/2008/3/9/186
> Handled-By : David Chinner <[email protected]>

Rafael, this looks to be something related to dm-crypt, not XFS. Can you
reassign it appropriately?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-13 05:45:45

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 04:09:17PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 12 Mar 2008, Greg KH wrote:
> >
> > I can't get a box here to produce both of those PCI: messages myself,
> > and neither can Len, so something is really odd here.

Ok, stupid me, this was my fault. I was assuming that pci busses would
never be registered multiple times with the pci core. Obviously this
isn't true. The previous patch I proposed was only paying attention to
the PCI devices, and that logic is just fine (it's already protected
when it is attempted to be registered multiple times.)

So, the patch below fixes the issue for me, and reboot seems to work as
well.

Guennadi, can you test this out on your machine?

thanks for your patience,

greg k-h

From: Greg Kroah-Hartman <[email protected]>
Subject: PCI: fix issue with busses registering multiple times in sysfs

PCI busses can be registered multiple times, so we need to detect if we
have registered our bus structure in sysfs already. If so, don't do it
again.

Thanks to Guennadi Liakhovetski <[email protected]> for reporting
the problem, and to Linus for poking me to get me to believe that it was
a real problem.

Cc: Linus Torvalds <[email protected]>
Cc: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/pci/bus.c | 6 +++++-
include/linux/pci.h | 1 +
2 files changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -143,14 +143,18 @@ void pci_bus_add_devices(struct pci_bus
/* register the bus with sysfs as the parent is now
* properly registered. */
child_bus = dev->subordinate;
+ if (child_bus->is_added)
+ continue;
child_bus->dev.parent = child_bus->bridge;
retval = device_register(&child_bus->dev);
if (retval)
dev_err(&dev->dev, "Error registering pci_bus,"
" continuing...\n");
- else
+ else {
+ child_bus->is_added = 1;
retval = device_create_file(&child_bus->dev,
&dev_attr_cpuaffinity);
+ }
if (retval)
dev_err(&dev->dev, "Error creating cpuaffinity"
" file, continuing...\n");
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -278,6 +278,7 @@ struct pci_bus {
struct device dev;
struct bin_attribute *legacy_io; /* legacy I/O for this bus */
struct bin_attribute *legacy_mem; /* legacy mem */
+ unsigned int is_added:1;
};

#define pci_bus_b(n) list_entry(n, struct pci_bus, node)

2008-03-13 06:25:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, Mar 12, 2008 at 10:44 PM, Greg KH <[email protected]> wrote:
> On Wed, Mar 12, 2008 at 04:09:17PM -0700, Linus Torvalds wrote:
> >
> >
>
> > On Wed, 12 Mar 2008, Greg KH wrote:
> > >
> > > I can't get a box here to produce both of those PCI: messages myself,
> > > and neither can Len, so something is really odd here.
>
> Ok, stupid me, this was my fault. I was assuming that pci busses would
> never be registered multiple times with the pci core. Obviously this
> isn't true. The previous patch I proposed was only paying attention to
> the PCI devices, and that logic is just fine (it's already protected
> when it is attempted to be registered multiple times.)
>
> So, the patch below fixes the issue for me, and reboot seems to work as
> well.
>
> Guennadi, can you test this out on your machine?
>
> thanks for your patience,
>
> greg k-h
>
>
> From: Greg Kroah-Hartman <[email protected]>
> Subject: PCI: fix issue with busses registering multiple times in sysfs
>
> PCI busses can be registered multiple times, so we need to detect if we
> have registered our bus structure in sysfs already. If so, don't do it
> again.
>
> Thanks to Guennadi Liakhovetski <[email protected]> for reporting
> the problem, and to Linus for poking me to get me to believe that it was
> a real problem.
>
> Cc: Linus Torvalds <[email protected]>
> Cc: Guennadi Liakhovetski <[email protected]>
>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>

wonder if

http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commitdiff;h=fff07473e243989a2739b9d802d63e051ade7188

helps.

YH

2008-03-13 10:06:17

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, 12 Mar 2008, Greg KH wrote:

> On Wed, Mar 12, 2008 at 04:09:17PM -0700, Linus Torvalds wrote:
> >
> >
> > On Wed, 12 Mar 2008, Greg KH wrote:
> > >
> > > I can't get a box here to produce both of those PCI: messages myself,
> > > and neither can Len, so something is really odd here.
>
> Ok, stupid me, this was my fault. I was assuming that pci busses would
> never be registered multiple times with the pci core. Obviously this
> isn't true. The previous patch I proposed was only paying attention to
> the PCI devices, and that logic is just fine (it's already protected
> when it is attempted to be registered multiple times.)
>
> So, the patch below fixes the issue for me, and reboot seems to work as
> well.
>
> Guennadi, can you test this out on your machine?

Yes, it fixes all _3_ startup warnings and lets the machine reboot and
power off again. 3 warnings are the 2 reported as a regression from
2.6.24, and one present also under 2.6.24:

PCI: Probing PCI hardware
sysfs: duplicate filename 'bridge' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
Pid: 1, comm: swapper Not tainted 2.6.24-hires-nohz #2
[<c010541a>] show_trace_log_lvl+0x1a/0x30
[<c0105ed2>] show_trace+0x12/0x20
[<c010675e>] dump_stack+0x6e/0x80
[<c01b246d>] sysfs_add_one+0x9d/0xe0
[<c01b328b>] sysfs_create_link+0x8b/0x130
[<c01f9f14>] pci_bus_add_devices+0x94/0x120
[<c03f3920>] pci_legacy_init+0x50/0xf0
[<c03d95f2>] kernel_init+0x142/0x320
[<c0104fe3>] kernel_thread_helper+0x7/0x14
=======================
pci 0000:00:01.0: Error creating sysfs bridge symlink, continuing...

So, well done! I was going to disturb you with that one after 2.6.25, now
I don't have to any more, unless we want it fixed in 2.6.24-stable.

> thanks for your patience,

always at your disposal:-)

Thanks
Guennadi
---
Guennadi Liakhovetski

2008-03-13 10:07:51

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Wed, 12 Mar 2008, Yinghai Lu wrote:

> wonder if
>
> http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commitdiff;h=fff07473e243989a2739b9d802d63e051ade7188
>
> helps.

Firstly, it doesn't apply to -rc4-ish, secondly, no, after applying it
manually, it didn't improve anything.

Thanks
Guennadi
---
Guennadi Liakhovetski

2008-03-13 15:32:54

by Greg KH

[permalink] [raw]
Subject: Re: pcibios_scanned needs to be set in ACPI? (was Re: 2.6.25-rc5: Reported regressions from 2.6.24)

On Thu, Mar 13, 2008 at 11:06:10AM +0100, Guennadi Liakhovetski wrote:
> On Wed, 12 Mar 2008, Greg KH wrote:
>
> > On Wed, Mar 12, 2008 at 04:09:17PM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Wed, 12 Mar 2008, Greg KH wrote:
> > > >
> > > > I can't get a box here to produce both of those PCI: messages myself,
> > > > and neither can Len, so something is really odd here.
> >
> > Ok, stupid me, this was my fault. I was assuming that pci busses would
> > never be registered multiple times with the pci core. Obviously this
> > isn't true. The previous patch I proposed was only paying attention to
> > the PCI devices, and that logic is just fine (it's already protected
> > when it is attempted to be registered multiple times.)
> >
> > So, the patch below fixes the issue for me, and reboot seems to work as
> > well.
> >
> > Guennadi, can you test this out on your machine?
>
> Yes, it fixes all _3_ startup warnings and lets the machine reboot and
> power off again. 3 warnings are the 2 reported as a regression from
> 2.6.24, and one present also under 2.6.24:
>
> PCI: Probing PCI hardware
> sysfs: duplicate filename 'bridge' can not be created

Yes, I noticed that if you were having this problem, .24 would also be
complaining to you about creating sysfs links. I'll make up a .24 patch
for -stable after this.

Kay just pointed out that I can use a struct device field instead of
creating my own in the bus device, so I'll simplify the patch and then
send it to Linus in a bit.

Thanks so much for testing quickly and letting me know.

greg k-h

2008-03-13 21:13:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

On Thursday, 13 of March 2008, David Chinner wrote:
> On Tue, Mar 11, 2008 at 12:14:52AM +0100, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject : INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter : Christian Kujau <[email protected]>
> > Date : 2008-03-07 21:32
> > References : http://lkml.org/lkml/2008/3/7/308
> > http://lkml.org/lkml/2008/3/9/186
> > Handled-By : David Chinner <[email protected]>
>
> Rafael, this looks to be something related to dm-crypt, not XFS. Can you
> reassign it appropriately?

Well, it's already been assigned to IO/Storage->LVM2/DM, but I've reassigned
it to Herbert Xu.

Thanks,
Rafael

2008-03-17 19:22:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Rafael J. Wysocki wrote:
> On Tuesday, 11 of March 2008, Jeff Garzik wrote:
>> Linus Torvalds wrote:
>>> On Mon, 10 Mar 2008, Jeff Garzik wrote:
>>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10123
>>>>> Subject : No power-off / reboot with 2.6.25-rcX (up to -rc3)
>>>> FWIW, I have this same problem.
>>> Jeff, does that one ("keep rd->online and cpu_online_map in sync") fix the
>>> problem for you?
>> Nope. I am running baadac8b10c5ac15ce3d26b68fa266c8889b163f now, and it
>> still hangs on reboot or power-off.
>>
>> Interestingly, if I reboot -immediately- from gdm, it succeeds. However
>> if I login to Fedora GNOME via gdm, and load my standard apps (1001
>> terminals, firefox, tbird, IRC) reboot and poweroff no longer work.
>>
>> My guess was always some ACPI regression. I'll bisect today or
>> tomorrow. It is reproducible regression that appeared recently (circa
>> 2.6.24 or 2.6.25-rc1 I think), so I should be able to find the culprit.


Well, after going through several kernel versions (back to 2.6.19 so
far), this machine continues to have reboot problems. I'm going to
back-burner this, as it is looking more like a hardware or BIOS problem
that cropped up recently.

Jeff



2008-03-17 21:29:07

by Thomas Meyer

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Adrian Bunk schrieb:
> On Tue, Mar 11, 2008 at 01:22:42PM +0100, Stefan Richter wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10080
>>> Subject : 2.6.25-rc2: ohci1394 problem
>>> Submitter : Thomas Meyer <[email protected]>
>>> Date : 2008-02-20 08:47
>>> References : http://lkml.org/lkml/2008/2/20/58
>>> Handled-By : Stefan Richter <[email protected]>
>>>
>> Thomas wrote on 2008-02-25:
>> ''So i did a "make clean" and a "make" (not a make
>> -j3 as i use to do) and recompiled 2.6.25-rc3 and now it works again.
>> Case closed under strange error.''
>> ...
>>
>
> Although I don't think this would cause the error, it would be nice if
> Thomas could verify that the -j3 did not cause the problem.
>
I still cannot *believe* this bug, but i just checked out the latest
kernel and did a make distclean and a make (with mr. bunks patch
applied) and there it is again:
$ dmesg

(cut)
[ 464.852986] ohci1394: fw-host0: physical posted write error
[ 464.852991] ohci1394: fw-host0: respTxComplete: dma prg stopped
[ 464.852997] ohci1394: fw-host0: SelfID received outside of bus reset
sequence
[ 464.853002] ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
[ 464.896722] ohci1394: fw-host0: Unrecoverable error!
[ 464.896722] ohci1394: fw-host0: Async Rsp Tx Context died:
ctrl[f0002a00] cmdptr[f0002a00]
[ 464.896722] ohci1394: fw-host0: Iso Recv 3 Context died:
ctrl[d4000d0e] cmdptr[0014c397] match[00000000]
[ 464.896722] ohci1394: fw-host0: Iso Recv 17 Context died:
ctrl[7c006e38] cmdptr[f58b18cd] match[4910c683]
[ 464.896722] ohci1394: fw-host0: Iso Recv 18 Context died:
ctrl[003cacf0] cmdptr[88f2eb10] match[46e8104e]
[ 464.896722] ohci1394: fw-host0: Iso Recv 19 Context died:
ctrl[0c047e80] cmdptr[83060246] match[83060846]
[ 464.896722] ohci1394: fw-host0: Iso Recv 26 Context died:
ctrl[00656c62] cmdptr[6e696461] match[706f2067]
[ 464.896722] ohci1394: fw-host0: Iso Recv 27 Context died:
ctrl[4d006d65] cmdptr[61726570] match[676e6974]
[ 464.896722] ohci1394: fw-host0: physical posted write error
[ 464.896722] ohci1394: fw-host0: respTxComplete: dma prg stopped
[ 464.896722] ohci1394: fw-host0: SelfID received outside of bus reset
sequence
[ 464.896722] ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
[ 464.898957] ohci1394: fw-host0: Unrecoverable error!
[ 464.898957] ohci1394: fw-host0: Async Rsp Tx Context died:
ctrl[f0002a00] cmdptr[f0002a00]
[ 464.898957] ohci1394: fw-host0: Iso Recv 3 Context died:
ctrl[d4000d0e] cmdptr[0014c397] match[00000000]
[ 464.898957] ohci1394: fw-host0: Iso Recv 17 Context died:
ctrl[7c006e38] cmdptr[f58b18cd] match[4910c683]
[ 464.898957] ohci1394: fw-host0: Iso Recv 18 Context died:
ctrl[003cacf0] cmdptr[88f2eb10] match[46e8104e]
[ 464.898957] ohci1394: fw-host0: Iso Recv 19 Context died:
ctrl[0c047e80] cmdptr[83060246] match[83060846]
[ 464.898957] ohci1394: fw-host0: Iso Recv 26 Context died:
ctrl[00656c62] cmdptr[6e696461] match[706f2067]
[ 464.898957] ohci1394: fw-host0: Iso Recv 27 Context died:
ctrl[4d006d65] cmdptr[61726570] match[676e6974]
[ 464.898957] ohci1394: fw-host0: physical posted write error
[ 464.898957] ohci1394: fw-host0: respTxComplete: dma prg stopped
[ 464.898957] ohci1394: fw-host0: SelfID received outside of bus reset
sequence
[ 464.898957] ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
and so on....

$ git describe
v2.6.25-rc6-14-gbde4f8f

As i already wrote: I tried to bisect this behavior, but with no result.

And Stefan didn't change anything in the involved drivers. I have no
idea what could cause this kind of bug!
Suggestions?

- Maybe my build chain produces corrupted code?
- Maybe an udev error?
- ...?

$ emerge --info
Portage 2.1.4.4 (default-linux/x86/2006.1, gcc-4.2.3, glibc-2.7-r1,
2.6.25-rc6 i686)
=================================================================
System uname: 2.6.25-rc6 i686 Genuine Intel(R) CPU T2400 @ 1.83GHz
Timestamp of tree: Mon, 17 Mar 2008 19:00:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632)
[disabled]
app-shells/bash: 3.2_p33
dev-java/java-config: 1.3.7, 2.1.5
dev-lang/python: 2.4.4-r4, 2.5.1-r5
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 2.0.0_rc6-r1
sys-apps/sandbox: 1.2.18.1-r2
sys-devel/autoconf: 2.13, 2.61-r1
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2,
1.10.1
sys-devel/binutils: 2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool: 1.5.26
virtual/os-headers: 2.6.24
ACCEPT_KEYWORDS="x86 ~x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=prescott -O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config
/usr/kde/3.5/shutdown /usr/kde/4.0/env /usr/kde/4.0/share/config
/usr/kde/4.0/shutdown /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf
/etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/
/etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/
/etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=prescott -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict
unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org
http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LANG="de_DE"
LC_ALL="de_DE"
LINGUAS="de"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times
--compress --force --whole-file --delete --stats --timeout=180
--exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
(cut)

2008-03-19 19:27:52

by Stefan Richter

[permalink] [raw]
Subject: Re: 2.6.25-rc5: Reported regressions from 2.6.24

Thomas Meyer wrote:
> Adrian Bunk schrieb:
>> On Tue, Mar 11, 2008 at 01:22:42PM +0100, Stefan Richter wrote:
>>
>>> Rafael J. Wysocki wrote:
>>>
>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10080

I have reopened, re-assigned, and slightly renamed that bug now.

>>>> Subject : 2.6.25-rc2: ohci1394 problem
>>>> Submitter : Thomas Meyer <[email protected]>
>>>> Date : 2008-02-20 08:47
>>>> References : http://lkml.org/lkml/2008/2/20/58
>>>> Handled-By : Stefan Richter <[email protected]>

This bug is not handled by me.

>>>>
>>> Thomas wrote on 2008-02-25:
>>> ''So i did a "make clean" and a "make" (not a make
>>> -j3 as i use to do) and recompiled 2.6.25-rc3 and now it works again.
>>> Case closed under strange error.''
>>> ...
>>>
>>
>> Although I don't think this would cause the error, it would be nice if
>> Thomas could verify that the -j3 did not cause the problem.
>>
> I still cannot *believe* this bug, but i just checked out the latest
> kernel and did a make distclean and a make (with mr. bunks patch
> applied) and there it is again:
> $ dmesg
>
> (cut)
> [ 464.852986] ohci1394: fw-host0: physical posted write error
[...]
> [ 464.898957] ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
> and so on....
>
> $ git describe
> v2.6.25-rc6-14-gbde4f8f
>
> As i already wrote: I tried to bisect this behavior, but with no result.
>
> And Stefan didn't change anything in the involved drivers. I have no
> idea what could cause this kind of bug!

The messages which Thomas posted result from ohci1394 getting ~0 (i.e.
0xffffffff) from some or all MMIO reads. This is not a FireWire driver bug.

MMIO has been broken by something after 2.6.24.
--
Stefan Richter
-=====-==--- --== =--==
http://arcgraph.de/sr/