2010-01-24 22:06:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.33-rc5: Reported regressions from 2.6.32

This message contains a list of some regressions from 2.6.32, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.32, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2010-01-24 75 29 23
2010-01-10 55 33 21
2009-12-29 36 34 27


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15139
Subject : e1000: transmit queue 0 timed out
Submitter : Alexander Beregalov <[email protected]>
Date : 2010-01-23 15:37 (2 days old)
References : http://marc.info/?l=linux-netdev&m=126426149306083&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15138
Subject : evdev regression on macbook
Submitter : Guillaume Chazarain <[email protected]>
Date : 2010-01-23 18:53 (2 days old)
References : http://marc.info/?l=linux-kernel&m=126427286219235&w=4
Handled-By : Dmitry Torokhov <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15133
Subject : Wake on LAN doesn't work in sky2
Submitter : Tino Keitel <[email protected]>
Date : 2010-01-15 9:10 (10 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=166a0fd4c788ec7f10ca8194ec6d526afa12db75
References : http://marc.info/?l=linux-kernel&m=126354704815848&w=4
Handled-By : Stephen Hemminger <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15132
Subject : OOPS's with large initramfs
Submitter : Nigel Kukard <[email protected]>
Date : 2010-01-16 11:12 (9 days old)
References : http://marc.info/?l=linux-kernel&m=126364100321603&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15129
Subject : [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512
Submitter : Miles Lane <[email protected]>
Date : 2010-01-14 23:18 (11 days old)
References : http://lkml.org/lkml/2010/1/14/570
Handled-By : Chris Wilson <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15126
Subject : REGRESSION for RT2561/RT61 in 2.6.33
Submitter : Alan Stern <[email protected]>
Date : 2010-01-11 14:54 (14 days old)
References : http://marc.info/?l=linux-kernel&m=126322167427159&w=4
Handled-By : Johannes Berg <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15125
Subject : hung task - jbd2/dm-1-8 (during raid rebuild)
Submitter : Michael Breuer <[email protected]>
Date : 2010-01-10 21:47 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126316012025978&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
Subject : PCI host bridge windows ignored (works with pci=use_crs)
Submitter : Jeff Garrett <[email protected]>
Date : 2010-01-13 5:37 (12 days old)
References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
Handled-By : Yinghai Lu <[email protected]>
Bjorn Helgaas <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15076
Subject : System panic under load with clockevents_program_event
Submitter : okias <[email protected]>
Date : 2010-01-17 13:03 (8 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
Subject : Display goes off with i915.powersave=1
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2010-01-10 20:09 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
Subject : drm/ksm: fbdev blanking regression
Submitter : Johan Hovold <[email protected]>
Date : 2010-01-06 17:00 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
Handled-By : James Simmons <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15036
Subject : soft lockup in dmesg after suspend/resume
Submitter : ykzhao <[email protected]>
Date : 2010-01-04 5:36 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126258356202722&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15032
Subject : Oops in uart_resume_port() on resume
Submitter : Zdenek Kabelac <[email protected]>
Date : 2010-01-04 15:47 (21 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba15ab0e8de0d4439a91342ad52d55ca9e313f3d
References : http://marc.info/?l=linux-kernel&m=126262008815689&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15025
Subject : Oops in ext4 driver
Submitter : Steinar H. Gunderson <[email protected]>
Date : 2010-01-10 13:09 (15 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15017
Subject : kexec regression, radeon/kms irq related (bisected)
Submitter : Markus Trippelsdorf <[email protected]>
Date : 2010-01-09 18:49 (16 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d8f60cfc93452d0554f6a701aa8e3236cbee4636


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15000
Subject : Thinkpad dock button no longer works
Submitter : Paul Martin <[email protected]>
Date : 2010-01-07 02:11 (18 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14999
Subject : possible circular locking dependency detected in rfkill at suspend
Submitter : Christian Casteyde <[email protected]>
Date : 2010-01-06 21:52 (19 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14950
Subject : tbench regression with 2.6.33-rc1
Submitter : Lin Ming <[email protected]>
Date : 2009-12-25 11:11 (31 days old)
References : http://marc.info/?l=linux-kernel&m=126174044213172&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946
Subject : All kernels after 2.6.32-git10 show only 1 CPU
Submitter : Sid Boyce <[email protected]>
Date : 2009-12-23 16:55 (33 days old)
References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14937
Subject : WARNING: at kernel/lockdep.c:2830
Submitter : Grant Wilson <[email protected]>
Date : 2009-12-27 13:35 (29 days old)
References : http://marc.info/?l=linux-kernel&m=126192220404829&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
Submitter : David <[email protected]>
Date : 2009-12-21 21:53 (35 days old)
References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
Subject : System timer firing too much without cause
Submitter : Shawn Starr <[email protected]>
Date : 2009-12-21 19:16 (35 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14792
Subject : Misdetection of the TV output
Submitter : Santi <[email protected]>
Date : 2009-12-12 13:28 (44 days old)


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
Subject : NULL pointer dereference in vlan_skb_recv
Submitter : Bruno PrĂ©mont <[email protected]>
Date : 2010-01-23 15:56 (2 days old)
References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
Handled-By : Eric Dumazet <[email protected]>
Patch : http://patchwork.kernel.org/patch/74999/
http://patchwork.kernel.org/patch/75002/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15131
Subject : [OOPS] radeon kms
Submitter : John Kacur <[email protected]>
Date : 2010-01-15 15:45 (10 days old)
References : http://lkml.org/lkml/2010/1/15/129
Handled-By : Jerome Glisse <[email protected]>
Patch : http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=patch;h=30d2d9a54d48e4fefede0389ded1b6fc2d44a522


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15128
Subject : Boot regression on AMD
Submitter : Gene Heskett <[email protected]>
Date : 2010-01-13 20:21 (12 days old)
References : http://marc.info/?l=linux-kernel&m=126341413213017&w=4
Handled-By : Andreas Herrmann <[email protected]>
Patch : http://patchwork.kernel.org/patch/74883/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15039
Subject : leds_alix2: can't allocate I/O for GPIO
Submitter : Arnd Hannemann <[email protected]>
Date : 2010-01-07 10:26 (18 days old)
References : http://marc.info/?l=linux-kernel&m=126286001106257&w=4
Handled-By : Daniel Mack <[email protected]>
Patch : http://patchwork.kernel.org/patch/72006/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14949
Subject : drm_vm.c:drm_mmap: possible circular locking dependency detected
Submitter : Borislav Petkov <[email protected]>
Date : 2009-12-26 9:45 (30 days old)
References : http://marc.info/?l=linux-kernel&m=126182073616279&w=4
Handled-By : Eric W. Biederman <[email protected]>
Patch : http://patchwork.kernel.org/patch/70461/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791
Subject : Something has been broken in the network stack this week
Submitter : Delete This Account <[email protected]>
Date : 2009-12-12 13:06 (44 days old)
Handled-By : Ben Hutchings <[email protected]>
Patch : http://patchwork.kernel.org/patch/72073/


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.32,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=14885

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2010-01-24 22:06:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14791] Something has been broken in the network stack this week

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791
Subject : Something has been broken in the network stack this week
Submitter : Delete This Account <[email protected]>
Date : 2009-12-12 13:06 (44 days old)
Handled-By : Ben Hutchings <[email protected]>
Patch : http://patchwork.kernel.org/patch/72073/

2010-01-24 22:16:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14946] All kernels after 2.6.32-git10 show only 1 CPU

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946
Subject : All kernels after 2.6.32-git10 show only 1 CPU
Submitter : Sid Boyce <[email protected]>
Date : 2009-12-23 16:55 (33 days old)
References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4

2010-01-24 22:16:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14792] Misdetection of the TV output

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14792
Subject : Misdetection of the TV output
Submitter : Santi <[email protected]>
Date : 2009-12-12 13:28 (44 days old)

2010-01-24 22:17:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15017] kexec regression, radeon/kms irq related (bisected)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15017
Subject : kexec regression, radeon/kms irq related (bisected)
Submitter : Markus Trippelsdorf <[email protected]>
Date : 2010-01-09 18:49 (16 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d8f60cfc93452d0554f6a701aa8e3236cbee4636

2010-01-24 22:17:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14999] possible circular locking dependency detected in rfkill at suspend

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14999
Subject : possible circular locking dependency detected in rfkill at suspend
Submitter : Christian Casteyde <[email protected]>
Date : 2010-01-06 21:52 (19 days old)

2010-01-24 22:17:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14949] drm_vm.c:drm_mmap: possible circular locking dependency detected

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14949
Subject : drm_vm.c:drm_mmap: possible circular locking dependency detected
Submitter : Borislav Petkov <[email protected]>
Date : 2009-12-26 9:45 (30 days old)
References : http://marc.info/?l=linux-kernel&m=126182073616279&w=4
Handled-By : Eric W. Biederman <[email protected]>
Patch : http://patchwork.kernel.org/patch/70461/

2010-01-24 22:18:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15138] evdev regression on macbook

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15138
Subject : evdev regression on macbook
Submitter : Guillaume Chazarain <[email protected]>
Date : 2010-01-23 18:53 (2 days old)
References : http://marc.info/?l=linux-kernel&m=126427286219235&w=4
Handled-By : Dmitry Torokhov <[email protected]>

2010-01-24 22:18:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15137] NULL pointer dereference in vlan_skb_recv

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
Subject : NULL pointer dereference in vlan_skb_recv
Submitter : Bruno PrĂ©mont <[email protected]>
Date : 2010-01-23 15:56 (2 days old)
References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
Handled-By : Eric Dumazet <[email protected]>
Patch : http://patchwork.kernel.org/patch/74999/
http://patchwork.kernel.org/patch/75002/

2010-01-24 22:17:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15038] drm/ksm: fbdev blanking regression

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
Subject : drm/ksm: fbdev blanking regression
Submitter : Johan Hovold <[email protected]>
Date : 2010-01-06 17:00 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
Handled-By : James Simmons <[email protected]>

2010-01-24 22:18:57

by Ben Hutchings

[permalink] [raw]
Subject: Re: [Bug #14791] Something has been broken in the network stack this week

On Sun, 2010-01-24 at 22:54 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791
> Subject : Something has been broken in the network stack this week
> Submitter : Delete This Account <[email protected]>
> Date : 2009-12-12 13:06 (44 days old)
> Handled-By : Ben Hutchings <[email protected]>
> Patch : http://patchwork.kernel.org/patch/72073/

This should still be listed; I am still waiting for someone to test the
proposed patch.

Ben.

--
Ben Hutchings
Any smoothly functioning technology is indistinguishable from a rigged demo.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2010-01-24 22:17:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15125] hung task - jbd2/dm-1-8 (during raid rebuild)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15125
Subject : hung task - jbd2/dm-1-8 (during raid rebuild)
Submitter : Michael Breuer <[email protected]>
Date : 2010-01-10 21:47 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126316012025978&w=4

2010-01-24 22:17:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15131] [OOPS] radeon kms

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15131
Subject : [OOPS] radeon kms
Submitter : John Kacur <[email protected]>
Date : 2010-01-15 15:45 (10 days old)
References : http://lkml.org/lkml/2010/1/15/129
Handled-By : Jerome Glisse <[email protected]>
Patch : http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=patch;h=30d2d9a54d48e4fefede0389ded1b6fc2d44a522

2010-01-24 22:18:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15076] System panic under load with clockevents_program_event

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15076
Subject : System panic under load with clockevents_program_event
Submitter : okias <[email protected]>
Date : 2010-01-17 13:03 (8 days old)

2010-01-24 22:19:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15139] e1000: transmit queue 0 timed out

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15139
Subject : e1000: transmit queue 0 timed out
Submitter : Alexander Beregalov <[email protected]>
Date : 2010-01-23 15:37 (2 days old)
References : http://marc.info/?l=linux-netdev&m=126426149306083&w=4

2010-01-24 22:19:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15129] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15129
Subject : [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512
Submitter : Miles Lane <[email protected]>
Date : 2010-01-14 23:18 (11 days old)
References : http://lkml.org/lkml/2010/1/14/570
Handled-By : Chris Wilson <[email protected]>

2010-01-24 22:19:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15043] Display goes off with i915.powersave=1

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
Subject : Display goes off with i915.powersave=1
Submitter : Soeren Sonnenburg <[email protected]>
Date : 2010-01-10 20:09 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4

2010-01-24 22:20:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15128] Boot regression on AMD

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15128
Subject : Boot regression on AMD
Submitter : Gene Heskett <[email protected]>
Date : 2010-01-13 20:21 (12 days old)
References : http://marc.info/?l=linux-kernel&m=126341413213017&w=4
Handled-By : Andreas Herrmann <[email protected]>
Patch : http://patchwork.kernel.org/patch/74883/

2010-01-24 22:20:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15132] OOPS's with large initramfs

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15132
Subject : OOPS's with large initramfs
Submitter : Nigel Kukard <[email protected]>
Date : 2010-01-16 11:12 (9 days old)
References : http://marc.info/?l=linux-kernel&m=126364100321603&w=4

2010-01-24 22:20:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15133] Wake on LAN doesn't work in sky2

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15133
Subject : Wake on LAN doesn't work in sky2
Submitter : Tino Keitel <[email protected]>
Date : 2010-01-15 9:10 (10 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=166a0fd4c788ec7f10ca8194ec6d526afa12db75
References : http://marc.info/?l=linux-kernel&m=126354704815848&w=4
Handled-By : Stephen Hemminger <[email protected]>

2010-01-24 22:20:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15126] REGRESSION for RT2561/RT61 in 2.6.33

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15126
Subject : REGRESSION for RT2561/RT61 in 2.6.33
Submitter : Alan Stern <[email protected]>
Date : 2010-01-11 14:54 (14 days old)
References : http://marc.info/?l=linux-kernel&m=126322167427159&w=4
Handled-By : Johannes Berg <[email protected]>

2010-01-24 22:21:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
Subject : PCI host bridge windows ignored (works with pci=use_crs)
Submitter : Jeff Garrett <[email protected]>
Date : 2010-01-13 5:37 (12 days old)
References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
Handled-By : Yinghai Lu <[email protected]>
Bjorn Helgaas <[email protected]>

2010-01-24 22:21:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15025] Oops in ext4 driver

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15025
Subject : Oops in ext4 driver
Submitter : Steinar H. Gunderson <[email protected]>
Date : 2010-01-10 13:09 (15 days old)

2010-01-24 22:21:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14950] tbench regression with 2.6.33-rc1

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14950
Subject : tbench regression with 2.6.33-rc1
Submitter : Lin Ming <[email protected]>
Date : 2009-12-25 11:11 (31 days old)
References : http://marc.info/?l=linux-kernel&m=126174044213172&w=4

2010-01-24 22:21:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15032] Oops in uart_resume_port() on resume

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15032
Subject : Oops in uart_resume_port() on resume
Submitter : Zdenek Kabelac <[email protected]>
Date : 2010-01-04 15:47 (21 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba15ab0e8de0d4439a91342ad52d55ca9e313f3d
References : http://marc.info/?l=linux-kernel&m=126262008815689&w=4

2010-01-24 22:22:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15039] leds_alix2: can't allocate I/O for GPIO

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15039
Subject : leds_alix2: can't allocate I/O for GPIO
Submitter : Arnd Hannemann <[email protected]>
Date : 2010-01-07 10:26 (18 days old)
References : http://marc.info/?l=linux-kernel&m=126286001106257&w=4
Handled-By : Daniel Mack <[email protected]>
Patch : http://patchwork.kernel.org/patch/72006/

2010-01-24 22:17:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14937] WARNING: at kernel/lockdep.c:2830

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14937
Subject : WARNING: at kernel/lockdep.c:2830
Submitter : Grant Wilson <[email protected]>
Date : 2009-12-27 13:35 (29 days old)
References : http://marc.info/?l=linux-kernel&m=126192220404829&w=4

2010-01-24 22:23:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14924] Weird hard hangs when rendering 'some' web-sites in Firefox

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
Submitter : David <[email protected]>
Date : 2009-12-21 21:53 (35 days old)
References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4

2010-01-24 22:22:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15036] soft lockup in dmesg after suspend/resume

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15036
Subject : soft lockup in dmesg after suspend/resume
Submitter : ykzhao <[email protected]>
Date : 2010-01-04 5:36 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126258356202722&w=4

2010-01-24 22:22:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #15000] Thinkpad dock button no longer works

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15000
Subject : Thinkpad dock button no longer works
Submitter : Paul Martin <[email protected]>
Date : 2010-01-07 02:11 (18 days old)

2010-01-24 22:23:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14859] System timer firing too much without cause

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
Subject : System timer firing too much without cause
Submitter : Shawn Starr <[email protected]>
Date : 2009-12-21 19:16 (35 days old)

2010-01-24 22:31:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14791] Something has been broken in the network stack this week

On Sunday 24 January 2010, Ben Hutchings wrote:
> On Sun, 2010-01-24 at 22:54 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791
> > Subject : Something has been broken in the network stack this week
> > Submitter : Delete This Account <[email protected]>
> > Date : 2009-12-12 13:06 (44 days old)
> > Handled-By : Ben Hutchings <[email protected]>
> > Patch : http://patchwork.kernel.org/patch/72073/
>
> This should still be listed; I am still waiting for someone to test the
> proposed patch.

Well, I'm going to drop it. The bug is there, but the reporter is non-existent
and no one else seems to care.

I'll leave the bug entry open, but I'm not going to list this one any more.

Rafael

2010-01-24 22:32:28

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug #15126] REGRESSION for RT2561/RT61 in 2.6.33

On Sun, 24 Jan 2010, Rafael J. Wysocki wrote:

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15126
> Subject : REGRESSION for RT2561/RT61 in 2.6.33
> Submitter : Alan Stern <[email protected]>
> Date : 2010-01-11 14:54 (14 days old)
> References : http://marc.info/?l=linux-kernel&m=126322167427159&w=4
> Handled-By : Johannes Berg <[email protected]>

This bug entry can be removed from the list. It turned out not to be
a bug at all, just a kernel config error I made when updating to
2.6.33-rc1.

Alan Stern

2010-01-24 22:33:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15126] REGRESSION for RT2561/RT61 in 2.6.33

On Sunday 24 January 2010, Alan Stern wrote:
> On Sun, 24 Jan 2010, Rafael J. Wysocki wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15126
> > Subject : REGRESSION for RT2561/RT61 in 2.6.33
> > Submitter : Alan Stern <[email protected]>
> > Date : 2010-01-11 14:54 (14 days old)
> > References : http://marc.info/?l=linux-kernel&m=126322167427159&w=4
> > Handled-By : Johannes Berg <[email protected]>
>
> This bug entry can be removed from the list. It turned out not to be
> a bug at all, just a kernel config error I made when updating to
> 2.6.33-rc1.

Thanks, closed as "invalid".

Rafael

2010-01-24 22:45:36

by Johan Hovold

[permalink] [raw]
Subject: Re: [Bug #15038] drm/ksm: fbdev blanking regression

On Sun, Jan 24, 2010 at 11:04:36PM +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
> Subject : drm/ksm: fbdev blanking regression
> Submitter : Johan Hovold <[email protected]>
> Date : 2010-01-06 17:00 (19 days old)
> First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
> References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
> Handled-By : James Simmons <[email protected]>

Issue remains in rc5.

/Johan

2010-01-24 22:46:46

by Shawn Starr

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Sunday 24 January 2010 17:04:33 Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
> Subject : System timer firing too much without cause
> Submitter : Shawn Starr <[email protected]>
> Date : 2009-12-21 19:16 (35 days old)

Continues with -rc5, I really cannot use Dynamic ticks at all, it has to be
disabled.

I should probably mention this CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
stepping : 10
cpu MHz : 800.000
cache size : 6144 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida
tpr_shadow vnmi flexpriority
bogomips : 5053.40
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

2010-01-24 22:49:28

by Michael Breuer

[permalink] [raw]
Subject: Re: [Bug #15125] hung task - jbd2/dm-1-8 (during raid rebuild)

On 1/24/2010 5:04 PM, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15125
> Subject : hung task - jbd2/dm-1-8 (during raid rebuild)
> Submitter : Michael Breuer<[email protected]>
> Date : 2010-01-10 21:47 (15 days old)
> References : http://marc.info/?l=linux-kernel&m=126316012025978&w=4
>
>
Not an easy one to recreate. Should probably remain listed for now.

2010-01-24 23:03:20

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [Bug #15025] Oops in ext4 driver

On Sun, Jan 24, 2010 at 11:04:35PM +0100, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).

I'm not using 2.6.33 anymore since this bug is a showstopper to me (it's on a
production system), so I'm unable to check if it's fixed or not.

/* Steinar */
--
Homepage: http://www.sesse.net/

2010-01-24 23:06:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Sunday 24 January 2010, Shawn Starr wrote:
> On Sunday 24 January 2010 17:04:33 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
> > Subject : System timer firing too much without cause
> > Submitter : Shawn Starr <[email protected]>
> > Date : 2009-12-21 19:16 (35 days old)
>
> Continues with -rc5, I really cannot use Dynamic ticks at all, it has to be
> disabled.
>
> I should probably mention this CPU info:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 23
> model name : Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
> stepping : 10
> cpu MHz : 800.000
> cache size : 6144 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida
> tpr_shadow vnmi flexpriority
> bogomips : 5053.40
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management:

Thanks for the update.

Rafael

2010-01-24 23:08:08

by Nigel Kukard

[permalink] [raw]
Subject: Re: [Bug #15132] OOPS's with large initramfs

Verified. I have tested this as far back as 2.6.30 with the same problem
with "very" large initramfs's.

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15132
> Subject : OOPS's with large initramfs
> Submitter : Nigel Kukard <[email protected]>
> Date : 2010-01-16 11:12 (9 days old)
> References : http://marc.info/?l=linux-kernel&m=126364100321603&w=4
>
>
>


--
Regards
Nigel Kukard, PhD CompSc
Linux Based Systems Design (Pty) Ltd

Support: 086 747 7600 (premium 24/7/365)
Fax: 086 601 7884

Quote: The best language to use is the language that was designed for
what you want to use it for.

*** The attachment to my email signature.asc is a digital PGP
signature, if your mail client supports digital signatures it will
allow you to verify I am the sender of this email and that it has not
been tampered with along the way ***



Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2010-01-24 23:08:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15025] Oops in ext4 driver

On Sunday 24 January 2010, Steinar H. Gunderson wrote:
> On Sun, Jan 24, 2010 at 11:04:35PM +0100, Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
>
> I'm not using 2.6.33 anymore since this bug is a showstopper to me (it's on a
> production system), so I'm unable to check if it's fixed or not.

Well, in that case I'll have to close it as 'unreproducible', because no one
else seems to be able to reproduce it.

Rafael

2010-01-24 23:13:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15038] drm/ksm: fbdev blanking regression

On Sunday 24 January 2010, Johan Hovold wrote:
> On Sun, Jan 24, 2010 at 11:04:36PM +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
> > Subject : drm/ksm: fbdev blanking regression
> > Submitter : Johan Hovold <[email protected]>
> > Date : 2010-01-06 17:00 (19 days old)
> > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
> > References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
> > Handled-By : James Simmons <[email protected]>
>
> Issue remains in rc5.

Thanks for the update.

OK, we know what commit broke things, we don't seem to know how to fix it,
so perhaps it's time to revert that commit?

Rafael

2010-01-24 23:13:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15125] hung task - jbd2/dm-1-8 (during raid rebuild)

On Sunday 24 January 2010, Michael Breuer wrote:
> On 1/24/2010 5:04 PM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15125
> > Subject : hung task - jbd2/dm-1-8 (during raid rebuild)
> > Submitter : Michael Breuer<[email protected]>
> > Date : 2010-01-10 21:47 (15 days old)
> > References : http://marc.info/?l=linux-kernel&m=126316012025978&w=4
> >
> >
> Not an easy one to recreate. Should probably remain listed for now.

Thanks for the update.

Rafael

2010-01-24 23:14:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15132] OOPS's with large initramfs

On Sunday 24 January 2010, Nigel Kukard wrote:
> Verified. I have tested this as far back as 2.6.30 with the same problem
> with "very" large initramfs's.

So this is not a recent regression, but a bug that has been there for a long
time. Dropping from the list.

Thanks,
Rafael

2010-01-25 01:41:37

by Sid Boyce

[permalink] [raw]
Subject: Re: [Bug #14946] All kernels after 2.6.32-git10 show only 1 CPU

On 24/01/10 22:04, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946
> Subject : All kernels after 2.6.32-git10 show only 1 CPU
> Submitter : Sid Boyce <[email protected]>
> Date : 2009-12-23 16:55 (33 days old)
> References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4
>
>
>

Definitely fixed in 2.6.33-rc4, thanks.
Regards
Sid.
--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks

2010-01-25 03:26:44

by Alex Deucher

[permalink] [raw]
Subject: Re: [Bug #15017] kexec regression, radeon/kms irq related (bisected)

On Sun, Jan 24, 2010 at 5:04 PM, Rafael J. Wysocki <[email protected]> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. ?Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry ? ? ? : http://bugzilla.kernel.org/show_bug.cgi?id=15017
> Subject ? ? ? ? : kexec regression, radeon/kms irq related (bisected)
> Submitter ? ? ? : Markus Trippelsdorf <[email protected]>
> Date ? ? ? ? ? ?: 2010-01-09 18:49 (16 days old)
> First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d8f60cfc93452d0554f6a701aa8e3236cbee4636
>
>
>

That bug has patches which fix the issue attached and queued for 2.6.33.

Alex

2010-01-25 06:32:37

by David Airlie

[permalink] [raw]
Subject: Re: [Bug #15038] drm/ksm: fbdev blanking regression

On Mon, 2010-01-25 at 00:13 +0100, Rafael J. Wysocki wrote:
> On Sunday 24 January 2010, Johan Hovold wrote:
> > On Sun, Jan 24, 2010 at 11:04:36PM +0100, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.32. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
> > > Subject : drm/ksm: fbdev blanking regression
> > > Submitter : Johan Hovold <[email protected]>
> > > Date : 2010-01-06 17:00 (19 days old)
> > > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
> > > References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
> > > Handled-By : James Simmons <[email protected]>
> >
> > Issue remains in rc5.
>
> Thanks for the update.
>
> OK, we know what commit broke things, we don't seem to know how to fix it,
> so perhaps it's time to revert that commit?

Just sent revert of the broken bit to Linus.

Dave.

2010-01-25 06:32:50

by David Airlie

[permalink] [raw]
Subject: Re: [Bug #15017] kexec regression, radeon/kms irq related (bisected)

On Sun, 2010-01-24 at 22:26 -0500, Alex Deucher wrote:
> On Sun, Jan 24, 2010 at 5:04 PM, Rafael J. Wysocki <[email protected]> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15017
> > Subject : kexec regression, radeon/kms irq related (bisected)
> > Submitter : Markus Trippelsdorf <[email protected]>
> > Date : 2010-01-09 18:49 (16 days old)
> > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d8f60cfc93452d0554f6a701aa8e3236cbee4636
> >
> >
> >
>
> That bug has patches which fix the issue attached and queued for 2.6.33.

Just sent to Linus.

Dave.

2010-01-25 08:28:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bug #14949] drm_vm.c:drm_mmap: possible circular locking dependency detected

On Sun, Jan 24, 2010 at 11:04:33PM +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).

Yep, this one is fixed by the patch below. Thanks.

>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14949
> Subject : drm_vm.c:drm_mmap: possible circular locking dependency detected
> Submitter : Borislav Petkov <[email protected]>
> Date : 2009-12-26 9:45 (30 days old)
> References : http://marc.info/?l=linux-kernel&m=126182073616279&w=4
> Handled-By : Eric W. Biederman <[email protected]>
> Patch : http://patchwork.kernel.org/patch/70461/

--
Regards/Gruss,
Boris.

2010-01-25 10:36:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Sun, 24 Jan 2010, Shawn Starr wrote:

> On Sunday 24 January 2010 17:04:33 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).

Why is this on the regression list at all ? The report says that this
is happening with 33-rcX, but there is no comparison to the behaviour
of 32 or earlier kernels on that machine. Instead we have a comparison
of apples and oranges:

> As a comparsion my quad core box has no such issue: (Running 2.6.32-rc7)
> x86_64
> 0: 42 4 1 1 IO-APIC-edge timer
>
> my Lenovo ThinkPad W500 (latest BIOS 3.11) laptop shows the system timer
> flooding the bus (Running 2.6.33-rc1) x86_64
> 0: 66775 70429 IO-APIC-edge timer <-- keeps rising, rapidly

So we look at a quad core desktop machine which probably has no deeper
power states and therefor does not use the broadcast timer and compare
it to a laptop which has deeper power states and needs to use the
broadcast timer, which of course increases the number of IRQ0
events. What a surprise.

Can we please remove this from the regression list unless Shawn
confirms that 32 or earlier kernels do not show that behaviour on the
laptop?

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
> > Subject : System timer firing too much without cause
> > Submitter : Shawn Starr <[email protected]>
> > Date : 2009-12-21 19:16 (35 days old)
>
> Continues with -rc5, I really cannot use Dynamic ticks at all, it has to be
> disabled.

Shawn, why can't you use dynamic ticks ? In the bugzilla I just see
that you worry about the IRQ0 interrupts (which are correct and
necessary when the system is in nohz mode) and the extra rescheduling
interrupts. How is the system misbehaving ?

Thanks,

tglx

2010-01-25 13:41:47

by Cong Wang

[permalink] [raw]
Subject: Re: [Bug #15137] NULL pointer dereference in vlan_skb_recv

On Sun, Jan 24, 2010 at 11:04:41PM +0100, Rafael J. Wysocki wrote:
>This message has been generated automatically as a part of a report
>of recent regressions.
>
>The following bug entry is on the current list of known regressions
>from 2.6.32. Please verify if it still should be listed and let me know
>(either way).
>
>
>Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
>Subject : NULL pointer dereference in vlan_skb_recv
>Submitter : Bruno PrĂ©mont <[email protected]>
>Date : 2010-01-23 15:56 (2 days old)
>References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
>Handled-By : Eric Dumazet <[email protected]>
>Patch : http://patchwork.kernel.org/patch/74999/
> http://patchwork.kernel.org/patch/75002/
>

This one can be closed, patch from Eric is already applied by David Miller.


--
Live like a child, think like the god.

2010-01-25 13:53:29

by Cong Wang

[permalink] [raw]
Subject: Re: [Bug #14924] Weird hard hangs when rendering 'some' web-sites in Firefox

On Sun, Jan 24, 2010 at 11:04:33PM +0100, Rafael J. Wysocki wrote:
>This message has been generated automatically as a part of a report
>of recent regressions.
>
>The following bug entry is on the current list of known regressions
>from 2.6.32. Please verify if it still should be listed and let me know
>(either way).
>
>
>Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
>Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
>Submitter : David <[email protected]>
>Date : 2009-12-21 21:53 (35 days old)
>References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4

Hmm, you have CONFIG_DETECT_SOFTLOCKUP=y, I have no idea what happened,
doing a bisect would be appreciated.

Thanks.

--
Live like a child, think like the god.

2010-01-25 16:54:15

by Shawn Starr

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Monday 25 January 2010 05:35:50 Thomas Gleixner wrote:
> On Sun, 24 Jan 2010, Shawn Starr wrote:
> > On Sunday 24 January 2010 17:04:33 Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.32. Please verify if it still should be listed and let me
> > > know (either way).
>
> Why is this on the regression list at all ? The report says that this
> is happening with 33-rcX, but there is no comparison to the behaviour
> of 32 or earlier kernels on that machine. Instead we have a comparison
>
> of apples and oranges:
> > As a comparsion my quad core box has no such issue: (Running 2.6.32-rc7)
> > x86_64
> >
> > 0: 42 4 1 1 IO-APIC-edge
> > timer
> >
> > my Lenovo ThinkPad W500 (latest BIOS 3.11) laptop shows the system timer
> > flooding the bus (Running 2.6.33-rc1) x86_64
> >
> > 0: 66775 70429 IO-APIC-edge timer <-- keeps rising,
> > rapidly
>
> So we look at a quad core desktop machine which probably has no deeper
> power states and therefor does not use the broadcast timer and compare
> it to a laptop which has deeper power states and needs to use the
> broadcast timer, which of course increases the number of IRQ0
> events. What a surprise.
>
> Can we please remove this from the regression list unless Shawn
> confirms that 32 or earlier kernels do not show that behaviour on the
> laptop?
>

> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
> > > Subject : System timer firing too much without cause
> > > Submitter : Shawn Starr <[email protected]>
> > > Date : 2009-12-21 19:16 (35 days old)
> >
> > Continues with -rc5, I really cannot use Dynamic ticks at all, it has to
> > be disabled.
>
> Shawn, why can't you use dynamic ticks ? In the bugzilla I just see
> that you worry about the IRQ0 interrupts (which are correct and
> necessary when the system is in nohz mode) and the extra rescheduling
> interrupts. How is the system misbehaving ?
>

Well, this all stems from trying to use Radeon KMS with IRQs on. Doing so I
see system stalls and this is quite noticeable however, I am able to show this
same stall on the quad core with the same GPU.

Right now, it is unclear to me if there is a underlying irq issue or a bug in
the radeon driver code that is showing these stalls. Since the radeon folks -
at the moment - do not think it is a coding problem in their driver

My impression was using dynamic ticks meant ticks were on demand and not
continuous. On the quad core box, with dynamic ticks on, the broadcasts are
not increasing IRQ 0 events this only happens on the laptop.

Thanks,
Shawn.

> Thanks,
>
> tglx

2010-01-25 17:21:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Mon, 25 Jan 2010, Shawn Starr wrote:
> On Monday 25 January 2010 05:35:50 Thomas Gleixner wrote:
> > Shawn, why can't you use dynamic ticks ? In the bugzilla I just see
> > that you worry about the IRQ0 interrupts (which are correct and
> > necessary when the system is in nohz mode) and the extra rescheduling
> > interrupts. How is the system misbehaving ?
> >

> Well, this all stems from trying to use Radeon KMS with IRQs
> on. Doing so I see system stalls and this is quite noticeable
> however, I am able to show this same stall on the quad core with the
x> same GPU. Right now, it is unclear to me if there is a underlying
> irq issue or a bug in the radeon driver code that is showing these
> stalls. Since the radeon folks - at the moment - do not think it is
> a coding problem in their driver

Does the stall go away, when you disable dynticks ?

> My impression was using dynamic ticks meant ticks were on demand and

Dynamic ticks are providing a continuous tick long as the machine is
busy. When a core becomes idle, we programm the timer to go off at the
next scheduled timer event, if the event is longer away than the next
tick. When the core goes out of idle (due to the timer or some other
event) we restart the tick.

So you see less timer interrupts (IRQ0 + Local timer interrupts)

> not continuous. On the quad core box, with dynamic ticks on, the
> broadcasts are not increasing IRQ 0 events this only happens on the
> laptop.

Right, that is expected as I explained already. Your desktop does not
use deeper power states. Check /proc/acpi/processor/CPU0/power on both
machines to see the difference. You _cannot_ compare a desktop and a
laptop machine and deduce a regression.

The broadcast mechanism is necessary because the local APIC timer
stops in deeper power states. That's a hardware problem. So if the
core goes into a deeper power state then we arm the broadcast timer
which fires on IRQ0 to wake us up. It is a single timer which is used
by all cores in a system to work around this hardware stupidity. It's
named broadcast because it broadcasts the event to the other cores
when necessary. Your desktop does not use deeper power states,
therefor it does not use the broadcast timer either.

So the timer IRQ0 increasing is neither a Linux BUG nor a regression.

Thanks,

tglx

2010-01-25 17:37:09

by Shawn Starr

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Monday 25 January 2010 12:20:38 Thomas Gleixner wrote:
> On Mon, 25 Jan 2010, Shawn Starr wrote:
> > On Monday 25 January 2010 05:35:50 Thomas Gleixner wrote:
> > > Shawn, why can't you use dynamic ticks ? In the bugzilla I just see
> > > that you worry about the IRQ0 interrupts (which are correct and
> > > necessary when the system is in nohz mode) and the extra rescheduling
> > > interrupts. How is the system misbehaving ?
> >
> > Well, this all stems from trying to use Radeon KMS with IRQs
> > on. Doing so I see system stalls and this is quite noticeable
> > however, I am able to show this same stall on the quad core with the
>
> x> same GPU. Right now, it is unclear to me if there is a underlying
>
> > irq issue or a bug in the radeon driver code that is showing these
> > stalls. Since the radeon folks - at the moment - do not think it is
> > a coding problem in their driver
>
> Does the stall go away, when you disable dynticks ?
>

It does not, no.

> > My impression was using dynamic ticks meant ticks were on demand and
>
> Dynamic ticks are providing a continuous tick long as the machine is
> busy. When a core becomes idle, we programm the timer to go off at the
> next scheduled timer event, if the event is longer away than the next
> tick. When the core goes out of idle (due to the timer or some other
> event) we restart the tick.
>
> So you see less timer interrupts (IRQ0 + Local timer interrupts)

With dynamic ticks on or off, LOC increments rapidly, but I assume that is
normal behavour.

So if none of this really is a kernel issue, I defer it to the radeon folks to
comment further.

Please remove from regression list, I'll close the original bug.

>
> > not continuous. On the quad core box, with dynamic ticks on, the
> > broadcasts are not increasing IRQ 0 events this only happens on the
> > laptop.
>
> Right, that is expected as I explained already. Your desktop does not
> use deeper power states. Check /proc/acpi/processor/CPU0/power on both
> machines to see the difference. You _cannot_ compare a desktop and a
> laptop machine and deduce a regression.
>
> The broadcast mechanism is necessary because the local APIC timer
> stops in deeper power states. That's a hardware problem. So if the
> core goes into a deeper power state then we arm the broadcast timer
> which fires on IRQ0 to wake us up. It is a single timer which is used
> by all cores in a system to work around this hardware stupidity. It's
> named broadcast because it broadcasts the event to the other cores
> when necessary. Your desktop does not use deeper power states,
> therefor it does not use the broadcast timer either.
>
> So the timer IRQ0 increasing is neither a Linux BUG nor a regression.
>
> Thanks,
>
> tglx

2010-01-25 19:28:06

by David R

[permalink] [raw]
Subject: Re: [Bug #14924] Weird hard hangs when rendering 'some' web-sites in Firefox

Am?rico Wang wrote:
> On Sun, Jan 24, 2010 at 11:04:33PM +0100, Rafael J. Wysocki wrote:
>
>> This message has been generated automatically as a part of a report
>> of recent regressions.
>>
>> The following bug entry is on the current list of known regressions
>>
> >from 2.6.32. Please verify if it still should be listed and let me know
>
>> (either way).
>>
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
>> Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
>> Submitter : David <[email protected]>
>> Date : 2009-12-21 21:53 (35 days old)
>> References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4
>>
>
> Hmm, you have CONFIG_DETECT_SOFTLOCKUP=y, I have no idea what happened,
> doing a bisect would be appreciated.
>
> Thanks.
>
>

I no longer have the offending hardware, but I think that the issue was
probably corrected by:

cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd -
drm/radeon/kms: Schedule host path read cache flush through the ring V2

as the offending ATI graphics was indeed R300.

Cheers
David

2010-01-25 20:58:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14859] System timer firing too much without cause

On Monday 25 January 2010, Shawn Starr wrote:
> On Monday 25 January 2010 12:20:38 Thomas Gleixner wrote:
> > On Mon, 25 Jan 2010, Shawn Starr wrote:
> > > On Monday 25 January 2010 05:35:50 Thomas Gleixner wrote:
> > > > Shawn, why can't you use dynamic ticks ? In the bugzilla I just see
> > > > that you worry about the IRQ0 interrupts (which are correct and
> > > > necessary when the system is in nohz mode) and the extra rescheduling
> > > > interrupts. How is the system misbehaving ?
> > >
> > > Well, this all stems from trying to use Radeon KMS with IRQs
> > > on. Doing so I see system stalls and this is quite noticeable
> > > however, I am able to show this same stall on the quad core with the
> >
> > x> same GPU. Right now, it is unclear to me if there is a underlying
> >
> > > irq issue or a bug in the radeon driver code that is showing these
> > > stalls. Since the radeon folks - at the moment - do not think it is
> > > a coding problem in their driver
> >
> > Does the stall go away, when you disable dynticks ?
> >
>
> It does not, no.
>
> > > My impression was using dynamic ticks meant ticks were on demand and
> >
> > Dynamic ticks are providing a continuous tick long as the machine is
> > busy. When a core becomes idle, we programm the timer to go off at the
> > next scheduled timer event, if the event is longer away than the next
> > tick. When the core goes out of idle (due to the timer or some other
> > event) we restart the tick.
> >
> > So you see less timer interrupts (IRQ0 + Local timer interrupts)
>
> With dynamic ticks on or off, LOC increments rapidly, but I assume that is
> normal behavour.
>
> So if none of this really is a kernel issue, I defer it to the radeon folks to
> comment further.
>
> Please remove from regression list, I'll close the original bug.

OK, closing it right now.

Rafael

2010-01-25 21:01:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14924] Weird hard hangs when rendering 'some' web-sites in Firefox

On Monday 25 January 2010, David wrote:
> Am?rico Wang wrote:
> > On Sun, Jan 24, 2010 at 11:04:33PM +0100, Rafael J. Wysocki wrote:
> >
> >> This message has been generated automatically as a part of a report
> >> of recent regressions.
> >>
> >> The following bug entry is on the current list of known regressions
> >>
> > >from 2.6.32. Please verify if it still should be listed and let me know
> >
> >> (either way).
> >>
> >>
> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
> >> Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
> >> Submitter : David <[email protected]>
> >> Date : 2009-12-21 21:53 (35 days old)
> >> References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4
> >>
> >
> > Hmm, you have CONFIG_DETECT_SOFTLOCKUP=y, I have no idea what happened,
> > doing a bisect would be appreciated.
> >
> > Thanks.
> >
> >
>
> I no longer have the offending hardware, but I think that the issue was
> probably corrected by:
>
> cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd -
> drm/radeon/kms: Schedule host path read cache flush through the ring V2
>
> as the offending ATI graphics was indeed R300.

Well, let's assume that's really the case. Closed.

Rafael

2010-01-25 21:03:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14946] All kernels after 2.6.32-git10 show only 1 CPU

On Monday 25 January 2010, Sid Boyce wrote:
> On 24/01/10 22:04, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946
> > Subject : All kernels after 2.6.32-git10 show only 1 CPU
> > Submitter : Sid Boyce <[email protected]>
> > Date : 2009-12-23 16:55 (33 days old)
> > References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4
> >
> >
> >
>
> Definitely fixed in 2.6.33-rc4, thanks.

Thanks, closed.

Rafael

2010-01-25 21:06:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15038] drm/ksm: fbdev blanking regression

On Monday 25 January 2010, Dave Airlie wrote:
> On Mon, 2010-01-25 at 00:13 +0100, Rafael J. Wysocki wrote:
> > On Sunday 24 January 2010, Johan Hovold wrote:
> > > On Sun, Jan 24, 2010 at 11:04:36PM +0100, Rafael J. Wysocki wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.32. Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > >
> > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
> > > > Subject : drm/ksm: fbdev blanking regression
> > > > Submitter : Johan Hovold <[email protected]>
> > > > Date : 2010-01-06 17:00 (19 days old)
> > > > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
> > > > References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
> > > > Handled-By : James Simmons <[email protected]>
> > >
> > > Issue remains in rc5.
> >
> > Thanks for the update.
> >
> > OK, we know what commit broke things, we don't seem to know how to fix it,
> > so perhaps it's time to revert that commit?
>
> Just sent revert of the broken bit to Linus.

Thanks!

Rafael

2010-01-25 21:07:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15137] NULL pointer dereference in vlan_skb_recv

On Monday 25 January 2010, Américo Wang wrote:
> On Sun, Jan 24, 2010 at 11:04:41PM +0100, Rafael J. Wysocki wrote:
> >This message has been generated automatically as a part of a report
> >of recent regressions.
> >
> >The following bug entry is on the current list of known regressions
> >from 2.6.32. Please verify if it still should be listed and let me know
> >(either way).
> >
> >
> >Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
> >Subject : NULL pointer dereference in vlan_skb_recv
> >Submitter : Bruno PrĂ©mont <[email protected]>
> >Date : 2010-01-23 15:56 (2 days old)
> >References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
> >Handled-By : Eric Dumazet <[email protected]>
> >Patch : http://patchwork.kernel.org/patch/74999/
> > http://patchwork.kernel.org/patch/75002/
> >
>
> This one can be closed, patch from Eric is already applied by David Miller.

Is it in the Linus' tree already?

Rafael

2010-01-25 21:30:15

by David Miller

[permalink] [raw]
Subject: Re: [Bug #15137] NULL pointer dereference in vlan_skb_recv

From: "Rafael J. Wysocki" <[email protected]>
Date: Mon, 25 Jan 2010 22:07:52 +0100

> On Monday 25 January 2010, Am?rico Wang wrote:
>> On Sun, Jan 24, 2010 at 11:04:41PM +0100, Rafael J. Wysocki wrote:
>> >This message has been generated automatically as a part of a report
>> >of recent regressions.
>> >
>> >The following bug entry is on the current list of known regressions
>> >from 2.6.32. Please verify if it still should be listed and let me know
>> >(either way).
>> >
>> >
>> >Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
>> >Subject : NULL pointer dereference in vlan_skb_recv
>> >Submitter : Bruno Pr?mont <[email protected]>
>> >Date : 2010-01-23 15:56 (2 days old)
>> >References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
>> >Handled-By : Eric Dumazet <[email protected]>
>> >Patch : http://patchwork.kernel.org/patch/74999/
>> > http://patchwork.kernel.org/patch/75002/
>> >
>>
>> This one can be closed, patch from Eric is already applied by David Miller.
>
> Is it in the Linus' tree already?

No, but it will be there soon, I'll push it to him today.
:-)

2010-01-25 21:57:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15137] NULL pointer dereference in vlan_skb_recv

On Monday 25 January 2010, David Miller wrote:
> From: "Rafael J. Wysocki" <[email protected]>
> Date: Mon, 25 Jan 2010 22:07:52 +0100
>
> > On Monday 25 January 2010, Am?rico Wang wrote:
> >> On Sun, Jan 24, 2010 at 11:04:41PM +0100, Rafael J. Wysocki wrote:
> >> >This message has been generated automatically as a part of a report
> >> >of recent regressions.
> >> >
> >> >The following bug entry is on the current list of known regressions
> >> >from 2.6.32. Please verify if it still should be listed and let me know
> >> >(either way).
> >> >
> >> >
> >> >Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15137
> >> >Subject : NULL pointer dereference in vlan_skb_recv
> >> >Submitter : Bruno Pr?mont <[email protected]>
> >> >Date : 2010-01-23 15:56 (2 days old)
> >> >References : http://marc.info/?l=linux-kernel&m=126426286507497&w=4
> >> >Handled-By : Eric Dumazet <[email protected]>
> >> >Patch : http://patchwork.kernel.org/patch/74999/
> >> > http://patchwork.kernel.org/patch/75002/
> >> >
> >>
> >> This one can be closed, patch from Eric is already applied by David Miller.
> >
> > Is it in the Linus' tree already?
>
> No, but it will be there soon, I'll push it to him today.
> :-)

Thanks!

2010-01-26 03:06:56

by Cong Wang

[permalink] [raw]
Subject: Re: [Bug #14924] Weird hard hangs when rendering 'some' web-sites in Firefox

On Tue, Jan 26, 2010 at 3:26 AM, David <[email protected]> wrote:
> Américo Wang wrote:
>> On Sun, Jan 24, 2010 at 11:04:33PM +0100, Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>>
>> >from 2.6.32.  Please verify if it still should be listed and let me know
>>
>>> (either way).
>>>
>>>
>>> Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=14924
>>> Subject              : Weird hard hangs when rendering 'some' web-sites in Firefox
>>> Submitter    : David <[email protected]>
>>> Date         : 2009-12-21 21:53 (35 days old)
>>> References   : http://marc.info/?l=linux-kernel&m=126143375823340&w=4
>>>
>>
>> Hmm, you have CONFIG_DETECT_SOFTLOCKUP=y, I have no idea what happened,
>> doing a bisect would be appreciated.
>>
>> Thanks.
>>
>>
>
> I no longer have the offending hardware, but I think that the issue was
> probably corrected by:
>
>    cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd -
> drm/radeon/kms: Schedule host path read cache flush through the ring V2
>
> as the offending ATI graphics was indeed R300.
>

Ok, thanks!

2010-01-26 07:20:19

by Jeff Garrett

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Sun, Jan 24, 2010 at 11:04:38PM +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
> Subject : PCI host bridge windows ignored (works with pci=use_crs)
> Submitter : Jeff Garrett <[email protected]>
> Date : 2010-01-13 5:37 (12 days old)
> References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
> Handled-By : Yinghai Lu <[email protected]>
> Bjorn Helgaas <[email protected]>

This regression should still be listed. No patch to test yet.

-Jeff Garrett

2010-01-26 12:48:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tuesday 26 January 2010, Jeff Garrett wrote:
> On Sun, Jan 24, 2010 at 11:04:38PM +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
> > Subject : PCI host bridge windows ignored (works with pci=use_crs)
> > Submitter : Jeff Garrett <[email protected]>
> > Date : 2010-01-13 5:37 (12 days old)
> > References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
> > Handled-By : Yinghai Lu <[email protected]>
> > Bjorn Helgaas <[email protected]>
>
> This regression should still be listed. No patch to test yet.

Thanks for the update.

IIRC, we already know how to fix this ...

Rafael

2010-01-26 17:32:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tuesday 26 January 2010 05:48:59 am Rafael J. Wysocki wrote:
> On Tuesday 26 January 2010, Jeff Garrett wrote:
> > On Sun, Jan 24, 2010 at 11:04:38PM +0100, Rafael J. Wysocki wrote:
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.32. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
> > > Subject : PCI host bridge windows ignored (works with pci=use_crs)
> > > Submitter : Jeff Garrett <[email protected]>
> > > Date : 2010-01-13 5:37 (12 days old)
> > > References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
> > > Handled-By : Yinghai Lu <[email protected]>
> > > Bjorn Helgaas <[email protected]>
> >
> > This regression should still be listed. No patch to test yet.
> ...
> IIRC, we already know how to fix this ...

As far as I know, we do NOT know how to fix this.

This regression occurred when we added intel_bus.c because it's not
yet smart enough to determine the correct host bridge apertures.
Here's what it thinks the bridge aperture is and the Radeon BAR:

IOH bus: 00 index 1 mmio: [e0000000, fdffffff]
pci 0000:04:00.0: reg 10: [mem 0xd0000000-0xdfffffff 64bit pref]

The IOH aperture is obviously not big enough to cover the Radeon BAR.
But the host bridge _CRS tells us this:

pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff]
pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfed8ffff]

which IS big enough, and we know the bridge is in fact forwarding the
[mem 0xd0000000-0xdfffffff 64bit pref] region, because the Radeon works
when Jeff boots with "pci=use_crs".

I'm quite concerned about this for .33 because I don't think Jeff's
configuration (Dell desktop with Intel x58 and large graphics device)
is unusual.

The benefit of intel_bus.c is on machines with multiple IOHs, where we
need to figure out which address ranges go to which IOHs so we can
program downstream devices correctly. But even there, _CRS should give
us the information we need, so "pci=use_crs" should make these machines
work.

I think we should remove intel_bus.c before .33. It's breaking boxes
and we don't know how to fix it. Even if we do find out how to fix it,
I think we should move toward using _CRS instead, because that's what
Windows uses and it's an easy way for the firmware to tell us about
platform quirks.

Bjorn

2010-01-26 18:01:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tuesday 26 January 2010, Bjorn Helgaas wrote:
> On Tuesday 26 January 2010 05:48:59 am Rafael J. Wysocki wrote:
> > On Tuesday 26 January 2010, Jeff Garrett wrote:
> > > On Sun, Jan 24, 2010 at 11:04:38PM +0100, Rafael J. Wysocki wrote:
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.32. Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15124
> > > > Subject : PCI host bridge windows ignored (works with pci=use_crs)
> > > > Submitter : Jeff Garrett <[email protected]>
> > > > Date : 2010-01-13 5:37 (12 days old)
> > > > References : http://marc.info/?l=linux-kernel&m=126336296600307&w=4
> > > > Handled-By : Yinghai Lu <[email protected]>
> > > > Bjorn Helgaas <[email protected]>
> > >
> > > This regression should still be listed. No patch to test yet.
> > ...
> > IIRC, we already know how to fix this ...
>
> As far as I know, we do NOT know how to fix this.
>
> This regression occurred when we added intel_bus.c because it's not
> yet smart enough to determine the correct host bridge apertures.
> Here's what it thinks the bridge aperture is and the Radeon BAR:
>
> IOH bus: 00 index 1 mmio: [e0000000, fdffffff]
> pci 0000:04:00.0: reg 10: [mem 0xd0000000-0xdfffffff 64bit pref]
>
> The IOH aperture is obviously not big enough to cover the Radeon BAR.
> But the host bridge _CRS tells us this:
>
> pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff]
> pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfed8ffff]
>
> which IS big enough, and we know the bridge is in fact forwarding the
> [mem 0xd0000000-0xdfffffff 64bit pref] region, because the Radeon works
> when Jeff boots with "pci=use_crs".
>
> I'm quite concerned about this for .33 because I don't think Jeff's
> configuration (Dell desktop with Intel x58 and large graphics device)
> is unusual.
>
> The benefit of intel_bus.c is on machines with multiple IOHs, where we
> need to figure out which address ranges go to which IOHs so we can
> program downstream devices correctly. But even there, _CRS should give
> us the information we need, so "pci=use_crs" should make these machines
> work.
>
> I think we should remove intel_bus.c before .33. It's breaking boxes
> and we don't know how to fix it. Even if we do find out how to fix it,
> I think we should move toward using _CRS instead, because that's what
> Windows uses and it's an easy way for the firmware to tell us about
> platform quirks.

Perhaps it would be sufficient to make pci=use_crs the default and leave the
option to use intel_bus.c for whoever needs that?

Rafael

2010-01-26 18:17:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)



On Tue, 26 Jan 2010, Bjorn Helgaas wrote:
>
> which IS big enough, and we know the bridge is in fact forwarding the
> [mem 0xd0000000-0xdfffffff 64bit pref] region, because the Radeon works
> when Jeff boots with "pci=use_crs".

I bet it's a subtractive decode thing. Sure, it could be just another
undocumented range register (does anybody have the datasheet for that
thing?) but Intel tends to often have subtractive decode.

That system in question has three PCI express root ports, but two of them
have IO and memory disabled according to the lspci info. So maybe it's as
simple as that "I/O Hub PCI Express Root Port 7" just catching anything
that nobody else does, and the single IOH host chip doing the same?

> I think we should remove intel_bus.c before .33. It's breaking boxes
> and we don't know how to fix it. Even if we do find out how to fix it,
> I think we should move toward using _CRS instead, because that's what
> Windows uses and it's an easy way for the firmware to tell us about
> platform quirks.

I suspect that for 33 it is indeed best to just revert. But somebody is
bound to have information on how the actual hardware works. Yinghai?

Linus

2010-01-26 18:18:11

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tue, 26 Jan 2010 19:02:13 +0100
"Rafael J. Wysocki" <[email protected]> wrote:
> > I'm quite concerned about this for .33 because I don't think Jeff's
> > configuration (Dell desktop with Intel x58 and large graphics device)
> > is unusual.
> >
> > The benefit of intel_bus.c is on machines with multiple IOHs, where we
> > need to figure out which address ranges go to which IOHs so we can
> > program downstream devices correctly. But even there, _CRS should give
> > us the information we need, so "pci=use_crs" should make these machines
> > work.
> >
> > I think we should remove intel_bus.c before .33. It's breaking boxes
> > and we don't know how to fix it. Even if we do find out how to fix it,
> > I think we should move toward using _CRS instead, because that's what
> > Windows uses and it's an easy way for the firmware to tell us about
> > platform quirks.
>
> Perhaps it would be sufficient to make pci=use_crs the default and leave the
> option to use intel_bus.c for whoever needs that?

We can't make use_crs the default w/o some more _CRS handling fixes
(some firmwares have large lists we need to handle).

We can disable intel_bus.c though. Yinghai, I'm inclined against the
intel_bus.c approach at this point. It seems unlikely we'll ever keep
it up to date with new bridges, since its approach differs so much from
how things are done in the Windows world, where the firmware provides
a list of resources. We'll always be playing catch up, and will
probably be behind the firmware most of the time since the docs with
the necessary info likely won't be public most of the time.

For 2.6.33 I'd like a minimal fix though, can you disable it for all
but the multi-IOH case perhaps?

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-26 18:23:14

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/26/2010 10:16 AM, Linus Torvalds wrote:
>
>
> On Tue, 26 Jan 2010, Bjorn Helgaas wrote:
>>
>> which IS big enough, and we know the bridge is in fact forwarding the
>> [mem 0xd0000000-0xdfffffff 64bit pref] region, because the Radeon works
>> when Jeff boots with "pci=use_crs".
>
> I bet it's a subtractive decode thing. Sure, it could be just another
> undocumented range register (does anybody have the datasheet for that
> thing?) but Intel tends to often have subtractive decode.
>
> That system in question has three PCI express root ports, but two of them
> have IO and memory disabled according to the lspci info. So maybe it's as
> simple as that "I/O Hub PCI Express Root Port 7" just catching anything
> that nobody else does, and the single IOH host chip doing the same?
>
>> I think we should remove intel_bus.c before .33. It's breaking boxes
>> and we don't know how to fix it. Even if we do find out how to fix it,
>> I think we should move toward using _CRS instead, because that's what
>> Windows uses and it's an easy way for the firmware to tell us about
>> platform quirks.
>
> I suspect that for 33 it is indeed best to just revert. But somebody is
> bound to have information on how the actual hardware works. Yinghai?

I have asked intel if there is any bit that could be enabled the routing.
there is no info about for their documentations.

Yinghai

2010-01-26 18:23:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)



On Tue, 26 Jan 2010, Rafael J. Wysocki wrote:
>
> Perhaps it would be sufficient to make pci=use_crs the default and leave the
> option to use intel_bus.c for whoever needs that?

Well, 'use_crs' broke other machines. See:

http://lkml.org/lkml/2009/6/23/715

but maybe that is all fixed..

Linus

2010-01-26 18:23:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/26/2010 10:17 AM, Jesse Barnes wrote:
> On Tue, 26 Jan 2010 19:02:13 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>>> I'm quite concerned about this for .33 because I don't think Jeff's
>>> configuration (Dell desktop with Intel x58 and large graphics device)
>>> is unusual.
>>>
>>> The benefit of intel_bus.c is on machines with multiple IOHs, where we
>>> need to figure out which address ranges go to which IOHs so we can
>>> program downstream devices correctly. But even there, _CRS should give
>>> us the information we need, so "pci=use_crs" should make these machines
>>> work.
>>>
>>> I think we should remove intel_bus.c before .33. It's breaking boxes
>>> and we don't know how to fix it. Even if we do find out how to fix it,
>>> I think we should move toward using _CRS instead, because that's what
>>> Windows uses and it's an easy way for the firmware to tell us about
>>> platform quirks.
>>
>> Perhaps it would be sufficient to make pci=use_crs the default and leave the
>> option to use intel_bus.c for whoever needs that?
>
> We can't make use_crs the default w/o some more _CRS handling fixes
> (some firmwares have large lists we need to handle).
>
> We can disable intel_bus.c though. Yinghai, I'm inclined against the
> intel_bus.c approach at this point. It seems unlikely we'll ever keep
> it up to date with new bridges, since its approach differs so much from
> how things are done in the Windows world, where the firmware provides
> a list of resources. We'll always be playing catch up, and will
> probably be behind the firmware most of the time since the docs with
> the necessary info likely won't be public most of the time.
>
> For 2.6.33 I'd like a minimal fix though, can you disable it for all
> but the multi-IOH case perhaps?

ok, we have one patch to enable that only with multi-IOH case.

YH

2010-01-26 18:35:27

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tue, 26 Jan 2010 10:21:29 -0800
Yinghai Lu <[email protected]> wrote:

> On 01/26/2010 10:16 AM, Linus Torvalds wrote:
> >
> >
> > On Tue, 26 Jan 2010, Bjorn Helgaas wrote:
> >>
> >> which IS big enough, and we know the bridge is in fact forwarding the
> >> [mem 0xd0000000-0xdfffffff 64bit pref] region, because the Radeon works
> >> when Jeff boots with "pci=use_crs".
> >
> > I bet it's a subtractive decode thing. Sure, it could be just another
> > undocumented range register (does anybody have the datasheet for that
> > thing?) but Intel tends to often have subtractive decode.
> >
> > That system in question has three PCI express root ports, but two of them
> > have IO and memory disabled according to the lspci info. So maybe it's as
> > simple as that "I/O Hub PCI Express Root Port 7" just catching anything
> > that nobody else does, and the single IOH host chip doing the same?
> >
> >> I think we should remove intel_bus.c before .33. It's breaking boxes
> >> and we don't know how to fix it. Even if we do find out how to fix it,
> >> I think we should move toward using _CRS instead, because that's what
> >> Windows uses and it's an easy way for the firmware to tell us about
> >> platform quirks.
> >
> > I suspect that for 33 it is indeed best to just revert. But somebody is
> > bound to have information on how the actual hardware works. Yinghai?
>
> I have asked intel if there is any bit that could be enabled the routing.
> there is no info about for their documentations.

I could probably dig something up in our confidential database, but this
is the main problem with intel_bus.c. It'll always be behind with _CRS
provides. Sure _CRS may be wrong sometimes, but it'll always work well
enough to bring Windows up, so we ought not to ignore it.

The underlying problems with our _CRS support still aren't fixed
though, so switching that on for 2.6.33 isn't an option.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-26 22:58:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/26/2010 10:17 AM, Jesse Barnes wrote:

>
> For 2.6.33 I'd like a minimal fix though, can you disable it for all
> but the multi-IOH case perhaps?
>
please check,

[PATCH] x86/pci: don't use ioh resource if only have one ioh

some system could use reosurce out of IOH resources when only one ioh is there.

could be BIOS have wrong IOH resources and not enable them.

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/pci/intel_bus.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)

Index: linux-2.6/arch/x86/pci/intel_bus.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/intel_bus.c
+++ linux-2.6/arch/x86/pci/intel_bus.c
@@ -7,9 +7,11 @@
#include <linux/pci.h>
#include <linux/init.h>
#include <asm/pci_x86.h>
+#include <asm/pci-direct.h>

#include "bus_numa.h"

+static int nr_ioh;
static inline void print_ioh_resources(struct pci_root_info *info)
{
int res_num;
@@ -49,6 +51,9 @@ static void __devinit pci_root_bus_res(s
u64 mmioh_base, mmioh_end;
int bus_base, bus_end;

+ if (nr_ioh < 2)
+ return;
+
/* some sys doesn't get mmconf enabled */
if (dev->cfg_size < 0x120)
return;
@@ -92,3 +97,84 @@ static void __devinit pci_root_bus_res(s

/* intel IOH */
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, pci_root_bus_res);
+
+static void __init count_ioh(int num, int slot, int func)
+{
+ nr_ioh++;
+}
+
+struct pci_check_probe {
+ u32 vendor;
+ u32 device;
+ void (*f)(int num, int slot, int func);
+};
+
+static struct pci_check_probe early_qrk[] __initdata = {
+ { PCI_VENDOR_ID_INTEL, 0x342e, count_ioh },
+ {}
+};
+
+static void __init early_check_pci_dev(int num, int slot, int func)
+{
+ u16 vendor;
+ u16 device;
+ int i;
+
+ vendor = read_pci_config_16(num, slot, func, PCI_VENDOR_ID);
+ device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
+
+ for (i = 0; early_qrk[i].f != NULL; i++) {
+ if (((early_qrk[i].vendor == PCI_ANY_ID) ||
+ (early_qrk[i].vendor == vendor)) &&
+ ((early_qrk[i].device == PCI_ANY_ID) ||
+ (early_qrk[i].device == device)))
+ early_qrk[i].f(num, slot, func);
+ }
+}
+
+static void __init early_check_pci_devs(void)
+{
+ unsigned bus, slot, func;
+
+ if (!early_pci_allowed())
+ return;
+
+ for (bus = 0; bus < 256; bus++) {
+ for (slot = 0; slot < 32; slot++) {
+ for (func = 0; func < 8; func++) {
+ u32 class;
+ u8 type;
+
+ class = read_pci_config(bus, slot, func,
+ PCI_CLASS_REVISION);
+ if (class == 0xffffffff)
+ continue;
+
+ early_check_pci_dev(bus, slot, func);
+
+ if (func == 0) {
+ type = read_pci_config_byte(bus, slot,
+ func,
+ PCI_HEADER_TYPE);
+ if (!(type & 0x80))
+ break;
+ }
+ }
+ }
+ }
+}
+
+static int __init intel_postcore_init(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return 0;
+
+ early_check_pci_devs();
+
+ if (nr_ioh)
+ printk(KERN_DEBUG "pci: found %d IOH\n", nr_ioh);
+
+ return 0;
+}
+postcore_initcall(intel_postcore_init);
+

2010-01-27 16:45:24

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tuesday 26 January 2010 03:57:31 pm Yinghai Lu wrote:
> [PATCH] x86/pci: don't use ioh resource if only have one ioh
>
> some system could use reosurce out of IOH resources when only one ioh is there.
>
> could be BIOS have wrong IOH resources and not enable them.

The subtractive decode theory makes sense and would explain what's
happening, but I don't like this patch.

If we assume that this really is a subtractive decode issue, this
patch approaches it the wrong way. We need to know whether a
particular host bridge is configured for subtractive decode. This
patch tests whether we have more than one host bridge, which is quite
a different question.

Imagine these system configurations:

1) a single host bridge with subtractive decode
2) a single host bridge with only positive decode
3) multiple host bridges with subtractive decode enabled on one
4) multiple host bridges with only positive decode

This patch will break if we encounter configs 2 or 3. In config 2,
this patch assumes the bridge performs subtractive decode, so we
think the bridge forwards more address space than it actually does.
If we try to use that address space, the device will never see the
accesses. In config 3, this patch assumes there's no subtractive
decode, so we would see Jeff's problem all over again.

For configs 3 and 4, there might be a single host bridge in domain 0,
with the others in different domains. This patch would find only one
host bridge (the one in domain 0), so we would wrongly assume that ALL
the host bridges use subtractive decode, which is obviously a disaster.

Bjorn

> ---
> arch/x86/pci/intel_bus.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 86 insertions(+)
>
> Index: linux-2.6/arch/x86/pci/intel_bus.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/intel_bus.c
> +++ linux-2.6/arch/x86/pci/intel_bus.c
> @@ -7,9 +7,11 @@
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <asm/pci_x86.h>
> +#include <asm/pci-direct.h>
>
> #include "bus_numa.h"
>
> +static int nr_ioh;
> static inline void print_ioh_resources(struct pci_root_info *info)
> {
> int res_num;
> @@ -49,6 +51,9 @@ static void __devinit pci_root_bus_res(s
> u64 mmioh_base, mmioh_end;
> int bus_base, bus_end;
>
> + if (nr_ioh < 2)
> + return;
> +
> /* some sys doesn't get mmconf enabled */
> if (dev->cfg_size < 0x120)
> return;
> @@ -92,3 +97,84 @@ static void __devinit pci_root_bus_res(s
>
> /* intel IOH */
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, pci_root_bus_res);
> +
> +static void __init count_ioh(int num, int slot, int func)
> +{
> + nr_ioh++;
> +}
> +
> +struct pci_check_probe {
> + u32 vendor;
> + u32 device;
> + void (*f)(int num, int slot, int func);
> +};
> +
> +static struct pci_check_probe early_qrk[] __initdata = {
> + { PCI_VENDOR_ID_INTEL, 0x342e, count_ioh },
> + {}
> +};
> +
> +static void __init early_check_pci_dev(int num, int slot, int func)
> +{
> + u16 vendor;
> + u16 device;
> + int i;
> +
> + vendor = read_pci_config_16(num, slot, func, PCI_VENDOR_ID);
> + device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
> +
> + for (i = 0; early_qrk[i].f != NULL; i++) {
> + if (((early_qrk[i].vendor == PCI_ANY_ID) ||
> + (early_qrk[i].vendor == vendor)) &&
> + ((early_qrk[i].device == PCI_ANY_ID) ||
> + (early_qrk[i].device == device)))
> + early_qrk[i].f(num, slot, func);
> + }
> +}
> +
> +static void __init early_check_pci_devs(void)
> +{
> + unsigned bus, slot, func;
> +
> + if (!early_pci_allowed())
> + return;
> +
> + for (bus = 0; bus < 256; bus++) {
> + for (slot = 0; slot < 32; slot++) {
> + for (func = 0; func < 8; func++) {
> + u32 class;
> + u8 type;
> +
> + class = read_pci_config(bus, slot, func,
> + PCI_CLASS_REVISION);
> + if (class == 0xffffffff)
> + continue;
> +
> + early_check_pci_dev(bus, slot, func);
> +
> + if (func == 0) {
> + type = read_pci_config_byte(bus, slot,
> + func,
> + PCI_HEADER_TYPE);
> + if (!(type & 0x80))
> + break;
> + }
> + }
> + }
> + }
> +}
> +
> +static int __init intel_postcore_init(void)
> +{
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + return 0;
> +
> + early_check_pci_devs();
> +
> + if (nr_ioh)
> + printk(KERN_DEBUG "pci: found %d IOH\n", nr_ioh);
> +
> + return 0;
> +}
> +postcore_initcall(intel_postcore_init);
> +
>

2010-01-27 16:53:58

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 27 Jan 2010 09:45:15 -0700
Bjorn Helgaas <[email protected]> wrote:

> On Tuesday 26 January 2010 03:57:31 pm Yinghai Lu wrote:
> > [PATCH] x86/pci: don't use ioh resource if only have one ioh
> >
> > some system could use reosurce out of IOH resources when only one ioh is there.
> >
> > could be BIOS have wrong IOH resources and not enable them.
>
> The subtractive decode theory makes sense and would explain what's
> happening, but I don't like this patch.
>
> If we assume that this really is a subtractive decode issue, this
> patch approaches it the wrong way. We need to know whether a
> particular host bridge is configured for subtractive decode. This
> patch tests whether we have more than one host bridge, which is quite
> a different question.
>
> Imagine these system configurations:
>
> 1) a single host bridge with subtractive decode
> 2) a single host bridge with only positive decode
> 3) multiple host bridges with subtractive decode enabled on one
> 4) multiple host bridges with only positive decode
>
> This patch will break if we encounter configs 2 or 3. In config 2,
> this patch assumes the bridge performs subtractive decode, so we
> think the bridge forwards more address space than it actually does.
> If we try to use that address space, the device will never see the
> accesses. In config 3, this patch assumes there's no subtractive
> decode, so we would see Jeff's problem all over again.

Right, but OTOH:
- multiple IOH has already been tested with the intel_bus.c code
- we want to move to using _CRS data in these cases instead

So do you have any objection to applying this patch for 2.6.33 and then
moving away from intel_bus.c in .34 (assuming we can get _CRS working
well on the same machines where intel_bus.c was needed)?

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-27 17:57:52

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15043] Display goes off with i915.powersave=1

On Sun, 24 Jan 2010 23:04:37 +0100 (CET)
"Rafael J. Wysocki" <[email protected]> wrote:

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
> Subject : Display goes off with i915.powersave=1
> Submitter : Soeren Sonnenburg <[email protected]>
> Date : 2010-01-10 20:09 (15 days old)
> References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4

If this isn't fixed yet, I hope David's patch fixes it. See "[Bug
#14897] i915: Commit 0e442c60 causes flickering".

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-27 17:58:50

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15129] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512

On Sun, 24 Jan 2010 23:04:39 +0100 (CET)
"Rafael J. Wysocki" <[email protected]> wrote:

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15129
> Subject : [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512
> Submitter : Miles Lane <[email protected]>
> Date : 2010-01-14 23:18 (11 days old)
> References : http://lkml.org/lkml/2010/1/14/570
> Handled-By : Chris Wilson <[email protected]>
>

I think this message has been rightly killed. Getting -ERESTARTSYS from
this function is perfectly normal, so we shouldn't bother printing a
message about it.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-27 20:46:07

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wednesday 27 January 2010 09:53:37 am Jesse Barnes wrote:
> On Wed, 27 Jan 2010 09:45:15 -0700
> Bjorn Helgaas <[email protected]> wrote:
>
> > On Tuesday 26 January 2010 03:57:31 pm Yinghai Lu wrote:
> > > [PATCH] x86/pci: don't use ioh resource if only have one ioh
> > >
> > > some system could use reosurce out of IOH resources when only one ioh is there.
> > >
> > > could be BIOS have wrong IOH resources and not enable them.
> >
> > The subtractive decode theory makes sense and would explain what's
> > happening, but I don't like this patch.
> >
> > If we assume that this really is a subtractive decode issue, this
> > patch approaches it the wrong way. We need to know whether a
> > particular host bridge is configured for subtractive decode. This
> > patch tests whether we have more than one host bridge, which is quite
> > a different question.
> >
> > Imagine these system configurations:
> >
> > 1) a single host bridge with subtractive decode
> > 2) a single host bridge with only positive decode
> > 3) multiple host bridges with subtractive decode enabled on one
> > 4) multiple host bridges with only positive decode
> >
> > This patch will break if we encounter configs 2 or 3. In config 2,
> > this patch assumes the bridge performs subtractive decode, so we
> > think the bridge forwards more address space than it actually does.
> > If we try to use that address space, the device will never see the
> > accesses. In config 3, this patch assumes there's no subtractive
> > decode, so we would see Jeff's problem all over again.
>
> Right, but OTOH:
> - multiple IOH has already been tested with the intel_bus.c code
> - we want to move to using _CRS data in these cases instead

> So do you have any objection to applying this patch for 2.6.33 and then
> moving away from intel_bus.c in .34 (assuming we can get _CRS working
> well on the same machines where intel_bus.c was needed)?

Without intel_bus.c, we essentially assume config 1 all the time.
If we keep intel_bus.c and this patch for .33, things should work
for configs 1 and 4. Adding support for config 4 is good.

The bad part is that for config 4, intel_bus.c covers up any defects
in the _CRS or the Linux code that interprets it. The reason Yinghai
added intel_bus.c in the first place was to work around a defect in
this area[1]. Keeping it will make it harder to fix the underlying
issue that keeps us from turning on _CRS for that box.

Bjorn

[1] http://lkml.org/lkml/2009/10/6/371

2010-01-27 20:51:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)



On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
>
> Without intel_bus.c, we essentially assume config 1 all the time.
> If we keep intel_bus.c and this patch for .33, things should work
> for configs 1 and 4. Adding support for config 4 is good.

Quite frankly, is there any major downside to just disabling/removing
intel_bus.c for 2.6.33? If we're not planning on having it in the long run
anyway - or even if we are, but we can't be really happy about the state
of it as it would be in 2.6.33, not using it at all seems to be the
smaller headache.

The machines that it helps are also the machines where you can fix things
up with 'use_csr', no? And they are pretty rare, and they didn't use to
work without that use_csr in 2.6.32 either, so it's not even a regression.

Am I missing something?

Linus

2010-01-27 20:59:27

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 27 Jan 2010 12:50:12 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
> >
> > Without intel_bus.c, we essentially assume config 1 all the time.
> > If we keep intel_bus.c and this patch for .33, things should work
> > for configs 1 and 4. Adding support for config 4 is good.
>
> Quite frankly, is there any major downside to just disabling/removing
> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
> anyway - or even if we are, but we can't be really happy about the state
> of it as it would be in 2.6.33, not using it at all seems to be the
> smaller headache.
>
> The machines that it helps are also the machines where you can fix things
> up with 'use_csr', no? And they are pretty rare, and they didn't use to
> work without that use_csr in 2.6.32 either, so it's not even a regression.
>
> Am I missing something?

No that's the plan. intel_bus.c was a good effort, but it's just too
different from what Windows does, and it'll always be behind. We'll
disable it for 2.6.33 and try again to move to _CRS in 2.6.34 (but
fixing the problem with large numbers of _CRS resources this time).

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-27 21:02:34

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 27 Jan 2010 12:59:05 -0800
Jesse Barnes <[email protected]> wrote:

> On Wed, 27 Jan 2010 12:50:12 -0800 (PST)
> Linus Torvalds <[email protected]> wrote:
>
> >
> >
> > On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
> > >
> > > Without intel_bus.c, we essentially assume config 1 all the time.
> > > If we keep intel_bus.c and this patch for .33, things should work
> > > for configs 1 and 4. Adding support for config 4 is good.
> >
> > Quite frankly, is there any major downside to just disabling/removing
> > intel_bus.c for 2.6.33? If we're not planning on having it in the long run
> > anyway - or even if we are, but we can't be really happy about the state
> > of it as it would be in 2.6.33, not using it at all seems to be the
> > smaller headache.
> >
> > The machines that it helps are also the machines where you can fix things
> > up with 'use_csr', no? And they are pretty rare, and they didn't use to
> > work without that use_csr in 2.6.32 either, so it's not even a regression.
> >
> > Am I missing something?
>
> No that's the plan. intel_bus.c was a good effort, but it's just too
> different from what Windows does, and it'll always be behind. We'll
> disable it for 2.6.33 and try again to move to _CRS in 2.6.34 (but
> fixing the problem with large numbers of _CRS resources this time).

Should say "disable it for 2.6.33 for all but multi-IOH configs", which
seem to be fairly rare anyway, and were what intel_bus.c was designed
to accommodate. On the one machine that motivated it, use_crs was
broken (though it likely isn't now), so it seems the safest route.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-27 21:03:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15129] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512

On Wednesday 27 January 2010, Jesse Barnes wrote:
> On Sun, 24 Jan 2010 23:04:39 +0100 (CET)
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15129
> > Subject : [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -512
> > Submitter : Miles Lane <[email protected]>
> > Date : 2010-01-14 23:18 (11 days old)
> > References : http://lkml.org/lkml/2010/1/14/570
> > Handled-By : Chris Wilson <[email protected]>
> >
>
> I think this message has been rightly killed. Getting -ERESTARTSYS from
> this function is perfectly normal, so we shouldn't bother printing a
> message about it.

OK, closing this one.

Rafael

2010-01-27 21:03:49

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wednesday 27 January 2010 01:50:12 pm Linus Torvalds wrote:
>
> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
> >
> > Without intel_bus.c, we essentially assume config 1 all the time.
> > If we keep intel_bus.c and this patch for .33, things should work
> > for configs 1 and 4. Adding support for config 4 is good.
>
> Quite frankly, is there any major downside to just disabling/removing
> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
> anyway - or even if we are, but we can't be really happy about the state
> of it as it would be in 2.6.33, not using it at all seems to be the
> smaller headache.
>
> The machines that it helps are also the machines where you can fix things
> up with 'use_csr', no? And they are pretty rare, and they didn't use to
> work without that use_csr in 2.6.32 either, so it's not even a regression.
>
> Am I missing something?

Only that when we added intel_bus.c, Yinghai reported that the reason
was because a machine had a broken _CRS, so "pci=use_crs" wouldn't help.

At the time, Windows hadn't been brought up on that box. My
speculation is that by now, they've done that bringup and probably
fixed the _CRS issue, so it might work now.

If that's the case, we could drop intel_bus.c from .33 and just use
"pci=use_crs" on those boxes until we can figure out how to turn it
on automatically.

Bjorn

2010-01-27 23:36:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/27/2010 01:03 PM, Bjorn Helgaas wrote:
> On Wednesday 27 January 2010 01:50:12 pm Linus Torvalds wrote:
>>
>> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
>>>
>>> Without intel_bus.c, we essentially assume config 1 all the time.
>>> If we keep intel_bus.c and this patch for .33, things should work
>>> for configs 1 and 4. Adding support for config 4 is good.
>>
>> Quite frankly, is there any major downside to just disabling/removing
>> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
>> anyway - or even if we are, but we can't be really happy about the state
>> of it as it would be in 2.6.33, not using it at all seems to be the
>> smaller headache.
>>
>> The machines that it helps are also the machines where you can fix things
>> up with 'use_csr', no? And they are pretty rare, and they didn't use to
>> work without that use_csr in 2.6.32 either, so it's not even a regression.
>>
>> Am I missing something?
>
> Only that when we added intel_bus.c, Yinghai reported that the reason
> was because a machine had a broken _CRS, so "pci=use_crs" wouldn't help.
>
> At the time, Windows hadn't been brought up on that box. My
> speculation is that by now, they've done that bringup and probably
> fixed the _CRS issue, so it might work now.
>
> If that's the case, we could drop intel_bus.c from .33 and just use
> "pci=use_crs" on those boxes until we can figure out how to turn it
> on automatically.

BIOS fixed that problem already. but
1. how to turn that pci=use_crs for that box automatically ?
how about our other kind of boxes?
2. how about when apci is disabled?

let's apply that patch at first, and wait for intel give us info about which bit is used to enable routing set up.

YH

2010-01-28 01:34:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/27/2010 01:02 PM, Jesse Barnes wrote:
> On Wed, 27 Jan 2010 12:59:05 -0800
> Jesse Barnes <[email protected]> wrote:
>
>> On Wed, 27 Jan 2010 12:50:12 -0800 (PST)
>> Linus Torvalds <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
>>>>
>>>> Without intel_bus.c, we essentially assume config 1 all the time.
>>>> If we keep intel_bus.c and this patch for .33, things should work
>>>> for configs 1 and 4. Adding support for config 4 is good.
>>>
>>> Quite frankly, is there any major downside to just disabling/removing
>>> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
>>> anyway - or even if we are, but we can't be really happy about the state
>>> of it as it would be in 2.6.33, not using it at all seems to be the
>>> smaller headache.
>>>
>>> The machines that it helps are also the machines where you can fix things
>>> up with 'use_csr', no? And they are pretty rare, and they didn't use to
>>> work without that use_csr in 2.6.32 either, so it's not even a regression.
>>>
>>> Am I missing something?
>>
>> No that's the plan. intel_bus.c was a good effort, but it's just too
>> different from what Windows does, and it'll always be behind. We'll
>> disable it for 2.6.33 and try again to move to _CRS in 2.6.34 (but
>> fixing the problem with large numbers of _CRS resources this time).
>
> Should say "disable it for 2.6.33 for all but multi-IOH configs", which
> seem to be fairly rare anyway, and were what intel_bus.c was designed
> to accommodate. On the one machine that motivated it, use_crs was
> broken (though it likely isn't now), so it seems the safest route.

will try to produce one patch to handle subtract decoding for legacy IOH aka the one with ESI.

the structure could be something like amd_bus.c, need to do it early, but it need after pci_arch_init to get mmconf.

YH

2010-01-28 01:36:14

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Tue, 26 Jan 2010 14:57:31 -0800
Yinghai Lu <[email protected]> wrote:

> On 01/26/2010 10:17 AM, Jesse Barnes wrote:
>
> >
> > For 2.6.33 I'd like a minimal fix though, can you disable it for all
> > but the multi-IOH case perhaps?
> >
> please check,
>
> [PATCH] x86/pci: don't use ioh resource if only have one ioh
>
> some system could use reosurce out of IOH resources when only one ioh is there.
>
> could be BIOS have wrong IOH resources and not enable them.
>
> Signed-off-by: Yinghai Lu <[email protected]>

I applied this one to my for-linus branch. Jeff can you confirm it
works for you? I'd like to push it to Linus tomorrow.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 01:51:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)



On Tue, 26 Jan 2010, Yinghai Lu wrote:
>
> [PATCH] x86/pci: don't use ioh resource if only have one ioh

Please, no.

This patch is too ugly to live.

And it's totally unacceptable to probe every single possible PCI device
for something like this.

If we don't know enough about the hardware workings of those Intel bridges
to know when they are active and how they decode things, then please let's
just disable intel_bus.c entirely.

There's no excuse for hacky tests like this.

Linus

2010-01-28 03:24:32

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 27 Jan 2010 17:50:17 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 26 Jan 2010, Yinghai Lu wrote:
> >
> > [PATCH] x86/pci: don't use ioh resource if only have one ioh
>
> Please, no.
>
> This patch is too ugly to live.
>
> And it's totally unacceptable to probe every single possible PCI device
> for something like this.
>
> If we don't know enough about the hardware workings of those Intel bridges
> to know when they are active and how they decode things, then please let's
> just disable intel_bus.c entirely.
>
> There's no excuse for hacky tests like this.

Ok, we'll just kill it entirely then. I'll send a patch tomorrow
unless Yinghai beats me to it.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 04:03:17

by Jeff Garrett

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, Jan 27, 2010 at 05:35:50PM -0800, Jesse Barnes wrote:
> On Tue, 26 Jan 2010 14:57:31 -0800
> Yinghai Lu <[email protected]> wrote:
>
> > On 01/26/2010 10:17 AM, Jesse Barnes wrote:
> >
> > >
> > > For 2.6.33 I'd like a minimal fix though, can you disable it for all
> > > but the multi-IOH case perhaps?
> > >
> > please check,
> >
> > [PATCH] x86/pci: don't use ioh resource if only have one ioh
> >
> > some system could use reosurce out of IOH resources when only one ioh is there.
> >
> > could be BIOS have wrong IOH resources and not enable them.
> >
> > Signed-off-by: Yinghai Lu <[email protected]>
>
> I applied this one to my for-linus branch. Jeff can you confirm it
> works for you? I'd like to push it to Linus tomorrow.
>
> Thanks,

FWIW, works...

2010-01-28 04:03:32

by Jeff Garrett

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, Jan 27, 2010 at 07:24:09PM -0800, Jesse Barnes wrote:
> On Wed, 27 Jan 2010 17:50:17 -0800 (PST)
> Linus Torvalds <[email protected]> wrote:
> > On Tue, 26 Jan 2010, Yinghai Lu wrote:
> > >
> > > [PATCH] x86/pci: don't use ioh resource if only have one ioh
> >
> > Please, no.
> >
> > This patch is too ugly to live.
> >
> > And it's totally unacceptable to probe every single possible PCI device
> > for something like this.
> >
> > If we don't know enough about the hardware workings of those Intel bridges
> > to know when they are active and how they decode things, then please let's
> > just disable intel_bus.c entirely.
> >
> > There's no excuse for hacky tests like this.
>
> Ok, we'll just kill it entirely then. I'll send a patch tomorrow
> unless Yinghai beats me to it.

What about something like this (works for me, without pci=use_crs)?

---
Remove intel_bus.c Intel-specific PCI/IOH logic

Signed-off-by: Jeff Garrett <[email protected]>
---
arch/x86/pci/Makefile | 2 +-
arch/x86/pci/intel_bus.c | 94 ----------------------------------------------
2 files changed, 1 insertions(+), 95 deletions(-)

diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
index 564b008..39fba37 100644
--- a/arch/x86/pci/Makefile
+++ b/arch/x86/pci/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_X86_NUMAQ) += numaq_32.o

obj-y += common.o early.o
obj-y += amd_bus.o
-obj-$(CONFIG_X86_64) += bus_numa.o intel_bus.o
+obj-$(CONFIG_X86_64) += bus_numa.o

ifeq ($(CONFIG_PCI_DEBUG),y)
EXTRA_CFLAGS += -DDEBUG
diff --git a/arch/x86/pci/intel_bus.c b/arch/x86/pci/intel_bus.c
deleted file mode 100644
index f81a2fa..0000000
--- a/arch/x86/pci/intel_bus.c
+++ /dev/null
@@ -1,94 +0,0 @@
-/*
- * to read io range from IOH pci conf, need to do it after mmconfig is there
- */
-
-#include <linux/delay.h>
-#include <linux/dmi.h>
-#include <linux/pci.h>
-#include <linux/init.h>
-#include <asm/pci_x86.h>
-
-#include "bus_numa.h"
-
-static inline void print_ioh_resources(struct pci_root_info *info)
-{
- int res_num;
- int busnum;
- int i;
-
- printk(KERN_DEBUG "IOH bus: [%02x, %02x]\n",
- info->bus_min, info->bus_max);
- res_num = info->res_num;
- busnum = info->bus_min;
- for (i = 0; i < res_num; i++) {
- struct resource *res;
-
- res = &info->res[i];
- printk(KERN_DEBUG "IOH bus: %02x index %x %s: [%llx, %llx]\n",
- busnum, i,
- (res->flags & IORESOURCE_IO) ? "io port" :
- "mmio",
- res->start, res->end);
- }
-}
-
-#define IOH_LIO 0x108
-#define IOH_LMMIOL 0x10c
-#define IOH_LMMIOH 0x110
-#define IOH_LMMIOH_BASEU 0x114
-#define IOH_LMMIOH_LIMITU 0x118
-#define IOH_LCFGBUS 0x11c
-
-static void __devinit pci_root_bus_res(struct pci_dev *dev)
-{
- u16 word;
- u32 dword;
- struct pci_root_info *info;
- u16 io_base, io_end;
- u32 mmiol_base, mmiol_end;
- u64 mmioh_base, mmioh_end;
- int bus_base, bus_end;
-
- /* some sys doesn't get mmconf enabled */
- if (dev->cfg_size < 0x120)
- return;
-
- if (pci_root_num >= PCI_ROOT_NR) {
- printk(KERN_DEBUG "intel_bus.c: PCI_ROOT_NR is too small\n");
- return;
- }
-
- info = &pci_root_info[pci_root_num];
- pci_root_num++;
-
- pci_read_config_word(dev, IOH_LCFGBUS, &word);
- bus_base = (word & 0xff);
- bus_end = (word & 0xff00) >> 8;
- sprintf(info->name, "PCI Bus #%02x", bus_base);
- info->bus_min = bus_base;
- info->bus_max = bus_end;
-
- pci_read_config_word(dev, IOH_LIO, &word);
- io_base = (word & 0xf0) << (12 - 4);
- io_end = (word & 0xf000) | 0xfff;
- update_res(info, io_base, io_end, IORESOURCE_IO, 0);
-
- pci_read_config_dword(dev, IOH_LMMIOL, &dword);
- mmiol_base = (dword & 0xff00) << (24 - 8);
- mmiol_end = (dword & 0xff000000) | 0xffffff;
- update_res(info, mmiol_base, mmiol_end, IORESOURCE_MEM, 0);
-
- pci_read_config_dword(dev, IOH_LMMIOH, &dword);
- mmioh_base = ((u64)(dword & 0xfc00)) << (26 - 10);
- mmioh_end = ((u64)(dword & 0xfc000000) | 0x3ffffff);
- pci_read_config_dword(dev, IOH_LMMIOH_BASEU, &dword);
- mmioh_base |= ((u64)(dword & 0x7ffff)) << 32;
- pci_read_config_dword(dev, IOH_LMMIOH_LIMITU, &dword);
- mmioh_end |= ((u64)(dword & 0x7ffff)) << 32;
- update_res(info, mmioh_base, mmioh_end, IORESOURCE_MEM, 0);
-
- print_ioh_resources(info);
-}
-
-/* intel IOH */
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, pci_root_bus_res);

2010-01-28 04:32:00

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> On 01/27/2010 01:03 PM, Bjorn Helgaas wrote:
> > On Wednesday 27 January 2010 01:50:12 pm Linus Torvalds wrote:
> >>
> >> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
> >>>
> >>> Without intel_bus.c, we essentially assume config 1 all the time.
> >>> If we keep intel_bus.c and this patch for .33, things should work
> >>> for configs 1 and 4. Adding support for config 4 is good.
> >>
> >> Quite frankly, is there any major downside to just disabling/removing
> >> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
> >> anyway - or even if we are, but we can't be really happy about the state
> >> of it as it would be in 2.6.33, not using it at all seems to be the
> >> smaller headache.
> >>
> >> The machines that it helps are also the machines where you can fix things
> >> up with 'use_csr', no? And they are pretty rare, and they didn't use to
> >> work without that use_csr in 2.6.32 either, so it's not even a regression.
> >>
> >> Am I missing something?
> >
> > Only that when we added intel_bus.c, Yinghai reported that the reason
> > was because a machine had a broken _CRS, so "pci=use_crs" wouldn't help.
> >
> > At the time, Windows hadn't been brought up on that box. My
> > speculation is that by now, they've done that bringup and probably
> > fixed the _CRS issue, so it might work now.
> >
> > If that's the case, we could drop intel_bus.c from .33 and just use
> > "pci=use_crs" on those boxes until we can figure out how to turn it
> > on automatically.
>
> BIOS fixed that problem already. but
> 1. how to turn that pci=use_crs for that box automatically ?
> how about our other kind of boxes?

Yes, we need a way to turn on "pci=use_crs" automatically. My first
thought is to turn it on for all BIOSes with dates of 2010 or later, and
in addition, have a whitelist of the pre-2010 machines that require it.

> 2. how about when apci is disabled?

When ACPI is disabled, I think we just have to accept that we lose some
functionality. I don't see the need for alternate ways to accomplish
everything that ACPI does. It's becoming less and less useful to
disable ACPI; I think it's only interesting as a debugging tool, and
even then it's a sledgehammer.

Bjorn

2010-01-28 05:05:07

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: [Bug #15043] Display goes off with i915.powersave=1

On Wed, 2010-01-27 at 09:57 -0800, Jesse Barnes wrote:
> On Sun, 24 Jan 2010 23:04:37 +0100 (CET)
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
> > Subject : Display goes off with i915.powersave=1
> > Submitter : Soeren Sonnenburg <[email protected]>
> > Date : 2010-01-10 20:09 (15 days old)
> > References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4
>
> If this isn't fixed yet, I hope David's patch fixes it. See "[Bug
> #14897] i915: Commit 0e442c60 causes flickering".

that sounds indeed like it, because sometimes I observe a flickering
before darkness :) I will give the new kernel a try though I don't have
time to do it in the next few days :/

Soeren
--
For the one fact about the future of which we can be certain is that it
will be utterly fantastic. -- Arthur C. Clarke, 1962


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2010-01-28 05:55:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
>> On 01/27/2010 01:03 PM, Bjorn Helgaas wrote:
>>> On Wednesday 27 January 2010 01:50:12 pm Linus Torvalds wrote:
>>>>
>>>> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
>>>>>
>>>>> Without intel_bus.c, we essentially assume config 1 all the time.
>>>>> If we keep intel_bus.c and this patch for .33, things should work
>>>>> for configs 1 and 4. Adding support for config 4 is good.
>>>>
>>>> Quite frankly, is there any major downside to just disabling/removing
>>>> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
>>>> anyway - or even if we are, but we can't be really happy about the state
>>>> of it as it would be in 2.6.33, not using it at all seems to be the
>>>> smaller headache.
>>>>
>>>> The machines that it helps are also the machines where you can fix things
>>>> up with 'use_csr', no? And they are pretty rare, and they didn't use to
>>>> work without that use_csr in 2.6.32 either, so it's not even a regression.
>>>>
>>>> Am I missing something?
>>>
>>> Only that when we added intel_bus.c, Yinghai reported that the reason
>>> was because a machine had a broken _CRS, so "pci=use_crs" wouldn't help.
>>>
>>> At the time, Windows hadn't been brought up on that box. My
>>> speculation is that by now, they've done that bringup and probably
>>> fixed the _CRS issue, so it might work now.
>>>
>>> If that's the case, we could drop intel_bus.c from .33 and just use
>>> "pci=use_crs" on those boxes until we can figure out how to turn it
>>> on automatically.
>>
>> BIOS fixed that problem already. but
>> 1. how to turn that pci=use_crs for that box automatically ?
>> how about our other kind of boxes?
>
> Yes, we need a way to turn on "pci=use_crs" automatically. My first
> thought is to turn it on for all BIOSes with dates of 2010 or later, and
> in addition, have a whitelist of the pre-2010 machines that require it.
>
>> 2. how about when apci is disabled?
>
> When ACPI is disabled, I think we just have to accept that we lose some
> functionality. I don't see the need for alternate ways to accomplish
> everything that ACPI does. It's becoming less and less useful to
> disable ACPI; I think it's only interesting as a debugging tool, and
> even then it's a sledgehammer.

some systems when acpi is enabled could have interrupt storm.
and have to disable acpi.

YH

2010-01-28 10:44:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs )

On Thursday 28 January 2010, Yinghai Lu wrote:
> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> > On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> >> On 01/27/2010 01:03 PM, Bjorn Helgaas wrote:
> >>> On Wednesday 27 January 2010 01:50:12 pm Linus Torvalds wrote:
> >>>>
> >>>> On Wed, 27 Jan 2010, Bjorn Helgaas wrote:
> >>>>>
> >>>>> Without intel_bus.c, we essentially assume config 1 all the time.
> >>>>> If we keep intel_bus.c and this patch for .33, things should work
> >>>>> for configs 1 and 4. Adding support for config 4 is good.
> >>>>
> >>>> Quite frankly, is there any major downside to just disabling/removing
> >>>> intel_bus.c for 2.6.33? If we're not planning on having it in the long run
> >>>> anyway - or even if we are, but we can't be really happy about the state
> >>>> of it as it would be in 2.6.33, not using it at all seems to be the
> >>>> smaller headache.
> >>>>
> >>>> The machines that it helps are also the machines where you can fix things
> >>>> up with 'use_csr', no? And they are pretty rare, and they didn't use to
> >>>> work without that use_csr in 2.6.32 either, so it's not even a regression.
> >>>>
> >>>> Am I missing something?
> >>>
> >>> Only that when we added intel_bus.c, Yinghai reported that the reason
> >>> was because a machine had a broken _CRS, so "pci=use_crs" wouldn't help.
> >>>
> >>> At the time, Windows hadn't been brought up on that box. My
> >>> speculation is that by now, they've done that bringup and probably
> >>> fixed the _CRS issue, so it might work now.
> >>>
> >>> If that's the case, we could drop intel_bus.c from .33 and just use
> >>> "pci=use_crs" on those boxes until we can figure out how to turn it
> >>> on automatically.
> >>
> >> BIOS fixed that problem already. but
> >> 1. how to turn that pci=use_crs for that box automatically ?
> >> how about our other kind of boxes?
> >
> > Yes, we need a way to turn on "pci=use_crs" automatically. My first
> > thought is to turn it on for all BIOSes with dates of 2010 or later, and
> > in addition, have a whitelist of the pre-2010 machines that require it.
> >
> >> 2. how about when apci is disabled?
> >
> > When ACPI is disabled, I think we just have to accept that we lose some
> > functionality. I don't see the need for alternate ways to accomplish
> > everything that ACPI does. It's becoming less and less useful to
> > disable ACPI; I think it's only interesting as a debugging tool, and
> > even then it's a sledgehammer.
>
> some systems when acpi is enabled could have interrupt storm.
> and have to disable acpi.

Blacklist them?

Rafael

2010-01-28 16:10:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs )

On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> > On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:

> >> 2. how about when apci is disabled?
> >
> > When ACPI is disabled, I think we just have to accept that we lose some
> > functionality. I don't see the need for alternate ways to accomplish
> > everything that ACPI does. It's becoming less and less useful to
> > disable ACPI; I think it's only interesting as a debugging tool, and
> > even then it's a sledgehammer.
>
> some systems when acpi is enabled could have interrupt storm.
> and have to disable acpi.

We should fix that problem rather than just covering it up by
disabling ACPI. Can you provide any details?

I think it's crazy to add code to work around Problem B that only
occurs because we disabled ACPI to work around Problem A. We should
just fix Problem A instead.

Bjorn

2010-01-28 16:24:58

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, 27 Jan 2010 22:02:26 -0600
[email protected] (Jeff Garrett) wrote:

> On Wed, Jan 27, 2010 at 07:24:09PM -0800, Jesse Barnes wrote:
> > On Wed, 27 Jan 2010 17:50:17 -0800 (PST)
> > Linus Torvalds <[email protected]> wrote:
> > > On Tue, 26 Jan 2010, Yinghai Lu wrote:
> > > >
> > > > [PATCH] x86/pci: don't use ioh resource if only have one ioh
> > >
> > > Please, no.
> > >
> > > This patch is too ugly to live.
> > >
> > > And it's totally unacceptable to probe every single possible PCI device
> > > for something like this.
> > >
> > > If we don't know enough about the hardware workings of those Intel bridges
> > > to know when they are active and how they decode things, then please let's
> > > just disable intel_bus.c entirely.
> > >
> > > There's no excuse for hacky tests like this.
> >
> > Ok, we'll just kill it entirely then. I'll send a patch tomorrow
> > unless Yinghai beats me to it.
>
> What about something like this (works for me, without pci=use_crs)?
>
> ---
> Remove intel_bus.c Intel-specific PCI/IOH logic
>
> Signed-off-by: Jeff Garrett <[email protected]>

Yeah, looks good. I'll push to Linus today.

Thanks,
Jesse

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 18:16:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/28/2010 08:24 AM, Jesse Barnes wrote:
> On Wed, 27 Jan 2010 22:02:26 -0600
> [email protected] (Jeff Garrett) wrote:
>
>> On Wed, Jan 27, 2010 at 07:24:09PM -0800, Jesse Barnes wrote:
>>> On Wed, 27 Jan 2010 17:50:17 -0800 (PST)
>>> Linus Torvalds <[email protected]> wrote:
>>>> On Tue, 26 Jan 2010, Yinghai Lu wrote:
>>>>>
>>>>> [PATCH] x86/pci: don't use ioh resource if only have one ioh
>>>>
>>>> Please, no.
>>>>
>>>> This patch is too ugly to live.
>>>>
>>>> And it's totally unacceptable to probe every single possible PCI device
>>>> for something like this.
>>>>
>>>> If we don't know enough about the hardware workings of those Intel bridges
>>>> to know when they are active and how they decode things, then please let's
>>>> just disable intel_bus.c entirely.
>>>>
>>>> There's no excuse for hacky tests like this.
>>>
>>> Ok, we'll just kill it entirely then. I'll send a patch tomorrow
>>> unless Yinghai beats me to it.
>>
>> What about something like this (works for me, without pci=use_crs)?
>>
>> ---
>> Remove intel_bus.c Intel-specific PCI/IOH logic
>>
>> Signed-off-by: Jeff Garrett <[email protected]>
>
> Yeah, looks good. I'll push to Linus today.
>

please don't. will send you another patch, to keep the print out so we can cross check the _CRS.

YH

2010-01-28 18:23:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
>> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
>>> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
>
>>>> 2. how about when apci is disabled?
>>>
>>> When ACPI is disabled, I think we just have to accept that we lose some
>>> functionality. I don't see the need for alternate ways to accomplish
>>> everything that ACPI does. It's becoming less and less useful to
>>> disable ACPI; I think it's only interesting as a debugging tool, and
>>> even then it's a sledgehammer.
>>
>> some systems when acpi is enabled could have interrupt storm.
>> and have to disable acpi.
>
> We should fix that problem rather than just covering it up by
> disabling ACPI. Can you provide any details?
that is not covering problem. acpi just cause too many problems.

systems using acpi hotplug support, and use acpi aml code to monitor the hotplug status instead of HW
and after one or two days will have interrupt storm with sci/acpi interrupt aka 9.
>
> I think it's crazy to add code to work around Problem B that only
> occurs because we disabled ACPI to work around Problem A. We should
> just fix Problem A instead.

that is not point. fix BIOS or HW or OS?

check many systems have broken acpi?
some system acpi code even clear pci bar when just enable acpi at the first point.

YH

2010-01-28 18:34:09

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86/pci: print ioh resources only


don't use them for peer pci root bus resource yet.
so could cross check _CRS results

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/pci/intel_bus.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)

Index: linux-2.6/arch/x86/pci/intel_bus.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/intel_bus.c
+++ linux-2.6/arch/x86/pci/intel_bus.c
@@ -43,7 +43,7 @@ static void __devinit pci_root_bus_res(s
{
u16 word;
u32 dword;
- struct pci_root_info *info;
+ struct pci_root_info info;
u16 io_base, io_end;
u32 mmiol_base, mmiol_end;
u64 mmioh_base, mmioh_end;
@@ -53,30 +53,22 @@ static void __devinit pci_root_bus_res(s
if (dev->cfg_size < 0x120)
return;

- if (pci_root_num >= PCI_ROOT_NR) {
- printk(KERN_DEBUG "intel_bus.c: PCI_ROOT_NR is too small\n");
- return;
- }
-
- info = &pci_root_info[pci_root_num];
- pci_root_num++;
-
pci_read_config_word(dev, IOH_LCFGBUS, &word);
bus_base = (word & 0xff);
bus_end = (word & 0xff00) >> 8;
- sprintf(info->name, "PCI Bus #%02x", bus_base);
- info->bus_min = bus_base;
- info->bus_max = bus_end;
+ sprintf(info.name, "PCI Bus #%02x", bus_base);
+ info.bus_min = bus_base;
+ info.bus_max = bus_end;

pci_read_config_word(dev, IOH_LIO, &word);
io_base = (word & 0xf0) << (12 - 4);
io_end = (word & 0xf000) | 0xfff;
- update_res(info, io_base, io_end, IORESOURCE_IO, 0);
+ update_res(&info, io_base, io_end, IORESOURCE_IO, 0);

pci_read_config_dword(dev, IOH_LMMIOL, &dword);
mmiol_base = (dword & 0xff00) << (24 - 8);
mmiol_end = (dword & 0xff000000) | 0xffffff;
- update_res(info, mmiol_base, mmiol_end, IORESOURCE_MEM, 0);
+ update_res(&info, mmiol_base, mmiol_end, IORESOURCE_MEM, 0);

pci_read_config_dword(dev, IOH_LMMIOH, &dword);
mmioh_base = ((u64)(dword & 0xfc00)) << (26 - 10);
@@ -85,9 +77,9 @@ static void __devinit pci_root_bus_res(s
mmioh_base |= ((u64)(dword & 0x7ffff)) << 32;
pci_read_config_dword(dev, IOH_LMMIOH_LIMITU, &dword);
mmioh_end |= ((u64)(dword & 0x7ffff)) << 32;
- update_res(info, mmioh_base, mmioh_end, IORESOURCE_MEM, 0);
+ update_res(&info, mmioh_base, mmioh_end, IORESOURCE_MEM, 0);

- print_ioh_resources(info);
+ print_ioh_resources(&info);
}

/* intel IOH */

2010-01-28 18:56:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/pci: print ioh resources only



On Thu, 28 Jan 2010, Yinghai Lu wrote:
> @@ -43,7 +43,7 @@ static void __devinit pci_root_bus_res(s
> {
> u16 word;
> u32 dword;
> - struct pci_root_info *info;
> + struct pci_root_info info;

That structure is something like a kilobyte in size, please don't put
those things on the stack (sixteen "struct resource" entries).

Linus

2010-01-28 19:03:39

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thu, 28 Jan 2010 10:20:04 -0800
Yinghai Lu <[email protected]> wrote:

> On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> > On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> >> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> >>> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> >
> >>>> 2. how about when apci is disabled?
> >>>
> >>> When ACPI is disabled, I think we just have to accept that we
> >>> lose some functionality. I don't see the need for alternate ways
> >>> to accomplish everything that ACPI does. It's becoming less and
> >>> less useful to disable ACPI; I think it's only interesting as a
> >>> debugging tool, and even then it's a sledgehammer.
> >>
> >> some systems when acpi is enabled could have interrupt storm.
> >> and have to disable acpi.
> >
> > We should fix that problem rather than just covering it up by
> > disabling ACPI. Can you provide any details?
> that is not covering problem. acpi just cause too many problems.
>
> systems using acpi hotplug support, and use acpi aml code to monitor
> the hotplug status instead of HW and after one or two days will have
> interrupt storm with sci/acpi interrupt aka 9.


But disabling it gets us into trouble too. When platforms are designed
for Linux, they may be designed to have ACPI disabled (though this is
probably rare for general purpose PCs and servers). However when
they're designed for Windows, they're generally designed to use ACPI,
so if we disable it we run the risk of hitting all sorts of bugs since
we're running in an untested configuration.

So fixing the issues with ACPI enabled seems like a better idea; after
all, presumably Windows works on this platform with ACPI enabled, why
shouldn't we?

But I'm speaking in general here; we'd have to dig into the details of
the particular problem you mention to figure out the best course of
action (but I'm still pretty sure it's not "disable ACPI").

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 19:12:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v2] x86/pci: print ioh resources only


don't use them for peer pci root bus resource yet.
so could cross check _CRS results

-v2: dont put info struct in stack according to Linus.
because that is kbytes big

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/pci/intel_bus.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/pci/intel_bus.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/intel_bus.c
+++ linux-2.6/arch/x86/pci/intel_bus.c
@@ -53,13 +53,9 @@ static void __devinit pci_root_bus_res(s
if (dev->cfg_size < 0x120)
return;

- if (pci_root_num >= PCI_ROOT_NR) {
- printk(KERN_DEBUG "intel_bus.c: PCI_ROOT_NR is too small\n");
+ info = kmalloc(sizeof(struct pci_root_info), GFP_KERNEL);
+ if (!info)
return;
- }
-
- info = &pci_root_info[pci_root_num];
- pci_root_num++;

pci_read_config_word(dev, IOH_LCFGBUS, &word);
bus_base = (word & 0xff);

2010-01-28 19:39:38

by Olivier Galibert

[permalink] [raw]
Subject: Re: [PATCH -v2] x86/pci: print ioh resources only

On Thu, Jan 28, 2010 at 11:10:14AM -0800, Yinghai Lu wrote:
>
> don't use them for peer pci root bus resource yet.
> so could cross check _CRS results
>
> -v2: dont put info struct in stack according to Linus.
> because that is kbytes big

No kfree?

OG.

2010-01-28 20:12:23

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thu, 28 Jan 2010 10:13:26 -0800
Yinghai Lu <[email protected]> wrote:

> On 01/28/2010 08:24 AM, Jesse Barnes wrote:
> > On Wed, 27 Jan 2010 22:02:26 -0600
> > [email protected] (Jeff Garrett) wrote:
> >
> >> On Wed, Jan 27, 2010 at 07:24:09PM -0800, Jesse Barnes wrote:
> >>> On Wed, 27 Jan 2010 17:50:17 -0800 (PST)
> >>> Linus Torvalds <[email protected]> wrote:
> >>>> On Tue, 26 Jan 2010, Yinghai Lu wrote:
> >>>>>
> >>>>> [PATCH] x86/pci: don't use ioh resource if only have one ioh
> >>>>
> >>>> Please, no.
> >>>>
> >>>> This patch is too ugly to live.
> >>>>
> >>>> And it's totally unacceptable to probe every single possible PCI
> >>>> device for something like this.
> >>>>
> >>>> If we don't know enough about the hardware workings of those
> >>>> Intel bridges to know when they are active and how they decode
> >>>> things, then please let's just disable intel_bus.c entirely.
> >>>>
> >>>> There's no excuse for hacky tests like this.
> >>>
> >>> Ok, we'll just kill it entirely then. I'll send a patch tomorrow
> >>> unless Yinghai beats me to it.
> >>
> >> What about something like this (works for me, without pci=use_crs)?
> >>
> >> ---
> >> Remove intel_bus.c Intel-specific PCI/IOH logic
> >>
> >> Signed-off-by: Jeff Garrett <[email protected]>
> >
> > Yeah, looks good. I'll push to Linus today.
> >
>
> please don't. will send you another patch, to keep the print out so
> we can cross check the _CRS.

I don't think there's much point due to the points we discussed
earlier, I'd rather just get rid of it.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 20:20:53

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thursday 28 January 2010 11:20:04 am Yinghai Lu wrote:
> On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> > On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> >> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> >>> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> >
> >>>> 2. how about when apci is disabled?
> >>>
> >>> When ACPI is disabled, I think we just have to accept that we lose some
> >>> functionality. I don't see the need for alternate ways to accomplish
> >>> everything that ACPI does. It's becoming less and less useful to
> >>> disable ACPI; I think it's only interesting as a debugging tool, and
> >>> even then it's a sledgehammer.
> >>
> >> some systems when acpi is enabled could have interrupt storm.
> >> and have to disable acpi.
> >
> > We should fix that problem rather than just covering it up by
> > disabling ACPI. Can you provide any details?
> that is not covering problem. acpi just cause too many problems.
>
> systems using acpi hotplug support, and use acpi aml code to monitor the hotplug status instead of HW
> and after one or two days will have interrupt storm with sci/acpi interrupt aka 9.

If you just want to whine about problems without helping us figure
them out and fix them, I think there's another mailing list for that.

I really don't have time to deal with unsubstantiated rumor-mongering
like this.

Bjorn

2010-01-28 20:28:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thursday 28 January 2010, Jesse Barnes wrote:
> On Thu, 28 Jan 2010 10:20:04 -0800
> Yinghai Lu <[email protected]> wrote:
>
> > On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> > > On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> > >> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> > >>> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> > >
> > >>>> 2. how about when apci is disabled?
> > >>>
> > >>> When ACPI is disabled, I think we just have to accept that we
> > >>> lose some functionality. I don't see the need for alternate ways
> > >>> to accomplish everything that ACPI does. It's becoming less and
> > >>> less useful to disable ACPI; I think it's only interesting as a
> > >>> debugging tool, and even then it's a sledgehammer.
> > >>
> > >> some systems when acpi is enabled could have interrupt storm.
> > >> and have to disable acpi.
> > >
> > > We should fix that problem rather than just covering it up by
> > > disabling ACPI. Can you provide any details?
> > that is not covering problem. acpi just cause too many problems.
> >
> > systems using acpi hotplug support, and use acpi aml code to monitor
> > the hotplug status instead of HW and after one or two days will have
> > interrupt storm with sci/acpi interrupt aka 9.
>
>
> But disabling it gets us into trouble too. When platforms are designed
> for Linux, they may be designed to have ACPI disabled (though this is
> probably rare for general purpose PCs and servers).

Well, not quite. On recent SMP systems it's next to impossible to get all of
the necessary system configuration information without ACPI, since it only is
provided by the ACPI tables (the configuration of APICs, interrupt routing,
CPU C states, other stuff).

[BTW, I think it's better to CC linux-acpi and Len at this point.]

> However when they're designed for Windows, they're generally designed to use
> ACPI, so if we disable it we run the risk of hitting all sorts of bugs since
> we're running in an untested configuration.

I guess without ACPI we're guaranteed to run into troubles on many modern
hardware configurations.

> So fixing the issues with ACPI enabled seems like a better idea; after
> all, presumably Windows works on this platform with ACPI enabled, why
> shouldn't we?
>
> But I'm speaking in general here; we'd have to dig into the details of
> the particular problem you mention to figure out the best course of
> action (but I'm still pretty sure it's not "disable ACPI").

Agreed.

Rafael

2010-01-28 20:31:13

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thursday 28 January 2010 11:20:04 am Yinghai Lu wrote:
> On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> > On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> > We should fix that problem rather than just covering it up by
> > disabling ACPI. Can you provide any details?
> that is not covering problem. acpi just cause too many problems.
>
> systems using acpi hotplug support, and use acpi aml code to monitor the hotplug status instead of HW
> and after one or two days will have interrupt storm with sci/acpi interrupt aka 9.
> ...
> check many systems have broken acpi?
> some system acpi code even clear pci bar when just enable acpi at the first point.

Sorry, let me try to be more constructive. You mention some things
above that might be issues with Linux. I am eager to help fix them.

However, to make progress, I need information, not just rumors. Can
you point me to bug reports? Bugzillas? Ways to reproduce the
problems?

Bjorn

2010-01-28 20:35:17

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Thu, 28 Jan 2010 21:28:33 +0100
"Rafael J. Wysocki" <[email protected]> wrote:
> > But disabling it gets us into trouble too. When platforms are
> > designed for Linux, they may be designed to have ACPI disabled
> > (though this is probably rare for general purpose PCs and servers).
>
> Well, not quite. On recent SMP systems it's next to impossible to
> get all of the necessary system configuration information without
> ACPI, since it only is provided by the ACPI tables (the configuration
> of APICs, interrupt routing, CPU C states, other stuff).
>
> [BTW, I think it's better to CC linux-acpi and Len at this point.]

I was thinking more of custom designed low power servers or something,
possibly running LinuxBIOS or some other custom BIOS. For a general
purpose machine though I'm 100% agreed. ACPI is required these days
for PCs.

I was trying to make a point that we shouldn't disable ACPI on
platforms that support it. Rather, we should fix any bugs we discover
in handling ACPI correctly, rather than working around it by turning it
off.

--
Jesse Barnes, Intel Open Source Technology Center

2010-01-28 21:08:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH -v2] x86/pci: print ioh resources only

On 01/28/2010 11:30 AM, Olivier Galibert wrote:
> On Thu, Jan 28, 2010 at 11:10:14AM -0800, Yinghai Lu wrote:
>>
>> don't use them for peer pci root bus resource yet.
>> so could cross check _CRS results
>>
>> -v2: dont put info struct in stack according to Linus.
>> because that is kbytes big
>
> No kfree?

thanks.

anyway Jesse don't need print out.

YH

2010-01-28 21:17:01

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On 01/28/2010 12:31 PM, Bjorn Helgaas wrote:
> On Thursday 28 January 2010 11:20:04 am Yinghai Lu wrote:
>> On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
>>> On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
>>> We should fix that problem rather than just covering it up by
>>> disabling ACPI. Can you provide any details?
>> that is not covering problem. acpi just cause too many problems.
>>
>> systems using acpi hotplug support, and use acpi aml code to monitor the hotplug status instead of HW
>> and after one or two days will have interrupt storm with sci/acpi interrupt aka 9.
>> ...
>> check many systems have broken acpi?
>> some system acpi code even clear pci bar when just enable acpi at the first point.
>
> Sorry, let me try to be more constructive. You mention some things
> above that might be issues with Linux. I am eager to help fix them.

should be have problem on several parties.
good hw design should let fgpa to monitor that change, instead of have bunch ACPI AML code
on host cpu to do that.

>
> However, to make progress, I need information, not just rumors. Can
> you point me to bug reports? Bugzillas? Ways to reproduce the
> problems?
I asked our HW engineer to update fpga to handle the problem already.

YH

2010-01-29 02:45:53

by Zhang, Rui

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Fri, 2010-01-29 at 04:28 +0800, Rafael J. Wysocki wrote:
> On Thursday 28 January 2010, Jesse Barnes wrote:
> > On Thu, 28 Jan 2010 10:20:04 -0800
> > Yinghai Lu <[email protected]> wrote:
> >
> > > On 01/28/2010 08:09 AM, Bjorn Helgaas wrote:
> > > > On Wednesday 27 January 2010 10:53:51 pm Yinghai Lu wrote:
> > > >> On 01/27/2010 08:26 PM, Bjorn Helgaas wrote:
> > > >>> On Wed, 2010-01-27 at 15:34 -0800, Yinghai Lu wrote:
> > > >
> > > >>>> 2. how about when apci is disabled?
> > > >>>
> > > >>> When ACPI is disabled, I think we just have to accept that we
> > > >>> lose some functionality. I don't see the need for alternate ways
> > > >>> to accomplish everything that ACPI does. It's becoming less and
> > > >>> less useful to disable ACPI; I think it's only interesting as a
> > > >>> debugging tool, and even then it's a sledgehammer.
> > > >>
> > > >> some systems when acpi is enabled could have interrupt storm.
> > > >> and have to disable acpi.
> > > >
> > > > We should fix that problem rather than just covering it up by
> > > > disabling ACPI. Can you provide any details?
> > > that is not covering problem. acpi just cause too many problems.
> > >
> > > systems using acpi hotplug support, and use acpi aml code to monitor
> > > the hotplug status instead of HW and after one or two days will have
> > > interrupt storm with sci/acpi interrupt aka 9.
> >
> >
> > But disabling it gets us into trouble too. When platforms are designed
> > for Linux, they may be designed to have ACPI disabled (though this is
> > probably rare for general purpose PCs and servers).
>
> Well, not quite. On recent SMP systems it's next to impossible to get all of
> the necessary system configuration information without ACPI, since it only is
> provided by the ACPI tables (the configuration of APICs, interrupt routing,
> CPU C states, other stuff).
>
> [BTW, I think it's better to CC linux-acpi and Len at this point.]
>
IMO, Disabling ACPI is wrong.
"acpi=off" should just be used for debug purpose. For example, it is a
good excuse to dedicate an uncleared bug to ACPI if the problem doesn't
exist with acpi=off, although they may be not ACPI related sometime.

So if there are some platforms that
1. work in Windows.
2. don't work in Linux when ACPI is on.
3. work in Linux when ACPI is off.
please file a bug at
http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI

thanks,
rui

> > However when they're designed for Windows, they're generally designed to use
> > ACPI, so if we disable it we run the risk of hitting all sorts of bugs since
> > we're running in an untested configuration.
>
> I guess without ACPI we're guaranteed to run into troubles on many modern
> hardware configurations.
>
> > So fixing the issues with ACPI enabled seems like a better idea; after
> > all, presumably Windows works on this platform with ACPI enabled, why
> > shouldn't we?
> >
> > But I'm speaking in general here; we'd have to dig into the details of
> > the particular problem you mention to figure out the best course of
> > action (but I'm still pretty sure it's not "disable ACPI").
>
> Agreed.
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-01-30 19:21:25

by Michael Breuer

[permalink] [raw]
Subject: Re: [Bug #15125] hung task - jbd2/dm-1-8 (during raid rebuild)

On 1/24/2010 6:14 PM, Rafael J. Wysocki wrote:
> On Sunday 24 January 2010, Michael Breuer wrote:
>
>> On 1/24/2010 5:04 PM, Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.32. Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15125
>>> Subject : hung task - jbd2/dm-1-8 (during raid rebuild)
>>> Submitter : Michael Breuer<[email protected]>
>>> Date : 2010-01-10 21:47 (15 days old)
>>> References : http://marc.info/?l=linux-kernel&m=126316012025978&w=4
>>>
>>>
>>>
>> Not an easy one to recreate. Should probably remain listed for now.
>>
> Thanks for the update.
>
> Rafael
>
Still happening on 2.6.33 rc5.

2010-01-31 02:45:55

by Matthew Garrett

[permalink] [raw]
Subject: Re: [Bug #15124] PCI host bridge windows ignored (works with pci=use_crs)

On Wed, Jan 27, 2010 at 09:26:02PM -0700, Bjorn Helgaas wrote:

> When ACPI is disabled, I think we just have to accept that we lose some
> functionality. I don't see the need for alternate ways to accomplish
> everything that ACPI does. It's becoming less and less useful to
> disable ACPI; I think it's only interesting as a debugging tool, and
> even then it's a sledgehammer.

I'd agree with this. The days where it was plausibly practical to boot
non-ACPI operating systems on hardware are clearly gone, and people who
are actually disabling ACPI in the field seem to be doing so in order to
avoid other bugs - and we're failing to fix those bugs as a result.

--
Matthew Garrett | [email protected]

2010-02-04 20:09:59

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: [Bug #15043] Display goes off with i915.powersave=1

On Wed, 2010-01-27 at 09:57 -0800, Jesse Barnes wrote:
> On Sun, 24 Jan 2010 23:04:37 +0100 (CET)
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
> > Subject : Display goes off with i915.powersave=1
> > Submitter : Soeren Sonnenburg <[email protected]>
> > Date : 2010-01-10 20:09 (15 days old)
> > References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4
>
> If this isn't fixed yet, I hope David's patch fixes it. See "[Bug
> #14897] i915: Commit 0e442c60 causes flickering".

ok, so it is not fixed yet - I will try to unapply 0e442c60

Soeren
--
For the one fact about the future of which we can be certain is that it
will be utterly fantastic. -- Arthur C. Clarke, 1962


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part