2009-09-06 20:17:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.31-rc9: Reported regressions from 2.6.30

This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2009-09-06 123 34 27
2009-08-26 108 33 26
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
Subject : order 2 page allocation failures
Submitter : Frans Pop <[email protected]>
Date : 2009-09-06 7:40 (1 days old)
References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
Handled-By : Pekka Enberg <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14139
Subject : Output to external monitor is broken
Submitter : Carlos R. Mafra <[email protected]>
Date : 2009-09-06 14:22 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f8aed700c6ec46ddade6570004ce25332283b306
References : http://marc.info/?l=linux-kernel&m=125224701520738&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14135
Subject : NULL pointer dereference in ima_counts_put
Submitter : Ciprian Docan <[email protected]>
Date : 2009-09-02 13:49 (5 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94e5d714f604d4cb4cb13163f01ede278e69258b
References : http://marc.info/?l=linux-kernel&m=125190146028116&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133
Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule
Submitter : Jens Axboe <[email protected]>
Date : 2009-08-31 20:43 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114
Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7
Submitter : Tsvety Petrov <[email protected]>
Date : 2009-09-03 21:06 (4 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14103
Subject : cdc_acm gives I/O error
Submitter : Paul Martin <[email protected]>
Date : 2009-09-01 13:30 (6 days old)
Handled-By : Oliver Neukum <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14095
Subject : Asus EeePC 1005HA-M: Suspend hangs and disables the wireless
Submitter : Karsten Jaeger <[email protected]>
Date : 2009-08-31 10:14 (7 days old)
References : http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-August/002513.html


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070
Subject : lockdep warning triggered by dup_fd
Submitter : Bart Van Assche <[email protected]>
Date : 2009-08-23 09:36 (15 days old)
References : http://lkml.org/lkml/2009/8/23/8


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <[email protected]>
Date : 2009-08-20 15:48 (18 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14043
Subject : System sometimes hangs during boot
Submitter : Bart Van Assche <[email protected]>
Date : 2009-08-23 18:04 (15 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <[email protected]>
Date : 2009-08-19 12:48 (19 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <[email protected]>
Date : 2009-08-14 8:26 (24 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <[email protected]>
Date : 2009-08-15 07:55 (23 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-08 17:47 (30 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Handled-By : Alan Stern <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <[email protected]>
Date : 2009-08-06 20:15 (32 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-04 10:12 (34 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <[email protected]>
Date : 2009-08-03 12:58 (35 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <[email protected]>
Date : 2009-08-07 22:33 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <[email protected]>
Date : 2009-08-08 22:08 (30 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <[email protected]>
Date : 2009-08-04 09:02 (34 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <[email protected]>
Date : 2009-07-29 16:44 (40 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <[email protected]>
Date : 2009-07-17 21:24 (52 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <[email protected]>
Date : 2009-07-23 17:57 (46 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <[email protected]>
Date : 2009-07-22 13:35 (47 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <[email protected]>
Date : 2009-07-07 15:19 (62 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <[email protected]>
Date : 2009-07-06 18:32 (63 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <[email protected]>
Date : 2009-06-17 17:56 (82 days old)
References : http://lkml.org/lkml/2009/6/17/194


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14140
Subject : 2.6.31-rc9 breaks gianfar
Submitter : Michael Guntsche <[email protected]>
Date : 2009-09-06 7:27 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=38bddf04bcfe661fbdab94888c3b72c32f6873b3
References : http://marc.info/?l=linux-kernel&m=125222206218784&w=4
Handled-By : David Miller <[email protected]>
Patch : http://patchwork.kernel.org/patch/45965/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14138
Subject : Regression in suspend to ram
Submitter : Zdenek Kabelac <[email protected]>
Date : 2009-08-31 11:51 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125171952817851&w=4
Handled-By : OGAWA Hirofumi <[email protected]>
Patch : http://patchwork.kernel.org/patch/45945/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137
Subject : usb console regressions
Submitter : Jason Wessel <[email protected]>
Date : 2009-09-05 21:08 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4
Handled-By : Jason Wessel <[email protected]>
Patch : http://patchwork.kernel.org/patch/45953/
http://patchwork.kernel.org/patch/45952/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
Subject : readcd Oops
Submitter : Bob Tracy <[email protected]>
Date : 2009-09-03 3:39 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
Handled-By : Michal Schmidt <[email protected]>
Patch : http://patchwork.kernel.org/patch/45347/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <[email protected]>
Date : 2009-08-13 6:45 (25 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <[email protected]>
Date : 2009-08-07 21:51 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <[email protected]>
Patch : http://patchwork.kernel.org/patch/38550/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <[email protected]>
Date : 2009-08-07 19:11 (31 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <[email protected]>
Dan Williams <[email protected]>
Patch : http://patchwork.kernel.org/patch/43114/


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=13615

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2009-09-06 20:17:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <[email protected]>
Date : 2009-06-17 17:56 (82 days old)
References : http://lkml.org/lkml/2009/6/17/194

2009-09-06 20:27:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13733] 2.6.31-rc2: irq 16: nobody cared

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <[email protected]>
Date : 2009-07-06 18:32 (63 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4

2009-09-06 20:27:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13819] system freeze when switching to console

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <[email protected]>
Date : 2009-07-23 17:57 (46 days old)

2009-09-06 20:27:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13740] X server crashes with 2.6.31-rc2 when options are changed

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <[email protected]>
Date : 2009-07-07 15:19 (62 days old)

2009-09-06 20:27:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13836] suspend script fails, related to stdout?

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <[email protected]>
Date : 2009-07-17 21:24 (52 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4

2009-09-06 20:32:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot.

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <[email protected]>
Date : 2009-07-29 16:44 (40 days old)

2009-09-06 20:34:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13809] oprofile: possible circular locking dependency detected

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <[email protected]>
Date : 2009-07-22 13:35 (47 days old)

2009-09-06 20:28:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13906] Huawei E169 GPRS connection causes Ooops

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <[email protected]>
Date : 2009-08-04 09:02 (34 days old)

2009-09-06 20:28:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <[email protected]>
Date : 2009-08-07 22:33 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4

2009-09-06 20:33:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <[email protected]>
Date : 2009-08-08 22:08 (30 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343

2009-09-06 20:28:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <[email protected]>
Date : 2009-08-06 20:15 (32 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4

2009-09-06 20:33:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13942] Troubles with AoE and uninitialized object

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-04 10:12 (34 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4

2009-09-06 20:33:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13947] Libertas: Association request to the driver failed

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <[email protected]>
Date : 2009-08-07 19:11 (31 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <[email protected]>
Dan Williams <[email protected]>
Patch : http://patchwork.kernel.org/patch/43114/

2009-09-06 20:33:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13941] x86 Geode issue

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <[email protected]>
Date : 2009-08-03 12:58 (35 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4

2009-09-06 20:28:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13987] Received NMI interrupt at resume

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <[email protected]>
Date : 2009-08-15 07:55 (23 days old)

2009-09-06 20:28:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13950] Oops when USB Serial disconnected while in use

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-08 17:47 (30 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Handled-By : Alan Stern <[email protected]>

2009-09-06 20:33:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13948] ath5k broken after suspend-to-ram

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <[email protected]>
Date : 2009-08-07 21:51 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <[email protected]>
Patch : http://patchwork.kernel.org/patch/38550/

2009-09-06 20:29:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14018] kernel freezes, inotify problem

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <[email protected]>
Date : 2009-08-19 12:48 (19 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <[email protected]>

2009-09-06 20:32:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14017] _end symbol missing from Symbol.map

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <[email protected]>
Date : 2009-08-13 6:45 (25 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4

2009-09-06 20:32:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14013] hd don't show up

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <[email protected]>
Date : 2009-08-14 8:26 (24 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <[email protected]>

2009-09-06 20:29:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14043] System sometimes hangs during boot

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14043
Subject : System sometimes hangs during boot
Submitter : Bart Van Assche <[email protected]>
Date : 2009-08-23 18:04 (15 days old)

2009-09-06 20:29:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14070] lockdep warning triggered by dup_fd

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070
Subject : lockdep warning triggered by dup_fd
Submitter : Bart Van Assche <[email protected]>
Date : 2009-08-23 09:36 (15 days old)
References : http://lkml.org/lkml/2009/8/23/8

2009-09-06 20:29:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14058] Oops in fsnotify

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <[email protected]>
Date : 2009-08-20 15:48 (18 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4

2009-09-06 20:29:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114
Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7
Submitter : Tsvety Petrov <[email protected]>
Date : 2009-09-03 21:06 (4 days old)

2009-09-06 20:29:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133
Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule
Submitter : Jens Axboe <[email protected]>
Date : 2009-08-31 20:43 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4

2009-09-06 20:32:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14095] Asus EeePC 1005HA-M: Suspend hangs and disables the wireless

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14095
Subject : Asus EeePC 1005HA-M: Suspend hangs and disables the wireless
Submitter : Karsten Jaeger <[email protected]>
Date : 2009-08-31 10:14 (7 days old)
References : http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-August/002513.html

2009-09-06 20:31:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14103] cdc_acm gives I/O error

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14103
Subject : cdc_acm gives I/O error
Submitter : Paul Martin <[email protected]>
Date : 2009-09-01 13:30 (6 days old)
Handled-By : Oliver Neukum <[email protected]>

2009-09-06 20:29:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14138] Regression in suspend to ram

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14138
Subject : Regression in suspend to ram
Submitter : Zdenek Kabelac <[email protected]>
Date : 2009-08-31 11:51 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125171952817851&w=4
Handled-By : OGAWA Hirofumi <[email protected]>
Patch : http://patchwork.kernel.org/patch/45945/

2009-09-06 20:29:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14135] NULL pointer dereference in ima_counts_put

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14135
Subject : NULL pointer dereference in ima_counts_put
Submitter : Ciprian Docan <[email protected]>
Date : 2009-09-02 13:49 (5 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94e5d714f604d4cb4cb13163f01ede278e69258b
References : http://marc.info/?l=linux-kernel&m=125190146028116&w=4

2009-09-06 20:30:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14137] usb console regressions

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137
Subject : usb console regressions
Submitter : Jason Wessel <[email protected]>
Date : 2009-09-05 21:08 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4
Handled-By : Jason Wessel <[email protected]>
Patch : http://patchwork.kernel.org/patch/45953/
http://patchwork.kernel.org/patch/45952/

2009-09-06 20:31:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14136] readcd Oops

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
Subject : readcd Oops
Submitter : Bob Tracy <[email protected]>
Date : 2009-09-03 3:39 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
Handled-By : Michal Schmidt <[email protected]>
Patch : http://patchwork.kernel.org/patch/45347/

2009-09-06 20:29:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14140] 2.6.31-rc9 breaks gianfar

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14140
Subject : 2.6.31-rc9 breaks gianfar
Submitter : Michael Guntsche <[email protected]>
Date : 2009-09-06 7:27 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=38bddf04bcfe661fbdab94888c3b72c32f6873b3
References : http://marc.info/?l=linux-kernel&m=125222206218784&w=4
Handled-By : David Miller <[email protected]>
Patch : http://patchwork.kernel.org/patch/45965/

2009-09-06 20:29:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14141] order 2 page allocation failures

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
Subject : order 2 page allocation failures
Submitter : Frans Pop <[email protected]>
Date : 2009-09-06 7:40 (1 days old)
References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
Handled-By : Pekka Enberg <[email protected]>

2009-09-06 20:29:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #14139] Output to external monitor is broken

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14139
Subject : Output to external monitor is broken
Submitter : Carlos R. Mafra <[email protected]>
Date : 2009-09-06 14:22 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f8aed700c6ec46ddade6570004ce25332283b306
References : http://marc.info/?l=linux-kernel&m=125224701520738&w=4

2009-09-06 20:30:55

by Martin-Éric Racine

[permalink] [raw]
Subject: Re: [Bug #13941] x86 Geode issue

Yes, it should still be listed, for as long as it hasn't been resolved.

On Sun, Sep 6, 2009 at 8:24 PM, Rafael J. Wysocki<[email protected]> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30.  Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> Subject         : x86 Geode issue
> Submitter       : Martin-Éric Racine <[email protected]>
> Date            : 2009-08-03 12:58 (35 days old)
> References      : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
>
>
>

Subject: Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related

On Sunday 06 September 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).

Yes, the regression is still present.

2009-09-06 21:10:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related

On Sunday 06 September 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote:
> On Sunday 06 September 2009, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
>
> Yes, the regression is still present.

Thanks for the update.

Rafael

2009-09-06 21:11:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13941] x86 Geode issue

On Sunday 06 September 2009, Martin-Éric Racine wrote:
> Yes, it should still be listed, for as long as it hasn't been resolved.

Thanks for the update.

Rafael

2009-09-06 21:38:22

by Eric Paris

[permalink] [raw]
Subject: Re: [Bug #14018] kernel freezes, inotify problem

On Sun, 2009-09-06 at 19:24 +0200, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
> Subject : kernel freezes, inotify problem
> Submitter : Christoph Thielecke <[email protected]>
> Date : 2009-08-19 12:48 (19 days old)
> References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
> Handled-By : Eric Paris <[email protected]>

This should be marked as fixed.

2009-09-06 21:50:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14018] kernel freezes, inotify problem

On Sunday 06 September 2009, Eric Paris wrote:
> On Sun, 2009-09-06 at 19:24 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
> > Subject : kernel freezes, inotify problem
> > Submitter : Christoph Thielecke <[email protected]>
> > Date : 2009-08-19 12:48 (19 days old)
> > References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
> > Handled-By : Eric Paris <[email protected]>
>
> This should be marked as fixed.

Yeah, I forgot to close it. Thanks for the heads up.

Rafael

2009-09-07 03:25:20

by Tomas M

[permalink] [raw]
Subject: Re: [Bug #13836] suspend script fails, related to stdout?

tested rc9 and its still there.

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
> Subject : suspend script fails, related to stdout?
> Submitter : Tomas M. <[email protected]>
> Date : 2009-07-17 21:24 (52 days old)
> References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
>
>
>

2009-09-07 05:38:33

by Bob Tracy

[permalink] [raw]
Subject: Re: [Bug #14136] readcd Oops

On Sun, Sep 06, 2009 at 07:24:57PM +0200, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
> Subject : readcd Oops
> Submitter : Bob Tracy <[email protected]>
> Date : 2009-09-03 3:39 (4 days old)
> References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
> Handled-By : Michal Schmidt <[email protected]>
> Patch : http://patchwork.kernel.org/patch/45347/

I ack'd the fix a few days ago: please remove this bug entry from the
list of 2.6.30 regressions.

--Bob

2009-09-08 16:29:35

by Reinette Chatre

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> Please verify if it still should be listed and let me know
> (either way).

Issue is still present in 2.6.31-rc8.

Reinette

2009-09-08 17:00:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, reinette chatre wrote:

> On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > Please verify if it still should be listed and let me know
> > (either way).
>
> Issue is still present in 2.6.31-rc8.

Is there any chance that you could connect a serial line to the machine?
Your report about blinking keyboard led's means that there's an oops, but
since the display isn't in textmode (and the oops obviously happens when
trying to enter it), we don't know what it is.

A serial line (along with a kernel compiled with serial console support,
of course, and a kernel command line option like "console=ttyS0,115400
console=tty0") would get that. You'd just need another machine with a
terminal program like minicom..

The network console could also work out, but serial lines tend to be more
reliable if you have them. But in the absense of serial lines, see the
Documentation/networking/netconsole.txt file for some details. The setup
is more complicated, but on the other hand it's a lot more dynamic, and in
your case - since the box works until you try to switch to text-mode, I
suspect the network console dynamic run-time setup would be easy for you
to use.

(For other examples of using netconsole with that dynamic mode, just
google for "sys/kernel/config/netconsole" and you'll find a number of docs
that explain how to find the MAC address for setup etc).

Linus

2009-09-08 17:24:53

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Sun, 6 Sep 2009 19:24:50 +0200 (CEST)
"Rafael J. Wysocki" <[email protected]> wrote:

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me
> know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> Subject : system freeze when switching to console
> Submitter : Reinette Chatre <[email protected]>
> Date : 2009-07-23 17:57 (46 days old)

So simply switching VTs causes this problem too? Based on your initial
description it sounds like a panic (keyboard LEDs were flashing). If
it happens at VT switch time you should be able to capture the panic
output with netconsole like Linus mentioned.

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 17:36:15

by Reinette Chatre

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 2009-09-08 at 10:00 -0700, Linus Torvalds wrote:
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
>
> > On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > > Please verify if it still should be listed and let me know
> > > (either way).
> >
> > Issue is still present in 2.6.31-rc8.
>
> Is there any chance that you could connect a serial line to the machine?

The system does not have a serial console, but I was able to set up
netconsole. For what it is worth, I did not do this until now because
(1) I was able to bisect the problem, and (2) I asked driver developers
directly how I can help to debug this and I received no response.

As you can see from the kernel version it is not a build of a vanilla
kernel. It only contains changes related to the wireless networking work
I am doing.

Here is the output:

[ 352.803652] render error detected, EIR: 0x00000010
[ 352.803684] IPEIR: 0x00000000
[ 352.803709] IPEHR: 0x01000000
[ 352.803732] INSTDONE: 0xfffffffe
[ 352.803754] INSTPS: 0x0001e000
[ 352.803776] INSTDONE1: 0xffffffff
[ 352.803801] ACTHD: 0x0480a3c8
[ 352.803823] page table error
[ 352.803846] PGTBL_ER: 0x00100000
[ 352.803870] [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking
[ 352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
[ 352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] PGD b5d00067 PUD b9753067 PMD 0
[ 352.804006] Oops: 0000 [#1] SMP
[ 352.804006] last sysfs file: /sys/class/power_supply/BAT0/energy_full
[ 352.804006] CPU 0
[ 352.804006] Modules linked in: i915 drm i2c_algo_bit i2c_core ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table container sbs sbshc arc4 ecb joydev af_packet pcmcia psmouse sony_laptop serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr iTCO_wdt iTCO_vendor_support rfkill intel_agp button battery tpm_infineon tpm tpm_bios processor video output ac evdev ext3 jbd mbcache sr_mod sg cdrom sd_mod ahci libata scsi_mod ehci_hcd uhci_hcd usbcore thermal fan thermal_sys [last unloaded: cfg80211]
[ 352.804006] Pid: 4424, comm: Xorg Not tainted 2.6.31-rc8-wl-50925-gdcecd82-dirty #57 VGN-Z540N
[ 352.804006] RIP: 0010:[<ffffffffa03ecaab>] [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP: 0018:ffff880001e9de58 EFLAGS: 00010082
[ 352.804006] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 352.804006] RDX: ffffc9000007d898 RSI: 0000000000000001 RDI: ffffffff8132f0f8
[ 352.804006] RBP: ffff880001e9dee8 R08: 0000000000000002 R09: ffff880037373c38
[ 352.804006] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800b57fe000
[ 352.804006] R13: 000000000000001f R14: ffff8800b57fe000 R15: ffff8800b9746000
[ 352.804006] FS: 00007fcc05d20700(0000) GS:ffff880001e9a000(0000) knlGS:0000000000000000
[ 352.804006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 352.804006] CR2: 0000000000000084 CR3: 00000000b50c3000 CR4: 00000000000006f0
[ 352.804006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 352.804006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
[ 352.804006] Stack:
[ 352.804006] ffffffff8106db7d 0000000000000086 ffff88009a5ce040 ffff8800b57fe158
[ 352.804006] <0> ffff8800b57fe1a8 ffff8800b57fe110 0004000000008000 0000000400440202
[ 352.804006] <0> 0000000000000086 0044020200000000 0000001000040000 0000000000000040
[ 352.804006] Call Trace:
[ 352.804006] <IRQ>
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI>
[ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b
[ 352.804006] Code: 00 8b 18 49 8b 87 b0 05 00 00 48 8b 80 20 02 00 00 48 85 c0 74 21 48 8b 80 00 01 00 00 48 8b 50 08 48 85 d2 74 11 49 8b 44 24 78 <8b> 80 84 00 00 00 89 82 08 08 00 00 f6 45 a0 02 0f 85 47 03 00
[ 352.804006] RIP [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP <ffff880001e9de58>
[ 352.804006] CR2: 0000000000000084
[ 352.804006] ---[ end trace 756dbe26c2f29fdd ]---
[ 352.804006] Kernel panic - not syncing: Fatal exception in interrupt
[ 352.804006] Pid: 4424, comm: Xorg Tainted: G D 2.6.31-rc8-wl-50925-gdcecd82-dirty #57
[ 352.804006] Call Trace:
[ 352.804006] <IRQ> [<ffffffff8132ba7f>] panic+0xa0/0x170
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff81041b35>] ? release_console_sem+0x1f5/0x240
[ 352.804006] [<ffffffff81041e05>] ? console_unblank+0x75/0x90
[ 352.804006] [<ffffffff813306c4>] oops_end+0xd4/0xe0
[ 352.804006] [<ffffffff810279d8>] no_context+0xe8/0x260
[ 352.804006] [<ffffffff81027ca5>] __bad_area_nosemaphore+0x155/0x1f0
[ 352.804006] [<ffffffff8106ca5d>] ? trace_hardirqs_off+0xd/0x10
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff8103bb58>] ? try_to_wake_up+0xe8/0x210
[ 352.804006] [<ffffffff81027d4e>] bad_area_nosemaphore+0xe/0x10
[ 352.804006] [<ffffffff8133204e>] do_page_fault+0x29e/0x350
[ 352.804006] [<ffffffff8132f8af>] page_fault+0x1f/0x30
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffffa03ecaab>] ? i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] [<ffffffffa03ec9cb>] ? i915_driver_irq_handler+0x18b/0xd20 [i915]
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI> [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b

2009-09-08 18:06:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, reinette chatre wrote:
>
> As you can see from the kernel version it is not a build of a vanilla
> kernel. It only contains changes related to the wireless networking work
> I am doing.
>
> Here is the output:

Thanks, this is great. It pinpoints the problem very effectively.

> [ 352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
> [ 352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]

The code here is

16: 48 8b 80 00 01 00 00 mov 0x100(%rax),%rax
1d: 48 8b 50 08 mov 0x8(%rax),%rdx
21: 48 85 d2 test %rdx,%rdx
24: 74 11 je 0x37
26: 49 8b 44 24 78 mov 0x78(%r12),%rax
2b:* 8b 80 84 00 00 00 mov 0x84(%rax),%eax <-- trapping instruction
31: 89 82 08 08 00 00 mov %eax,0x808(%rdx)
37: f6 45 a0 02 testb $0x2,-0x60(%rbp)

and that "testb $0x2, -0x60(%rbp)" seems to be the

if (iir & I915_USER_INTERRUPT) {

test if I'm reading things right. Although it could also be the

if (eir & I915_ERROR_MEMORY_REFRESH) {

thing. The disassembly is totally impossible to read, because the stupid
i915 driver is chock-full of crap like

if (IS_G4X(dev)) {
..

which expands to insane amounts of code that check the PCI ID's one by
one.

Intel guys: could you _please_ stop doing that. Create a capability mask
in the device or something, so that you can test for "is this a G4x" with
a single bit test, rather than have code like this:

mov 0x31c(%rsi),%eax
cmp $0x2982,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2972,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2992,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x29a2,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a02,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a12,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a42,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e02,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e12,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e22,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e32,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x42,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>

for that IS_G4X() thing (I'm not kidding - that's exactly a hundred bytes
of code for that _stupid_ test, and it's inlined!)

Anyway, we're getting that DRM irq, and it has a normal IRQ stack trace:

> [ 352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
> [ 352.804006] Call Trace:
> [ 352.804006] <IRQ>
> [ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> [ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> [ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> [ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> [ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> [ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf

.. but it happened just as we're tearing down the DRM irq handling:

> [ 352.804006] <EOI>
> [ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> [ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
> [ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> [ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> [ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]

so what is going on is that the i915 driver has obviously torn down some
state before it uninstalls the irq, so the irq happens when the state has
already been torn down, and the irq handler is not ready for that.

This patch *may* fix it - simply by getting rid of the irq early. However,
I did not check whether maybe something in i915_gem_idle() actually needs
the interrupt to be able to happen, so this is TOTALLY UNTESTED!

Linus
---
drivers/gpu/drm/i915/i915_gem.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7edb5b9..80e5ba4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4232,15 +4232,11 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- int ret;
-
if (drm_core_check_feature(dev, DRIVER_MODESET))
return 0;

- ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
-
- return ret;
+ return i915_gem_idle(dev);
}

void

2009-09-08 18:20:20

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009 11:06:21 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
> >
> > As you can see from the kernel version it is not a build of a
> > vanilla kernel. It only contains changes related to the wireless
> > networking work I am doing.
> >
> > Here is the output:
>
> Thanks, this is great. It pinpoints the problem very effectively.
>
> > [ 352.803960] BUG: unable to handle kernel NULL pointer
> > dereference at 0000000000000084 [ 352.804006] IP:
> > [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov
> 0x100(%rax),%rax 1d: 48 8b 50 08 mov
> 0x8(%rax),%rdx 21: 48 85 d2 test
> %rdx,%rdx 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov
> 0x78(%r12),%rax 2b:* 8b 80 84 00 00 00 mov
> 0x84(%rax),%eax <-- trapping instruction 31: 89 82 08 08
> 00 00 mov %eax,0x808(%rdx) 37: f6 45 a0
> 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {
>
> test if I'm reading things right. Although it could also be the
>
> if (eir & I915_ERROR_MEMORY_REFRESH) {
>
> thing. The disassembly is totally impossible to read, because the
> stupid i915 driver is chock-full of crap like
>
> if (IS_G4X(dev)) {
> ..
>
> which expands to insane amounts of code that check the PCI ID's one
> by one.
>
> Intel guys: could you _please_ stop doing that. Create a capability
> mask in the device or something, so that you can test for "is this a
> G4x" with a single bit test, rather than have code like this:
>
> mov 0x31c(%rsi),%eax
> cmp $0x2982,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2972,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2992,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x29a2,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e22,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e32,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
>
> for that IS_G4X() thing (I'm not kidding - that's exactly a hundred
> bytes of code for that _stupid_ test, and it's inlined!)

Yeah things are getting a bit out of hand there... We've moved to
feature tests for some things, but they're still PCI ID based; however
they should be easy to convert.

>
> Anyway, we're getting that DRM irq, and it has a normal IRQ stack
> trace:
>
> > [ 352.804006] Process Xorg (pid: 4424, threadinfo
> > ffff8800b6b1a000, task ffff880037373c00) [ 352.804006] Call Trace:
> > [ 352.804006] <IRQ>
> > [ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> > [ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> > [ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> > [ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> > [ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> > [ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
>
> .. but it happened just as we're tearing down the DRM irq handling:
>
> > [ 352.804006] <EOI>
> > [ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffff8132d7b5>] ?
> > mutex_lock_nested+0x45/0x320 [ 352.804006] [<ffffffffa03c0b85>] ?
> > drm_irq_uninstall+0x65/0x180 [drm] [ 352.804006]
> > [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> > [ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffffa03f3335>] ?
> > i915_gem_idle+0x225/0x330 [i915] [ 352.804006]
> > [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> > [ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> > [ 352.804006] [<ffffffffa03f3490>] ?
> > i915_gem_leavevt_ioctl+0x0/0x50 [i915]
>
> so what is going on is that the i915 driver has obviously torn down
> some state before it uninstalls the irq, so the irq happens when the
> state has already been torn down, and the irq handler is not ready
> for that.
>
> This patch *may* fix it - simply by getting rid of the irq early.
> However, I did not check whether maybe something in i915_gem_idle()
> actually needs the interrupt to be able to happen, so this is TOTALLY
> UNTESTED!
>
> Linus
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 7edb5b9..80e5ba4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4232,15 +4232,11 @@ int
> i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - int ret;
> -
> if (drm_core_check_feature(dev, DRIVER_MODESET))
> return 0;
>
> - ret = i915_gem_idle(dev);
> drm_irq_uninstall(dev);
> -
> - return ret;
> + return i915_gem_idle(dev);
> }

Theoretically i915_gem_idle should prevent any user interrupts from
coming in. If we uninstall the IRQ first we i915_gem_idle probably
won't work anymore, since it queues an interrupt and waits for it.

Eric, any thoughts on this? We shouldn't be racing to queue new work
after the idle call since we suspend GEM at that point, so we must be
failing to manage our active lists properly somehow?

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 19:19:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, Linus Torvalds wrote:
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov 0x100(%rax),%rax
> 1d: 48 8b 50 08 mov 0x8(%rax),%rdx
> 21: 48 85 d2 test %rdx,%rdx
> 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov 0x78(%r12),%rax
> 2b:* 8b 80 84 00 00 00 mov 0x84(%rax),%eax <-- trapping instruction
> 31: 89 82 08 08 00 00 mov %eax,0x808(%rdx)
> 37: f6 45 a0 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {

Yeah, that seems to be the right thing.

So the actual faulting instruction is from this:

if (dev->primary->master) {
master_priv = dev->primary->master->driver_priv;
if (master_priv->sarea_priv)
master_priv->sarea_priv->last_dispatch =
READ_BREADCRUMB(dev_priv);

and it looks like %rax starts out being 'dev', then the

mov 0x100(%rax),%rax

means that %rax is now 'dev->primary', and then

mov 0x8(%rax),%rdx

moves 'dev->primary->master' into %rdx. It's not zero, so we then do that
READ_BREADCRUMB(dev_priv), which expands to

READ_HWSP(dev_priv, I915_BREADCRUMB_INDEX)

which in turn is

(((volatile u32*)(dev_priv->hw_status_page))[reg])

and it looks like dev_priv->hw_status_page is NULL.

You can verify this by looking at teh exception address:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000084

and that '84' is I915_BREADCRUMB_INDEX*4 (0x21*4).

And the problem seems to be that we've cleared the hw_status_page pointer
in i915_gem_cleanup_hws():

dev_priv->hw_status_page = NULL;

and we did that in

i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws()

so now since interrupts are still enabled, you'll get a NULL pointer
dereference.

I think my patch is correct.

Linus

2009-09-08 19:27:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Theoretically i915_gem_idle should prevent any user interrupts from
> coming in.

That is _entirely_ immaterial.

The thing is, interrupts can be shared. So it does not matter ONE WHIT
that you are trying to idle the hardware - there may be _other_ hardware
in the machine that is not idle, and that raises the same shared
interrupt. End result: the irq handler will be called, whether your
particular hardware is idle or not.

So if you tear down data structures that the interrupt handler needs, you
_ABSOLUTELY_ must first unregister the whole interrupt.

Also, even if there are no shared interrupts or any other devices, there
can easily be old pending interrupts still queued up on IO-APIC's etc. So
even though you quiesce the hardware, there is no guarantee that there
aren't some pending interrupts that happened just before you turned off
the interrupt from the hardware side, and are still "en route" to the CPU.

Which gets us exactly the same rule as if there were shared interrupts: if
your interrupt handler depends on some data structure, you must tear down
the interrupt handler _before_ you tear down the data structures it
depends on (and in the reverse order when setting things up, of course).

> If we uninstall the IRQ first we i915_gem_idle probably
> won't work anymore, since it queues an interrupt and waits for it.

So then you'd better fix that. Because the code as is is very
fundamentally buggy.

> Eric, any thoughts on this? We shouldn't be racing to queue new work
> after the idle call since we suspend GEM at that point, so we must be
> failing to manage our active lists properly somehow?

See my previous email. The bug is that you do

i915_gem_cleanup_ringbuffer ->
i915_gem_cleanup_hws ->
dev_priv->hw_status_page = NULL;

while interrupts are still enabled and coming in. And the interrupt path
wants to access that hw_status_page. Which you just destroyed.

Linus

2009-09-08 19:38:49

by Fabio Comolli

[permalink] [raw]
Subject: Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k

Still present in -rc9

On Sun, Sep 6, 2009 at 7:24 PM, Rafael J. Wysocki<[email protected]> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30.  Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> Subject         : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> Submitter       : Fabio Comolli <[email protected]>
> Date            : 2009-08-06 20:15 (32 days old)
> References      : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
>
>
>

2009-09-08 19:31:54

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009 12:26:45 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Theoretically i915_gem_idle should prevent any user interrupts from
> > coming in.
>
> That is _entirely_ immaterial.
>
> The thing is, interrupts can be shared. So it does not matter ONE
> WHIT that you are trying to idle the hardware - there may be _other_
> hardware in the machine that is not idle, and that raises the same
> shared interrupt. End result: the irq handler will be called, whether
> your particular hardware is idle or not.

Which is fine. We can handle interrupts in the shared case. It's
specific IRQ statuses we can't handle. E.g. if we've explicitly turned
off vblank events we definitely won't expect to see them in the handler
(assuming we've taken care to barrier things like you mention below).

> So if you tear down data structures that the interrupt handler needs,
> you _ABSOLUTELY_ must first unregister the whole interrupt.
>
> Also, even if there are no shared interrupts or any other devices,
> there can easily be old pending interrupts still queued up on
> IO-APIC's etc. So even though you quiesce the hardware, there is no
> guarantee that there aren't some pending interrupts that happened
> just before you turned off the interrupt from the hardware side, and
> are still "en route" to the CPU.

The way we barrier things should handle that case.

> Which gets us exactly the same rule as if there were shared
> interrupts: if your interrupt handler depends on some data structure,
> you must tear down the interrupt handler _before_ you tear down the
> data structures it depends on (and in the reverse order when setting
> things up, of course).
>
> > If we uninstall the IRQ first we i915_gem_idle probably
> > won't work anymore, since it queues an interrupt and waits for it.
>
> So then you'd better fix that. Because the code as is is very
> fundamentally buggy.
>
> > Eric, any thoughts on this? We shouldn't be racing to queue new
> > work after the idle call since we suspend GEM at that point, so we
> > must be failing to manage our active lists properly somehow?
>
> See my previous email. The bug is that you do
>
> i915_gem_cleanup_ringbuffer ->
> i915_gem_cleanup_hws ->
> dev_priv->hw_status_page = NULL;
>
> while interrupts are still enabled and coming in. And the interrupt
> path wants to access that hw_status_page. Which you just destroyed.

Yeah, saw that. I don't think that's the root cause though. If we see
a user interrupt after gem_idle is called we may have serious issues in
our command handling code.

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 19:58:27

by Niel Lambrechts

[permalink] [raw]
Subject: Re: [Bug #13733] 2.6.31-rc2: irq 16: nobody cared

On 09/06/2009 10:30 PM, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
> Subject : 2.6.31-rc2: irq 16: nobody cared
> Submitter : Niel Lambrechts<[email protected]>
> Date : 2009-07-06 18:32 (63 days old)
> References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4

Hi Rafael,

This is still present in 2.6.31-rc9-pae.

To test, I removed ehci_hcd, hibernated/resumed and the problem did not appear. But when I first plugged in a USB mouse after the system was up and before ehci_hcd was loaded the error occurred again.

Regards,
Niel

2009-09-08 22:06:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Yeah, saw that. I don't think that's the root cause though. If we see
> a user interrupt after gem_idle is called we may have serious issues in
> our command handling code.

Quite frankly, I do not understand why you seem to be making excuses for
code that causes a very nasty and undebuggable oops, causing the machine
to die.

This regression is almost two months old, and apparently the Intel
graphics people DID ABSOLUTELY NOTHING about it during those two months,
because they couldn't be bothered to look at it.

And now, when I pinpointed exactly where the oops happens, and what the
cause is, you seem to be trying to hold things up. I wanted to do the
final 2.6.31 release yesterday, quite frankly I'm not in the _least_
interested in excuses, I'm interested in something that at least gets us
back to the 2.6.30 state that doesn't oops!

Get me a patch, please. If disabling the interrupts early won't work, get
me something else. Stop delaying it - it's been pending for 48 days
already.

Linus

2009-09-08 22:11:38

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009 15:06:21 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Yeah, saw that. I don't think that's the root cause though. If we
> > see a user interrupt after gem_idle is called we may have serious
> > issues in our command handling code.
>
> Quite frankly, I do not understand why you seem to be making excuses
> for code that causes a very nasty and undebuggable oops, causing the
> machine to die.

No excuses. This is a serious bug; I just don't want to paper over it.

> This regression is almost two months old, and apparently the Intel
> graphics people DID ABSOLUTELY NOTHING about it during those two
> months, because they couldn't be bothered to look at it.

Yeah sorry, this is the first I've seen of it... I usually troll the
regressions lists but I must have missed this one.

> And now, when I pinpointed exactly where the oops happens, and what
> the cause is, you seem to be trying to hold things up. I wanted to do
> the final 2.6.31 release yesterday, quite frankly I'm not in the
> _least_ interested in excuses, I'm interested in something that at
> least gets us back to the 2.6.30 state that doesn't oops!
>
> Get me a patch, please. If disabling the interrupts early won't work,
> get me something else. Stop delaying it - it's been pending for 48
> days already.

Sure, looking at it now.

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 22:37:53

by Reinette Chatre

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 2009-09-08 at 11:06 -0700, Linus Torvalds wrote:
> so this is TOTALLY UNTESTED!

I understand that the discussion is still going on whether this is the
right thing to do. Even so, I thought you may like to know that with
this patch I can again switch to console, back again, hibernate, and
shut down .. all without crashing my system.

Tested-by: Reinette Chatre <[email protected]>

Thank you very much!

Reinette

2009-09-08 23:05:29

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009 15:06:21 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:
> And now, when I pinpointed exactly where the oops happens, and what
> the cause is, you seem to be trying to hold things up. I wanted to do
> the final 2.6.31 release yesterday, quite frankly I'm not in the
> _least_ interested in excuses, I'm interested in something that at
> least gets us back to the 2.6.30 state that doesn't oops!

Based on the earlier mail I thought this might have been a bigger
problem with the way we handle command submission and completion; but
on looking at things again (both Linus's debugging and your
configuration), I think this is actually a DRI1 & userspace related
issue. Back in the DRI1 days, the X server told the driver when to
register and unregister its irq handler, and had some responsibility
for making sure it didn't hose things (very easy to do with the old
architecture). Stuff like this was one of the main reasons we moved
most of the handling of this into the kernel...

We obviously need a kernel fix though; panics like this aren't
acceptable.

This fix is along the lines of Linus's initial suggestion; we
definitely are tearing down some state that the interrupt handler
needs. And the 2D driver isn't saving us from ourselves like it used
to (previously it would uninstall the IRQ handler before tearing down
the mappings; but with the kernel in charge of those now, we have to
handle it).

This one should disable i915 interrupts (we'll still handle shared ones
just fine as no-ops) at the point where we no longer need them, then
let the DRM core code take care of finally unregistering it.

Ugly, but I'd like to know if it works for you. Any chance you could
give it a try Reinette?

--
Jesse Barnes, Intel Open Source Technology Center

diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c index 0767521..487d902 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3990,6 +3990,7 @@ i915_gem_idle(struct drm_device *dev)
return ret;
}

+ i915_driver_irq_uninstall(dev);
i915_gem_cleanup_ringbuffer(dev);
mutex_unlock(&dev->struct_mutex);

2009-09-08 23:16:58

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 08 Sep 2009 15:37:41 -0700
reinette chatre <[email protected]> wrote:

> On Tue, 2009-09-08 at 11:06 -0700, Linus Torvalds wrote:
> > so this is TOTALLY UNTESTED!
>
> I understand that the discussion is still going on whether this is the
> right thing to do. Even so, I thought you may like to know that with
> this patch I can again switch to console, back again, hibernate, and
> shut down .. all without crashing my system.
>
> Tested-by: Reinette Chatre <[email protected]>
>
> Thank you very much!

Do you see "hardware wedged" messages in your log after using Linus's
patch? That's what I'd expect... ah no I see we don't call the
routine that requires interrupts in that path like I thought.

So Linus's patch is fine with me.

Acked-by: Jesse Barnes <[email protected]>

Sorry Linus, you were right; I was making this more complicated than it
had to be.

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 23:27:01

by Reinette Chatre

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 2009-09-08 at 16:16 -0700, Jesse Barnes wrote:

> Do you see "hardware wedged" messages in your log after using Linus's
> patch? That's what I'd expect... ah no I see we don't call the
> routine that requires interrupts in that path like I thought.

I can confirm that. While using this patch, when I am in X and then
switch to console and back to X there are no new messages (checked with
dmesg).

Reinette

2009-09-08 23:36:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console



On Tue, 8 Sep 2009, Jesse Barnes wrote:
> > This regression is almost two months old, and apparently the Intel
> > graphics people DID ABSOLUTELY NOTHING about it during those two
> > months, because they couldn't be bothered to look at it.
>
> Yeah sorry, this is the first I've seen of it... I usually troll the
> regressions lists but I must have missed this one.

Hmm. We must have screwed up something, because this was bisected to the
intel DRI commits back in July. See

http://bugzilla.kernel.org/show_bug.cgi?id=13819#c4

and while there was some confusion about exactly which commit caused
it - probably because the irq thing obviously depends on timing -
Reinette had a list of three commits that he used to be able to revert to
get things going:

drm/i915: Don't update display FIFO watermark on IGDNG
drm/i915: add FIFO watermark support
drm/i915: enable error detection & state collection

So Andrew assigned it to DRI, and Rafael has had both Eric and Ma Ling on
the cc for his regression reports because of the bisection. And that has
been going on for a long time, I just checked:

Date: Sun, 26 Jul 2009 22:28:26 +0200 (CEST)
From: Rafael J. Wysocki <[email protected]>
To: Linux Kernel Mailing List <[email protected]>
Cc: Kernel Testers List <[email protected]>, Eric Anholt <[email protected]>, "[email protected]" <[email protected]>,
Linus Torvalds <[email protected]>, Ma Ling <[email protected]>, Reinette Chatre <[email protected]>
Subject: [Bug #13819] system freeze when switching to console

If you didn't see it, then that means that we have screw-ups with the
bugzilla thing. You're actually listed as a "Reviewed-by" on the commit
that the fixed-up bisection blamed - And I get the feeling that Rafael's
bugzilla "bugme" scripts may only pick up "Signed-off-by:" lines.

The point is: this bug has been in bisected in bugzilla for a month and a
half, and had at least two Intel DRI people cc'd on the weekly reminder
reports, along with being

Assigned To: [email protected]

We have other bugs on the regression list that are even older (no, I'm not
proud of them):

http://bugzilla.kernel.org/show_bug.cgi?id=13809
http://bugzilla.kernel.org/show_bug.cgi?id=13740
http://bugzilla.kernel.org/show_bug.cgi?id=13733
http://bugzilla.kernel.org/show_bug.cgi?id=13645

but they aren't bisected and it's not nearly as clear what is going on
there. The last one in particular I don't know if it even happens any
more and the first one seems to be fixed in -rc5, or at least the
reporter couldn't reproduce it any more..

Linus

2009-09-08 23:45:12

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009 16:36:06 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> > > This regression is almost two months old, and apparently the
> > > Intel graphics people DID ABSOLUTELY NOTHING about it during
> > > those two months, because they couldn't be bothered to look at it.
> >
> > Yeah sorry, this is the first I've seen of it... I usually troll
> > the regressions lists but I must have missed this one.
>
> Hmm. We must have screwed up something, because this was bisected to
> the intel DRI commits back in July. See
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13819#c4
>
> and while there was some confusion about exactly which commit caused
> it - probably because the irq thing obviously depends on timing -
> Reinette had a list of three commits that he used to be able to
> revert to get things going:
>
> drm/i915: Don't update display FIFO watermark on IGDNG
> drm/i915: add FIFO watermark support
> drm/i915: enable error detection & state collection
>
> So Andrew assigned it to DRI, and Rafael has had both Eric and Ma
> Ling on the cc for his regression reports because of the bisection.
> And that has been going on for a long time, I just checked:
>
> Date: Sun, 26 Jul 2009 22:28:26 +0200 (CEST)
> From: Rafael J. Wysocki <[email protected]>
> To: Linux Kernel Mailing List <[email protected]>
> Cc: Kernel Testers List <[email protected]>, Eric
> Anholt <[email protected]>, "[email protected]" <[email protected]>,
> Linus Torvalds <[email protected]>, Ma Ling
> <[email protected]>, Reinette Chatre <[email protected]>
> Subject: [Bug #13819] system freeze when switching to console
>
> If you didn't see it, then that means that we have screw-ups with the
> bugzilla thing. You're actually listed as a "Reviewed-by" on the
> commit that the fixed-up bisection blamed - And I get the feeling
> that Rafael's bugzilla "bugme" scripts may only pick up
> "Signed-off-by:" lines.

Reinette actually mailed me offlist about this; we corresponded
privately about this issue a month ago; I lost track of it while on
vacation (yeah I'm not on the cc lists for the bz or regression
updates). Totally my fault.

Anyway the bisects look like they might just be lucky; it sounds like
this wasn't a KMS related issue at all...

> We have other bugs on the regression list that are even older (no,
> I'm not proud of them):
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13740

This one looks gfx related, upstream bug is
https://bugs.freedesktop.org/show_bug.cgi?id=23096.

The graphics group tracks freedesktop.org bugs on a weekly basis since
that's where a vast majority of our bugs our filed (often from OSVs);
I'll get the kernel bugzilla stuff included in our future scrubs so we
don't miss stuff like this.

--
Jesse Barnes, Intel Open Source Technology Center

2009-09-08 23:56:23

by Reinette Chatre

[permalink] [raw]
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 2009-09-08 at 16:05 -0700, Jesse Barnes wrote:
> Any chance you could
> give it a try Reinette?

This patch also solves the issue for me.

Tested-by: Reinette Chatre <[email protected]>

Thank you very much

Reinette

2009-09-09 05:58:40

by Christoph Thielecke

[permalink] [raw]
Subject: Re: [Bug #14018] kernel freezes, inotify problem

Hello,
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me
> > > know (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
> > > Subject : kernel freezes, inotify problem
> > > Submitter : Christoph Thielecke <[email protected]>
> > > Date : 2009-08-19 12:48 (19 days old)
> > > References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
> > > Handled-By : Eric Paris <[email protected]>
> >
> > This should be marked as fixed.
>
> Yeah, I forgot to close it. Thanks for the heads up.
Right, seems working fine (sorry for the delay I tested a while).


With best regards

Christoph
--
Linux User Group Wernigerode
http://www.lug-wr.de/


Attachments:
(No filename) (798.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2009-09-09 15:22:03

by Mel Gorman

[permalink] [raw]
Subject: Re: [Bug #14141] order 2 page allocation failures

On Sun, Sep 06, 2009 at 07:24:58PM +0200, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
> Subject : order 2 page allocation failures
> Submitter : Frans Pop <[email protected]>
> Date : 2009-09-06 7:40 (1 days old)
> References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
> Handled-By : Pekka Enberg <[email protected]>
>

AFAIK, this is still a problem. Have requested more information in
relation to the bug.

As an aside, the subject seems to have lost the iwlagn aspect of this
bug.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-09-09 17:19:51

by Stefan Schmidt

[permalink] [raw]
Subject: Re: [Bug #14103] cdc_acm gives I/O error

Hello.

On Sun, 2009-09-06 at 19:24, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14103
> Subject : cdc_acm gives I/O error
> Submitter : Paul Martin <[email protected]>
> Date : 2009-09-01 13:30 (6 days old)
> Handled-By : Oliver Neukum <[email protected]>

I had the same problem here at least since -rc4. Oliver already did a two-liner
patch to fix the issue. Can be found here:

http://bugzilla.kernel.org/attachment.cgi?id=23046

It fixes the issue for me on two different cdc_acm devices. Don't know if it is
already in the USB tre, but I had to apply it to linus HEAD from today. Would
be good to get this in before the -final as it would cause a real regression.

regards
Stefan Schmidt

2009-09-10 21:02:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13733] 2.6.31-rc2: irq 16: nobody cared

On Tuesday 08 September 2009, Niel Lambrechts wrote:
> On 09/06/2009 10:30 PM, Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
> > Subject : 2.6.31-rc2: irq 16: nobody cared
> > Submitter : Niel Lambrechts<[email protected]>
> > Date : 2009-07-06 18:32 (63 days old)
> > References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
>
> Hi Rafael,
>
> This is still present in 2.6.31-rc9-pae.
>
> To test, I removed ehci_hcd, hibernated/resumed and the problem did not appear.
> But when I first plugged in a USB mouse after the system was up and before
> ehci_hcd was loaded the error occurred again.

Thanks for the update (extending the CC list).

Rafael

2009-09-10 21:04:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13836] suspend script fails, related to stdout?

On Monday 07 September 2009, Tomas M. wrote:
> tested rc9 and its still there.

Thanks for the update.

> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
> > Subject : suspend script fails, related to stdout?
> > Submitter : Tomas M. <[email protected]>
> > Date : 2009-07-17 21:24 (52 days old)
> > References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4

Rafael

2009-09-10 21:08:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k

On Tuesday 08 September 2009, Fabio Comolli wrote:
> Still present in -rc9

Thanks for the update.

> On Sun, Sep 6, 2009 at 7:24 PM, Rafael J. Wysocki<[email protected]> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> > Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> > Submitter : Fabio Comolli <[email protected]>
> > Date : 2009-08-06 20:15 (32 days old)
> > References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4

Rafael

2009-09-10 21:10:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14136] readcd Oops

On Monday 07 September 2009, Bob Tracy wrote:
> On Sun, Sep 06, 2009 at 07:24:57PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
> > Subject : readcd Oops
> > Submitter : Bob Tracy <[email protected]>
> > Date : 2009-09-03 3:39 (4 days old)
> > References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
> > Handled-By : Michal Schmidt <[email protected]>
> > Patch : http://patchwork.kernel.org/patch/45347/
>
> I ack'd the fix a few days ago: please remove this bug entry from the
> list of 2.6.30 regressions.

Hmm. Has the fix been merged already?

Rafael

2009-09-10 21:14:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #14141] order 2 page allocation failures

On Wednesday 09 September 2009, Mel Gorman wrote:
> On Sun, Sep 06, 2009 at 07:24:58PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
> > Subject : order 2 page allocation failures
> > Submitter : Frans Pop <[email protected]>
> > Date : 2009-09-06 7:40 (1 days old)
> > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
> > Handled-By : Pekka Enberg <[email protected]>
> >
>
> AFAIK, this is still a problem. Have requested more information in
> relation to the bug.
>
> As an aside, the subject seems to have lost the iwlagn aspect of this
> bug.

Fixed. I also moved the bug to "drivers/network-wireless" as it seems to
be specific to iwlagn.

Thanks,
Rafael

2009-09-11 05:02:53

by Bob Tracy

[permalink] [raw]
Subject: Re: [Bug #14136] readcd Oops

On Thu, Sep 10, 2009 at 11:11:26PM +0200, Rafael J. Wysocki wrote:
> On Monday 07 September 2009, Bob Tracy wrote:
> > On Sun, Sep 06, 2009 at 07:24:57PM +0200, Rafael J. Wysocki wrote:
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
> > > Subject : readcd Oops
> > > Submitter : Bob Tracy <[email protected]>
> > > Date : 2009-09-03 3:39 (4 days old)
> > > References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
> > > Handled-By : Michal Schmidt <[email protected]>
> > > Patch : http://patchwork.kernel.org/patch/45347/
> >
> > I ack'd the fix a few days ago: please remove this bug entry from the
> > list of 2.6.30 regressions.
>
> Hmm. Has the fix been merged already?

Good question: I don't think so. It's not in mainline as of 2.6.31.
Looks like we need to keep this bug open a while longer... Sorry 'bout
that.

--Bob

2009-09-11 12:38:07

by Martin-Éric Racine

[permalink] [raw]
Subject: Re: [Bug #13941] x86 Geode issue

2009/8/18 Willy Tarreau <[email protected]>:
> On Tue, Aug 18, 2009 at 12:02:34AM +0300, Martin-Éric Racine wrote:
> (...)
>> > it's still a JPG - posting the transcribed oops in email text would
>> > certainly help more folks looking over it.
>> >
>> > (painful i know ...)
>>
>> I welcome suggestions for proper OCR software that can extract the
>> text displayed therein. Manually transcribing it is too error-prone to
>> even try.
>
> Well, there are less risks of errors retyping by hand than passing via
> an OCR. At least *you* know that everything you see are hex numbers, the
> OCR does not. Eventhough it's quite annoying to do that by hand, it
> generally takes less than 5 minutes to retype an oops, which is not that
> much. Of course, the serial cable to another machine to get a panic dump
> is the easiest solution ;-)

That would be assuming that a serial console is available. This is
not the case here. No legacy port whatsoever.

Martin-Éric

2009-09-11 12:36:25

by Martin-Éric Racine

[permalink] [raw]
Subject: Re: [Bug #13941] x86 Geode issue

2009/8/17 Ingo Molnar <[email protected]>:
>
> * Ingo Molnar <[email protected]> wrote:
>
>>
>> * Martin-Éric Racine <[email protected]> wrote:
>>
>> > On Thu, Aug 13, 2009 at 9:34 PM, Rafael J. Wysocki<[email protected]> wrote:
>> > > On Thursday 13 August 2009, Martin-Éric Racine wrote:
>> > >> On Thu, Aug 13, 2009 at 5:54 PM, Rafael J. Wysocki<[email protected]> wrote:
>> > >> > On Thursday 13 August 2009, Martin-Éric Racine wrote:
>> > >> >> 2009/8/13 Martin-Éric Racine <[email protected]>:
>> > >> >> > On Thu, Aug 13, 2009 at 12:07 PM, Ingo Molnar<[email protected]> wrote:
>> > >> >> >> * Martin-Éric Racine <[email protected]> wrote:
>> > >> >> >>> Yes, this bug is still valid.
>> > >> >> >>>
>> > >> >> >>> Ubuntu kernel team member Leann Ogasawara and I are slowly
>> > >> >> >>> bisecting our way through the changes that took place since 2.6.30
>> > >> >> >>> to find the commit that introduced this regression. Please stay
>> > >> >> >>> tuned.
>> > >> >> >>
>> > >> >> >> hm, the only outright Geode related commit was:
>> > >> >> >>
>> > >> >> >>  d6c585a: x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure
>> > >> >> >>
>> > >> >> >> the jpg at:
>> > >> >> >>
>> > >> >> >>  http://launchpadlibrarian.net/28892781/00002.jpg
>> > >> >> >>
>> > >> >> >> is very out of focus - but what i could decypher suggests a
>> > >> >> >> pagefault crash in the VFS code, in generic_delete_inode().
>> > >> >>
>> > >> >> This one might be a bit better:
>> > >> >>
>> > >> >> http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg
>> > >
>> > > Hmm.  This looks like a sysfs oops to my untrained eye.
>> >
>> > The bisect I did with Leann Ogasawara has narrowed the kernel panic
>> > down to the following:
>> >
>> > commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0
>> > Author: Al Viro <[email protected]>
>> > Date: Mon Jun 8 19:50:45 2009 -0400
>> >
>> >     add caching of ACLs in struct inode
>> >
>> >     No helpers, no conversions yet.
>> >
>> >     Signed-off-by: Al Viro <[email protected]>
>>
>> Weird. If the functions do what their name suggests, i.e. if
>> inode_init_always() is an always called constructor and if
>> destroy_inode() is an unconditional destructor then this patch
>> should have no functional effect on the VFS side.
>>
>> It increases the size of struct inode, so if you have some old
>> module (built to an older version of fs.h) still around it might
>> corrupt your inode data structure.
>>
>> Or the size change might trigger some dormant bug. It might move a
>> critical inode right into the path of a pre-existing (but not
>> visibly crash-triggering) data corruption.
>>
>> The possibilities on the 'weird bug' front are endless - the
>> crash/oops itself should be turned into text, posted here and
>> analyzed.
>
> Btw., before you invest any time into the 'weird crash' theory, i'd
> suggest to double check the bisection result:
>
>  f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0    crashes
>  f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1  boots fine
>
> You can save yourself from a lot of head scratching that way - the
> bisection result looks weird. (albeit plausible - a VFS crash points
> to a VFS commit.)
>
> _Maybe_ the bisection is just off a little bit (there was a
> bisection mistake in the last few steps), and the real buggy commit
> is one of the nearby ones:

We double checked again last week with fresh builds and validated that
the above result is correct.

What puzzles us is the start of the crash:


BUG: unable to handle kernel paging request at ffffb4ff
IP: [<c01f716b>] __destroy_inode+0x4b/0x80
*pde = 00810067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/power/resume


Any ideas?
Martin-Éric

2009-09-30 13:22:14

by Jan Scholz

[permalink] [raw]
Subject: Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)

Hi,

now that the patch "HID: completely remove apple mightymouse from
blacklist" is merged upstream as
"42960a13001aa6df52ca9952ce996f94a744ea65" I think it should be merged
in the v2.6.31-stable series as well.

Best regards,
Jan

Jan Scholz <[email protected]> writes:

> I can confirm the reported bug, but for me reverting fa047e4f6fa63a6 is
> not sufficient. I have to remove the device id of the mighty mouse from
> the hid_blacklist list in drivers/hid/hid-core.c as well, see the patch below.
>
> "Rafael J. Wysocki" <[email protected]> writes:
>
>> This message has been generated automatically as a part of a report
>> of recent regressions.
>>
>> The following bug entry is on the current list of known regressions
>> from 2.6.30. Please verify if it still should be listed and let me know
>> (either way).
>>
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
>> Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
>> Submitter : Adrian Ulrich <[email protected]>
>> Date : 2009-08-08 22:08 (2 days old)
>> First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
>>
>
> From b7393ed6dfe00c9e126a2dd34659156548df15cc Mon Sep 17 00:00:00 2001
> From: Jan Scholz <[email protected]>
> Date: Tue, 11 Aug 2009 14:33:27 +0200
> Subject: [PATCH] HID: commit fa047e4f is incomplete
>
> Commit fa047e4f6fa63a6e9d0ae4d7749538830d14a343 "HID: fix inverted
> wheel for bluetooth version of apple mighty mouse" is incomplete. If
> we remove Apple MightyMouse (bluetooth version) from the list of
> apple_devices in drivers/hid/hid-apple.c we have to remove it from
> hid_blacklist in drivers/hid/hid-core.c as well.
>
> Signed-off-by: Jan Scholz <[email protected]>
> ---
> drivers/hid/hid-core.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
> index 5eb10c2..047844d 100644
> --- a/drivers/hid/hid-core.c
> +++ b/drivers/hid/hid-core.c
> @@ -1319,7 +1319,6 @@ static const struct hid_device_id hid_blacklist[] = {
> { HID_USB_DEVICE(USB_VENDOR_ID_ZEROPLUS, 0x0005) },
> { HID_USB_DEVICE(USB_VENDOR_ID_ZEROPLUS, 0x0030) },
>
> - { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, 0x030c) },
> { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_BT) },
> { }
> };

2009-09-30 15:25:15

by Jiri Kosina

[permalink] [raw]
Subject: Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)

On Wed, 30 Sep 2009, Jan Scholz wrote:

> now that the patch "HID: completely remove apple mightymouse from
> blacklist" is merged upstream as
> "42960a13001aa6df52ca9952ce996f94a744ea65" I think it should be merged
> in the v2.6.31-stable series as well.

Agreed. As it didn't have "Cc: [email protected]" in the changelog, it
will not be picked up automagically.

Will do.

--
Jiri Kosina
SUSE Labs, Novell Inc.