2009-06-29 00:31:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.31-rc1-git3: Reported regressions 2.6.29 -> 2.6.30

[NOTES:
* I hope you notice the jump of the number of reported regressions after 2.6.30
was released.
* Please let me know which of these bugs have been fixed already (ideally
please also provide the name of the fix commit).
* The post-2.6.30 reports were flooded by the megre window noise that made
them very difficult to track.]

This message contains a list of some regressions introduced between 2.6.29 and
2.6.30, for which there are no fixes in the mainline I know of. If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.29
and 2.6.30, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2009-06-29 133 46 43
2009-06-07 110 35 31
2009-05-31 100 32 27
2009-05-24 92 34 27
2009-05-16 81 36 33
2009-04-25 55 36 26
2009-04-17 37 35 28


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13669
Subject : Kernel bug with dock driver
Submitter : Joerg Platte <[email protected]>
Date : 2009-06-14 21:00 (15 days old)
References : http://lkml.org/lkml/2009/6/14/216
Handled-By : Henrique de Moraes Holschuh <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13668
Subject : Can't boot 2.6.30 powerpc kernel under qemu.
Submitter : Rob Landley <[email protected]>
Date : 2009-06-27 18:08 (2 days old)
References : http://lkml.org/lkml/2009/6/27/159


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13660
Subject : Crashes during boot on 2.6.30 / 2.6.31-rc, random programs
Submitter : Joao Correia <[email protected]>
Date : 2009-06-27 16:07 (2 days old)
References : http://lkml.org/lkml/2009/6/27/95


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13651
Subject : Anyone know what happened with PC speaker in 2.6.30?
Submitter : Michael Tokarev <[email protected]>
Date : 2009-06-15 14:41 (14 days old)
References : http://marc.info/?l=linux-kernel&m=124507695427817&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13649
Subject : Bad page state in process with various applications
Submitter : Maxim Levitsky <[email protected]>
Date : 2009-06-20 15:27 (9 days old)
References : http://marc.info/?l=linux-mm&m=124551168828090&w=4
Handled-By : Mel Gorman <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13648
Subject : nfsd: page allocation failure
Submitter : Justin Piszcz <[email protected]>
Date : 2009-06-22 12:08 (7 days old)
References : http://lkml.org/lkml/2009/6/22/309


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13647
Subject : fb/mmap lockdep report.
Submitter : Dave Jones <[email protected]>
Date : 2009-06-21 13:33 (8 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=513adb58685615b0b1d47a3f0d40f5352beff189
References : http://lkml.org/lkml/2009/6/21/90
http://lkml.org/lkml/2009/6/21/122


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13646
Subject : warn_on tty_io.c, broken bluetooth
Submitter : Pavel Machek <[email protected]>
Date : 2009-06-19 17:05 (10 days old)
References : http://lkml.org/lkml/2009/6/19/187


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13644
Subject : hibernation/swsusp lockup due to acpi-cpufreq
Submitter : Johannes Stezenbach <[email protected]>
Date : 2009-06-16 01:27 (13 days old)
References : http://lkml.org/lkml/2009/6/15/630
Handled-By : Rafael J. Wysocki <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13634
Subject : [drm:drm_wait_vblank] failed to acquire vblank counter
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Date : 2009-06-27 07:02 (2 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13624
Subject : usb: wrong autosuspend initialization
Submitter : <[email protected]>
Date : 2009-06-25 18:18 (4 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13621
Subject : xfs hangs with assertion failed
Submitter : Johannes Engel <[email protected]>
Date : 2009-06-25 10:07 (4 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13620
Subject : acpi_enforce_resources broken - conflicting i2c module loaded on some EeePCs
Submitter : Alan Jenkins <[email protected]>
Date : 2009-06-25 08:31 (4 days old)
References : <http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-June/002316.html>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13613
Subject : lockups with JFS (inconsistent lock state)
Submitter : Jan &quot;Yenya&quot; Kasprzak <[email protected]>
Date : 2009-06-24 09:35 (5 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13581
Subject : ath9k doesn't work with newer kernels
Submitter : Matteo <[email protected]>
Date : 2009-06-19 12:04 (10 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13558
Subject : Tracelog during resume
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Date : 2009-06-17 11:32 (12 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13554
Subject : linux-image-2.6.30-1-686, KMS enabled: black screen, no X window
Submitter : Jos van Wolput <[email protected]>
Date : 2009-06-17 06:28 (12 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13528
Subject : au0828: major drop in reception quality between 2.6.29.4 and 2.6.30 on HVR-950q
Submitter : Jim Faulkner <[email protected]>
Date : 2009-06-13 19:34 (16 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13518
Subject : slab grows with NFS write activity.
Submitter : Andrew Randrianasulu <[email protected]>
Date : 2009-06-12 09:51 (17 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13514
Subject : acer_wmi causes stack corruption
Submitter : Rus <[email protected]>
Date : 2009-06-12 08:13 (17 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13512
Subject : D43 on 2.6.30 doesn't suspend anymore
Submitter : Daniel Smolik <[email protected]>
Date : 2009-06-11 20:12 (18 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13502
Subject : GPE storm causes polling mode, which causes /proc/acpi/battery read to take 4 seconds - MacBookPro4,1
Submitter : <[email protected]>
Date : 2009-06-10 20:04 (19 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13472
Subject : Oops with minicom and USB serial
Submitter : Peter Chubb <[email protected]>
Date : 2009-06-05 1:37 (24 days old)
References : http://marc.info/?l=linux-kernel&m=124416901026700&w=4
Handled-By : Alan Stern <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13471
Subject : Loading parport_pc kills the keyboard if ACPI is enabled
Submitter : Ozan Çağlayan <[email protected]>
Date : 2009-06-04 9:12 (25 days old)
References : http://marc.info/?l=linux-kernel&m=124410667532558&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13424
Subject : possible deadlock when doing governor switching
Submitter : Shaohua Li <[email protected]>
Date : 2009-05-31 16:36 (29 days old)
References : http://www.spinics.net/lists/cpufreq/msg00711.html
Handled-By : Mathieu Desnoyers <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13408
Subject : Performance regression in 2.6.30-rc7
Submitter : Diego Calleja <[email protected]>
Date : 2009-05-30 18:51 (30 days old)
References : http://lkml.org/lkml/2009/5/30/146


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13407
Subject : adb trackpad disappears after suspend to ram
Submitter : Jan Scholz <[email protected]>
Date : 2009-05-28 7:59 (32 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2ed8d2b3a81bdbb0418301628ccdb008ac9f40b7
References : http://marc.info/?l=linux-kernel&m=124349762314976&w=4
Handled-By : Rafael J. Wysocki <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13401
Subject : pktcdvd writing is really slow with CFQ scheduler (bisected)
Submitter : Laurent Riffard <[email protected]>
Date : 2009-05-28 18:43 (32 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13374
Subject : reiserfs blocked for more than 120secs
Submitter : Harald Dunkel <[email protected]>
Date : 2009-05-23 8:52 (37 days old)
References : http://marc.info/?l=linux-kernel&m=124306880410811&w=4
http://lkml.org/lkml/2009/5/29/389


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13373
Subject : fbcon, intelfb, i915: INFO: possible circular locking dependency detected
Submitter : Miles Lane <[email protected]>
Date : 2009-05-23 5:08 (37 days old)
References : http://marc.info/?l=linux-kernel&m=124305538130702&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13362
Subject : rt2x00: slow wifi with correct basic rate bitmap
Submitter : Alejandro Riveira <[email protected]>
Date : 2009-05-22 13:32 (38 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13351
Subject : 2.6.30 corrupts my system after suspend resume with readonly mounted hard disk
Submitter : <[email protected]>
Date : 2009-05-20 14:09 (40 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78a8b35bc7abf8b8333d6f625e08c0f7cc1c3742
Handled-By : Yinghai Lu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13341
Subject : Random Oops at boot at loading ip6tables rules
Submitter : <[email protected]>
Date : 2009-05-19 09:08 (41 days old)
Handled-By : Rusty Russell <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13337
Subject : [post 2.6.29 regression] hang during suspend of b44/b43 modules
Submitter : Tomas Janousek <[email protected]>
Date : 2009-05-18 10:59 (42 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13328
Subject : b44: eth0: BUG! Timeout waiting for bit 00000002 of register 42c to clear.
Submitter : Francis Moreau <[email protected]>
Date : 2009-05-03 16:22 (57 days old)
References : http://marc.info/?l=linux-kernel&m=124136778012280&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
Subject : Page allocation failures with b43 and p54usb
Submitter : Larry Finger <[email protected]>
Date : 2009-04-29 21:01 (61 days old)
References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
http://lkml.org/lkml/2009/6/7/136
Handled-By : Johannes Berg <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13318
Subject : AGP doesn't work anymore on nforce2
Submitter : Karsten Mehrhoff <[email protected]>
Date : 2009-04-30 8:51 (60 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=59de2bebabc5027f93df999d59cc65df591c3e6e
References : http://marc.info/?l=linux-kernel&m=124108156417560&w=4
Handled-By : Shaohua Li <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13306
Subject : hibernate slow on _second_ run
Submitter : Johannes Berg <[email protected]>
Date : 2009-05-14 09:34 (46 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13277
Subject : 2.6.30 regression - hang on 2nd resume - bisected - Thinkpad X40
Submitter : Daniel Vetter <[email protected]>
Date : 2009-05-11 10:08 (49 days old)
Handled-By : Len Brown <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13219
Subject : Intel 440GX: Since kernel 2.6.30-rc1, computers hangs randomly but not with kernel <= 2.6.29.4
Submitter : David Hill <[email protected]>
Date : 2009-05-01 16:57 (59 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13179
Subject : CD-R: wodim intermittent failures
Submitter : Andy Isaacson <[email protected]>
Date : 2009-04-21 1:52 (69 days old)
References : http://marc.info/?l=linux-kernel&m=124027879214231&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13119
Subject : Trouble with make-install from a NFS mount
Submitter : Gregory Haskins <[email protected]>
Date : 2009-04-14 21:32 (76 days old)
References : http://marc.info/?l=linux-kernel&m=123974482327044&w=4
Handled-By : H. Peter Anvin <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13109
Subject : High latency on /sys/class/thermal
Submitter : Tiago Simões Batista <[email protected]>
Date : 2009-04-11 14:56 (79 days old)
References : http://marc.info/?l=linux-kernel&m=123946182301248&w=4
Handled-By : Zhang Rui <[email protected]>
Alexey Starikovskiy <[email protected]>


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13663
Subject : suspend to ram regression (IDE related)
Submitter : Etienne Basset <[email protected]>
Date : 2009-06-26 17:40 (3 days old)
References : http://lkml.org/lkml/2009/6/26/242
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
Patch : http://patchwork.kernel.org/patch/32719/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13475
Subject : suspend/hibernate lockdep warning
Submitter : Dave Young <[email protected]>
Date : 2009-06-02 10:00 (27 days old)
References : http://marc.info/?l=linux-kernel&m=124393723321241&w=4
Handled-By : Mathieu Desnoyers <[email protected]>
Patch : http://patchwork.kernel.org/patch/28660/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13389
Subject : Warning 'Invalid throttling state, reset' gets displayed when it should not be
Submitter : Frans Pop <[email protected]>
Date : 2009-05-26 15:24 (34 days old)
Handled-By : Frans Pop <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=21671
http://bugzilla.kernel.org/attachment.cgi?id=21672


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.29 and 2.6.30, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=13070

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2009-06-29 00:31:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13109] High latency on /sys/class/thermal

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13109
Subject : High latency on /sys/class/thermal
Submitter : Tiago Simões Batista <[email protected]>
Date : 2009-04-11 14:56 (79 days old)
References : http://marc.info/?l=linux-kernel&m=123946182301248&w=4
Handled-By : Zhang Rui <[email protected]>
Alexey Starikovskiy <[email protected]>

2009-06-29 00:35:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13219] Intel 440GX: Since kernel 2.6.30-rc1, computers hangs randomly but not with kernel <= 2.6.29.4

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13219
Subject : Intel 440GX: Since kernel 2.6.30-rc1, computers hangs randomly but not with kernel <= 2.6.29.4
Submitter : David Hill <[email protected]>
Date : 2009-05-01 16:57 (59 days old)

2009-06-29 00:35:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13306] hibernate slow on _second_ run

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13306
Subject : hibernate slow on _second_ run
Submitter : Johannes Berg <[email protected]>
Date : 2009-05-14 09:34 (46 days old)

2009-06-29 00:35:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13179] CD-R: wodim intermittent failures

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13179
Subject : CD-R: wodim intermittent failures
Submitter : Andy Isaacson <[email protected]>
Date : 2009-04-21 1:52 (69 days old)
References : http://marc.info/?l=linux-kernel&m=124027879214231&w=4

2009-06-29 00:36:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13119] Trouble with make-install from a NFS mount

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13119
Subject : Trouble with make-install from a NFS mount
Submitter : Gregory Haskins <[email protected]>
Date : 2009-04-14 21:32 (76 days old)
References : http://marc.info/?l=linux-kernel&m=123974482327044&w=4
Handled-By : H. Peter Anvin <[email protected]>

2009-06-29 00:36:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13277] 2.6.30 regression - hang on 2nd resume - bisected - Thinkpad X40

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13277
Subject : 2.6.30 regression - hang on 2nd resume - bisected - Thinkpad X40
Submitter : Daniel Vetter <[email protected]>
Date : 2009-05-11 10:08 (49 days old)
Handled-By : Len Brown <[email protected]>

2009-06-29 00:37:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13319] Page allocation failures with b43 and p54usb

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
Subject : Page allocation failures with b43 and p54usb
Submitter : Larry Finger <[email protected]>
Date : 2009-04-29 21:01 (61 days old)
References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
http://lkml.org/lkml/2009/6/7/136
Handled-By : Johannes Berg <[email protected]>

2009-06-29 00:37:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13318] AGP doesn't work anymore on nforce2

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13318
Subject : AGP doesn't work anymore on nforce2
Submitter : Karsten Mehrhoff <[email protected]>
Date : 2009-04-30 8:51 (60 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=59de2bebabc5027f93df999d59cc65df591c3e6e
References : http://marc.info/?l=linux-kernel&m=124108156417560&w=4
Handled-By : Shaohua Li <[email protected]>

2009-06-29 00:37:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13389] Warning 'Invalid throttling state, reset' gets displayed when it should not be

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13389
Subject : Warning 'Invalid throttling state, reset' gets displayed when it should not be
Submitter : Frans Pop <[email protected]>
Date : 2009-05-26 15:24 (34 days old)
Handled-By : Frans Pop <[email protected]>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=21671
http://bugzilla.kernel.org/attachment.cgi?id=21672

2009-06-29 00:39:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13351] 2.6.30 corrupts my system after suspend resume with readonly mounted hard disk

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13351
Subject : 2.6.30 corrupts my system after suspend resume with readonly mounted hard disk
Submitter : <[email protected]>
Date : 2009-05-20 14:09 (40 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78a8b35bc7abf8b8333d6f625e08c0f7cc1c3742
Handled-By : Yinghai Lu <[email protected]>

2009-06-29 00:39:02

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13373] fbcon, intelfb, i915: INFO: possible circular locking dependency detected

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13373
Subject : fbcon, intelfb, i915: INFO: possible circular locking dependency detected
Submitter : Miles Lane <[email protected]>
Date : 2009-05-23 5:08 (37 days old)
References : http://marc.info/?l=linux-kernel&m=124305538130702&w=4

2009-06-29 00:38:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13337] [post 2.6.29 regression] hang during suspend of b44/b43 modules

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13337
Subject : [post 2.6.29 regression] hang during suspend of b44/b43 modules
Submitter : Tomas Janousek <[email protected]>
Date : 2009-05-18 10:59 (42 days old)

2009-06-29 00:38:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13362] rt2x00: slow wifi with correct basic rate bitmap

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13362
Subject : rt2x00: slow wifi with correct basic rate bitmap
Submitter : Alejandro Riveira <[email protected]>
Date : 2009-05-22 13:32 (38 days old)

2009-06-29 00:39:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13328] b44: eth0: BUG! Timeout waiting for bit 00000002 of register 42c to clear.

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13328
Subject : b44: eth0: BUG! Timeout waiting for bit 00000002 of register 42c to clear.
Submitter : Francis Moreau <[email protected]>
Date : 2009-05-03 16:22 (57 days old)
References : http://marc.info/?l=linux-kernel&m=124136778012280&w=4

2009-06-29 00:38:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13341] Random Oops at boot at loading ip6tables rules

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13341
Subject : Random Oops at boot at loading ip6tables rules
Submitter : <[email protected]>
Date : 2009-05-19 09:08 (41 days old)
Handled-By : Rusty Russell <[email protected]>

2009-06-29 00:38:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13374] reiserfs blocked for more than 120secs

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13374
Subject : reiserfs blocked for more than 120secs
Submitter : Harald Dunkel <[email protected]>
Date : 2009-05-23 8:52 (37 days old)
References : http://marc.info/?l=linux-kernel&m=124306880410811&w=4
http://lkml.org/lkml/2009/5/29/389

2009-06-29 00:39:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13401] pktcdvd writing is really slow with CFQ scheduler (bisected)

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13401
Subject : pktcdvd writing is really slow with CFQ scheduler (bisected)
Submitter : Laurent Riffard <[email protected]>
Date : 2009-05-28 18:43 (32 days old)

2009-06-29 00:41:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13471] Loading parport_pc kills the keyboard if ACPI is enabled

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13471
Subject : Loading parport_pc kills the keyboard if ACPI is enabled
Submitter : Ozan Çağlayan <[email protected]>
Date : 2009-06-04 9:12 (25 days old)
References : http://marc.info/?l=linux-kernel&m=124410667532558&w=4

2009-06-29 00:40:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13424] possible deadlock when doing governor switching

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13424
Subject : possible deadlock when doing governor switching
Submitter : Shaohua Li <[email protected]>
Date : 2009-05-31 16:36 (29 days old)
References : http://www.spinics.net/lists/cpufreq/msg00711.html
Handled-By : Mathieu Desnoyers <[email protected]>

2009-06-29 00:39:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13407] adb trackpad disappears after suspend to ram

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13407
Subject : adb trackpad disappears after suspend to ram
Submitter : Jan Scholz <[email protected]>
Date : 2009-05-28 7:59 (32 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2ed8d2b3a81bdbb0418301628ccdb008ac9f40b7
References : http://marc.info/?l=linux-kernel&m=124349762314976&w=4
Handled-By : Rafael J. Wysocki <[email protected]>

2009-06-29 00:41:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13502] GPE storm causes polling mode, which causes /proc/acpi/battery read to take 4 seconds - MacBookPro4,1

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13502
Subject : GPE storm causes polling mode, which causes /proc/acpi/battery read to take 4 seconds - MacBookPro4,1
Submitter : <[email protected]>
Date : 2009-06-10 20:04 (19 days old)

2009-06-29 00:40:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13408] Performance regression in 2.6.30-rc7

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13408
Subject : Performance regression in 2.6.30-rc7
Submitter : Diego Calleja <[email protected]>
Date : 2009-05-30 18:51 (30 days old)
References : http://lkml.org/lkml/2009/5/30/146

2009-06-29 00:40:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13512] D43 on 2.6.30 doesn't suspend anymore

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13512
Subject : D43 on 2.6.30 doesn't suspend anymore
Submitter : Daniel Smolik <[email protected]>
Date : 2009-06-11 20:12 (18 days old)

2009-06-29 00:41:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13472] Oops with minicom and USB serial

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13472
Subject : Oops with minicom and USB serial
Submitter : Peter Chubb <[email protected]>
Date : 2009-06-05 1:37 (24 days old)
References : http://marc.info/?l=linux-kernel&m=124416901026700&w=4
Handled-By : Alan Stern <[email protected]>

2009-06-29 00:42:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13554] linux-image-2.6.30-1-686, KMS enabled: black screen, no X window

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13554
Subject : linux-image-2.6.30-1-686, KMS enabled: black screen, no X window
Submitter : Jos van Wolput <[email protected]>
Date : 2009-06-17 06:28 (12 days old)

2009-06-29 00:42:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13518] slab grows with NFS write activity.

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13518
Subject : slab grows with NFS write activity.
Submitter : Andrew Randrianasulu <[email protected]>
Date : 2009-06-12 09:51 (17 days old)

2009-06-29 00:43:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13621] xfs hangs with assertion failed

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13621
Subject : xfs hangs with assertion failed
Submitter : Johannes Engel <[email protected]>
Date : 2009-06-25 10:07 (4 days old)

2009-06-29 00:42:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13528] au0828: major drop in reception quality between 2.6.29.4 and 2.6.30 on HVR-950q

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13528
Subject : au0828: major drop in reception quality between 2.6.29.4 and 2.6.30 on HVR-950q
Submitter : Jim Faulkner <[email protected]>
Date : 2009-06-13 19:34 (16 days old)

2009-06-29 00:42:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13475] suspend/hibernate lockdep warning

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13475
Subject : suspend/hibernate lockdep warning
Submitter : Dave Young <[email protected]>
Date : 2009-06-02 10:00 (27 days old)
References : http://marc.info/?l=linux-kernel&m=124393723321241&w=4
Handled-By : Mathieu Desnoyers <[email protected]>
Patch : http://patchwork.kernel.org/patch/28660/

2009-06-29 00:42:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13558] Tracelog during resume

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13558
Subject : Tracelog during resume
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Date : 2009-06-17 11:32 (12 days old)

2009-06-29 00:43:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13620] acpi_enforce_resources broken - conflicting i2c module loaded on some EeePCs

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13620
Subject : acpi_enforce_resources broken - conflicting i2c module loaded on some EeePCs
Submitter : Alan Jenkins <[email protected]>
Date : 2009-06-25 08:31 (4 days old)
References : <http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-June/002316.html>

2009-06-29 00:41:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13514] acer_wmi causes stack corruption

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13514
Subject : acer_wmi causes stack corruption
Submitter : Rus <[email protected]>
Date : 2009-06-12 08:13 (17 days old)

2009-06-29 00:43:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13624] usb: wrong autosuspend initialization

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13624
Subject : usb: wrong autosuspend initialization
Submitter : <[email protected]>
Date : 2009-06-25 18:18 (4 days old)

2009-06-29 00:44:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13613] lockups with JFS (inconsistent lock state)

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13613
Subject : lockups with JFS (inconsistent lock state)
Submitter : Jan &quot;Yenya&quot; Kasprzak <[email protected]>
Date : 2009-06-24 09:35 (5 days old)

2009-06-29 00:43:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13634] [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13634
Subject : [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Date : 2009-06-27 07:02 (2 days old)

2009-06-29 00:44:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13646] warn_on tty_io.c, broken bluetooth

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13646
Subject : warn_on tty_io.c, broken bluetooth
Submitter : Pavel Machek <[email protected]>
Date : 2009-06-19 17:05 (10 days old)
References : http://lkml.org/lkml/2009/6/19/187

2009-06-29 00:44:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13581] ath9k doesn't work with newer kernels

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13581
Subject : ath9k doesn't work with newer kernels
Submitter : Matteo <[email protected]>
Date : 2009-06-19 12:04 (10 days old)

2009-06-29 00:45:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13647] fb/mmap lockdep report.

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13647
Subject : fb/mmap lockdep report.
Submitter : Dave Jones <[email protected]>
Date : 2009-06-21 13:33 (8 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=513adb58685615b0b1d47a3f0d40f5352beff189
References : http://lkml.org/lkml/2009/6/21/90
http://lkml.org/lkml/2009/6/21/122

2009-06-29 00:44:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13644] hibernation/swsusp lockup due to acpi-cpufreq

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13644
Subject : hibernation/swsusp lockup due to acpi-cpufreq
Submitter : Johannes Stezenbach <[email protected]>
Date : 2009-06-16 01:27 (13 days old)
References : http://lkml.org/lkml/2009/6/15/630
Handled-By : Rafael J. Wysocki <[email protected]>

2009-06-29 00:46:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13669] Kernel bug with dock driver

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13669
Subject : Kernel bug with dock driver
Submitter : Joerg Platte <[email protected]>
Date : 2009-06-14 21:00 (15 days old)
References : http://lkml.org/lkml/2009/6/14/216
Handled-By : Henrique de Moraes Holschuh <[email protected]>

2009-06-29 00:45:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13648] nfsd: page allocation failure

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13648
Subject : nfsd: page allocation failure
Submitter : Justin Piszcz <[email protected]>
Date : 2009-06-22 12:08 (7 days old)
References : http://lkml.org/lkml/2009/6/22/309

2009-06-29 00:46:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13651] Anyone know what happened with PC speaker in 2.6.30?

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13651
Subject : Anyone know what happened with PC speaker in 2.6.30?
Submitter : Michael Tokarev <[email protected]>
Date : 2009-06-15 14:41 (14 days old)
References : http://marc.info/?l=linux-kernel&m=124507695427817&w=4

2009-06-29 00:46:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13668] Can't boot 2.6.30 powerpc kernel under qemu.

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13668
Subject : Can't boot 2.6.30 powerpc kernel under qemu.
Submitter : Rob Landley <[email protected]>
Date : 2009-06-27 18:08 (2 days old)
References : http://lkml.org/lkml/2009/6/27/159

2009-06-29 00:47:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13660] Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13660
Subject : Crashes during boot on 2.6.30 / 2.6.31-rc, random programs
Submitter : Joao Correia <[email protected]>
Date : 2009-06-27 16:07 (2 days old)
References : http://lkml.org/lkml/2009/6/27/95

2009-06-29 00:46:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13663] suspend to ram regression (IDE related)

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13663
Subject : suspend to ram regression (IDE related)
Submitter : Etienne Basset <[email protected]>
Date : 2009-06-26 17:40 (3 days old)
References : http://lkml.org/lkml/2009/6/26/242
Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
Patch : http://patchwork.kernel.org/patch/32719/

2009-06-29 00:47:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #13649] Bad page state in process with various applications

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.29 and 2.6.30.

The following bug entry is on the current list of known regressions
introduced between 2.6.29 and 2.6.30. Please verify if it still should
be listed and let me know (either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13649
Subject : Bad page state in process with various applications
Submitter : Maxim Levitsky <[email protected]>
Date : 2009-06-20 15:27 (9 days old)
References : http://marc.info/?l=linux-mm&m=124551168828090&w=4
Handled-By : Mel Gorman <[email protected]>

2009-06-29 01:28:00

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [Bug #13424] possible deadlock when doing governor switching

* Rafael J. Wysocki ([email protected]) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>

Yep, it still exists. Venkatesh Pallipadi from Intel is working on it.
We need to figure out a proper way to fix policy rwlock vs dbs_mutex vs
timer mutex dependency.

Mathieu

>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13424
> Subject : possible deadlock when doing governor switching
> Submitter : Shaohua Li <[email protected]>
> Date : 2009-05-31 16:36 (29 days old)
> References : http://www.spinics.net/lists/cpufreq/msg00711.html
> Handled-By : Mathieu Desnoyers <[email protected]>
>
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-06-29 03:55:27

by Jos van Wolput

[permalink] [raw]
Subject: Re: [Bug #13554] linux-image-2.6.30-1-686, KMS enabled: black screen, no X window



Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13554
> Subject : linux-image-2.6.30-1-686, KMS enabled: black screen, no X window
> Submitter : Jos van Wolput <[email protected]>
> Date : 2009-06-17 06:28 (12 days old)
>
>
>
>
Yes, it still should be listed, KMS doesn't work, at least on my system.

2009-06-29 06:31:18

by Daniel Smolik

[permalink] [raw]
Subject: Re: [Bug #13512] D43 on 2.6.30 doesn't suspend anymore

Rafael J. affected napsal(a):
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13512
> Subject : D43 on 2.6.30 doesn't suspend anymore
> Submitter : Daniel Smolik <[email protected]>
> Date : 2009-06-11 20:12 (18 days old)
>
>
>
Yes problem still exists. I now bitsecting and I am near to find
affected patch.

Regards
Dan

2009-06-29 10:29:19

by Etienne Basset

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13663
> Subject : suspend to ram regression (IDE related)
> Submitter : Etienne Basset <[email protected]>
> Date : 2009-06-26 17:40 (3 days old)
> References : http://lkml.org/lkml/2009/6/26/242
> Handled-By : Bartlomiej Zolnierkiewicz <[email protected]>
> Patch : http://patchwork.kernel.org/patch/32719/
>
>
>
yes, patch is not yet upstream;
2.6.31-rc1 + bart patch resumes from STR
current git + bart patch resume from STR fails, STR seems to have been broken again
(i was confident that the post-rc1 MCE fixes would correct the fact that computer hangs
a few minutes after resume, but computer doesn't resume at all)


Etienne

2009-06-29 10:37:41

by David Miller

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

From: Etienne Basset <[email protected]>
Date: Mon, 29 Jun 2009 12:29:09 +0200

> yes, patch is not yet upstream;

I'll take care of pushing this around today.

2009-06-29 15:51:46

by Etienne Basset

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

David Miller wrote:
> From: Etienne Basset <[email protected]>
> Date: Mon, 29 Jun 2009 12:29:09 +0200
>
>> yes, patch is not yet upstream;
>
> I'll take care of pushing this around today.
>
Hi,

thank you ;
i ran a new bisection to identify the commit that cause pain after -rc1

etienne@etienne-desktop:~/linux-2.6$ git bisect good
a1317f714af7aed60ddc182d0122477cbe36ee9b is first bad commit
commit a1317f714af7aed60ddc182d0122477cbe36ee9b
Author: Bartlomiej Zolnierkiewicz <[email protected]>
Date: Tue Jun 23 23:52:17 2009 -0700

ide: improve handling of Power Management requests

Make hwif->rq point to PM request during PM sequence and do not allow
any other types of requests to slip in (the old comment was never correct
as there should be no such requests generated during PM sequence).

Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

To have STR/resume work with current git, I have to :
1) apply Bart's patch
2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b

thanks
Etienne

2009-06-29 16:21:28

by Jeff Chua

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Mon, Jun 29, 2009 at 11:51 PM, Etienne
Basset<[email protected]> wrote:
> i ran a new bisection to identify the commit that cause pain after -rc1
> commit a1317f714af7aed60ddc182d0122477cbe36ee9b
> To have STR/resume work with current git, I have to :
> 1) apply Bart's patch
> 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b

I just tried, and it "seems" to work. Will try a few more cycles.

Thanks,
Jeff.

2009-06-29 16:51:36

by Larry Finger

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
> Subject : Page allocation failures with b43 and p54usb
> Submitter : Larry Finger <[email protected]>
> Date : 2009-04-29 21:01 (61 days old)
> References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
> http://lkml.org/lkml/2009/6/7/136
> Handled-By : Johannes Berg <[email protected]>

The cause of these failures has been determined. The wireless
subsystem frequently requests buffers of size 4096, but when SLUB
debugging is enabled and the debug info is added, the request becomes
of order 1 and memory becomes fragmented.

A controversial "fix" in which SLUB debugging was disabled for
allocations where adding such debugging info would increase the order
was discussed and tried. With a quick look at the commit list for
Linus's tree, I don't see that such a patch is available, but I will
be corrected if I missed it.

Larry

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Monday 29 June 2009 17:51:35 Etienne Basset wrote:
> David Miller wrote:
> > From: Etienne Basset <[email protected]>
> > Date: Mon, 29 Jun 2009 12:29:09 +0200
> >
> >> yes, patch is not yet upstream;
> >
> > I'll take care of pushing this around today.
> >
> Hi,
>
> thank you ;
> i ran a new bisection to identify the commit that cause pain after -rc1
>
> etienne@etienne-desktop:~/linux-2.6$ git bisect good
> a1317f714af7aed60ddc182d0122477cbe36ee9b is first bad commit

Thanks for finding it.

Dave, please just revert this patch (it wasn't meant for Linus' tree anyway).

> commit a1317f714af7aed60ddc182d0122477cbe36ee9b
> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> Date: Tue Jun 23 23:52:17 2009 -0700
>
> ide: improve handling of Power Management requests
>
> Make hwif->rq point to PM request during PM sequence and do not allow
> any other types of requests to slip in (the old comment was never correct
> as there should be no such requests generated during PM sequence).
>
> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> To have STR/resume work with current git, I have to :
> 1) apply Bart's patch
> 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
>
> thanks
> Etienne
>
>

2009-06-29 18:39:39

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [Bug #13424] possible deadlock when doing governor switching

On Sun, 2009-06-28 at 18:25 -0700, Mathieu Desnoyers wrote:
> * Rafael J. Wysocki ([email protected]) wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
> >
>
> Yep, it still exists. Venkatesh Pallipadi from Intel is working on it.
> We need to figure out a proper way to fix policy rwlock vs dbs_mutex vs
> timer mutex dependency.
>

Yes. Still working on it. I thought I had a fix for this. But, over the
weekend test run resulted in a WARN_ON with sysfs_remove_group as below.
Looks like I need a day or two more to work through the web of locks
here..

Thanks,
Venki

[10412.466195] ------------[ cut here ]------------
[10412.466201] WARNING:
at /home/venkip/src/linus/linux-2.6/fs/sysfs/group.c:138
sysfs_remove_group+0x3e/0xa3()
[10412.466204] Hardware name: Santa Rosa platform
[10412.466206] sysfs group c16df3b0 not found for kobject 'cpufreq'
[10412.466207] Modules linked in:
[10412.466210] Pid: 20609, comm: write_syscpufre Not tainted 2.6.31-rc1
#195
[10412.466212] Call Trace:
[10412.466217] [<c102a0a4>] warn_slowpath_common+0x60/0x90
[10412.466220] [<c102a108>] warn_slowpath_fmt+0x24/0x27
[10412.466223] [<c10e0422>] sysfs_remove_group+0x3e/0xa3
[10412.466227] [<c131b7fc>] cpufreq_governor_dbs+0x1f7/0x25b
[10412.466231] [<c1319469>] __cpufreq_governor+0x7c/0xb3
[10412.466234] [<c1319608>] __cpufreq_set_policy+0x13f/0x1c3
[10412.466238] [<c1319e74>] store_scaling_governor+0x18a/0x1b2
[10412.466241] [<c131aa50>] ? handle_update+0x0/0x28
[10412.466244] [<c131a2a5>] ? lock_policy_rwsem_write+0x33/0x5b
[10412.466247] [<c1319cea>] ? store_scaling_governor+0x0/0x1b2
[10412.466250] [<c131a942>] store+0x48/0x61
[10412.466254] [<c10de532>] sysfs_write_file+0xb4/0xdf
[10412.466265] [<c10de47e>] ? sysfs_write_file+0x0/0xdf
[10412.466269] [<c10a0172>] vfs_write+0x84/0xdf
[10412.466272] [<c10a0266>] sys_write+0x3b/0x60
[10412.466276] [<c1002a04>] sysenter_do_call+0x12/0x22
[10412.466278] ---[ end trace 31a730d96cbc1841 ]---

2009-06-29 19:06:45

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [Bug #13424] possible deadlock when doing governor switching

* Pallipadi, Venkatesh ([email protected]) wrote:
> On Sun, 2009-06-28 at 18:25 -0700, Mathieu Desnoyers wrote:
> > * Rafael J. Wysocki ([email protected]) wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.29 and 2.6.30.
> > >
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > > be listed and let me know (either way).
> > >
> >
> > Yep, it still exists. Venkatesh Pallipadi from Intel is working on it.
> > We need to figure out a proper way to fix policy rwlock vs dbs_mutex vs
> > timer mutex dependency.
> >
>
> Yes. Still working on it. I thought I had a fix for this. But, over the
> weekend test run resulted in a WARN_ON with sysfs_remove_group as below.
> Looks like I need a day or two more to work through the web of locks
> here..
>

A quick fix I thought about is to add a mutex to cpufreq.c.

This mutex would be taken outside of the rwlock write lock each time
this lock is taken in cpufreq.c.

This mutex would also be taken from the ondemand and conservator module
sysfs operations.

We remove the dbs_mutexes, given they would now be replaced by this
new cpufreq.c mutex.

Note that the GOV_STOP call should be done while this new mutex is held,
but the rwlock is _not_ held.

I did not implement it because cpufreq.c:cpufreq_add_dev() first needs a
big cleanup for the error handling paths. They are currently completely
bogus and I don't want to add a lock into code that is not currently
correct.

If you find time to do this cleanup and lock implementation, I'll be
glad to review it and provide advice.

Thanks,

Mathieu


> Thanks,
> Venki
>
> [10412.466195] ------------[ cut here ]------------
> [10412.466201] WARNING:
> at /home/venkip/src/linus/linux-2.6/fs/sysfs/group.c:138
> sysfs_remove_group+0x3e/0xa3()
> [10412.466204] Hardware name: Santa Rosa platform
> [10412.466206] sysfs group c16df3b0 not found for kobject 'cpufreq'
> [10412.466207] Modules linked in:
> [10412.466210] Pid: 20609, comm: write_syscpufre Not tainted 2.6.31-rc1
> #195
> [10412.466212] Call Trace:
> [10412.466217] [<c102a0a4>] warn_slowpath_common+0x60/0x90
> [10412.466220] [<c102a108>] warn_slowpath_fmt+0x24/0x27
> [10412.466223] [<c10e0422>] sysfs_remove_group+0x3e/0xa3
> [10412.466227] [<c131b7fc>] cpufreq_governor_dbs+0x1f7/0x25b
> [10412.466231] [<c1319469>] __cpufreq_governor+0x7c/0xb3
> [10412.466234] [<c1319608>] __cpufreq_set_policy+0x13f/0x1c3
> [10412.466238] [<c1319e74>] store_scaling_governor+0x18a/0x1b2
> [10412.466241] [<c131aa50>] ? handle_update+0x0/0x28
> [10412.466244] [<c131a2a5>] ? lock_policy_rwsem_write+0x33/0x5b
> [10412.466247] [<c1319cea>] ? store_scaling_governor+0x0/0x1b2
> [10412.466250] [<c131a942>] store+0x48/0x61
> [10412.466254] [<c10de532>] sysfs_write_file+0xb4/0xdf
> [10412.466265] [<c10de47e>] ? sysfs_write_file+0x0/0xdf
> [10412.466269] [<c10a0172>] vfs_write+0x84/0xdf
> [10412.466272] [<c10a0266>] sys_write+0x3b/0x60
> [10412.466276] [<c1002a04>] sysenter_do_call+0x12/0x22
> [10412.466278] ---[ end trace 31a730d96cbc1841 ]---
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-06-29 23:15:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Monday 29 June 2009, Larry Finger wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
> > Subject : Page allocation failures with b43 and p54usb
> > Submitter : Larry Finger <[email protected]>
> > Date : 2009-04-29 21:01 (61 days old)
> > References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
> > http://lkml.org/lkml/2009/6/7/136
> > Handled-By : Johannes Berg <[email protected]>
>
> The cause of these failures has been determined. The wireless
> subsystem frequently requests buffers of size 4096, but when SLUB
> debugging is enabled and the debug info is added, the request becomes
> of order 1 and memory becomes fragmented.
>
> A controversial "fix" in which SLUB debugging was disabled for
> allocations where adding such debugging info would increase the order
> was discussed and tried. With a quick look at the commit list for
> Linus's tree, I don't see that such a patch is available, but I will
> be corrected if I missed it.

Thanks for the update.

Hmm, isn't it suboptimal to use a slab allocator for allocations taking up an
entire page? That's the case on some architectures and seems to be the root
cause of the issue at hand.

Best,
Rafael

2009-06-29 23:20:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13512] D43 on 2.6.30 doesn't suspend anymore

On Monday 29 June 2009, Daniel Smolik wrote:
> Rafael J. affected napsal(a):
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13512
> > Subject : D43 on 2.6.30 doesn't suspend anymore
> > Submitter : Daniel Smolik <[email protected]>
> > Date : 2009-06-11 20:12 (18 days old)
> >
> >
> >
> Yes problem still exists. I now bitsecting and I am near to find
> affected patch.

Thanks for the update.

Best,
Rafael

2009-06-29 23:23:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13554] linux-image-2.6.30-1-686, KMS enabled: black screen, no X window

On Monday 29 June 2009, Jos van Wolput wrote:
>
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13554
> > Subject : linux-image-2.6.30-1-686, KMS enabled: black screen, no X window
> > Submitter : Jos van Wolput <[email protected]>
> > Date : 2009-06-17 06:28 (12 days old)
> >
> >
> >
> >
> Yes, it still should be listed, KMS doesn't work, at least on my system.

Thanks for the update, but I'm afraid we won't have enough information to
debug this issue.

Best,
Rafael

2009-06-29 23:48:16

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Mon, 29 Jun 2009, Larry Finger wrote:

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
> > Subject : Page allocation failures with b43 and p54usb
> > Submitter : Larry Finger <[email protected]>
> > Date : 2009-04-29 21:01 (61 days old)
> > References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
> > http://lkml.org/lkml/2009/6/7/136
> > Handled-By : Johannes Berg <[email protected]>
>
> The cause of these failures has been determined. The wireless
> subsystem frequently requests buffers of size 4096, but when SLUB
> debugging is enabled and the debug info is added, the request becomes
> of order 1 and memory becomes fragmented.
>
> A controversial "fix" in which SLUB debugging was disabled for
> allocations where adding such debugging info would increase the order
> was discussed and tried. With a quick look at the commit list for
> Linus's tree, I don't see that such a patch is available, but I will
> be corrected if I missed it.
>

I'd disagree with disabling slub debugging by default for caches where
oo_order(s->min) increases as the result of using it. This particular
page allocation failure is happening for, presumably, kmalloc-4096, and
the system has 4K pages. Disabling debugging for that cache (and any of
its aliases) implicitly will lead to errors going undiagnosed as a result.

2009-06-30 00:03:23

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13648] nfsd: page allocation failure

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13648
> Subject : nfsd: page allocation failure
> Submitter : Justin Piszcz <[email protected]>
> Date : 2009-06-22 12:08 (7 days old)
> References : http://lkml.org/lkml/2009/6/22/309
>

I'd be interested to hear from Justin if reducing
/proc/sys/vm/dirty_background_ratio as I earlier suggested helps.

ZONE_NORMAL isn't much larger than ZONE_DMA32 on this machine and both
lowmem zones have an abundance of free memory which suggests pdflush's
ratio isn't being met to commence background writeout while at the same
time ZONE_NORMAL is being depleted as the result of constant nfs
GFP_ATOMIC allocations that cannot try direct reclaim.

2009-06-30 02:07:08

by Larry Finger

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

David Rientjes wrote:
> On Mon, 29 Jun 2009, Larry Finger wrote:
>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13319
>>> Subject : Page allocation failures with b43 and p54usb
>>> Submitter : Larry Finger <[email protected]>
>>> Date : 2009-04-29 21:01 (61 days old)
>>> References : http://marc.info/?l=linux-kernel&m=124103897101088&w=4
>>> http://lkml.org/lkml/2009/6/7/136
>>> Handled-By : Johannes Berg <[email protected]>
>> The cause of these failures has been determined. The wireless
>> subsystem frequently requests buffers of size 4096, but when SLUB
>> debugging is enabled and the debug info is added, the request becomes
>> of order 1 and memory becomes fragmented.
>>
>> A controversial "fix" in which SLUB debugging was disabled for
>> allocations where adding such debugging info would increase the order
>> was discussed and tried. With a quick look at the commit list for
>> Linus's tree, I don't see that such a patch is available, but I will
>> be corrected if I missed it.
>>
>
> I'd disagree with disabling slub debugging by default for caches where
> oo_order(s->min) increases as the result of using it. This particular
> page allocation failure is happening for, presumably, kmalloc-4096, and
> the system has 4K pages. Disabling debugging for that cache (and any of
> its aliases) implicitly will lead to errors going undiagnosed as a result.

If the current behavior is not changed, I will be forced to disable
SLUB debugging, which will explicitly lead to errors that are
undiagnosed. It seems better to me to debug when you can, but turn off
debugging in cases like this.

Larry

2009-06-30 00:40:48

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: [Bug #13644] hibernation/swsusp lockup due to acpi-cpufreq

On Mon, Jun 29, 2009 at 02:31:01AM +0200, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13644
> Subject : hibernation/swsusp lockup due to acpi-cpufreq
> Submitter : Johannes Stezenbach <[email protected]>
> Date : 2009-06-16 01:27 (13 days old)
> References : http://lkml.org/lkml/2009/6/15/630
> Handled-By : Rafael J. Wysocki <[email protected]>

I tested v2.6.31-rc1-228-g2bfdd79 and the bug is still there.
It actually got worse, the local_irq_save/restore workaround
in kernel/up-c (http://lkml.org/lkml/2009/6/16/333) doesn't fix it
anymore, it hangs at suspend before writing out the image.

With the up.c workaround (including a
WARN_ON_ONCE(irqs_disabled() && !oops_in_progress);)
applied and no_console_suspend I captured the attached
output using a crappy webcam. (Without the workaround
there is a huge spew of warnings about irqs enabled
unexpectedly.) I guess the interesting part is

pm_op(): pci_pm_thaw returns -16
PM: Device 0000:00:00.0 failed to thaw: error -16

(PCI info is in http://lkml.org/lkml/2009/6/15/630)


Johannes


Attachments:
(No filename) (1.06 kB)
suspend-crash.jpg (30.35 kB)
Download all attachments

2009-06-30 05:47:20

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Mon, 29 Jun 2009, Larry Finger wrote:

> > I'd disagree with disabling slub debugging by default for caches where
> > oo_order(s->min) increases as the result of using it. This particular
> > page allocation failure is happening for, presumably, kmalloc-4096, and
> > the system has 4K pages. Disabling debugging for that cache (and any of
> > its aliases) implicitly will lead to errors going undiagnosed as a result.
>
> If the current behavior is not changed, I will be forced to disable
> SLUB debugging, which will explicitly lead to errors that are
> undiagnosed.

You're buying debugging support at the cost of increased memory
consumption when you enable CONFIG_SLUB_DEBUG_ON and that's causing the
page allocation failures because of fragmentation. To reduce the minimum
order required for caches such as kmalloc-4096, you'd have to disable
debugging for that particular cache. It's my opinion that such a
configuration should not be the default, however.

You could argue adding `slub_debug=-,kmalloc-4096' support from the
command line, but CONFIG_SLUB_DEBUG_ON should not change its well-defined
purpose of enabling debugging on all slab caches. Otherwise the rest of
us would be forced to add `slub_debug=,kmalloc-4096' for consistent
behavior with older kernels.

2009-06-30 06:55:30

by Pekka Enberg

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Hi David,

On Tue, Jun 30, 2009 at 2:47 AM, David Rientjes<[email protected]> wrote:
> On Mon, 29 Jun 2009, Larry Finger wrote:
>
>> > Bug-Entry ? : http://bugzilla.kernel.org/show_bug.cgi?id=13319
>> > Subject ? ? ? ? ? ? : Page allocation failures with b43 and p54usb
>> > Submitter ? : Larry Finger <[email protected]>
>> > Date ? ? ? ? ? ? ? ?: 2009-04-29 21:01 (61 days old)
>> > References ?: http://marc.info/?l=linux-kernel&m=124103897101088&w=4
>> > ? ? ? ? ? ? ? http://lkml.org/lkml/2009/6/7/136
>> > Handled-By ?: Johannes Berg <[email protected]>
>>
>> The cause of these failures has been determined. The wireless
>> subsystem frequently requests buffers of size 4096, but when SLUB
>> debugging is enabled and the debug info is added, the request becomes
>> of order 1 and memory becomes fragmented.
>>
>> A controversial "fix" in which SLUB debugging was disabled for
>> allocations where adding such debugging info would increase the order
>> was discussed and tried. With a quick look at the commit list for
>> Linus's tree, I don't see that such a patch is available, but I will
>> be corrected if I missed it.
>>
>
> I'd disagree with disabling slub debugging by default for caches where
> oo_order(s->min) increases as the result of using it. ?This particular
> page allocation failure is happening for, presumably, kmalloc-4096, and
> the system has 4K pages. ?Disabling debugging for that cache (and any of
> its aliases) implicitly will lead to errors going undiagnosed as a result.

Well, I obviously don't agree here because kmalloc-4096 debugging
causes problems in the real world. Furthermore, SLUB never supported
debugging for objects that big historically because of page allocator
passthrough. And with Mel Gorman's page allocator optimizations, we
might be going back to that.

So we should fix SLUB debugging as outlined by Mel Gorman and
Christoph Lameter. I simply haven't had the time to do it. Patches are
welcome!

Pekka

2009-06-30 07:47:41

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Pekka Enberg wrote:

> > I'd disagree with disabling slub debugging by default for caches where
> > oo_order(s->min) increases as the result of using it. ?This particular
> > page allocation failure is happening for, presumably, kmalloc-4096, and
> > the system has 4K pages. ?Disabling debugging for that cache (and any of
> > its aliases) implicitly will lead to errors going undiagnosed as a result.
>
> Well, I obviously don't agree here because kmalloc-4096 debugging
> causes problems in the real world.

I don't think CONFIG_SLUB_DEBUG_ON is generally the configuration used in
the real world.

The option has a clear and well-defined purpose and that is to enable
debugging on all slab caches. If you modify its definition, users will
generally ignore the warning about debugging being disabled when "the
minimum possible order at which slab may be allocated is higher than
without." And unless they check the kernel log for such a warning to boot
with `slab_debug=,kmalloc-4096', we lose testing coverage because we
cannot enable redzoning or tracing after boot.

> Furthermore, SLUB never supported
> debugging for objects that big historically because of page allocator
> passthrough. And with Mel Gorman's page allocator optimizations, we
> might be going back to that.
>

Even when page allocation is fast enough, it would still be helpful to
configure slub to not do passthrough purely for the lightweight debugging
opportunities.

> So we should fix SLUB debugging as outlined by Mel Gorman and
> Christoph Lameter. I simply haven't had the time to do it. Patches are
> welcome!
>

You're referring to `slub_debug=A'? I think CONFIG_SLUB_DEBUG_ON should
continue to enable debugging on all slab caches and in instances where it
causes page allocation failures such in Larry's case because
oo_order(s->min) with debugging on is greater than oo_order(s->min) with
debugging off, you can emit a friendly warning in your recently added
slab_out_of_memory() about using `slab_debug=-,<cache>'.

We have a disagreement about which is the default behavior, but I would
opt on the side of adding exemptions to a debug configuration option as
opposed to requiring additional command line parameters to be fully
enabled.

2009-06-30 08:05:18

by Justin Piszcz

[permalink] [raw]
Subject: Re: [Bug #13648] nfsd: page allocation failure



On Mon, 29 Jun 2009, David Rientjes wrote:

> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
>
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.29 and 2.6.30.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.29 and 2.6.30. Please verify if it still should
>> be listed and let me know (either way).
>>
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13648
>> Subject : nfsd: page allocation failure
>> Submitter : Justin Piszcz <[email protected]>
>> Date : 2009-06-22 12:08 (7 days old)
>> References : http://lkml.org/lkml/2009/6/22/309
>>
>
> I'd be interested to hear from Justin if reducing
> /proc/sys/vm/dirty_background_ratio as I earlier suggested helps.
>
> ZONE_NORMAL isn't much larger than ZONE_DMA32 on this machine and both
> lowmem zones have an abundance of free memory which suggests pdflush's
> ratio isn't being met to commence background writeout while at the same
> time ZONE_NORMAL is being depleted as the result of constant nfs
> GFP_ATOMIC allocations that cannot try direct reclaim.
>

Hello,

http://patchwork.kernel.org/patch/30960/

"It's funny, though, that the problem that originally started this thread
was quickly diagnosed because of these messages. As far as I know, my
suggestion to increase /proc/sys/vm/dirty_background_ratio to kick pdflush
earlier has prevented the slab allocation failures and not required
delayed acks for nfsd."

--

The current value is 10, what value do you suggest I try?

$ cat /proc/sys/vm/dirty_background_ratio
10

Justin.

2009-06-30 08:25:11

by Pekka Enberg

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Hi David,

On Tue, 30 Jun 2009, Pekka Enberg wrote:
>> > I'd disagree with disabling slub debugging by default for caches where
>> > oo_order(s->min) increases as the result of using it. ?This particular
>> > page allocation failure is happening for, presumably, kmalloc-4096, and
>> > the system has 4K pages. ?Disabling debugging for that cache (and any of
>> > its aliases) implicitly will lead to errors going undiagnosed as a result.
>>
>> Well, I obviously don't agree here because kmalloc-4096 debugging
>> causes problems in the real world.

On Tue, Jun 30, 2009 at 10:47 AM, David Rientjes<[email protected]> wrote:
> I don't think CONFIG_SLUB_DEBUG_ON is generally the configuration used in
> the real world.

It is, hence the epic bug report that's eaten too many man hours
already! Look, we encourage _testers_ to turn all as much as debugging
options as possible so we catch bugs early. That why the only sane
defaults are the ones that don't cause other problems!

I don't know why you want to argue this. It's simply not an option to
say "stupid user, fix your config" in core code like the slab
allocator. Enabling CONFIG_SLUB_DEBUG_ON is a very reasonable thing to
do when you are a tester looking for bugs.

On Tue, 30 Jun 2009, Pekka Enberg wrote:
>> So we should fix SLUB debugging as outlined by Mel Gorman and
>> Christoph Lameter. I simply haven't had the time to do it. Patches are
>> welcome!

On Tue, Jun 30, 2009 at 10:47 AM, David Rientjes<[email protected]> wrote:
> You're referring to `slub_debug=A'? ?I think CONFIG_SLUB_DEBUG_ON should
> continue to enable debugging on all slab caches and in instances where it
> causes page allocation failures such in Larry's case because
> oo_order(s->min) with debugging on is greater than oo_order(s->min) with
> debugging off, you can emit a friendly warning in your recently added
> slab_out_of_memory() about using `slab_debug=-,<cache>'.
>
> We have a disagreement about which is the default behavior, but I would
> opt on the side of adding exemptions to a debug configuration option as
> opposed to requiring additional command line parameters to be fully
> enabled.

Yup, I was referring to slub_debug=A and no, I don't agree with you
that it should be on by default. Only people who know what they're
doing should enable the option and a random tester by definition
doesn't (no offence to Mr. Random Tester).

Pekka

2009-06-30 08:48:24

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13648] nfsd: page allocation failure

On Tue, 30 Jun 2009, Justin Piszcz wrote:

> The current value is 10, what value do you suggest I try?
>
> $ cat /proc/sys/vm/dirty_background_ratio
> 10
>

Looking at your initial bug report, it doesn't look like a background
writeout issue:

[415964.022375] Active_anon:154810 active_file:131162 inactive_anon:33447
[415964.022375] inactive_file:690987 unevictable:0 dirty:112116 writeback:0 unstable:0
[415964.022375] free:8662 slab:965366 mapped:9316 pagetables:4618 bounce:0
[415964.022375] DMA free:9692kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB present:8668kB pages_scanned:0 all_unreclaimable? yes
[415964.022375] lowmem_reserve[]: 0 3246 7980 7980
[415964.022375] DMA32 free:21312kB min:6656kB low:8320kB high:9984kB active_anon:118464kB inactive_anon:23908kB active_file:174708kB inactive_file:1206812kB unevictable:0kB present:3324312kB pages_scanned:0 all_unreclaimable? no
[415964.022375] lowmem_reserve[]: 0 0 4734 4734
[415964.022375] Normal free:3644kB min:9708kB low:12132kB high:14560kB active_anon:500776kB inactive_anon:109880kB active_file:349940kB inactive_file:1557136kB unevictable:0kB present:4848000kB pages_scanned:0 all_unreclaimable? no
[415964.022375] lowmem_reserve[]: 0 0 0 0
...
[415964.022375] 2277376 pages RAM

Ignore the all_unreclaimable information, this is a GFP_ATOMIC allocation
so we can't reclaim.

You have an 8G machine and only 437K is dirty (which is why pdflush hasn't
kicked in yet). You do have over 3.5G of slab allocated, however.

This appears related to http://bugzilla.kernel.org/show_bug.cgi?id=13518,
but that could be confirmed with slabtop.

2009-06-30 12:48:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #13644] hibernation/swsusp lockup due to acpi-cpufreq

On Tuesday 30 June 2009, Johannes Stezenbach wrote:
> On Mon, Jun 29, 2009 at 02:31:01AM +0200, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13644
> > Subject : hibernation/swsusp lockup due to acpi-cpufreq
> > Submitter : Johannes Stezenbach <[email protected]>
> > Date : 2009-06-16 01:27 (13 days old)
> > References : http://lkml.org/lkml/2009/6/15/630
> > Handled-By : Rafael J. Wysocki <[email protected]>
>
> I tested v2.6.31-rc1-228-g2bfdd79 and the bug is still there.
> It actually got worse, the local_irq_save/restore workaround
> in kernel/up-c (http://lkml.org/lkml/2009/6/16/333) doesn't fix it
> anymore, it hangs at suspend before writing out the image.
>
> With the up.c workaround (including a
> WARN_ON_ONCE(irqs_disabled() && !oops_in_progress);)
> applied and no_console_suspend I captured the attached
> output using a crappy webcam. (Without the workaround
> there is a huge spew of warnings about irqs enabled
> unexpectedly.) I guess the interesting part is
>
> pm_op(): pci_pm_thaw returns -16
> PM: Device 0000:00:00.0 failed to thaw: error -16

Hmm, it looks like we fail to thaw the host bridge.

> (PCI info is in http://lkml.org/lkml/2009/6/15/630)

Well, thanks for the update. I'll do my best to fix the cpufreq suspend
before 2.6.31 final.

Best,
Rafael

2009-06-30 14:34:26

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Pekka Enberg wrote:

> Well, I obviously don't agree here because kmalloc-4096 debugging causes
> problems in the real world. Furthermore, SLUB never supported debugging
> for objects that big historically because of page allocator passthrough.
> And with Mel Gorman's page allocator optimizations, we might be going
> back to that.

SLUB for some period of time had passthrough. It did not start out like
that though.

kmalloc-4096 causes problems in the long run and so do other caches that
are of similar size. But it allows debugging to occur. Silently switching
it off is something I am not comfortable with.

2009-06-30 14:38:34

by Larry Finger

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Pekka Enberg wrote:
>
> Yup, I was referring to slub_debug=A and no, I don't agree with you
> that it should be on by default. Only people who know what they're
> doing should enable the option and a random tester by definition
> doesn't (no offence to Mr. Random Tester).

None taken.

For me, the next step is clear. As I'm much more interested in finding
bugs in the wireless system than in the mechanics of SLUB allocation,
I need to disable CONFIG_SLUB_DEBUG_ON. BTW, I use SLAB on Linus's
mainline tree and SLUB on the wireless testing tree. I build and boot
the mainline kernels mostly to look for quick failures/regressions,
but run the w-t kernels looking for longer-term effects such as memory
fragmentation or slow memory leaks.

For Rafael's benefit, we do need to decide if this is a bug or merely
an unintended side effect. My sense is the latter and Bug #13319
should have a summary of this discussion added to the record, and then
the bug should be closed.

Larry

2009-06-30 15:01:42

by Pekka Enberg

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 2009-06-30 at 10:32 -0400, Christoph Lameter wrote:
> kmalloc-4096 causes problems in the long run and so do other caches that
> are of similar size. But it allows debugging to occur. Silently switching
> it off is something I am not comfortable with.

I suggested adding a

printk(KERN_INFO ": debugging disabled for %s. Use slub_debug=a to "
"enable it blah blah blah\n");

Does that work for you?

Pekka

2009-06-30 15:16:26

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Pekka Enberg wrote:

> printk(KERN_INFO ": debugging disabled for %s. Use slub_debug=a to "
> "enable it blah blah blah\n");
>
> Does that work for you?

Its definitely better.

Subject: Re: [Bug #13362] rt2x00: slow wifi with correct basic rate bitmap

El Mon, 29 Jun 2009 02:30:55 +0200 (CEST)
"Rafael J. Wysocki" <[email protected]> escribió:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. Please verify if it still should
> be listed and let me know (either way).

There is no 2.6.30.1 to see if it has been fixed and i have not tested
2.6.31-rc1 (too early for me) so i think it should be still listed

>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13362
> Subject : rt2x00: slow wifi with correct basic rate bitmap
> Submitter : Alejandro Riveira <[email protected]>
> Date : 2009-05-22 13:32 (38 days old)
>
>

2009-06-30 20:04:49

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Christoph Lameter wrote:

> > printk(KERN_INFO ": debugging disabled for %s. Use slub_debug=a to "
> > "enable it blah blah blah\n");
> >
> > Does that work for you?
>
> Its definitely better.
>

I don't see how that's different from enabling debugging on all caches
like CONFIG_SLAB_DEBUG_ON currently does and then warning at the time of
slab allocation failure that it may be the result of the debugging
metadata so the user can subsequently prevent it. In other words, if we
use MAX_DEBUG_SIZE as Pekka originally implemented as
(3 * sizeof(void *) + 2 * sizeof(struct track)), do this:

diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -142,6 +142,11 @@
SLAB_POISON | SLAB_STORE_USER)

/*
+ * The maximum amount of metadata added to a slab when debugging is enabled.
+ */
+#define MAX_DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
+
+/*
* Set of flags that will prevent slab merging
*/
#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
@@ -1561,6 +1566,21 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
"default order: %d, min order: %d\n", s->name, s->objsize,
s->size, oo_order(s->oo), oo_order(s->min));

+ if (s->flags & (SLAB_POISON | SLAB_RED_ZONE | SLAB_STORE_USER)) {
+ int min_order;
+
+ /*
+ * Debugging is enabled, which may increase oo_order(s->min), so
+ * warn the user that allocation failures may be avoided if
+ * debugging is enabled for this cache.
+ */
+ min_order = get_order(s->size - MAX_DEBUG_SIZE);
+ if (min_order < oo_order(s->min))
+ printk(KERN_WARNING " %s debugging increased min order "
+ "from %d to %d, use slab_debug=-,%s to disable.",
+ s->name, min_order, oo_order(s->min), s->name);
+ }
+
for_each_online_node(node) {
struct kmem_cache_node *n = get_node(s, node);
unsigned long nr_slabs;

2009-06-30 20:26:18

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Pekka Enberg wrote:

> On Tue, Jun 30, 2009 at 10:47 AM, David Rientjes<[email protected]> wrote:
> > I don't think CONFIG_SLUB_DEBUG_ON is generally the configuration used in
> > the real world.
>
> It is, hence the epic bug report that's eaten too many man hours
> already! Look, we encourage _testers_ to turn all as much as debugging
> options as possible so we catch bugs early. That why the only sane
> defaults are the ones that don't cause other problems!
>

I feel that asking a user to add a command line parameter such as
`slub_debug=A' in addition to CONFIG_SLUB_DEBUG_ON will likely lead to
less testing coverage and bugs going unreported. CONFIG_SLUB_DEBUG_ON is
not something that a distro is going to enable or would be used in a
production environment, it's something that's used to debug slub and/or
slab allocations either during the development of new kernel code or when
an underlying problem is realized.

> I don't know why you want to argue this. It's simply not an option to
> say "stupid user, fix your config" in core code like the slab
> allocator. Enabling CONFIG_SLUB_DEBUG_ON is a very reasonable thing to
> do when you are a tester looking for bugs.
>

Quite the contrary, I agree completely with the above, and that's why I'm
arguing for full debugging to be enabled when a well-defined configuration
option is enabled. I simply don't believe that such debugging should be
coupled with a command line option to be fully activated for all caches.

2009-06-30 21:06:29

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, David Rientjes wrote:

> I don't see how that's different from enabling debugging on all caches
> like CONFIG_SLAB_DEBUG_ON currently does and then warning at the time of
> slab allocation failure that it may be the result of the debugging
> metadata so the user can subsequently prevent it. In other words, if we
> use MAX_DEBUG_SIZE as Pekka originally implemented as
> (3 * sizeof(void *) + 2 * sizeof(struct track)), do this:

I like it.

> diff --git a/mm/slub.c b/mm/slub.c
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -142,6 +142,11 @@
> SLAB_POISON | SLAB_STORE_USER)
>
> /*
> + * The maximum amount of metadata added to a slab when debugging is enabled.
> + */
> +#define MAX_DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
> +
> +/*
> * Set of flags that will prevent slab merging
> */
> #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> @@ -1561,6 +1566,21 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
> "default order: %d, min order: %d\n", s->name, s->objsize,
> s->size, oo_order(s->oo), oo_order(s->min));
>
> + if (s->flags & (SLAB_POISON | SLAB_RED_ZONE | SLAB_STORE_USER)) {
> + int min_order;
> +
> + /*
> + * Debugging is enabled, which may increase oo_order(s->min), so
> + * warn the user that allocation failures may be avoided if
> + * debugging is enabled for this cache.
> + */
> + min_order = get_order(s->size - MAX_DEBUG_SIZE);
> + if (min_order < oo_order(s->min))
> + printk(KERN_WARNING " %s debugging increased min order "
> + "from %d to %d, use slab_debug=-,%s to disable.",
> + s->name, min_order, oo_order(s->min), s->name);

It may be easier to check the order of the initial size vs. the order of
the size with all metadata

if (get_order(s->size) > get_order(s->objsize)

2009-06-30 21:15:41

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Christoph Lameter wrote:

> > diff --git a/mm/slub.c b/mm/slub.c
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -142,6 +142,11 @@
> > SLAB_POISON | SLAB_STORE_USER)
> >
> > /*
> > + * The maximum amount of metadata added to a slab when debugging is enabled.
> > + */
> > +#define MAX_DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
> > +
> > +/*
> > * Set of flags that will prevent slab merging
> > */
> > #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> > @@ -1561,6 +1566,21 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
> > "default order: %d, min order: %d\n", s->name, s->objsize,
> > s->size, oo_order(s->oo), oo_order(s->min));
> >
> > + if (s->flags & (SLAB_POISON | SLAB_RED_ZONE | SLAB_STORE_USER)) {
> > + int min_order;
> > +
> > + /*
> > + * Debugging is enabled, which may increase oo_order(s->min), so
> > + * warn the user that allocation failures may be avoided if
> > + * debugging is enabled for this cache.
> > + */
> > + min_order = get_order(s->size - MAX_DEBUG_SIZE);
> > + if (min_order < oo_order(s->min))
> > + printk(KERN_WARNING " %s debugging increased min order "
> > + "from %d to %d, use slab_debug=-,%s to disable.",
> > + s->name, min_order, oo_order(s->min), s->name);
>
> It may be easier to check the order of the initial size vs. the order of
> the size with all metadata
>
> if (get_order(s->size) > get_order(s->objsize)
>

Ah, right. Then we could simply eliminate the check on s->flags to begin
with.

This patch is supposing that `slab_debug=-,<cache>' actually disables all
debugging for <cache> which would need to be implemented first, but I
think this is a better alternative than requiring slab_debug=A for full
debugging after enabling CONFIG_SLUB_DEBUG_ON.

2009-06-30 21:24:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, David Rientjes wrote:

> This patch is supposing that `slab_debug=-,<cache>' actually disables all
> debugging for <cache> which would need to be implemented first, but I
> think this is a better alternative than requiring slab_debug=A for full
> debugging after enabling CONFIG_SLUB_DEBUG_ON.

We could add an option that disables debugging for troublesome page
size slabs


slab_debug=p

or so

2009-06-30 21:52:41

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, Christoph Lameter wrote:

> We could add an option that disables debugging for troublesome page
> size slabs
>
>
> slab_debug=p
>
> or so
>

I definitely like that more than slab_debug=A, where we're requiring an
added parameter for full debugging to be activated.

I'm curious whether there would ever be any use for disabling debugging on
specific caches for reasons other than higher minimum orders for metadata,
though, given that we already support things like slub_debug=FZ,cache,
which should only enable free debugging and redzoning even with
CONFIG_SLUB_DEBUG_ON enabled for cache.

I think the solution to this is really based on good software engineering
and test practices, though, so hopefully there'll be a consensus on which
direction to take before any time is spent in implementing and pushing it.

2009-06-30 22:18:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Tue, 30 Jun 2009, David Rientjes wrote:

> I'm curious whether there would ever be any use for disabling debugging on
> specific caches for reasons other than higher minimum orders for metadata,
> though, given that we already support things like slub_debug=FZ,cache,
> which should only enable free debugging and redzoning even with
> CONFIG_SLUB_DEBUG_ON enabled for cache.

One of the reasons for disabling debugging is to speed up the kernel. Race
conditions may vanish due to the additional latency added by the debugging
code. Ideally you know which slab cache has the race and you only would
enable it on that one.

2009-07-01 05:59:19

by Pekka Enberg

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Hi David,

On Wed, Jul 1, 2009 at 12:52 AM, David Rientjes<[email protected]> wrote:
> I think the solution to this is really based on good software engineering
> and test practices, though, so hopefully there'll be a consensus on which
> direction to take before any time is spent in implementing and pushing it.

Lets go with the slab_out_of_memory() patch you outlined in a previous
post and implement the slub_debug=p thing Christoph suggested. I think
it's the best compromise at this point. When you guys finally see the
light, we can always change it to a reasonable default. ;)

So can you send a patch, please?

Pekka

2009-07-01 14:31:31

by Jeff Chua

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:

> I just tried, and it "seems" to work. Will try a few more cycles.

STD/STR survived quite a few cycles now. Patch seems to be doing the
right thing.

On Mon, Jun 29, 2009 at 11:51 PM, Etienne
Basset<[email protected]> wrote:

> To have STR/resume work with current git, I have to :

> 1) apply Bart's patch

This is not yet in Linus's tree. And much needed to really fix the problem.

> 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b

This is already in Linus's tree.


Thanks,
Jeff.

2009-07-01 14:48:09

by wu zhangjin

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
>
> > I just tried, and it "seems" to work. Will try a few more cycles.
>
> STD/STR survived quite a few cycles now. Patch seems to be doing the
> right thing.
>
> On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> Basset<[email protected]> wrote:
>
> > To have STR/resume work with current git, I have to :
>
> > 1) apply Bart's patch
>
> This is not yet in Linus's tree. And much needed to really fix the problem.
>
> > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
>

Yes, This commit must be reverted, otherwise, STD/Hibernation will not
work either. I have tested it on two different loongson-based machines:
fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)

Here is what i have traced:

hibernate(kernel/power/hibernate.c)
--> hibernation_snapshot
--> dpm_resume_end
--> dpm_resume
--> device_resume
--> dev->bus->resume(generic_ide_resume), dev_name(dev) = 0.0
--> blk_execute_rq
{
DECLARE_COMPLETION_ONSTACK(wait);
...
wait_for_completion(&wait); // stop here
...
}

and I have tried to revert this part of the above patch:

-
- WARN_ON_ONCE(hwif->rq);
repeat:
prev_port = hwif->host->cur_port;
+
+ if (drive->dev_flags & IDE_DFLAG_BLOCKED)
+ rq = hwif->rq;
+ else
+ WARN_ON_ONCE(hwif->rq);
+

it works! need more time to test!

thanks!
Wu Zhangjin

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> >
> > > I just tried, and it "seems" to work. Will try a few more cycles.
> >
> > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > right thing.
> >
> > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > Basset<[email protected]> wrote:
> >
> > > To have STR/resume work with current git, I have to :
> >
> > > 1) apply Bart's patch
> >
> > This is not yet in Linus's tree. And much needed to really fix the problem.
> >
> > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> >
>
> Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> work either. I have tested it on two different loongson-based machines:
> fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)

Since it seems like Dave is taking his sweet time with doing the revert
I stared at the code a bit more and I think that I finally found the bug
(thanks to your debugging work for giving me the right hint!).

The patch needs to take into the account a new code introduced by the recent
block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):

@@ -555,8 +560,11 @@ repeat:
startstop = start_request(drive, rq);
spin_lock_irq(&hwif->lock);

- if (startstop == ide_stopped)
+ if (startstop == ide_stopped) {
+ rq = hwif->rq;
+ hwif->rq = NULL;
goto repeat;
+ }
} else
goto plug_device;
out:

and not zero hwif->rq if the device is blocked.

Could you try the attached patch and see if it fixes the issue?

[ Dave: while I appreciate fast handling of my patches I had strongly
suggested giving this particular one some extra testing (because there
were a lot of changes in between the time that it has been tested
against other kernel subsystems). Yet, it seems that its linux-next
exposure was minimal at best.. :( ]

---
drivers/ide/ide-io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: b/drivers/ide/ide-io.c
===================================================================
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -532,7 +532,8 @@ repeat:

if (startstop == ide_stopped) {
rq = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
+ hwif->rq = NULL;
goto repeat;
}
} else

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > >
> > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > >
> > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > right thing.
> > >
> > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > Basset<[email protected]> wrote:
> > >
> > > > To have STR/resume work with current git, I have to :
> > >
> > > > 1) apply Bart's patch
> > >
> > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > >
> > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > >
> >
> > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > work either. I have tested it on two different loongson-based machines:
> > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
>
> Since it seems like Dave is taking his sweet time with doing the revert
> I stared at the code a bit more and I think that I finally found the bug
> (thanks to your debugging work for giving me the right hint!).
>
> The patch needs to take into the account a new code introduced by the recent
> block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
>
> @@ -555,8 +560,11 @@ repeat:
> startstop = start_request(drive, rq);
> spin_lock_irq(&hwif->lock);
>
> - if (startstop == ide_stopped)
> + if (startstop == ide_stopped) {
> + rq = hwif->rq;
> + hwif->rq = NULL;
> goto repeat;
> + }
> } else
> goto plug_device;
> out:
>
> and not zero hwif->rq if the device is blocked.
>
> Could you try the attached patch and see if it fixes the issue?

Here is the more complete version, also taking into the account changes
in ide_intr() and ide_timer_expiry():

---
drivers/ide/ide-io.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

Index: b/drivers/ide/ide-io.c
===================================================================
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -532,7 +532,8 @@ repeat:

if (startstop == ide_stopped) {
rq = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
+ hwif->rq = NULL;
goto repeat;
}
} else
@@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
spin_lock_irq(&hwif->lock);
enable_irq(hwif->irq);
if (startstop == ide_stopped && hwif->polling == 0) {
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}
@@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
*/
if (startstop == ide_stopped && hwif->polling == 0) {
BUG_ON(hwif->handler);
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}

2009-07-01 17:28:31

by Jeff Chua

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Thu, Jul 2, 2009 at 12:29 AM, BartlomiejZolnierkiewicz<[email protected]> wrote:> Here is the more complete version, also taking into the account changes> in ide_intr() and ide_timer_expiry():
This works great for. Survived STR, STD. I just applied on top vanillalatest Linus's git pull. Nothing else to revert.
Thanks,Jeff.

> ---> ?drivers/ide/ide-io.c | ? 15 ++++++++++-----> ?1 file changed, 10 insertions(+), 5 deletions(-)>> Index: b/drivers/ide/ide-io.c> ===================================================================> --- a/drivers/ide/ide-io.c> +++ b/drivers/ide/ide-io.c> @@ -532,7 +532,8 @@ repeat:>> ? ? ? ? ? ? ? ?if (startstop == ide_stopped) {> ? ? ? ? ? ? ? ? ? ? ? ?rq = hwif->rq;> - ? ? ? ? ? ? ? ? ? ? ? hwif->rq = NULL;> + ? ? ? ? ? ? ? ? ? ? ? if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hwif->rq = NULL;> ? ? ? ? ? ? ? ? ? ? ? ?goto repeat;> ? ? ? ? ? ? ? ?}> ? ? ? ?} else> @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat> ? ? ? ? ? ? ? ?spin_lock_irq(&hwif->lock);> ? ? ? ? ? ? ? ?enable_irq(hwif->irq);> ? ? ? ? ? ? ? ?if (startstop == ide_stopped && hwif->polling == 0) {> - ? ? ? ? ? ? ? ? ? ? ? rq_in_flight = hwif->rq;> - ? ? ? ? ? ? ? ? ? ? ? hwif->rq = NULL;> + ? ? ? ? ? ? ? ? ? ? ? if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rq_in_flight = hwif->rq;> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hwif->rq = NULL;> + ? ? ? ? ? ? ? ? ? ? ? }> ? ? ? ? ? ? ? ? ? ? ? ?ide_unlock_port(hwif);> ? ? ? ? ? ? ? ? ? ? ? ?plug_device = 1;> ? ? ? ? ? ? ? ?}> @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev> ? ? ? ? */> ? ? ? ?if (startstop == ide_stopped && hwif->polling == 0) {> ? ? ? ? ? ? ? ?BUG_ON(hwif->handler);> - ? ? ? ? ? ? ? rq_in_flight = hwif->rq;> - ? ? ? ? ? ? ? hwif->rq = NULL;> + ? ? ? ? ? ? ? if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {> + ? ? ? ? ? ? ? ? ? ? ? rq_in_flight = hwif->rq;> + ? ? ? ? ? ? ? ? ? ? ? hwif->rq = NULL;> + ? ? ? ? ? ? ? }> ? ? ? ? ? ? ? ?ide_unlock_port(hwif);> ? ? ? ? ? ? ? ?plug_device = 1;> ? ? ? ?}>????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2009-07-01 20:36:49

by Joao Correia

[permalink] [raw]
Subject: Re: [Bug #13660] Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

No formal patch has been sent yet, that i am aware of. I have made
some changes following suggestion by Americo Wang advise, to the
following:

(patch by Ingo)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index 699a2ac..031f4c6 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -65,7 +65,7 @@ enum {
* Stack-trace: tightly packed array of stack backtrace
* addresses. Protected by the hash_lock.
*/
-#define MAX_STACK_TRACE_ENTRIES 262144UL
+#define MAX_STACK_TRACE_ENTRIES 1048576UL

extern struct list_head all_lock_classes;
extern struct lock_chain lock_chains[];

and afterwards, a new bug popped up, solved by changing

include/linux/sched.h

# define MAX_LOCK_DEPTH 48UL

to

# define MAX_LOCK_DEPTH 96UL


I have now found a third limit bug, related to MAX_LOCKDEP_CHAINS,
which was hidden so far, which im trying to raise and replicate. This
is being discussed in detail in another message exchange on the lkml,
between me and Americo.

Thank you very much for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal



On Mon, Jun 29, 2009 at 1:31 AM, Rafael J. Wysocki<[email protected]> wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.29 and 2.6.30.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.29 and 2.6.30. ?Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry ? ? ? : http://bugzilla.kernel.org/show_bug.cgi?id=13660
> Subject ? ? ? ? : Crashes during boot on 2.6.30 / 2.6.31-rc, random programs
> Submitter ? ? ? : Joao Correia <[email protected]>
> Date ? ? ? ? ? ?: 2009-06-27 16:07 (2 days old)
> References ? ? ?: http://lkml.org/lkml/2009/6/27/95
>
>
>

2009-07-01 21:30:40

by Etienne Basset

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

Jeff Chua wrote:
> On Thu, Jul 2, 2009 at 12:29 AM, Bartlomiej
> Zolnierkiewicz<[email protected]> wrote:
>> Here is the more complete version, also taking into the account changes
>> in ide_intr() and ide_timer_expiry():
>
> This works great for. Survived STR, STD. I just applied on top vanilla
> latest Linus's git pull. Nothing else to revert.
>
> Thanks,
> Jeff.
>
>
i confirm, this works for me too :)
thanks,
Etienne


>> ---
>> drivers/ide/ide-io.c | 15 ++++++++++-----
>> 1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> Index: b/drivers/ide/ide-io.c
>> ===================================================================
>> --- a/drivers/ide/ide-io.c
>> +++ b/drivers/ide/ide-io.c
>> @@ -532,7 +532,8 @@ repeat:
>>
>> if (startstop == ide_stopped) {
>> rq = hwif->rq;
>> - hwif->rq = NULL;
>> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
>> + hwif->rq = NULL;
>> goto repeat;
>> }
>> } else
>> @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
>> spin_lock_irq(&hwif->lock);
>> enable_irq(hwif->irq);
>> if (startstop == ide_stopped && hwif->polling == 0) {
>> - rq_in_flight = hwif->rq;
>> - hwif->rq = NULL;
>> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
>> + rq_in_flight = hwif->rq;
>> + hwif->rq = NULL;
>> + }
>> ide_unlock_port(hwif);
>> plug_device = 1;
>> }
>> @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
>> */
>> if (startstop == ide_stopped && hwif->polling == 0) {
>> BUG_ON(hwif->handler);
>> - rq_in_flight = hwif->rq;
>> - hwif->rq = NULL;
>> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
>> + rq_in_flight = hwif->rq;
>> + hwif->rq = NULL;
>> + }
>> ide_unlock_port(hwif);
>> plug_device = 1;
>> }
>>

2009-07-02 01:47:07

by wu zhangjin

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Wed, 2009-07-01 at 18:29 +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > > >
> > > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > > >
> > > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > > right thing.
> > > >
> > > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > > Basset<[email protected]> wrote:
> > > >
> > > > > To have STR/resume work with current git, I have to :
> > > >
> > > > > 1) apply Bart's patch
> > > >
> > > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > > >
> > > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > > >
> > >
> > > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > > work either. I have tested it on two different loongson-based machines:
> > > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
> >
> > Since it seems like Dave is taking his sweet time with doing the revert
> > I stared at the code a bit more and I think that I finally found the bug
> > (thanks to your debugging work for giving me the right hint!).
> >
> > The patch needs to take into the account a new code introduced by the recent
> > block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
> >
> > @@ -555,8 +560,11 @@ repeat:
> > startstop = start_request(drive, rq);
> > spin_lock_irq(&hwif->lock);
> >
> > - if (startstop == ide_stopped)
> > + if (startstop == ide_stopped) {
> > + rq = hwif->rq;
> > + hwif->rq = NULL;
> > goto repeat;
> > + }
> > } else
> > goto plug_device;
> > out:
> >
> > and not zero hwif->rq if the device is blocked.
> >
> > Could you try the attached patch and see if it fixes the issue?
>
> Here is the more complete version, also taking into the account changes
> in ide_intr() and ide_timer_expiry():
>

Sorry, I can not apply this patch directly, which original version did
you use? I used the one in the master branch of linux-mips development
git repository.

commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf
Merge: ec9c45d e18ed14
Author: Linus Torvalds <[email protected]>
Date: Mon Jun 29 20:07:43 2009 -0700

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
ide: memory overrun in ide_get_identity_ioctl() on big endian
machines using ioctl HDIO_OBSOLETE_IDENTITY
ide: fix resume for CONFIG_BLK_DEV_IDEACPI=y
ide-cd: handle fragmented packet commands gracefully
ide: always kill the whole request on error
ide: fix ide_kill_rq() for special ide-{floppy,tape} driver
requests

it this too old? should i merge another git repository?

I have tried to apply it manually, but unfortunately, also not work. any
other patch needed?

Thanks!
Wu Zhangjin
> ---
> drivers/ide/ide-io.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> Index: b/drivers/ide/ide-io.c
> ===================================================================
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -532,7 +532,8 @@ repeat:
>
> if (startstop == ide_stopped) {
> rq = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
> + hwif->rq = NULL;
> goto repeat;
> }
> } else
> @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
> spin_lock_irq(&hwif->lock);
> enable_irq(hwif->irq);
> if (startstop == ide_stopped && hwif->polling == 0) {
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }
> @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
> */
> if (startstop == ide_stopped && hwif->polling == 0) {
> BUG_ON(hwif->handler);
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }

2009-07-02 02:10:03

by Jeff Chua

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Thu, Jul 2, 2009 at 9:46 AM, Wu Zhangjin<[email protected]> wrote:
> it this too old? should i merge another git repository?
> I have tried to apply it manually, but unfortunately, also not work. any
> other patch needed?

You need to be undo those two patches below ...

> On Mon, Jun 29, 2009 at 11:51 PM, Etienne Basset<[email protected]>
> To have STR/resume work with current git, I have to :
> 1) apply Bart's patch
> 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b

or try to pull from Linus's tree and try again. Latest is now ...

commit d960eea974f5e500c0dcb95a934239cc1f481cfd
Author: Randy Dunlap <[email protected]>
Date: Mon Jun 29 14:54:11 2009 -0700

kernel-doc: move ignoring kmemcheck



Jeff.

2009-07-02 10:47:29

by Ralf Baechle

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Thu, Jul 02, 2009 at 09:46:43AM +0800, Wu Zhangjin wrote:

> Sorry, I can not apply this patch directly, which original version did
> you use? I used the one in the master branch of linux-mips development
> git repository.

The master branch of linux-mips.org has no IDE changes over Linus' tree.

Ralf

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Thursday 02 July 2009 03:46:43 Wu Zhangjin wrote:
> On Wed, 2009-07-01 at 18:29 +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > > > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > > > >
> > > > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > > > >
> > > > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > > > right thing.
> > > > >
> > > > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > > > Basset<[email protected]> wrote:
> > > > >
> > > > > > To have STR/resume work with current git, I have to :
> > > > >
> > > > > > 1) apply Bart's patch
> > > > >
> > > > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > > > >
> > > > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > > > >
> > > >
> > > > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > > > work either. I have tested it on two different loongson-based machines:
> > > > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
> > >
> > > Since it seems like Dave is taking his sweet time with doing the revert
> > > I stared at the code a bit more and I think that I finally found the bug
> > > (thanks to your debugging work for giving me the right hint!).
> > >
> > > The patch needs to take into the account a new code introduced by the recent
> > > block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
> > >
> > > @@ -555,8 +560,11 @@ repeat:
> > > startstop = start_request(drive, rq);
> > > spin_lock_irq(&hwif->lock);
> > >
> > > - if (startstop == ide_stopped)
> > > + if (startstop == ide_stopped) {
> > > + rq = hwif->rq;
> > > + hwif->rq = NULL;
> > > goto repeat;
> > > + }
> > > } else
> > > goto plug_device;
> > > out:
> > >
> > > and not zero hwif->rq if the device is blocked.
> > >
> > > Could you try the attached patch and see if it fixes the issue?
> >
> > Here is the more complete version, also taking into the account changes
> > in ide_intr() and ide_timer_expiry():
> >
>
> Sorry, I can not apply this patch directly, which original version did
> you use? I used the one in the master branch of linux-mips development
> git repository.
>
> commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf
> Merge: ec9c45d e18ed14
> Author: Linus Torvalds <[email protected]>
> Date: Mon Jun 29 20:07:43 2009 -0700
>
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
>
> * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
> ide: memory overrun in ide_get_identity_ioctl() on big endian
> machines using ioctl HDIO_OBSOLETE_IDENTITY
> ide: fix resume for CONFIG_BLK_DEV_IDEACPI=y
> ide-cd: handle fragmented packet commands gracefully
> ide: always kill the whole request on error
> ide: fix ide_kill_rq() for special ide-{floppy,tape} driver
> requests
>
> it this too old? should i merge another git repository?

Weird, I used linux-next but Linus' tree should also be fine
(as it matches linux-next w.r.t. ide currently).

Anyway since the patch was confirmed to fix the problem by
Jeff and Etienne here is the final version for Dave.

From: Bartlomiej Zolnierkiewicz <[email protected]>
Subject: [PATCH] ide: make resume work again

It turns out that commit a1317f714af7aed60ddc182d0122477cbe36ee9b
("ide: improve handling of Power Management requests") needs to take
into the account a new code added by the recent block layer changes
in commit 8f6205cd572fece673da0255d74843680f67f879 ("ide: dequeue
in-flight request") and prevent clearing of hwif->rq if the device
is blocked.

Thanks to Etienne, Wu and Jeff for help in fixing the issue.

Reported-and-tested-by: Jeff Chua <[email protected]>
Reported-and-tested-by: Etienne Basset <[email protected]>
Reported-by: Wu Zhangjin <[email protected]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
---
Added patch description, no other changes.

drivers/ide/ide-io.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

Index: b/drivers/ide/ide-io.c
===================================================================
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -532,7 +532,8 @@ repeat:

if (startstop == ide_stopped) {
rq = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
+ hwif->rq = NULL;
goto repeat;
}
} else
@@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
spin_lock_irq(&hwif->lock);
enable_irq(hwif->irq);
if (startstop == ide_stopped && hwif->polling == 0) {
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}
@@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
*/
if (startstop == ide_stopped && hwif->polling == 0) {
BUG_ON(hwif->handler);
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}

2009-07-02 17:18:52

by David Rientjes

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

On Wed, 1 Jul 2009, Pekka Enberg wrote:

> Lets go with the slab_out_of_memory() patch you outlined in a previous
> post and implement the slub_debug=p thing Christoph suggested. I think
> it's the best compromise at this point. When you guys finally see the
> light, we can always change it to a reasonable default. ;)
>
> So can you send a patch, please?
>

Sure, let me know if you think this is -rc material; otherwise, the bug
will have to be deferred until 2.6.32 with the temporary workaround of
disabling CONFIG_SLUB_DEBUG_ON.

2009-07-03 03:58:47

by wu zhangjin

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Thu, 2009-07-02 at 18:13 +0200, Bartlomiej Zolnierkiewicz wrote:
> On Thursday 02 July 2009 03:46:43 Wu Zhangjin wrote:
> > On Wed, 2009-07-01 at 18:29 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> > > > On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > > > > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > > > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > > > > >
> > > > > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > > > > >
> > > > > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > > > > right thing.
> > > > > >
> > > > > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > > > > Basset<[email protected]> wrote:
> > > > > >
> > > > > > > To have STR/resume work with current git, I have to :
> > > > > >
> > > > > > > 1) apply Bart's patch
> > > > > >
> > > > > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > > > > >
> > > > > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > > > > >
> > > > >
> > > > > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > > > > work either. I have tested it on two different loongson-based machines:
> > > > > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
> > > >
> > > > Since it seems like Dave is taking his sweet time with doing the revert
> > > > I stared at the code a bit more and I think that I finally found the bug
> > > > (thanks to your debugging work for giving me the right hint!).
> > > >
> > > > The patch needs to take into the account a new code introduced by the recent
> > > > block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
> > > >
> > > > @@ -555,8 +560,11 @@ repeat:
> > > > startstop = start_request(drive, rq);
> > > > spin_lock_irq(&hwif->lock);
> > > >
> > > > - if (startstop == ide_stopped)
> > > > + if (startstop == ide_stopped) {
> > > > + rq = hwif->rq;
> > > > + hwif->rq = NULL;
> > > > goto repeat;
> > > > + }
> > > > } else
> > > > goto plug_device;
> > > > out:
> > > >
> > > > and not zero hwif->rq if the device is blocked.
> > > >
> > > > Could you try the attached patch and see if it fixes the issue?
> > >
> > > Here is the more complete version, also taking into the account changes
> > > in ide_intr() and ide_timer_expiry():
> > >
> >
> > Sorry, I can not apply this patch directly, which original version did
> > you use? I used the one in the master branch of linux-mips development
> > git repository.
> >
> > commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf
> > Merge: ec9c45d e18ed14
> > Author: Linus Torvalds <[email protected]>
> > Date: Mon Jun 29 20:07:43 2009 -0700
> >
> > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
> >
> > * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
> > ide: memory overrun in ide_get_identity_ioctl() on big endian
> > machines using ioctl HDIO_OBSOLETE_IDENTITY
> > ide: fix resume for CONFIG_BLK_DEV_IDEACPI=y
> > ide-cd: handle fragmented packet commands gracefully
> > ide: always kill the whole request on error
> > ide: fix ide_kill_rq() for special ide-{floppy,tape} driver
> > requests
> >
> > it this too old? should i merge another git repository?
>
> Weird, I used linux-next but Linus' tree should also be fine
> (as it matches linux-next w.r.t. ide currently).

I just cloned the linux-next git repo, and tested your patch with
STD/Hibernation, unfortunately, it also not work :-(

here is the Call Trace:

blk_delete_timer+0x0/0x20
blk_requeue_request+0x24/0xd0
ide_requeue_and_plug+0x38/0xb0
ide_intr+0x120/0x300 ---> ide_intr....
handle_IRQ_event+0x94/0x230
handle_level_irq+0x7c/0x120
mach_irq_dispatch+0xc8/0x158
ret_from_irq+0x0/0x4
cpu_idle+0x30/0x60
start_kernel+0x330/0x34c

If _NOT_ apply your patch and comment this part, it works:

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index d5f3c77..a45de2b 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -468,12 +468,12 @@ void do_ide_request(struct request_queue *q)
ide_hwif_t *prev_port;
repeat:
prev_port = hwif->host->cur_port;
-
+/*
if (drive->dev_flags & IDE_DFLAG_BLOCKED)
rq = hwif->rq;
else
WARN_ON_ONCE(hwif->rq);
-
+*/
if (drive->dev_flags & IDE_DFLAG_SLEEPING &&
time_after(drive->sleep, jiffies)) {
ide_unlock_port(hwif);


Regards,
Wu Zhangjin
>
> Anyway since the patch was confirmed to fix the problem by
> Jeff and Etienne here is the final version for Dave.
>
> From: Bartlomiej Zolnierkiewicz <[email protected]>
> Subject: [PATCH] ide: make resume work again
>
> It turns out that commit a1317f714af7aed60ddc182d0122477cbe36ee9b
> ("ide: improve handling of Power Management requests") needs to take
> into the account a new code added by the recent block layer changes
> in commit 8f6205cd572fece673da0255d74843680f67f879 ("ide: dequeue
> in-flight request") and prevent clearing of hwif->rq if the device
> is blocked.
>
> Thanks to Etienne, Wu and Jeff for help in fixing the issue.
>
> Reported-and-tested-by: Jeff Chua <[email protected]>
> Reported-and-tested-by: Etienne Basset <[email protected]>
> Reported-by: Wu Zhangjin <[email protected]>
> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
> ---
> Added patch description, no other changes.
>
> drivers/ide/ide-io.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> Index: b/drivers/ide/ide-io.c
> ===================================================================
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -532,7 +532,8 @@ repeat:
>
> if (startstop == ide_stopped) {
> rq = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
> + hwif->rq = NULL;
> goto repeat;
> }
> } else
> @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
> spin_lock_irq(&hwif->lock);
> enable_irq(hwif->irq);
> if (startstop == ide_stopped && hwif->polling == 0) {
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }
> @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
> */
> if (startstop == ide_stopped && hwif->polling == 0) {
> BUG_ON(hwif->handler);
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-07-03 04:07:25

by wu zhangjin

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Fri, 2009-07-03 at 11:58 +0800, Wu Zhangjin wrote:
> On Thu, 2009-07-02 at 18:13 +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Thursday 02 July 2009 03:46:43 Wu Zhangjin wrote:
> > > On Wed, 2009-07-01 at 18:29 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> > > > > On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > > > > > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > > > > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > > > > > >
> > > > > > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > > > > > >
> > > > > > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > > > > > right thing.
> > > > > > >
> > > > > > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > > > > > Basset<[email protected]> wrote:
> > > > > > >
> > > > > > > > To have STR/resume work with current git, I have to :
> > > > > > >
> > > > > > > > 1) apply Bart's patch
> > > > > > >
> > > > > > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > > > > > >
> > > > > > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > > > > > >
> > > > > >
> > > > > > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > > > > > work either. I have tested it on two different loongson-based machines:
> > > > > > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
> > > > >
> > > > > Since it seems like Dave is taking his sweet time with doing the revert
> > > > > I stared at the code a bit more and I think that I finally found the bug
> > > > > (thanks to your debugging work for giving me the right hint!).
> > > > >
> > > > > The patch needs to take into the account a new code introduced by the recent
> > > > > block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
> > > > >
> > > > > @@ -555,8 +560,11 @@ repeat:
> > > > > startstop = start_request(drive, rq);
> > > > > spin_lock_irq(&hwif->lock);
> > > > >
> > > > > - if (startstop == ide_stopped)
> > > > > + if (startstop == ide_stopped) {
> > > > > + rq = hwif->rq;
> > > > > + hwif->rq = NULL;
> > > > > goto repeat;
> > > > > + }
> > > > > } else
> > > > > goto plug_device;
> > > > > out:
> > > > >
> > > > > and not zero hwif->rq if the device is blocked.
> > > > >
> > > > > Could you try the attached patch and see if it fixes the issue?
> > > >
> > > > Here is the more complete version, also taking into the account changes
> > > > in ide_intr() and ide_timer_expiry():
> > > >
> > >
> > > Sorry, I can not apply this patch directly, which original version did
> > > you use? I used the one in the master branch of linux-mips development
> > > git repository.
> > >
> > > commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf
> > > Merge: ec9c45d e18ed14
> > > Author: Linus Torvalds <[email protected]>
> > > Date: Mon Jun 29 20:07:43 2009 -0700
> > >
> > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
> > >
> > > * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
> > > ide: memory overrun in ide_get_identity_ioctl() on big endian
> > > machines using ioctl HDIO_OBSOLETE_IDENTITY
> > > ide: fix resume for CONFIG_BLK_DEV_IDEACPI=y
> > > ide-cd: handle fragmented packet commands gracefully
> > > ide: always kill the whole request on error
> > > ide: fix ide_kill_rq() for special ide-{floppy,tape} driver
> > > requests
> > >
> > > it this too old? should i merge another git repository?
> >
> > Weird, I used linux-next but Linus' tree should also be fine
> > (as it matches linux-next w.r.t. ide currently).
>
> I just cloned the linux-next git repo, and tested your patch with
> STD/Hibernation, unfortunately, it also not work :-(
>
> here is the Call Trace:
>
> blk_delete_timer+0x0/0x20
> blk_requeue_request+0x24/0xd0
> ide_requeue_and_plug+0x38/0xb0
> ide_intr+0x120/0x300 ---> ide_intr....
> handle_IRQ_event+0x94/0x230
> handle_level_irq+0x7c/0x120
> mach_irq_dispatch+0xc8/0x158
> ret_from_irq+0x0/0x4
> cpu_idle+0x30/0x60
> start_kernel+0x330/0x34c
>
There are two more lines after the Call Trace:

Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Fatal exception in interrupt.

> If _NOT_ apply your patch and comment this part, it works:
>
> diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> index d5f3c77..a45de2b 100644
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -468,12 +468,12 @@ void do_ide_request(struct request_queue *q)
> ide_hwif_t *prev_port;
> repeat:
> prev_port = hwif->host->cur_port;
> -
> +/*
> if (drive->dev_flags & IDE_DFLAG_BLOCKED)
> rq = hwif->rq;
> else
> WARN_ON_ONCE(hwif->rq);
> -
> +*/
> if (drive->dev_flags & IDE_DFLAG_SLEEPING &&
> time_after(drive->sleep, jiffies)) {
> ide_unlock_port(hwif);
>
>
> Regards,
> Wu Zhangjin
> >
> > Anyway since the patch was confirmed to fix the problem by
> > Jeff and Etienne here is the final version for Dave.
> >
> > From: Bartlomiej Zolnierkiewicz <[email protected]>
> > Subject: [PATCH] ide: make resume work again
> >
> > It turns out that commit a1317f714af7aed60ddc182d0122477cbe36ee9b
> > ("ide: improve handling of Power Management requests") needs to take
> > into the account a new code added by the recent block layer changes
> > in commit 8f6205cd572fece673da0255d74843680f67f879 ("ide: dequeue
> > in-flight request") and prevent clearing of hwif->rq if the device
> > is blocked.
> >
> > Thanks to Etienne, Wu and Jeff for help in fixing the issue.
> >
> > Reported-and-tested-by: Jeff Chua <[email protected]>
> > Reported-and-tested-by: Etienne Basset <[email protected]>
> > Reported-by: Wu Zhangjin <[email protected]>
> > Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
> > ---
> > Added patch description, no other changes.
> >
> > drivers/ide/ide-io.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > Index: b/drivers/ide/ide-io.c
> > ===================================================================
> > --- a/drivers/ide/ide-io.c
> > +++ b/drivers/ide/ide-io.c
> > @@ -532,7 +532,8 @@ repeat:
> >
> > if (startstop == ide_stopped) {
> > rq = hwif->rq;
> > - hwif->rq = NULL;
> > + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
> > + hwif->rq = NULL;
> > goto repeat;
> > }
> > } else
> > @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
> > spin_lock_irq(&hwif->lock);
> > enable_irq(hwif->irq);
> > if (startstop == ide_stopped && hwif->polling == 0) {
> > - rq_in_flight = hwif->rq;
> > - hwif->rq = NULL;
> > + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> > + rq_in_flight = hwif->rq;
> > + hwif->rq = NULL;
> > + }
> > ide_unlock_port(hwif);
> > plug_device = 1;
> > }
> > @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
> > */
> > if (startstop == ide_stopped && hwif->polling == 0) {
> > BUG_ON(hwif->handler);
> > - rq_in_flight = hwif->rq;
> > - hwif->rq = NULL;
> > + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> > + rq_in_flight = hwif->rq;
> > + hwif->rq = NULL;
> > + }
> > ide_unlock_port(hwif);
> > plug_device = 1;
> > }
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-07-03 07:23:58

by Pekka Enberg

[permalink] [raw]
Subject: Re: [Bug #13319] Page allocation failures with b43 and p54usb

Hi David,

On Wed, 1 Jul 2009, Pekka Enberg wrote:
>> Lets go with the slab_out_of_memory() patch you outlined in a previous
>> post and implement the slub_debug=p thing Christoph suggested. I think
>> it's the best compromise at this point. When you guys finally see the
>> light, we can always change it to a reasonable default. ;)
>>
>> So can you send a patch, please?

On Thu, Jul 2, 2009 at 8:18 PM, David Rientjes<[email protected]> wrote:
> Sure, let me know if you think this is -rc material; otherwise, the bug
> will have to be deferred until 2.6.32 with the temporary workaround of
> disabling CONFIG_SLUB_DEBUG_ON.

We're at -rc2 so yes, I do think we should fix 2.6.31.

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Friday 03 July 2009 05:58:25 Wu Zhangjin wrote:
> On Thu, 2009-07-02 at 18:13 +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Thursday 02 July 2009 03:46:43 Wu Zhangjin wrote:
> > > On Wed, 2009-07-01 at 18:29 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > On Wednesday 01 July 2009 18:21:25 Bartlomiej Zolnierkiewicz wrote:
> > > > > On Wednesday 01 July 2009 16:47:41 Wu Zhangjin wrote:
> > > > > > On Wed, 2009-07-01 at 22:31 +0800, Jeff Chua wrote:
> > > > > > > On Tue, Jun 30, 2009 at 12:21 AM, Jeff Chua<[email protected]> wrote:
> > > > > > >
> > > > > > > > I just tried, and it "seems" to work. Will try a few more cycles.
> > > > > > >
> > > > > > > STD/STR survived quite a few cycles now. Patch seems to be doing the
> > > > > > > right thing.
> > > > > > >
> > > > > > > On Mon, Jun 29, 2009 at 11:51 PM, Etienne
> > > > > > > Basset<[email protected]> wrote:
> > > > > > >
> > > > > > > > To have STR/resume work with current git, I have to :
> > > > > > >
> > > > > > > > 1) apply Bart's patch
> > > > > > >
> > > > > > > This is not yet in Linus's tree. And much needed to really fix the problem.
> > > > > > >
> > > > > > > > 2) revert this commit : a1317f714af7aed60ddc182d0122477cbe36ee9b
> > > > > > >
> > > > > >
> > > > > > Yes, This commit must be reverted, otherwise, STD/Hibernation will not
> > > > > > work either. I have tested it on two different loongson-based machines:
> > > > > > fuloong2e box and yeeloong2f netbook.(loongson is mips compatiable)
> > > > >
> > > > > Since it seems like Dave is taking his sweet time with doing the revert
> > > > > I stared at the code a bit more and I think that I finally found the bug
> > > > > (thanks to your debugging work for giving me the right hint!).
> > > > >
> > > > > The patch needs to take into the account a new code introduced by the recent
> > > > > block layer changes (commit 8f6205cd572fece673da0255d74843680f67f879):
> > > > >
> > > > > @@ -555,8 +560,11 @@ repeat:
> > > > > startstop = start_request(drive, rq);
> > > > > spin_lock_irq(&hwif->lock);
> > > > >
> > > > > - if (startstop == ide_stopped)
> > > > > + if (startstop == ide_stopped) {
> > > > > + rq = hwif->rq;
> > > > > + hwif->rq = NULL;
> > > > > goto repeat;
> > > > > + }
> > > > > } else
> > > > > goto plug_device;
> > > > > out:
> > > > >
> > > > > and not zero hwif->rq if the device is blocked.
> > > > >
> > > > > Could you try the attached patch and see if it fixes the issue?
> > > >
> > > > Here is the more complete version, also taking into the account changes
> > > > in ide_intr() and ide_timer_expiry():
> > > >
> > >
> > > Sorry, I can not apply this patch directly, which original version did
> > > you use? I used the one in the master branch of linux-mips development
> > > git repository.
> > >
> > > commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf
> > > Merge: ec9c45d e18ed14
> > > Author: Linus Torvalds <[email protected]>
> > > Date: Mon Jun 29 20:07:43 2009 -0700
> > >
> > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
> > >
> > > * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
> > > ide: memory overrun in ide_get_identity_ioctl() on big endian
> > > machines using ioctl HDIO_OBSOLETE_IDENTITY
> > > ide: fix resume for CONFIG_BLK_DEV_IDEACPI=y
> > > ide-cd: handle fragmented packet commands gracefully
> > > ide: always kill the whole request on error
> > > ide: fix ide_kill_rq() for special ide-{floppy,tape} driver
> > > requests
> > >
> > > it this too old? should i merge another git repository?
> >
> > Weird, I used linux-next but Linus' tree should also be fine
> > (as it matches linux-next w.r.t. ide currently).
>
> I just cloned the linux-next git repo, and tested your patch with
> STD/Hibernation, unfortunately, it also not work :-(
>
> here is the Call Trace:
>
> blk_delete_timer+0x0/0x20
> blk_requeue_request+0x24/0xd0
> ide_requeue_and_plug+0x38/0xb0
> ide_intr+0x120/0x300 ---> ide_intr....
> handle_IRQ_event+0x94/0x230
> handle_level_irq+0x7c/0x120
> mach_irq_dispatch+0xc8/0x158
> ret_from_irq+0x0/0x4
> cpu_idle+0x30/0x60
> start_kernel+0x330/0x34c
>
> If _NOT_ apply your patch and comment this part, it works:

OK, I see another gotcha added by recent changes, we need to explicitly
initialize rq_in_flight variables now. Revised patch below..

From: Bartlomiej Zolnierkiewicz <[email protected]>
Subject: [PATCH] ide: make resume work again (for real)

It turns out that commit a1317f714af7aed60ddc182d0122477cbe36ee9b
("ide: improve handling of Power Management requests") needs to take
into the account a new code added by the recent block layer changes
in commit 8f6205cd572fece673da0255d74843680f67f879 ("ide: dequeue
in-flight request") and prevent clearing of hwif->rq if the device
is blocked.

Thanks to Etienne, Wu and Jeff for help in fixing the issue.

Reported-and-tested-by: Jeff Chua <[email protected]>
Reported-and-tested-by: Etienne Basset <[email protected]>
Reported-by: Wu Zhangjin <[email protected]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
---
Added patch description, no other changes.

drivers/ide/ide-io.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)

Index: b/drivers/ide/ide-io.c
===================================================================
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -532,7 +532,8 @@ repeat:

if (startstop == ide_stopped) {
rq = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
+ hwif->rq = NULL;
goto repeat;
}
} else
@@ -616,7 +617,7 @@ void ide_timer_expiry (unsigned long dat
unsigned long flags;
int wait = -1;
int plug_device = 0;
- struct request *uninitialized_var(rq_in_flight);
+ struct request *rq_in_flight = NULL;

spin_lock_irqsave(&hwif->lock, flags);

@@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
spin_lock_irq(&hwif->lock);
enable_irq(hwif->irq);
if (startstop == ide_stopped && hwif->polling == 0) {
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}
@@ -775,7 +778,7 @@ irqreturn_t ide_intr (int irq, void *dev
ide_startstop_t startstop;
irqreturn_t irq_ret = IRQ_NONE;
int plug_device = 0;
- struct request *uninitialized_var(rq_in_flight);
+ struct request *rq_in_flight = NULL;

if (host->host_flags & IDE_HFLAG_SERIALIZE) {
if (hwif != host->cur_port)
@@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
*/
if (startstop == ide_stopped && hwif->polling == 0) {
BUG_ON(hwif->handler);
- rq_in_flight = hwif->rq;
- hwif->rq = NULL;
+ if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
+ rq_in_flight = hwif->rq;
+ hwif->rq = NULL;
+ }
ide_unlock_port(hwif);
plug_device = 1;
}

2009-07-03 15:32:03

by wu zhangjin

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

Hi,

> OK, I see another gotcha added by recent changes, we need to explicitly
> initialize rq_in_flight variables now. Revised patch below..
>

Sorry, STD also not work. if apply this patch, the same problem as not
apply it, it stopped at:

...
PM: Crete hibernation image:
PM: Need to copy ... pages
PM: Hibernation image created ...

I think it's better to revert this commit:
a1317f714af7aed60ddc182d0122477cbe36ee9b ("ide: improve handling of
Power Management requests")

Regards,
Wu Zhangjin

> From: Bartlomiej Zolnierkiewicz <[email protected]>
> Subject: [PATCH] ide: make resume work again (for real)
>
> It turns out that commit a1317f714af7aed60ddc182d0122477cbe36ee9b
> ("ide: improve handling of Power Management requests") needs to take
> into the account a new code added by the recent block layer changes
> in commit 8f6205cd572fece673da0255d74843680f67f879 ("ide: dequeue
> in-flight request") and prevent clearing of hwif->rq if the device
> is blocked.
>
> Thanks to Etienne, Wu and Jeff for help in fixing the issue.
>
> Reported-and-tested-by: Jeff Chua <[email protected]>
> Reported-and-tested-by: Etienne Basset <[email protected]>
> Reported-by: Wu Zhangjin <[email protected]>
> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
> ---
> Added patch description, no other changes.
>
> drivers/ide/ide-io.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> Index: b/drivers/ide/ide-io.c
> ===================================================================
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -532,7 +532,8 @@ repeat:
>
> if (startstop == ide_stopped) {
> rq = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0)
> + hwif->rq = NULL;
> goto repeat;
> }
> } else
> @@ -616,7 +617,7 @@ void ide_timer_expiry (unsigned long dat
> unsigned long flags;
> int wait = -1;
> int plug_device = 0;
> - struct request *uninitialized_var(rq_in_flight);
> + struct request *rq_in_flight = NULL;
>
> spin_lock_irqsave(&hwif->lock, flags);
>
> @@ -679,8 +680,10 @@ void ide_timer_expiry (unsigned long dat
> spin_lock_irq(&hwif->lock);
> enable_irq(hwif->irq);
> if (startstop == ide_stopped && hwif->polling == 0) {
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }
> @@ -775,7 +778,7 @@ irqreturn_t ide_intr (int irq, void *dev
> ide_startstop_t startstop;
> irqreturn_t irq_ret = IRQ_NONE;
> int plug_device = 0;
> - struct request *uninitialized_var(rq_in_flight);
> + struct request *rq_in_flight = NULL;
>
> if (host->host_flags & IDE_HFLAG_SERIALIZE) {
> if (hwif != host->cur_port)
> @@ -856,8 +859,10 @@ irqreturn_t ide_intr (int irq, void *dev
> */
> if (startstop == ide_stopped && hwif->polling == 0) {
> BUG_ON(hwif->handler);
> - rq_in_flight = hwif->rq;
> - hwif->rq = NULL;
> + if ((drive->dev_flags & IDE_DFLAG_BLOCKED) == 0) {
> + rq_in_flight = hwif->rq;
> + hwif->rq = NULL;
> + }
> ide_unlock_port(hwif);
> plug_device = 1;
> }

Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

On Friday 03 July 2009 17:31:36 Wu Zhangjin wrote:
> Hi,
>
> > OK, I see another gotcha added by recent changes, we need to explicitly
> > initialize rq_in_flight variables now. Revised patch below..
> >
>
> Sorry, STD also not work. if apply this patch, the same problem as not
> apply it, it stopped at:
>
> ...
> PM: Crete hibernation image:
> PM: Need to copy ... pages
> PM: Hibernation image created ...
>
> I think it's better to revert this commit:
> a1317f714af7aed60ddc182d0122477cbe36ee9b ("ide: improve handling of
> Power Management requests")

I completely agree and I've already requested this a week ago
(this commit was not meant for going straight to -rc tree anyway).

2009-07-06 19:22:41

by David Miller

[permalink] [raw]
Subject: Re: [Bug #13663] suspend to ram regression (IDE related)

From: Bartlomiej Zolnierkiewicz <[email protected]>
Date: Mon, 6 Jul 2009 16:57:59 +0200

>> I think it's better to revert this commit:
>> a1317f714af7aed60ddc182d0122477cbe36ee9b ("ide: improve handling of
>> Power Management requests")
>
> I completely agree and I've already requested this a week ago
> (this commit was not meant for going straight to -rc tree anyway).

I'll revert this today and push that to Linus.

2009-07-07 06:02:37

by David Rientjes

[permalink] [raw]
Subject: [patch] slub: add option to disable higher order debugging slabs

When debugging is enabled, slub requires that additional metadata be
stored in slabs for certain options: SLAB_RED_ZONE, SLAB_POISON, and
SLAB_STORE_USER.

Consequently, it may require that the minimum possible slab order needed
to allocate a single object be greater when using these options. The
most notable example is for objects that are PAGE_SIZE bytes in size.

Higher minimum slab orders may cause page allocation failures when oom or
under heavy fragmentation.

This patch adds a new slub_debug option, which disables debugging by
default for caches that would have resulted in higher minimum orders:

slub_debug=O

When this option is used on systems with 4K pages, kmalloc-4096, for
example, will not have debugging enabled by default even if
CONFIG_SLUB_DEBUG_ON is defined because it would have resulted in a
order-1 minimum slab order.

Cc: Christoph Lameter <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
Documentation/vm/slub.txt | 10 ++++++++++
mm/slub.c | 42 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -41,6 +41,8 @@ Possible debug options are
P Poisoning (object and padding)
U User tracking (free and alloc)
T Trace (please only use on single slabs)
+ O Switch debugging off for caches that would have
+ caused higher minimum slab orders
- Switch all debugging off (useful if the kernel is
configured with CONFIG_SLUB_DEBUG_ON)

@@ -59,6 +61,14 @@ to the dentry cache with

slub_debug=F,dentry

+Debugging options may require the minimum possible slab order to increase as
+a result of storing the metadata (for example, caches with PAGE_SIZE object
+sizes). This has a higher liklihood of resulting in slab allocation errors
+in low memory situations or if there's high fragmentation of memory. To
+switch off debugging for such caches by default, use
+
+ slub_debug=O
+
In case you forgot to enable debugging on the kernel command line: It is
possible to enable debugging manually when the kernel is up. Look at the
contents of:
diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -142,6 +142,13 @@
SLAB_POISON | SLAB_STORE_USER)

/*
+ * Debugging flags that require metadata to be stored in the slab, up to
+ * DEBUG_SIZE in size.
+ */
+#define DEBUG_SIZE_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
+#define DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
+
+/*
* Set of flags that will prevent slab merging
*/
#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
@@ -326,6 +333,7 @@ static int slub_debug;
#endif

static char *slub_debug_slabs;
+static int disable_higher_order_debug;

/*
* Object debugging
@@ -977,6 +985,15 @@ static int __init setup_slub_debug(char *str)
*/
goto check_slabs;

+ if (tolower(*str) == 'o') {
+ /*
+ * Avoid enabling debugging on caches if its minimum order
+ * would increase as a result.
+ */
+ disable_higher_order_debug = 1;
+ goto out;
+ }
+
slub_debug = 0;
if (*str == '-')
/*
@@ -1023,13 +1040,28 @@ static unsigned long kmem_cache_flags(unsigned long objsize,
unsigned long flags, const char *name,
void (*ctor)(void *))
{
+ int debug_flags = slub_debug;
+
/*
* Enable debugging if selected on the kernel commandline.
*/
- if (slub_debug && (!slub_debug_slabs ||
- strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)) == 0))
- flags |= slub_debug;
+ if (debug_flags) {
+ if (slub_debug_slabs &&
+ strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)))
+ goto out;
+
+ /*
+ * Disable debugging that increases slab size if the minimum
+ * slab order would have increased as a result.
+ */
+ if (disable_higher_order_debug &&
+ get_order(objsize + DEBUG_SIZE) > get_order(objsize))
+ debug_flags &= ~DEBUG_SIZE_FLAGS;
+ goto out;

+ flags |= debug_flags;
+ }
+out:
return flags;
}
#else
@@ -1561,6 +1593,10 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
"default order: %d, min order: %d\n", s->name, s->objsize,
s->size, oo_order(s->oo), oo_order(s->min));

+ if (oo_order(s->min) > get_order(s->objsize))
+ printk(KERN_WARNING " %s debugging increased min order, use "
+ "slub_debug=O to disable.\n", s->name);
+
for_each_online_node(node) {
struct kmem_cache_node *n = get_node(s, node);
unsigned long nr_slabs;

2009-07-07 07:14:27

by David Rientjes

[permalink] [raw]
Subject: [patch v2] slub: add option to disable higher order debugging slabs

When debugging is enabled, slub requires that additional metadata be
stored in slabs for certain options: SLAB_RED_ZONE, SLAB_POISON, and
SLAB_STORE_USER.

Consequently, it may require that the minimum possible slab order needed
to allocate a single object be greater when using these options. The
most notable example is for objects that are PAGE_SIZE bytes in size.

Higher minimum slab orders may cause page allocation failures when oom or
under heavy fragmentation.

This patch adds a new slub_debug option, which disables debugging by
default for caches that would have resulted in higher minimum orders:

slub_debug=O

When this option is used on systems with 4K pages, kmalloc-4096, for
example, will not have debugging enabled by default even if
CONFIG_SLUB_DEBUG_ON is defined because it would have resulted in a
order-1 minimum slab order.

Cc: Christoph Lameter <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---

V1 -> V2: Removed spurious `goto out'.

Documentation/vm/slub.txt | 10 ++++++++++
mm/slub.c | 41 ++++++++++++++++++++++++++++++++++++++---
2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -41,6 +41,8 @@ Possible debug options are
P Poisoning (object and padding)
U User tracking (free and alloc)
T Trace (please only use on single slabs)
+ O Switch debugging off for caches that would have
+ caused higher minimum slab orders
- Switch all debugging off (useful if the kernel is
configured with CONFIG_SLUB_DEBUG_ON)

@@ -59,6 +61,14 @@ to the dentry cache with

slub_debug=F,dentry

+Debugging options may require the minimum possible slab order to increase as
+a result of storing the metadata (for example, caches with PAGE_SIZE object
+sizes). This has a higher liklihood of resulting in slab allocation errors
+in low memory situations or if there's high fragmentation of memory. To
+switch off debugging for such caches by default, use
+
+ slub_debug=O
+
In case you forgot to enable debugging on the kernel command line: It is
possible to enable debugging manually when the kernel is up. Look at the
contents of:
diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -142,6 +142,13 @@
SLAB_POISON | SLAB_STORE_USER)

/*
+ * Debugging flags that require metadata to be stored in the slab, up to
+ * DEBUG_SIZE in size.
+ */
+#define DEBUG_SIZE_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
+#define DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
+
+/*
* Set of flags that will prevent slab merging
*/
#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
@@ -326,6 +333,7 @@ static int slub_debug;
#endif

static char *slub_debug_slabs;
+static int disable_higher_order_debug;

/*
* Object debugging
@@ -977,6 +985,15 @@ static int __init setup_slub_debug(char *str)
*/
goto check_slabs;

+ if (tolower(*str) == 'o') {
+ /*
+ * Avoid enabling debugging on caches if its minimum order
+ * would increase as a result.
+ */
+ disable_higher_order_debug = 1;
+ goto out;
+ }
+
slub_debug = 0;
if (*str == '-')
/*
@@ -1023,13 +1040,27 @@ static unsigned long kmem_cache_flags(unsigned long objsize,
unsigned long flags, const char *name,
void (*ctor)(void *))
{
+ int debug_flags = slub_debug;
+
/*
* Enable debugging if selected on the kernel commandline.
*/
- if (slub_debug && (!slub_debug_slabs ||
- strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)) == 0))
- flags |= slub_debug;
+ if (debug_flags) {
+ if (slub_debug_slabs &&
+ strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)))
+ goto out;
+
+ /*
+ * Disable debugging that increases slab size if the minimum
+ * slab order would have increased as a result.
+ */
+ if (disable_higher_order_debug &&
+ get_order(objsize + DEBUG_SIZE) > get_order(objsize))
+ debug_flags &= ~DEBUG_SIZE_FLAGS;

+ flags |= debug_flags;
+ }
+out:
return flags;
}
#else
@@ -1561,6 +1592,10 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
"default order: %d, min order: %d\n", s->name, s->objsize,
s->size, oo_order(s->oo), oo_order(s->min));

+ if (oo_order(s->min) > get_order(s->objsize))
+ printk(KERN_WARNING " %s debugging increased min order, use "
+ "slub_debug=O to disable.\n", s->name);
+
for_each_online_node(node) {
struct kmem_cache_node *n = get_node(s, node);
unsigned long nr_slabs;

2009-07-07 14:06:05

by Cong Wang

[permalink] [raw]
Subject: Re: [Bug #13660] Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

On Thu, Jul 2, 2009 at 4:36 AM, Joao Correia<[email protected]> wrote:
> No formal patch has been sent yet, that i am aware of. I have made
> some changes following suggestion by Americo Wang advise, to the
> following:
>
> (patch by Ingo)
>
> diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
> index 699a2ac..031f4c6 100644
> --- a/kernel/lockdep_internals.h
> +++ b/kernel/lockdep_internals.h
> @@ -65,7 +65,7 @@ enum {
>  * Stack-trace: tightly packed array of stack backtrace
>  * addresses. Protected by the hash_lock.
>  */
> -#define MAX_STACK_TRACE_ENTRIES        262144UL
> +#define MAX_STACK_TRACE_ENTRIES        1048576UL
>
>  extern struct list_head all_lock_classes;
>  extern struct lock_chain lock_chains[];
>
> and afterwards, a new bug popped up, solved by changing
>
> include/linux/sched.h
>
> # define MAX_LOCK_DEPTH 48UL
>
> to
>
> # define MAX_LOCK_DEPTH 96UL
>
>
> I have now found a third limit bug, related to MAX_LOCKDEP_CHAINS,
> which was hidden so far, which im trying to raise and replicate. This
> is being discussed in detail in another message exchange on the lkml,
> between me and Americo.

How about changing MAX_LOCKDEP_CHAINS_BITS to 16?

kernel/lockdep_internals.h:59:#define MAX_LOCKDEP_CHAINS_BITS 15

And can you make a complete patch and send it to lkml with Peter and me
Cc'ed?

Thank you!

2009-07-07 14:23:29

by Joao Correia

[permalink] [raw]
Subject: Re: [Bug #13660] Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Already testing the changes, just to see if something else breaks.

Any special notes on the patch (a basic guideline info on patches
would be great, just so i dont mess it up)? Never submited one before.

Joao Correia

On Tue, Jul 7, 2009 at 3:05 PM, Am?rico Wang<[email protected]> wrote:
> On Thu, Jul 2, 2009 at 4:36 AM, Joao Correia<[email protected]> wrote:
>> No formal patch has been sent yet, that i am aware of. I have made
>> some changes following suggestion by Americo Wang advise, to the
>> following:
>>
>> (patch by Ingo)
>>
>> diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
>> index 699a2ac..031f4c6 100644
>> --- a/kernel/lockdep_internals.h
>> +++ b/kernel/lockdep_internals.h
>> @@ -65,7 +65,7 @@ enum {
>> ?* Stack-trace: tightly packed array of stack backtrace
>> ?* addresses. Protected by the hash_lock.
>> ?*/
>> -#define MAX_STACK_TRACE_ENTRIES ? ? ? ?262144UL
>> +#define MAX_STACK_TRACE_ENTRIES ? ? ? ?1048576UL
>>
>> ?extern struct list_head all_lock_classes;
>> ?extern struct lock_chain lock_chains[];
>>
>> and afterwards, a new bug popped up, solved by changing
>>
>> include/linux/sched.h
>>
>> # define MAX_LOCK_DEPTH 48UL
>>
>> to
>>
>> # define MAX_LOCK_DEPTH 96UL
>>
>>
>> I have now found a third limit bug, related to MAX_LOCKDEP_CHAINS,
>> which was hidden so far, which im trying to raise and replicate. This
>> is being discussed in detail in another message exchange on the lkml,
>> between me and Americo.
>
> How about changing MAX_LOCKDEP_CHAINS_BITS to 16?
>
> kernel/lockdep_internals.h:59:#define MAX_LOCKDEP_CHAINS_BITS ? 15
>
> And can you make a complete patch and send it to lkml with Peter and me
> Cc'ed?
>
> Thank you!
>

2009-07-07 14:44:54

by Cong Wang

[permalink] [raw]
Subject: Re: [Bug #13660] Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

On Tue, Jul 7, 2009 at 10:22 PM, Joao
Correia<[email protected]> wrote:
> Already testing the changes, just to see if something else breaks.
>
> Any special notes on the patch (a basic guideline info on patches
> would be great, just so i dont mess it up)? Never submited one before.

Yes, check Documentation/SubmittingPatches and Documentation/email-clients.txt.

I am not sure if Peter likes them, but it is a good idea to split them
and send one by one.

Good luck!

2009-07-07 15:57:46

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch v2] slub: add option to disable higher order debugging slabs

On Tue, 7 Jul 2009, David Rientjes wrote:

> + * Debugging flags that require metadata to be stored in the slab, up to
> + * DEBUG_SIZE in size.
> + */
> +#define DEBUG_SIZE_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> +#define DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))

There is no need for DEBUG_SIZE since slub keeps both the size of the
object kmem_cache->objsize and the size with the metadata kmem_cache->size

If the order of both is different then the order would increase.

2009-07-09 23:27:11

by David Rientjes

[permalink] [raw]
Subject: Re: [patch v2] slub: add option to disable higher order debugging slabs

On Tue, 7 Jul 2009, Christoph Lameter wrote:

> > + * Debugging flags that require metadata to be stored in the slab, up to
> > + * DEBUG_SIZE in size.
> > + */
> > +#define DEBUG_SIZE_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> > +#define DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
>
> There is no need for DEBUG_SIZE since slub keeps both the size of the
> object kmem_cache->objsize and the size with the metadata kmem_cache->size
>
> If the order of both is different then the order would increase.
>

Without DEBUG_SIZE_FLAGS, the only way to determine what flags have
increased the size is in calculate_sizes() and then disable them by
default if slub_debug=O is specified. calculate_sizes() is used by
the `store', `poison', and `red_zone' callbacks, so the admin still has
the ability to enable these options even though slub_debug=O was used.

So we can either mask off the size-increasing debug bits when the cache is
created in kmem_cache_flags() like I did, or we can move the logic to
calculate_sizes() with an added formal to determine whether this is from
kmem_cache_open() or one of the attribute callbacks.

I think my solution is the cleanest and provides a single entity,
DEBUG_SIZE_FLAGS, which specifies the flags that slub_debug=O clears if
the minimum order increases.

2009-07-10 06:54:50

by Pekka Enberg

[permalink] [raw]
Subject: Re: [patch v2] slub: add option to disable higher order debugging slabs

On Tue, 7 Jul 2009, Christoph Lameter wrote:
> > > + * Debugging flags that require metadata to be stored in the slab, up to
> > > + * DEBUG_SIZE in size.
> > > + */
> > > +#define DEBUG_SIZE_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> > > +#define DEBUG_SIZE (3 * sizeof(void *) + 2 * sizeof(struct track))
> >
> > There is no need for DEBUG_SIZE since slub keeps both the size of the
> > object kmem_cache->objsize and the size with the metadata kmem_cache->size
> >
> > If the order of both is different then the order would increase.

On Thu, 2009-07-09 at 16:26 -0700, David Rientjes wrote:
> Without DEBUG_SIZE_FLAGS, the only way to determine what flags have
> increased the size is in calculate_sizes() and then disable them by
> default if slub_debug=O is specified. calculate_sizes() is used by
> the `store', `poison', and `red_zone' callbacks, so the admin still has
> the ability to enable these options even though slub_debug=O was used.
>
> So we can either mask off the size-increasing debug bits when the cache is
> created in kmem_cache_flags() like I did, or we can move the logic to
> calculate_sizes() with an added formal to determine whether this is from
> kmem_cache_open() or one of the attribute callbacks.
>
> I think my solution is the cleanest and provides a single entity,
> DEBUG_SIZE_FLAGS, which specifies the flags that slub_debug=O clears if
> the minimum order increases.

Yup, agreed. I applied the patch, thanks everyone!

Pekka

2009-07-10 18:47:35

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch v2] slub: add option to disable higher order debugging slabs

On Fri, 10 Jul 2009, Pekka Enberg wrote:

> On Thu, 2009-07-09 at 16:26 -0700, David Rientjes wrote:
> > Without DEBUG_SIZE_FLAGS, the only way to determine what flags have
> > increased the size is in calculate_sizes() and then disable them by
> > default if slub_debug=O is specified. calculate_sizes() is used by
> > the `store', `poison', and `red_zone' callbacks, so the admin still has
> > the ability to enable these options even though slub_debug=O was used.
> >
> > So we can either mask off the size-increasing debug bits when the cache is
> > created in kmem_cache_flags() like I did, or we can move the logic to
> > calculate_sizes() with an added formal to determine whether this is from
> > kmem_cache_open() or one of the attribute callbacks.
> >
> > I think my solution is the cleanest and provides a single entity,
> > DEBUG_SIZE_FLAGS, which specifies the flags that slub_debug=O clears if
> > the minimum order increases.
>
> Yup, agreed. I applied the patch, thanks everyone!

There is a simpler solution. Call calculate sizes again if the resulting
sizes increased the order. Something like this.

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2009-07-10 13:45:02.000000000 -0500
+++ linux-2.6/mm/slub.c 2009-07-10 13:46:07.000000000 -0500
@@ -2454,6 +2454,10 @@ static int kmem_cache_open(struct kmem_c
if (!calculate_sizes(s, -1))
goto error;

+ if (get_order(s->size) != get_order(s->objsize) && flag is set) {
+ switch off debug flags.
+ calculate_sizes(s, -1);
+ }
/*
* The larger the object size is, the more pages we want on the
partial
* list to avoid pounding the page allocator excessively.