LinuxLists.cc - 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31

2009-10-01 23:04:29

Subject: 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31

[Notes:

* Quite a number of new regressions from 2.6.30 has been reported during
the last three weeks.

* The number of unresolved regressions 2.6.30 -> 2.6.31 is now the second
highest ever.]

This message contains a list of some regressions introduced between 2.6.30 and
2.6.31, for which there are no fixes in the mainline I know of. If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.30
and 2.6.31, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.

Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2009-10-02 151 49 42
2009-09-06 123 34 27
2009-08-26 108 33 26
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15

Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301
Subject : WARNING: at net/ipv4/af_inet.c:154
Submitter : Ralf Hildebrandt <[email protected]>
Date : 2009-09-30 12:24 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14294
Subject : kernel BUG at drivers/ide/ide-disk.c:187
Submitter : Santiago Garcia Mantinan <[email protected]>
Date : 2009-09-30 11:05 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125430926311466&w=4
Handled-By : David Miller <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270
Subject : Cannot boot on a PIII Celeron
Submitter : Michael Tokarev <[email protected]>
Date : 2009-09-28 15:26 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267
Subject : Disassociating atheros wlan
Submitter : Kristoffer Ericson <[email protected]>
Date : 2009-09-24 10:16 (8 days old)
References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14266
Subject : regression in page writeback
Submitter : Shaohua Li <[email protected]>
Date : 2009-09-22 5:49 (10 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d7831a0bdf06b9f722b947bb0c205ff7d77cebd8
References : http://marc.info/?l=linux-kernel&m=125359858117176&w=4
Handled-By : Wu Fengguang <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265
Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
Submitter : Karol Lewandowski <[email protected]>
Date : 2009-09-15 12:05 (17 days old)
References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14264
Subject : ehci problem - mouse dead on scroll
Submitter : Volker Armin Hemmann <[email protected]>
Date : 2009-09-12 7:46 (20 days old)
References : http://marc.info/?l=linux-kernel&m=125274202707893&w=4
Handled-By : Alan Stern <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14257
Subject : Not able to boot on 32 bit System
Submitter : Rishikesh <[email protected]>
Date : 2009-09-21 15:25 (11 days old)
References : http://marc.info/?l=linux-kernel&m=125354604314412&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14256
Subject : kernel BUG at fs/ext3/super.c:435
Submitter : Mikael Pettersson <[email protected]>
Date : 2009-09-21 7:29 (11 days old)
References : http://marc.info/?l=linux-kernel&m=125351816109264&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255
Subject : WARNING: at drivers/char/tty_io.c:1267
Submitter : Heinz Diehl <[email protected]>
Date : 2009-09-20 11:37 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4
http://lkml.org/lkml/2009/9/8/393
Handled-By : Linus Torvalds <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14254
Subject : Hibernation broken by clocksource: Save mult_orig in clocksource_disable()
Submitter : Ondrej Zary <[email protected]>
Date : 2009-09-19 19:55 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc
References : http://marc.info/?l=linux-kernel&m=125339012527719&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14252
Subject : WARNING: at include/linux/skbuff.h:1382 w/ e1000
Submitter : Stephan von Krawczynski <[email protected]>
Date : 2009-09-20 11:26 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125344599006033&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14251
Subject : 2.6.31: no login prompt
Submitter : Frédéric L. W. Meunier <[email protected]>
Date : 2009-09-19 22:43 (13 days old)
References : http://marc.info/?l=linux-kernel&m=125340020804711&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14249
Subject : BUG: oops in gss_validate on 2.6.31
Submitter : Bastian Blank <[email protected]>
Date : 2009-09-16 10:29 (16 days old)
References : http://marc.info/?l=linux-kernel&m=125309700417283&w=4
Handled-By : Trond Myklebust <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14248
Subject : 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34
Submitter : Jurriaan <[email protected]>
Date : 2009-09-13 7:32 (19 days old)
References : http://marc.info/?l=linux-kernel&m=125282721113553&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14222
Subject : Hibernation oopses for the 2nd time with 2.6.31 (won't fit the screen)
Submitter : Ondrej Zary <[email protected]>
Date : 2009-09-24 14:07 (8 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14205
Subject : Intel DX58SO mainboard - powering off takes really long
Submitter : Tomasz Chmielewski <[email protected]>
Date : 2009-09-22 10:14 (10 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14204
Subject : MCE prevent booting on my computer(pentium iii @500Mhz)
Submitter : GNUtoo <[email protected]>
Date : 2009-09-21 20:36 (11 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14185
Subject : Oops in driversbasefirmware_class
Submitter : <[email protected]>
Date : 2009-09-17 05:09 (15 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6e03a201bbe8137487f340d26aa662110e324b20

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14181
Subject : b43 causes panic at system shutdown
Submitter : Jeremy Huddleston <[email protected]>
Date : 2009-09-15 18:34 (17 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14157
Subject : end_request: I/O error, dev cciss/cXdX, sector 0
Submitter : <[email protected]>
Date : 2009-09-11 07:42 (21 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14143
Subject : OOPS when setting nr_requests for md devices
Submitter : aCaB <[email protected]>
Date : 2009-09-08 08:48 (24 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
Subject : order 2 page allocation failures in iwlagn
Submitter : Frans Pop <[email protected]>
Date : 2009-09-06 7:40 (26 days old)
References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
Handled-By : Pekka Enberg <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133
Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule
Submitter : Jens Axboe <[email protected]>
Date : 2009-08-31 20:43 (32 days old)
References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114
Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7
Submitter : Tsvety Petrov <[email protected]>
Date : 2009-09-03 21:06 (29 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14090
Subject : WARNING: at fs/notify/inotify/inotify_user.c:394
Submitter : Joerg Platte <[email protected]>
Date : 2009-08-30 15:21 (33 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070
Subject : lockdep warning triggered by dup_fd
Submitter : Bart Van Assche <[email protected]>
Date : 2009-08-23 09:36 (40 days old)
References : http://lkml.org/lkml/2009/8/23/8

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <[email protected]>
Date : 2009-08-20 15:48 (43 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <[email protected]>
Date : 2009-08-14 8:26 (49 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <[email protected]>
Date : 2009-08-15 07:55 (48 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-08 17:47 (55 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Handled-By : Alan Stern <[email protected]>

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <[email protected]>
Date : 2009-08-06 20:15 (57 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <[email protected]>
Date : 2009-08-04 10:12 (59 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <[email protected]>
Date : 2009-08-03 12:58 (60 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <[email protected]>
Date : 2009-08-07 22:33 (56 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <[email protected]>
Date : 2009-08-08 22:08 (55 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <[email protected]>
Date : 2009-08-04 09:02 (59 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <[email protected]>
Date : 2009-07-29 16:44 (65 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <[email protected]>
Date : 2009-07-17 21:24 (77 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <[email protected]>
Date : 2009-07-22 13:35 (72 days old)

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <[email protected]>
Date : 2009-07-06 18:32 (88 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <[email protected]>
Date : 2009-06-17 17:56 (107 days old)
References : http://lkml.org/lkml/2009/6/17/194

Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14275
Subject : kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more?
Submitter : gabriele balducci <[email protected]>
Date : 2009-09-30 15:02 (2 days old)
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14275#c0

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261
Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting'
Submitter : Nix <[email protected]>
Date : 2009-09-26 11:16 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4
Handled-By : Alexander Duyck <[email protected]>
Patch : http://patchwork.kernel.org/patch/50277/

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258
Subject : Memory leak in SCSI initialization
Submitter : Tetsuo Handa <[email protected]>
Date : 2009-09-22 4:18 (10 days old)
References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4
Handled-By : Michael Ellerman <[email protected]>
Patch : http://patchwork.kernel.org/patch/49258/

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14253
Subject : Oops in driversbasefirmware_class
Submitter : Lars Ericsson <[email protected]>
Date : 2009-09-16 20:44 (16 days old)
References : http://lkml.org/lkml/2009/9/16/461
Handled-By : Frederik Deweerdt <[email protected]>
Patch : http://patchwork.kernel.org/patch/49914/

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137
Subject : usb console regressions
Submitter : Jason Wessel <[email protected]>
Date : 2009-09-05 21:08 (27 days old)
References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4
Handled-By : Jason Wessel <[email protected]>
Patch : http://patchwork.kernel.org/patch/45953/
http://patchwork.kernel.org/patch/45952/

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <[email protected]>
Date : 2009-08-13 6:45 (50 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <[email protected]>
Date : 2009-08-07 21:51 (56 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <[email protected]>
Patch : http://patchwork.kernel.org/patch/38550/

For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.30 and 2.6.31, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=13615

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

2009-10-01 23:08:47

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14013] hd don't show up

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.30 and 2.6.31.

The following bug entry is on the current list of known regressions
introduced between 2.6.30 and 2.6.31. Please verify if it still should
be listed and let me know (either way).

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <[email protected]>
Date : 2009-08-14 8:26 (49 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <[email protected]>

2009-10-01 23:19:26

by Rafael J. Wysocki

[permalink] [raw]

2009-10-01 23:09:09

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14090] WARNING: at fs/notify/inotify/inotify_user.c:394

2009-10-01 23:08:51

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14070] lockdep warning triggered by dup_fd

2009-10-01 23:08:59

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14137] usb console regressions

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.30 and 2.6.31.

The following bug entry is on the current list of known regressions
introduced between 2.6.30 and 2.6.31. Please verify if it still should
be listed and let me know (either way).

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137
Subject : usb console regressions
Submitter : Jason Wessel <[email protected]>
Date : 2009-09-05 21:08 (27 days old)
References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4
Handled-By : Jason Wessel <[email protected]>
Patch : http://patchwork.kernel.org/patch/45953/
http://patchwork.kernel.org/patch/45952/

2009-10-01 23:08:58

by Rafael J. Wysocki

[permalink] [raw]

2009-10-01 23:09:34

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14181] b43 causes panic at system shutdown

2009-10-01 23:18:33

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14157] end_request: I/O error, dev cciss/cXdX, sector 0

2009-10-01 23:09:38

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14205] Intel DX58SO mainboard - powering off takes really long

2009-10-01 23:17:58

by Rafael J. Wysocki

[permalink] [raw]

2009-10-01 23:15:46

by Rafael J. Wysocki

[permalink] [raw]

Subject: [Bug #14254] Hibernation broken by clocksource: Save mult_orig in clocksource_disable()

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.30 and 2.6.31.

The following bug entry is on the current list of known regressions
introduced between 2.6.30 and 2.6.31. Please verify if it still should
be listed and let me know (either way).

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14254
Subject : Hibernation broken by clocksource: Save mult_orig in clocksource_disable()
Submitter : Ondrej Zary <[email protected]>
Date : 2009-09-19 19:55 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc
References : http://marc.info/?l=linux-kernel&m=125339012527719&w=4

2009-10-01 23:10:08

by Rafael J. Wysocki

[permalink] [raw]

2009-10-01 23:14:11

by Rafael J. Wysocki

[permalink] [raw]

On Thursday 15 October 2009, reinette chatre wrote:
> > The log file timestamps don't tell much as the logging gets delayed,
> > so they all end up at the same time. Maybe I should enable the kernel
> > timestamps so we can see how far apart these failures are.
>
> If you can get accurate timing it will be very useful. I am interested
> to see how quickly it goes from "48 free buffers" to "0 free buffers".

Attached the dmesg for three consecutive test runs (i.e. without
rebooting). Not that the 2nd one includes only "0 free buffers" messages,
even though the behavior (point where desktop freezes and music stops)
looked similar.

Not sure if you can tell all that much from the data.

N.B. You may want to clean this up in iwlwifi code:
iwl-dev.h:#include "iwl-fh.h"
iwl-dev.h:#define RX_LOW_WATERMARK 8
iwl-fh.h:#define RX_LOW_WATERMARK 8

I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes
iwl-fh.h where it's also defined. The same may be true for other defines.

I think this gave me an incorrect result the first time I increased the
limit as I only changed one of the two files (iwl-dev.h IIRC).

Cheers,
FJP

Attachments:

(No filename) (1.12 kB)
dmesg.tgz (43.93 kB)
Download all attachments

2009-10-15 20:15:52

by Frans Pop

[permalink] [raw]

Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Thursday 15 October 2009, Mel Gorman wrote:
> After a lot more eyeballing, the best next candidate within mm is the
> following patch. Should be tested on it's own and in combination with
> the wakeup-kswapd patch sent before.
>
> From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001
> From: Mel Gorman <[email protected]>
> Date: Thu, 15 Oct 2009 00:17:05 +0100
> Subject: [PATCH] page allocator: Direct reclaim should always obey
> watermarks
>
> ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the
> free-lists after a direct reclaim.

I've tested the two patches together and this seems like a definite
improvement. I still get SKB errors on the first test, but the desktop
freezes are a lot shorter and the total time needed to load the 3rd gitk
goes down from ~2:15 to ~1:15. The counter in gitk of the number of
loaded commits goes up visibly faster and with fewer halts.

This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn
at it's current value (8).

Here are the allocation failures for 2 consecutive tests. Note that the
first test still shows quite a lot of failures, but the second test hardly
had any at all (I still had music skips though).

[ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 240.121577] __ratelimit: 26 callbacks suppressed
[ 240.121634] __ratelimit: 6 callbacks suppressed
[ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining.
[ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
[ 309.446481] __ratelimit: 5 callbacks suppressed
[ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.

[ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining.
[ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining.
[ 536.058171] __ratelimit: 1 callbacks suppressed

Thanks,
FJP

2009-10-16 09:40:16

by Mel Gorman

[permalink] [raw]

Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Thu, Oct 15, 2009 at 10:15:09PM +0200, Frans Pop wrote:
> On Thursday 15 October 2009, Mel Gorman wrote:
> > After a lot more eyeballing, the best next candidate within mm is the
> > following patch. Should be tested on it's own and in combination with
> > the wakeup-kswapd patch sent before.
> >
> > From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001
> > From: Mel Gorman <[email protected]>
> > Date: Thu, 15 Oct 2009 00:17:05 +0100
> > Subject: [PATCH] page allocator: Direct reclaim should always obey
> > watermarks
> >
> > ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the
> > free-lists after a direct reclaim.
>
> I've tested the two patches together and this seems like a definite
> improvement.

You probably don't need the mental image, but this made me do a happy
dance. Certainly helps my cold!

> I still get SKB errors on the first test, but the desktop
> freezes are a lot shorter and the total time needed to load the 3rd gitk
> goes down from ~2:15 to ~1:15. The counter in gitk of the number of
> loaded commits goes up visibly faster and with fewer halts.
>

This brings us close to the state of affairs before the akpm merge.
There might still be something missing in either the mm area or the wireless
driver but any improvement is better than none.

> This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn
> at it's current value (8).
>
> Here are the allocation failures for 2 consecutive tests. Note that the
> first test still shows quite a lot of failures, but the second test hardly
> had any at all (I still had music skips though).
>

So, we are still dealing with three problems.

1. GFP_ATOMICS were introduced to the wireless driver in the
2.6.30..2.6.31 timeframe. It has been more or less identified as being
the tasklet off-loading and the pools being depleted too easily. This
still needs to be fixed.

2. There is also some firmware reloading problem of an unknown source

3. There was an mm regression that made GFP_ATOMIC failures much worse.
This is at least partially due to tasks exiting being able to go below the
watermarks and kswapd not being woken up when it should be. This could
be the source of the allocation failures on resume that have nothing to
do with the iwlagn wireless driver.

I am going to put together the pair of patches against mainline with a
recommendation they be also applied for 2.6.31.5. I'll keep looking to
see can I spot another possible candidate for GFP_ATOMIC being worse
than it was.

> [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 240.121577] __ratelimit: 26 callbacks suppressed
> [ 240.121634] __ratelimit: 6 callbacks suppressed
> [ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining.
> [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
> [ 309.446481] __ratelimit: 5 callbacks suppressed
> [ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining.
>
> [ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining.
> [ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining.
> [ 536.058171] __ratelimit: 1 callbacks suppressed
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-10-16 17:22:29

by Reinette Chatre

[permalink] [raw]

Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote:
> On Thursday 15 October 2009, reinette chatre wrote:
> > > The log file timestamps don't tell much as the logging gets delayed,
> > > so they all end up at the same time. Maybe I should enable the kernel
> > > timestamps so we can see how far apart these failures are.
> >
> > If you can get accurate timing it will be very useful. I am interested
> > to see how quickly it goes from "48 free buffers" to "0 free buffers".
>
> Attached the dmesg for three consecutive test runs (i.e. without
> rebooting). Not that the 2nd one includes only "0 free buffers" messages,
> even though the behavior (point where desktop freezes and music stops)
> looked similar.

Thank you very much. I am studying it.

> Not sure if you can tell all that much from the data.
>
> N.B. You may want to clean this up in iwlwifi code:
> iwl-dev.h:#include "iwl-fh.h"
> iwl-dev.h:#define RX_LOW_WATERMARK 8
> iwl-fh.h:#define RX_LOW_WATERMARK 8
>
> I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes
> iwl-fh.h where it's also defined. The same may be true for other defines.

Sorry about that. The patch below will fix that. I will send it
separately to wireless list.

>From 7cc8e6482b359eef5ce099457037a237d355b5b1 Mon Sep 17 00:00:00 2001
From: Reinette Chatre <[email protected]>
Date: Fri, 16 Oct 2009 10:11:10 -0700
Subject: [PATCH] iwlwifi: remove duplicate defines

RX_FREE_BUFFERS and RX_LOW_WATERMARK are currently defined in four places.
Based on how files are included we only need the definition in iwl-fh.h

Signed-off-by: Reinette Chatre <[email protected]>
Reported-by: Frans Pop <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-3945-hw.h | 6 ------
drivers/net/wireless/iwlwifi/iwl-3945.h | 6 ------
drivers/net/wireless/iwlwifi/iwl-dev.h | 6 ------
3 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h
index ccdac69..6fd10d4 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h
+++ b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h
@@ -248,12 +248,6 @@ struct iwl3945_eeprom {
#define TFD_CTL_PAD_SET(n) (n << 28)
#define TFD_CTL_PAD_GET(ctl) (ctl >> 28)

-/*
- * RX related structures and functions
- */
-#define RX_FREE_BUFFERS 64
-#define RX_LOW_WATERMARK 8
-
/* Sizes and addresses for instruction and data memory (SRAM) in
* 3945's embedded processor. Driver access is via HBUS_TARG_MEM_* regs. */
#define IWL39_RTC_INST_LOWER_BOUND (0x000000)
diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.h b/drivers/net/wireless/iwlwifi/iwl-3945.h
index f3907c1..84fa0d7 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945.h
+++ b/drivers/net/wireless/iwlwifi/iwl-3945.h
@@ -130,12 +130,6 @@ struct iwl3945_frame {
#define SN_TO_SEQ(ssn) (((ssn) << 4) & IEEE80211_SCTL_SEQ)
#define MAX_SN ((IEEE80211_SCTL_SEQ) >> 4)

-/*
- * RX related structures and functions
- */
-#define RX_FREE_BUFFERS 64
-#define RX_LOW_WATERMARK 8
-
#define SUP_RATE_11A_MAX_NUM_CHANNELS 8
#define SUP_RATE_11B_MAX_NUM_CHANNELS 4
#define SUP_RATE_11G_MAX_NUM_CHANNELS 12
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index 1378654..0fa0cf5 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -406,12 +406,6 @@ struct iwl_host_cmd {
u8 id;
};

-/*
- * RX related structures and functions
- */
-#define RX_FREE_BUFFERS 64
-#define RX_LOW_WATERMARK 8
-
#define SUP_RATE_11A_MAX_NUM_CHANNELS 8
#define SUP_RATE_11B_MAX_NUM_CHANNELS 4
#define SUP_RATE_11G_MAX_NUM_CHANNELS 12
--
1.5.6.3

2009-10-17 05:42:25

On Fri, Nov 06, 2009 at 10:51:37AM +0100, Frans Pop wrote:
> On Thursday 05 November 2009, Frans Pop wrote:
> > On Monday 26 October 2009, Frans Pop wrote:
> > > On Tuesday 20 October 2009, Mel Gorman wrote:
> > > > I've attached a patch below that should allow us to cheat. When it's
> > > > applied, it outputs who called congestion_wait(), how long the
> > > > timeout was and how long it waited for. By comparing before and
> > > > after sleep times, we should be able to see which of the callers has
> > > > significantly changed and if it's something easily addressable.
> > >
> > > The results from this look fairly interesting (although I may be a bad
> > > judge as I don't really know what I'm looking at ;-).
> > >
> > > I've tested with two kernels:
> > > 1) 2.6.31.1: 1 test run
> > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs
> >
> > I've taken another look at the data from this debug patch, resulting in
> > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf
> >
> > I think the graph may show the reason for the congestion_wait()
> > regression. Horizontal axis shows time, vertical axis shows number of
> > logged congestion_wait calls per type.
>
> I'm sorry. My initial version had a skewed time axis (showed occurrences
> instead of actual time). I've now uploaded a corrected version:
> http://people.debian.org/~fjp/tmp/kernel/congestion.pdf
>
> I've also uploaded a second version that shows cumulative delay per type,
> which probably gives a better insight:
> http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf
>
> For both the top chart is without the revert, the bottom one after the
> revert.
>

I'm looking into this at the moment. There are some definite
differences not only in the length congestion_wait() is waiting but in
what the callers are doing. I've more or less reproduced your results
locally and am slowly plodding through each caller to see what has
changed of significance. No patches yet though.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab