2008-08-23 19:38:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.27-rc4-git1: Reported regressions from 2.6.26

This message contains a list of some regressions from 2.6.26, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.26, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2008-08-23 122 48 40
2008-08-16 103 47 37
2008-08-10 80 52 31
2008-08-02 47 31 20


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11414
Subject : Random crashes with 2.6.27-rc3 on PPC
Submitter : Michael Buesch <[email protected]>
Date : 2008-08-23 14:10 (1 days old)
References : http://marc.info/?l=linux-kernel&m=121950076812616&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11410
Subject : SLUB list_lock vs obj_hash.lock...
Submitter : Daniel J Blueman <[email protected]>
Date : 2008-08-22 21:48 (2 days old)
References : http://marc.info/?l=linux-kernel&m=121944176609042&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11407
Subject : suspend: unable to handle kernel paging request
Submitter : Vegard Nossum <[email protected]>
Date : 2008-08-21 17:28 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121933974928881&w=4
Handled-By : Rafael J. Wysocki <[email protected]>
Pekka Enberg <[email protected]>
Pavel Machek <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11406
Subject : patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug
Submitter : Jan Beulich <[email protected]>
Date : 2008-08-21 12:59 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121932366326572&w=4
Handled-By : Ingo Molnar <[email protected]>
Robert Richter <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11405
Subject : 2.6.27-rc3 segfault on cold boot; not on warm boot.
Submitter : David Greaves <[email protected]>
Date : 2008-08-21 9:45 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121931198904777&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11404
Subject : BUG: in 2.6.23-rc3-git7 in do_cciss_intr
Submitter : rdunlap <[email protected]>
Date : 2008-08-21 5:52 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121929819616273&w=4
http://marc.info/?l=linux-kernel&m=121932889105368&w=4
Handled-By : Miller, Mike (OS Dev) <[email protected]>
James Bottomley <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11403
Subject : 2.6.27-rc2 USB suspend regression
Submitter : Jeremy Fitzhardinge <[email protected]>
Date : 2008-08-20 20:48 (4 days old)
References : http://marc.info/?l=linux-kernel&m=121926536103630&w=4
Handled-By : Alan Stern <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11402
Subject : skbuff bug?
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-21 3:56 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121929102707658&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11401
Subject : pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
Submitter : Laurent Riffard <[email protected]>
Date : 2008-08-22 08:16 (2 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11398
Subject : hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Submitter : Frans Pop <[email protected]>
Date : 2008-08-21 17:17 (3 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11388
Subject : 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable
Submitter : Joshua Hoblitt <[email protected]>
Date : 2008-08-20 17:38 (4 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11382
Subject : e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
Submitter : David Vrabel <[email protected]>
Date : 2008-08-08 10:47 (16 days old)
References : http://marc.info/?l=linux-kernel&m=121819267211679&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11380
Subject : lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
Submitter : Ingo Molnar <[email protected]>
Date : 2008-08-20 6:44 (4 days old)
References : http://marc.info/?l=linux-kernel&m=121921480931970&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11379
Subject : char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
Submitter : Frans Pop <[email protected]>
Date : 2008-08-18 13:40 (6 days old)
References : http://marc.info/?l=linux-kernel&m=121906698213329&w=4
Handled-By : Bjorn Helgaas <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11357
Subject : Can not boot up with zd1211rw USB-Wlan Stick
Submitter : uwe <[email protected]>
Date : 2008-08-16 14:17 (8 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
Subject : Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
Submitter : Frans Pop <[email protected]>
Date : 2008-08-16 19:11 (8 days old)
References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11355
Subject : Regression in 2.6.27-rc2 when cross-building the kernel
Submitter : Larry Finger <[email protected]>
Date : 2008-08-16 2:38 (8 days old)
References : http://marc.info/?l=linux-kernel&m=121885432118368&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11354
Subject : AMD Elan regression with 2.6.27-rc3
Submitter : Sean Young <[email protected]>
Date : 2008-08-15 18:37 (9 days old)
References : http://marc.info/?l=linux-kernel&m=121882578430056&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11343
Subject : SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
Submitter : Manny Maxwell <[email protected]>
Date : 2008-08-14 4:16 (10 days old)
References : http://marc.info/?l=linux-kernel&m=121868782917600&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11342
Subject : Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
Submitter : Alan D. Brunelle <[email protected]>
Date : 2008-08-13 23:03 (11 days old)
References : http://marc.info/?l=linux-kernel&m=121866876027629&w=4
Handled-By : Andrew Morton <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11340
Subject : LTP overnight run resulted in unusable box
Submitter : Alexey Dobriyan <[email protected]>
Date : 2008-08-13 9:24 (11 days old)
References : http://marc.info/?l=linux-kernel&m=121861951902949&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11336
Subject : 2.6.27-rc2:stall while mounting root fs
Submitter : Torsten Kaiser <[email protected]>
Date : 2008-08-12 12:37 (12 days old)
References : http://marc.info/?l=linux-kernel&m=121854484015909&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11335
Subject : 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
Submitter : Randy Dunlap <[email protected]>
Date : 2008-08-12 4:18 (12 days old)
References : http://marc.info/?l=linux-kernel&m=121851477201960&w=4
http://lkml.org/lkml/2008/8/16/274
Handled-By : Hugh Dickins <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11334
Subject : myri10ge: use ioremap_wc: compilation failure on ARM
Submitter : Martin Michlmayr <[email protected]>
Date : 2008-08-10 11:25 (14 days old)
References : http://marc.info/?l=linux-netdev&m=121836771727632&w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject : tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter : Christoph Lameter <[email protected]>
Date : 2008-08-11 18:36 (13 days old)
References : http://marc.info/?l=linux-kernel&m=121847986119495&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11282
Subject : Please fix x86 defconfig regression
Submitter : Andi Kleen <[email protected]>
Date : 2008-08-07 20:46 (17 days old)
References : http://marc.info/?l=linux-kernel&m=121814188805666&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11279
Subject : 2.6.27-rc0 Power Bugs with HP/Compaq Laptops
Submitter : Matt Parnell <[email protected]>
Date : 2008-08-07 14:57 (17 days old)
References : http://marc.info/?l=linux-kernel&m=121812108031685&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11272
Subject : BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
Submitter : Jaswinder Singh <[email protected]>
Date : 2008-08-05 15:12 (19 days old)
References : http://marc.info/?l=linux-kernel&m=121794900319776&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11271
Subject : BUG: fealnx in 2.6.27-rc1
Submitter : Jaswinder Singh <[email protected]>
Date : 2008-08-05 14:58 (19 days old)
References : http://marc.info/?l=linux-netdev&m=121794762016830&w=4
http://lkml.org/lkml/2008/8/10/98
Handled-By : Francois Romieu <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11264
Subject : Invalid op opcode in kernel/workqueue
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-08-07 04:18 (17 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11237
Subject : corrupt PMD after resume
Submitter : Alan Jenkins <[email protected]>
Date : 2008-08-02 9:51 (22 days old)
References : http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By : Hugh Dickins <[email protected]>
Jeremy Fitzhardinge <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11230
Subject : Kconfig no longer outputs a .config with freshly updated defconfigs
Submitter : Josh Boyer <[email protected]>
Date : 2008-08-02 16:03 (22 days old)
References : http://marc.info/?l=linux-kernel&m=121769306319391&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11224
Subject : Only three cores found on quad-core machine.
Submitter : Dave Jones <[email protected]>
Date : 2008-08-01 18:15 (23 days old)
References : http://marc.info/?l=linux-kernel&m=121761475224719&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11220
Subject : Screen stays black after resume
Submitter : Nico Schottelius <[email protected]>
Date : 2008-07-31 21:05 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121753882422899&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11219
Subject : KVM modules break emergency reboot
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-08-01 20:25 (23 days old)
References : http://marc.info/?l=linux-kernel&m=121762241105336&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11215
Subject : INFO: possible recursive locking detected ps2_command
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-07-31 9:41 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121749737011637&w=4
Handled-By : Peter Zijlstra <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11210
Subject : libata badness
Submitter : Kumar Gala <[email protected]>
Date : 2008-07-31 18:53 (24 days old)
References : http://marc.info/?l=linux-ide&m=121753059307310&w=4
Handled-By : Ben Dooks <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11209
Subject : 2.6.27-rc1 process time accounting
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-07-31 10:43 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121750102917490&w=4
Handled-By : Peter Zijlstra <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11191
Subject : 2.6.26-git8: spinlock lockup in c1e_idle()
Submitter : Mikhail Kshevetskiy <[email protected]>
Date : 2008-07-24 03:22 (31 days old)
References : http://lkml.org/lkml/2008/7/23/317
Handled-By : Thomas Gleixner <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11141
Subject : no battery or DC status - Dell i1501
Submitter : Gu Rui <[email protected]>
Date : 2008-07-21 19:43 (34 days old)
Handled-By : Zhao Yakui <[email protected]>


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11413
Subject : get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
Submitter : Mikael Pettersson <[email protected]>
Date : 2008-08-23 9:48 (1 days old)
References : http://marc.info/?l=linux-kernel&m=121948503224161&w=4
Handled-By : Ingo Molnar <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121950734922457&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11409
Subject : build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'
Submitter : Toralf Förster <[email protected]>
Date : 2008-08-22 8:33 (2 days old)
References : http://marc.info/?l=linux-kernel&m=121939410214677&w=4
Handled-By : Alan Cox <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121943097320451&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11361
Subject : my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-17 6:25 (7 days old)
References : http://marc.info/?l=linux-kernel&m=121895439927053&w=4
Handled-By : Rafael J. Wysocki <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121917167232014&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11360
Subject : mpc8xxx_wdt.c doesn't build modular
Submitter : Dave Jones <[email protected]>
Date : 2008-08-17 08:07 (7 days old)
References : http://lkml.org/lkml/2008/8/12/465
Handled-By : Anton Vorontsov <[email protected]>
Patch : http://lkml.org/lkml/2008/8/13/344


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11358
Subject : net: forcedeth call restore mac addr in nv_shutdown path
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-17 3:30 (7 days old)
References : http://marc.info/?l=linux-kernel&m=121894389018584&w=4
Handled-By : Yinghai Lu <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121894389018584&w=4


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11276
Subject : build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things
Submitter : Randy Dunlap <[email protected]>
Date : 2008-08-06 17:18 (18 days old)
References : http://marc.info/?l=linux-kernel&m=121804329014332&w=4
http://lkml.org/lkml/2008/7/22/353
Handled-By : Bjorn Helgaas <[email protected]>
Patch : http://lkml.org/lkml/2008/7/22/364


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11254
Subject : KVM: fix userspace ABI breakage
Submitter : Adrian Bunk <[email protected]>
Date : 21 Jul 2008 17:58:26 (0 days old)
References : http://lkml.org/lkml/2008/7/21/197
Handled-By : Adrian Bunk <[email protected]>
Patch : http://lkml.org/lkml/2008/7/21/197


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11207
Subject : VolanoMark regression with 2.6.27-rc1
Submitter : Zhang, Yanmin <[email protected]>
Date : 2008-07-31 3:20 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121747464114335&w=4
Handled-By : Zhang, Yanmin <[email protected]>
Peter Zijlstra <[email protected]>
Dhaval Giani <[email protected]>
Miao Xie <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121922991027344&w=4


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.26,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11167

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


2008-08-25 00:17:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

Linus Torvalds wrote:
>>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 46af716..9bed5ca 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -325,6 +325,10 @@ static struct notifier_block time_cpufreq_notifier_block = {
>
> static int __init cpufreq_tsc(void)
> {
> + if (!cpu_has_tsc)
> + return 0;
> + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> + return 0;
> cpufreq_register_notifier(&time_cpufreq_notifier_block,
> CPUFREQ_TRANSITION_NOTIFIER);
> return 0;

I added this patch to x86/urgent.

-hpa

2008-08-25 00:49:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26


> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11414
> Subject : Random crashes with 2.6.27-rc3 on PPC
> Submitter : Michael Buesch <[email protected]>
> Date : 2008-08-23 14:10 (1 days old)
> References : http://marc.info/?l=linux-kernel&m=121950076812616&w=4

This appears to be a gcc bug when -fno-omit-stack-pointer is used (which
we mostly don't need on ppc anyway except that another gcc stupidity makes
it mandatory for -pg which ftrace needs).

We're working on a two fold workaround: removing -fno-omit-stack-pointer
in all the cases where we don't really need it, and for when we do (ie,
CONFIG_FTRACE becaues of -pg), using -mno-sched-epilog which seems to
work around it.

The root cause in gcc hasn't been fully identified yet.

Ben.

2008-08-25 00:52:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sun, 24 Aug 2008, David Greaves wrote:
>
> Given that I'll manage at best 1 bisect/day with a reasonable chance of data
> corruption and hardware intermittency screwing it all up I thought it best to
> ask first in case there was another debug approach that could work.

Well, regardless, I think it would be good to fill in the hardware info,
especially wrt CPU data and the exact SATA controller you have.

There's another regression for SATA cold/hot boot issues, and while that
one looks very different, and is probably not really related, it's still a
good idea to try to see if we can match them up. See

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11343
Subject : SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
Submitter : Manny Maxwell <[email protected]>
Date : 2008-08-14 4:16 (10 days old)
References : http://marc.info/?l=linux-kernel&m=121868782917600&w=4

which actually has a patch, and which seems to work fine in 2.6.26 (so not
only is failure pattern different, the point were it starts is different).

But regardless of the big differences, it does seem to point to some
weakness in SATA initialization. But is it limited to _that_ particular
SATA controller, or just a few ones? Or a generic issue? Without more
reports to really find a pattern, I don't think we have a clue, and the
two may be _totally_ unrelated in all ways, but it would be good to at
least report and log the information you have..

Oh, I just noticed that your dmesg _does_ mention sata_sil and sata_via,
so we know which of two drivers it would be, at least. Not the nVidia one.

However, there's been tons of changes in soem core functions: both the
reset handling and the wait-for-ready has changed and caused lots of churn
across most drivers in between 2.6.25 and 2.6.26.

> PS if anyone really is interested then I am happy to try the bisection once I've
> moved her to a new box; otherwise I'm happy to close this.

I think it would be good to try to bisect. It could be something that is
really just limited to that particular machine (maybe it really is some
flaky hardware that just triggers some timing changes), but more likely it
isn't. So the more information, the better. So keep the thing open as long
as somebody is willing to try to gather more info, by all means.

Linus

2008-08-25 11:36:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Monday, 25 of August 2008, Benjamin Herrenschmidt wrote:
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11414
> > Subject : Random crashes with 2.6.27-rc3 on PPC
> > Submitter : Michael Buesch <[email protected]>
> > Date : 2008-08-23 14:10 (1 days old)
> > References : http://marc.info/?l=linux-kernel&m=121950076812616&w=4
>
> This appears to be a gcc bug when -fno-omit-stack-pointer is used (which
> we mostly don't need on ppc anyway except that another gcc stupidity makes
> it mandatory for -pg which ftrace needs).
>
> We're working on a two fold workaround: removing -fno-omit-stack-pointer
> in all the cases where we don't really need it, and for when we do (ie,
> CONFIG_FTRACE becaues of -pg), using -mno-sched-epilog which seems to
> work around it.
>
> The root cause in gcc hasn't been fully identified yet.

Thanks Ben.

I've already dropped it from the list of recent regressions.

Rafael

2008-08-24 21:09:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop

On Sunday, 24 of August 2008, Frans Pop wrote:
> On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11379
> > Subject : char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
> > Submitter : Frans Pop <[email protected]>
> > Date : 2008-08-18 13:40 (6 days old)
> > References : http://marc.info/?l=linux-kernel&m=121906698213329&w=4
> > Handled-By : Bjorn Helgaas <[email protected]>
>
> Fixed with:
> commit 5e4c6564c95ce127beeefe75e15cd11c93487436
> Author: Kay Sievers <[email protected]>
> Date: Thu Aug 21 15:28:56 2008 +0200
>
> pnp: fix "add acpi:* modalias entries"

Thanks, closed.

Rafael

2008-08-24 18:37:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11401
> Subject : pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
> Submitter : Laurent Riffard <[email protected]>
> Date : 2008-08-22 08:16 (2 days old)

This one looks irritating.

It's bisected to 5b6155ee70e9c4d2ad7e6f514c8eee06e2711c3a ("pktcdvd: push
BKL down into driver"), but the problem goes deeper than that.

The "unlocked" ioctl's do not get a "struct inode *" pointer, they _only_
get the "struct file *". And this is very much historical usage, where
some internal functions only passed in the inode (good or not, whatever).

And ioctl_by_bdev() doesn't have a "struct file *" and has depended on
passing in a NUMM "struct file *" and its own "struct inode *", and
expects the ioctl's to just use that instead. But the unlocked ioctl just
drops it on the floor, and uses just the (unusable) file pointer.

Grr.

And some other cases (like pkt_ioctl() itself) that simply pass in a
_different_ inode than the file itself is attached to. It does

blkdev_ioctl(pd->bdev->bd_inode, file, cmd, arg);

where "file" points to the pkt_ioctl thing, but "inode" points to the
inode "behind" the pkt interface.

Double grr.

I really think the only sane model is to literally make "unlocked_ioctl()"
have the same calling convention as the old "ioctl()" thing had, and pass
in both file * and inode *. It was a stupid "cleanup" to try to have a
simpler interface for the unlocked version. Having two different models,
where we have actually _depended_ on the old model and then are trying to
convert to a (weaker) new model, is not a good idea.

The alternative is to do this _only_ for the blkdev_ioctl's, and have
those only take the "inode *", and then create a new fake "struct file *"
to go with it, regardless of what "struct file" was passed in (exactly
because the blockdev ones really think that the inode is the important
part).

Hmm?

We need to fix this.

Linus

2008-08-24 22:51:17

by Sean Young

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Sun, Aug 24, 2008 at 11:52:06AM -0700, Linus Torvalds wrote:
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11354
> > Subject : AMD Elan regression with 2.6.27-rc3
> > Submitter : Sean Young <[email protected]>
> > Date : 2008-08-15 18:37 (9 days old)
> > References : http://marc.info/?l=linux-kernel&m=121882578430056&w=4
>
> Peter? Ingo? Alok?
>
> This _looks_ like it might be due to "x86: merge the TSC cpu-freq code"
> thing by Alok, where we do this:
>
> +static struct notifier_block time_cpufreq_notifier_block = {
> + .notifier_call = time_cpufreq_notifier
> +};
> +
> +static int __init cpufreq_tsc(void)
> +{
> + cpufreq_register_notifier(&time_cpufreq_notifier_block,
> + CPUFREQ_TRANSITION_NOTIFIER);
> + return 0;
> +}
>
> but that's just _insane_ if the CPU doesn't even support TSC to begin
> with. Also, in the actual time_cpufreq_notifier(), we do:
>
> if (cpu_has(&cpu_data(freq->cpu), X86_FEATURE_CONSTANT_TSC))
> return 0;
>
> and this is stupid because:
>
> (a) if the CPU has no TSC at all, then it sure as hell won't have a
> _constant_ one, so we'll actually continue into the function.
>
> (b) and why the hell is this done at run-time in the notifier, and not in
> the "cpufreq_tsc" init function? If anybody mixes totally different
> kinds of CPU's in SMP, they deserve whatever they want.
>
> so why is the patch not something like the appended?
>
> Sean, does this make any difference for you?

Yes, this patch fixes it.

Thanks
Sean

2008-08-24 18:43:56

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Sun, Aug 24, 2008 at 8:03 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11410
>> Subject : SLUB list_lock vs obj_hash.lock...
>> Submitter : Daniel J Blueman <[email protected]>
>> Date : 2008-08-22 21:48 (2 days old)
>> References : http://marc.info/?l=linux-kernel&m=121944176609042&w=4
>
> This one now has a suggested patch for Daniel to try from Vegard, but no
> reply yet:
>
> http://marc.info/?l=linux-kernel&m=121946972307110&w=4
>

Hi!

> Vegard, I think your patch is a bit odd, though. The result of your patch
> is
>
> - first loop:
>
> hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) {
> hlist_del(&obj->node);
> hlist_add_head(&obj->node, &freelist);
> }
>
> and quite frankly, I don't see what the difference between that and a
> something like a simple
>
> struct hlist_node *first = bd->list.first;
> if (first) {
> bd->list.first = NULL;
> first->pprev = &first;
> }
>
> really is?
>
> I dunno. We don't have list splicing ops for the hlist things.

Hm.

I haven't really used the hlists before, so my first instinct was to
do what is obvious. That's also why I put the XXX comment. Other than
that, I guess open-coding list ops is also not very good programming
practice? :-)

But... feel free to submit your own patch. Oh, what am I saying.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-08-24 19:03:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
> Subject : Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
> Submitter : Frans Pop <[email protected]>
> Date : 2008-08-16 19:11 (8 days old)
> References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4

Hmm. Wasn't this already confirmed to be fixed by commit
df60a8441866153d691ae69b77934904c2de5e0d?

At least Adrian sent out an email saying "Confirmed, bug closed.", but
bugzilla seems to disagree and still show it as open.

Linus

2008-08-24 18:53:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11354
> Subject : AMD Elan regression with 2.6.27-rc3
> Submitter : Sean Young <[email protected]>
> Date : 2008-08-15 18:37 (9 days old)
> References : http://marc.info/?l=linux-kernel&m=121882578430056&w=4

Peter? Ingo? Alok?

This _looks_ like it might be due to "x86: merge the TSC cpu-freq code"
thing by Alok, where we do this:

+static struct notifier_block time_cpufreq_notifier_block = {
+ .notifier_call = time_cpufreq_notifier
+};
+
+static int __init cpufreq_tsc(void)
+{
+ cpufreq_register_notifier(&time_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+ return 0;
+}

but that's just _insane_ if the CPU doesn't even support TSC to begin
with. Also, in the actual time_cpufreq_notifier(), we do:

if (cpu_has(&cpu_data(freq->cpu), X86_FEATURE_CONSTANT_TSC))
return 0;

and this is stupid because:

(a) if the CPU has no TSC at all, then it sure as hell won't have a
_constant_ one, so we'll actually continue into the function.

(b) and why the hell is this done at run-time in the notifier, and not in
the "cpufreq_tsc" init function? If anybody mixes totally different
kinds of CPU's in SMP, they deserve whatever they want.

so why is the patch not something like the appended?

Sean, does this make any difference for you?

Linus

---
arch/x86/kernel/tsc.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46af716..9bed5ca 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -325,6 +325,10 @@ static struct notifier_block time_cpufreq_notifier_block = {

static int __init cpufreq_tsc(void)
{
+ if (!cpu_has_tsc)
+ return 0;
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+ return 0;
cpufreq_register_notifier(&time_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
return 0;

2008-08-24 21:37:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
[--snip--]

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11361
> Subject : my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
> Submitter : Yinghai Lu <[email protected]>
> Date : 2008-08-17 6:25 (7 days old)
> References : http://marc.info/?l=linux-kernel&m=121895439927053&w=4
> Handled-By : Rafael J. Wysocki <[email protected]>
> Patch : http://marc.info/?l=linux-kernel&m=121917167232014&w=4

[--snip--]

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11358
> Subject : net: forcedeth call restore mac addr in nv_shutdown path
> Submitter : Yinghai Lu <[email protected]>
> Date : 2008-08-17 3:30 (7 days old)
> References : http://marc.info/?l=linux-kernel&m=121894389018584&w=4
> Handled-By : Yinghai Lu <[email protected]>
> Patch : http://marc.info/?l=linux-kernel&m=121894389018584&w=4

Jeff, do you have the patches for these two in your queue?

Rafael

2008-08-25 12:04:49

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Sat, 23 Aug 2008, Linus Torvalds wrote:
>> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but
>> then the call chain shows that there is no interrupt going on.
>
> Ahh, later in that thread there's another totally unrelated oops in
> debug_mutex_add_waiter().
>
> I'd guess that it is really wild pointer corrupting memory, quite possibly
> due to a double free or something like that. Alan - it would be good to
> run with DEBUG_PAGE_ALLOC and SLUB debugging etc if you don't already do
> that?
>
> Linus
>

I'll add those in - as to the repeatability: The "bad" kernels seem to
repeat quite reliably - not only in terms of counts (5 or 6 times in a
row before trying something else), but also in terms of the "what" -
either the original issue () or the other kernel with the later issue
(debug_mutex_add_waiter). That's /goodness/ in that it should help
narrow it down.

I'll make sure the kernel is still failing this morning, and then add in
DEBUG_PAGE_ALLOC and if that doesn't help, SLUB debugging...

Alan

2008-08-24 17:49:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11405
> Subject : 2.6.27-rc3 segfault on cold boot; not on warm boot.
> Submitter : David Greaves <[email protected]>
> Date : 2008-08-21 9:45 (3 days old)
> References : http://marc.info/?l=linux-kernel&m=121931198904777&w=4

It would be good to have some kind of bisection of this one, because it
looks pretty odd. Also, google doesn't find anybody else seeing that
"segfault at ffffffbf", even though it seems to be very consistent for
David. So I don't think we'll be able to even _guess_ where it is without
some more information about exactly when it started happening.

Since it's present in 2.6.26 too, it's clearly not a regression from that
one, but perhaps more importantly, since it's apparently an old one I'd
have expected more reports like this if it was some common problem. And
the warm-vs-cold-boot thing makes me think it's some hardware setup issue.

Possibly the disk controller, possibly the CPU (eg some MTRR/PAT
setup issue or TLB thing). But the dmesg's are all from late enough at
boot that I can't even tell what disk controller it is (except that it is
SATA), nor can I tell what CPU it is.

But again, if it was some MTRR/PAT issue, I'd expect a _lot_ more reports
of this.

MD/XFS sounds unlikely, since they should have absolutely nothing that
could possibly matter for cold/hot boot.

Linus

2008-08-24 19:23:40

by David Greaves

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

Linus Torvalds wrote:
>
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11405
>> Subject : 2.6.27-rc3 segfault on cold boot; not on warm boot.
>> Submitter : David Greaves <[email protected]>
>> Date : 2008-08-21 9:45 (3 days old)
>> References : http://marc.info/?l=linux-kernel&m=121931198904777&w=4
>
> It would be good to have some kind of bisection of this one, because it
> looks pretty odd. Also, google doesn't find anybody else seeing that
> "segfault at ffffffbf", even though it seems to be very consistent for
> David. So I don't think we'll be able to even _guess_ where it is without
> some more information about exactly when it started happening.
>
> Since it's present in 2.6.26 too, it's clearly not a regression from that
> one, but perhaps more importantly, since it's apparently an old one I'd
> have expected more reports like this if it was some common problem. And
> the warm-vs-cold-boot thing makes me think it's some hardware setup issue.
>
> Possibly the disk controller, possibly the CPU (eg some MTRR/PAT
> setup issue or TLB thing). But the dmesg's are all from late enough at
> boot that I can't even tell what disk controller it is (except that it is
> SATA), nor can I tell what CPU it is.
>
> But again, if it was some MTRR/PAT issue, I'd expect a _lot_ more reports
> of this.

OK, that all makes sense.

Given that I'll manage at best 1 bisect/day with a reasonable chance of data
corruption and hardware intermittency screwing it all up I thought it best to
ask first in case there was another debug approach that could work. However
since it does indeed sounds somewhat hardware related and it's an isolated
problem for my wife (as opposed to a problem that others are having too) then I
think she deserves a new machine...

Thanks for the impetus to cheer her up ;)

David
PS if anyone really is interested then I am happy to try the bisection once I've
moved her to a new box; otherwise I'm happy to close this.

2008-08-25 12:07:44

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Arjan van de Ven wrote:

>
> Wonder what gcc is in use?
> (newer ones tend to be a ton better... but maybe Alex is using a really
> old one)

I'm running Ubuntu 8.04 w/ gcc:

gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

Alan

2008-08-23 19:45:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11264] Invalid op opcode in kernel/workqueue

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11264
Subject : Invalid op opcode in kernel/workqueue
Submitter : Jean-Luc Coulon <[email protected]>
Date : 2008-08-07 04:18 (17 days old)

2008-08-23 19:49:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11340] LTP overnight run resulted in unusable box

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11340
Subject : LTP overnight run resulted in unusable box
Submitter : Alexey Dobriyan <[email protected]>
Date : 2008-08-13 9:24 (11 days old)
References : http://marc.info/?l=linux-kernel&m=121861951902949&w=4

2008-08-23 19:44:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11254] KVM: fix userspace ABI breakage

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11254
Subject : KVM: fix userspace ABI breakage
Submitter : Adrian Bunk <[email protected]>
Date : 21 Jul 2008 17:58:26 (0 days old)
References : http://lkml.org/lkml/2008/7/21/197
Handled-By : Adrian Bunk <[email protected]>
Patch : http://lkml.org/lkml/2008/7/21/197

2008-08-23 19:53:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11380] lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11380
Subject : lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
Submitter : Ingo Molnar <[email protected]>
Date : 2008-08-20 6:44 (4 days old)
References : http://marc.info/?l=linux-kernel&m=121921480931970&w=4

2008-08-23 19:46:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11272] BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11272
Subject : BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
Submitter : Jaswinder Singh <[email protected]>
Date : 2008-08-05 15:12 (19 days old)
References : http://marc.info/?l=linux-kernel&m=121794900319776&w=4

2008-08-23 19:50:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11355
Subject : Regression in 2.6.27-rc2 when cross-building the kernel
Submitter : Larry Finger <[email protected]>
Date : 2008-08-16 2:38 (8 days old)
References : http://marc.info/?l=linux-kernel&m=121885432118368&w=4

2008-08-23 19:53:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11388] 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11388
Subject : 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable
Submitter : Joshua Hoblitt <[email protected]>
Date : 2008-08-20 17:38 (4 days old)

2008-08-23 19:47:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11279] 2.6.27-rc0 Power Bugs with HP/Compaq Laptops

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11279
Subject : 2.6.27-rc0 Power Bugs with HP/Compaq Laptops
Submitter : Matt Parnell <[email protected]>
Date : 2008-08-07 14:57 (17 days old)
References : http://marc.info/?l=linux-kernel&m=121812108031685&w=4

2008-08-23 19:52:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11379
Subject : char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
Submitter : Frans Pop <[email protected]>
Date : 2008-08-18 13:40 (6 days old)
References : http://marc.info/?l=linux-kernel&m=121906698213329&w=4
Handled-By : Bjorn Helgaas <[email protected]>

2008-08-23 19:48:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11336] 2.6.27-rc2:stall while mounting root fs

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11336
Subject : 2.6.27-rc2:stall while mounting root fs
Submitter : Torsten Kaiser <[email protected]>
Date : 2008-08-12 12:37 (12 days old)
References : http://marc.info/?l=linux-kernel&m=121854484015909&w=4

2008-08-23 19:44:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11224] Only three cores found on quad-core machine.

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11224
Subject : Only three cores found on quad-core machine.
Submitter : Dave Jones <[email protected]>
Date : 2008-08-01 18:15 (23 days old)
References : http://marc.info/?l=linux-kernel&m=121761475224719&w=4

2008-08-23 19:50:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11354] AMD Elan regression with 2.6.27-rc3

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11354
Subject : AMD Elan regression with 2.6.27-rc3
Submitter : Sean Young <[email protected]>
Date : 2008-08-15 18:37 (9 days old)
References : http://marc.info/?l=linux-kernel&m=121882578430056&w=4

2008-08-23 19:44:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11220] Screen stays black after resume

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11220
Subject : Screen stays black after resume
Submitter : Nico Schottelius <[email protected]>
Date : 2008-07-31 21:05 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121753882422899&w=4

2008-08-23 20:18:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Sat, 23 Aug 2008, Linus Torvalds wrote:
>
> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but
> then the call chain shows that there is no interrupt going on.

Ahh, later in that thread there's another totally unrelated oops in
debug_mutex_add_waiter().

I'd guess that it is really wild pointer corrupting memory, quite possibly
due to a double free or something like that. Alan - it would be good to
run with DEBUG_PAGE_ALLOC and SLUB debugging etc if you don't already do
that?

Linus

2008-08-23 19:50:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
Subject : Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
Submitter : Frans Pop <[email protected]>
Date : 2008-08-16 19:11 (8 days old)
References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4

2008-08-24 06:14:37

by Frans Pop

[permalink] [raw]
Subject: Re: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'

On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
> Subject : Linux 2.6.27-rc3 - build failure: undefined reference to
> `.lockdep_count_forward_deps'
> Submitter : Frans Pop <[email protected]>
> Date : 2008-08-16 19:11 (8 days old)
> References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4

Fixed as per: http://marc.info/?l=linux-kernel&m=121898767530602&w=4
Adrian mentioned that he'd closed the bug, but apparently not.

2008-08-23 19:51:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11358] net: forcedeth call restore mac addr in nv_shutdown path

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11358
Subject : net: forcedeth call restore mac addr in nv_shutdown path
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-17 3:30 (7 days old)
References : http://marc.info/?l=linux-kernel&m=121894389018584&w=4
Handled-By : Yinghai Lu <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121894389018584&w=4

2008-08-24 18:04:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11410
> Subject : SLUB list_lock vs obj_hash.lock...
> Submitter : Daniel J Blueman <[email protected]>
> Date : 2008-08-22 21:48 (2 days old)
> References : http://marc.info/?l=linux-kernel&m=121944176609042&w=4

This one now has a suggested patch for Daniel to try from Vegard, but no
reply yet:

http://marc.info/?l=linux-kernel&m=121946972307110&w=4

Vegard, I think your patch is a bit odd, though. The result of your patch
is

- first loop:

hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) {
hlist_del(&obj->node);
hlist_add_head(&obj->node, &freelist);
}

and quite frankly, I don't see what the difference between that and a
something like a simple

struct hlist_node *first = bd->list.first;
if (first) {
bd->list.first = NULL;
first->pprev = &first;
}

really is?

I dunno. We don't have list splicing ops for the hlist things.

Linus

2008-08-23 19:58:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11413] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11413
Subject : get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
Submitter : Mikael Pettersson <[email protected]>
Date : 2008-08-23 9:48 (1 days old)
References : http://marc.info/?l=linux-kernel&m=121948503224161&w=4
Handled-By : Ingo Molnar <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121950734922457&w=4

2008-08-23 19:48:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11334
Subject : myri10ge: use ioremap_wc: compilation failure on ARM
Submitter : Martin Michlmayr <[email protected]>
Date : 2008-08-10 11:25 (14 days old)
References : http://marc.info/?l=linux-netdev&m=121836771727632&w=2

2008-08-23 19:56:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11405] 2.6.27-rc3 segfault on cold boot; not on warm boot.

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11405
Subject : 2.6.27-rc3 segfault on cold boot; not on warm boot.
Submitter : David Greaves <[email protected]>
Date : 2008-08-21 9:45 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121931198904777&w=4

2008-08-23 19:58:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11414] Random crashes with 2.6.27-rc3 on PPC

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11414
Subject : Random crashes with 2.6.27-rc3 on PPC
Submitter : Michael Buesch <[email protected]>
Date : 2008-08-23 14:10 (1 days old)
References : http://marc.info/?l=linux-kernel&m=121950076812616&w=4

2008-08-23 19:49:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11342
Subject : Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
Submitter : Alan D. Brunelle <[email protected]>
Date : 2008-08-13 23:03 (11 days old)
References : http://marc.info/?l=linux-kernel&m=121866876027629&w=4
Handled-By : Andrew Morton <[email protected]>

2008-08-23 19:49:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11343
Subject : SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
Submitter : Manny Maxwell <[email protected]>
Date : 2008-08-14 4:16 (10 days old)
References : http://marc.info/?l=linux-kernel&m=121868782917600&w=4

2008-08-23 19:42:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11209] 2.6.27-rc1 process time accounting

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11209
Subject : 2.6.27-rc1 process time accounting
Submitter : Lukas Hejtmanek <[email protected]>
Date : 2008-07-31 10:43 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121750102917490&w=4
Handled-By : Peter Zijlstra <[email protected]>

2008-08-23 19:39:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11141] no battery or DC status - Dell i1501

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11141
Subject : no battery or DC status - Dell i1501
Submitter : Gu Rui <[email protected]>
Date : 2008-07-21 19:43 (34 days old)
Handled-By : Zhao Yakui <[email protected]>

2008-08-23 20:15:50

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>> The following bug entry is on the current list of known regressions
>> from 2.6.26. Please verify if it still should be listed and let me know
>> (either way).
>>
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11342
>> Subject : Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
>> Submitter : Alan D. Brunelle <[email protected]>
>> Date : 2008-08-13 23:03 (11 days old)
>> References : http://marc.info/?l=linux-kernel&m=121866876027629&w=4
>> Handled-By : Andrew Morton <[email protected]>
>
> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but
> then the call chain shows that there is no interrupt going on.
>
> Also, the bisection is senseless - there's a trivial change wrt
> "do_one_initcall()" that got merged, but everything else is trivial about
> lguest and has nothing to do with the whole CPU-init thing. But if it was
> that initcall one, then "git bisect" woul have pointed to it, not the
> merge. And the merge itself had no conflicts or anything else going on..
>
> The fact that it came and went later also implies that it's probably just
> some timing-dependent thing or some subtle memory corruption, making the
> bisection result even less likely to be exact.
>
> But I'm adding Arjan and Rusty to the Cc, because that merge was takign
> Rusty's branch, and the "do_one_initcall()" is Arjan's commit. Since
> undoing that merge apparently does fix it, I'm wondering if something
> there just does end up triggering the problem.
>
> The do_one_commit() thing _is_ in the path of sys_init_module(), so it
> _is_ at least somewhat relevant from an oops standpoint.
>
> One thing the "do_one_commit()" thing does is to put more pressure on the
> stack due to that whole buffer for the printk's going on.

but it's 64 bit.. with 8Kb stack and separate irq stacks. I'd be surprised if we blow that this easily.
the trace is a tad long with a long ACPI call chain.

Wonder what gcc is in use?
(newer ones tend to be a ton better... but maybe Alex is using a really old one)

2008-08-23 19:57:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11409] build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11409
Subject : build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'
Submitter : Toralf Förster <[email protected]>
Date : 2008-08-22 8:33 (2 days old)
References : http://marc.info/?l=linux-kernel&m=121939410214677&w=4
Handled-By : Alan Cox <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121943097320451&w=4

2008-08-23 19:45:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11271] BUG: fealnx in 2.6.27-rc1

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11271
Subject : BUG: fealnx in 2.6.27-rc1
Submitter : Jaswinder Singh <[email protected]>
Date : 2008-08-05 14:58 (19 days old)
References : http://marc.info/?l=linux-netdev&m=121794762016830&w=4
http://lkml.org/lkml/2008/8/10/98
Handled-By : Francois Romieu <[email protected]>

2008-08-23 19:54:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11404] BUG: in 2.6.23-rc3-git7 in do_cciss_intr

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11404
Subject : BUG: in 2.6.23-rc3-git7 in do_cciss_intr
Submitter : rdunlap <[email protected]>
Date : 2008-08-21 5:52 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121929819616273&w=4
http://marc.info/?l=linux-kernel&m=121932889105368&w=4
Handled-By : Miller, Mike (OS Dev) <[email protected]>
James Bottomley <[email protected]>

2008-08-23 19:51:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11357] Can not boot up with zd1211rw USB-Wlan Stick

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11357
Subject : Can not boot up with zd1211rw USB-Wlan Stick
Submitter : uwe <[email protected]>
Date : 2008-08-16 14:17 (8 days old)

2008-08-23 19:47:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject : tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter : Christoph Lameter <[email protected]>
Date : 2008-08-11 18:36 (13 days old)
References : http://marc.info/?l=linux-kernel&m=121847986119495&w=4

2008-08-23 19:52:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11382
Subject : e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
Submitter : David Vrabel <[email protected]>
Date : 2008-08-08 10:47 (16 days old)
References : http://marc.info/?l=linux-kernel&m=121819267211679&w=4

2008-08-23 19:47:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11282] Please fix x86 defconfig regression

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11282
Subject : Please fix x86 defconfig regression
Submitter : Andi Kleen <[email protected]>
Date : 2008-08-07 20:46 (17 days old)
References : http://marc.info/?l=linux-kernel&m=121814188805666&w=4

2008-08-23 19:57:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11410] SLUB list_lock vs obj_hash.lock...

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11410
Subject : SLUB list_lock vs obj_hash.lock...
Submitter : Daniel J Blueman <[email protected]>
Date : 2008-08-22 21:48 (2 days old)
References : http://marc.info/?l=linux-kernel&m=121944176609042&w=4

2008-08-23 20:11:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>
> The following bug entry is on the current list of known regressions
> from 2.6.26. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11342
> Subject : Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
> Submitter : Alan D. Brunelle <[email protected]>
> Date : 2008-08-13 23:03 (11 days old)
> References : http://marc.info/?l=linux-kernel&m=121866876027629&w=4
> Handled-By : Andrew Morton <[email protected]>

This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but
then the call chain shows that there is no interrupt going on.

Also, the bisection is senseless - there's a trivial change wrt
"do_one_initcall()" that got merged, but everything else is trivial about
lguest and has nothing to do with the whole CPU-init thing. But if it was
that initcall one, then "git bisect" woul have pointed to it, not the
merge. And the merge itself had no conflicts or anything else going on..

The fact that it came and went later also implies that it's probably just
some timing-dependent thing or some subtle memory corruption, making the
bisection result even less likely to be exact.

But I'm adding Arjan and Rusty to the Cc, because that merge was takign
Rusty's branch, and the "do_one_initcall()" is Arjan's commit. Since
undoing that merge apparently does fix it, I'm wondering if something
there just does end up triggering the problem.

The do_one_commit() thing _is_ in the path of sys_init_module(), so it
_is_ at least somewhat relevant from an oops standpoint.

One thing the "do_one_commit()" thing does is to put more pressure on the
stack due to that whole buffer for the printk's going on.

Alan, can you try
- seeing how consistent it is with one kernel (ie boot a known-bad kernel
a few times just to see if it really is 100% consistent)
- try enabling 'initcall_debug' on the kernel command line, to (a) see
the new code actually do something and (b) see what it is actually
calling just before.

Hmm..

Linus

2008-08-23 19:43:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11215] INFO: possible recursive locking detected ps2_command

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11215
Subject : INFO: possible recursive locking detected ps2_command
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-07-31 9:41 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121749737011637&w=4
Handled-By : Peter Zijlstra <[email protected]>

2008-08-23 19:43:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11219] KVM modules break emergency reboot

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11219
Subject : KVM modules break emergency reboot
Submitter : Zdenek Kabelac <[email protected]>
Date : 2008-08-01 20:25 (23 days old)
References : http://marc.info/?l=linux-kernel&m=121762241105336&w=4

2008-08-23 19:56:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11406] patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11406
Subject : patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug
Submitter : Jan Beulich <[email protected]>
Date : 2008-08-21 12:59 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121932366326572&w=4
Handled-By : Ingo Molnar <[email protected]>
Robert Richter <[email protected]>

2008-08-23 19:41:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11207] VolanoMark regression with 2.6.27-rc1

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11207
Subject : VolanoMark regression with 2.6.27-rc1
Submitter : Zhang, Yanmin <[email protected]>
Date : 2008-07-31 3:20 (24 days old)
References : http://marc.info/?l=linux-kernel&m=121747464114335&w=4
Handled-By : Zhang, Yanmin <[email protected]>
Peter Zijlstra <[email protected]>
Dhaval Giani <[email protected]>
Miao Xie <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121922991027344&w=4

2008-08-23 19:52:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11360] mpc8xxx_wdt.c doesn't build modular

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11360
Subject : mpc8xxx_wdt.c doesn't build modular
Submitter : Dave Jones <[email protected]>
Date : 2008-08-17 08:07 (7 days old)
References : http://lkml.org/lkml/2008/8/12/465
Handled-By : Anton Vorontsov <[email protected]>
Patch : http://lkml.org/lkml/2008/8/13/344

2008-08-23 19:55:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11403] 2.6.27-rc2 USB suspend regression

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11403
Subject : 2.6.27-rc2 USB suspend regression
Submitter : Jeremy Fitzhardinge <[email protected]>
Date : 2008-08-20 20:48 (4 days old)
References : http://marc.info/?l=linux-kernel&m=121926536103630&w=4
Handled-By : Alan Stern <[email protected]>

2008-08-23 19:48:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11335] 2.6.27-rc2-git5 BUG: unable to handle kernel paging request

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11335
Subject : 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
Submitter : Randy Dunlap <[email protected]>
Date : 2008-08-12 4:18 (12 days old)
References : http://marc.info/?l=linux-kernel&m=121851477201960&w=4
http://lkml.org/lkml/2008/8/16/274
Handled-By : Hugh Dickins <[email protected]>

2008-08-24 06:18:49

by Frans Pop

[permalink] [raw]
Subject: Re: [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop

On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11379
> Subject : char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
> Submitter : Frans Pop <[email protected]>
> Date : 2008-08-18 13:40 (6 days old)
> References : http://marc.info/?l=linux-kernel&m=121906698213329&w=4
> Handled-By : Bjorn Helgaas <[email protected]>

Fixed with:
commit 5e4c6564c95ce127beeefe75e15cd11c93487436
Author: Kay Sievers <[email protected]>
Date: Thu Aug 21 15:28:56 2008 +0200

pnp: fix "add acpi:* modalias entries"

2008-08-25 12:23:20

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Loading, please [ 6.482953] busybox used greatest stack depth: 4840 bytes left
wait...
[ 6.521876] all_generic_ide used greatest stack depth: 4784 bytes left
Begin: Loading essential drivers... ...
[ 6.625509] fuse init (API version 7.9)
[ 6.625509] modprobe used greatest stack depth: 1720 bytes left
[ 6.644854] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM CPU_TM2 1 MSFT 100000E)
[ 6.651489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858
[ 6.655631] IP: [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[ 6.655631] PGD 21a0a4067 PUD 21a4bd067 PMD 0
[ 6.655631] Oops: 0002 [1] SMP
[ 6.655631] CPU 1
[ 6.655631] Modules linked in: processor(+) fan thermal_sys fuse
[ 6.655631] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #29
[ 6.655631] RIP: 0010:[<ffffffff8025e302>] [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[ 6.655631] RSP: 0018:ffff88021a4e7998 EFLAGS: 00010002
[ 6.655631] RAX: 0000000000000000 RBX: ffff88021a4e79d8 RCX: 0000000000000000
[ 6.655631] RDX: 0000000000000001 RSI: ffff88021a4e79d8 RDI: ffffffffa0091a60
[ 6.655631] RBP: ffff88021a4e79b8 R08: ffffffff811deff0 R09: ffff8800a6fdb000
[ 6.655631] R10: ffffffffa008f524 R11: 0000000000000000 R12: ffffffffa0091a60
[ 6.655631] R13: ffff88021a4e6000 R14: ffff88021a9c40a0 R15: ffffffffa0091a98
[ 6.655631] FS: 00007f233f11d6e0(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[ 6.655631] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6.655631] CR2: 0000000000000858 CR3: 000000021a07e000 CR4: 00000000000006e0
[ 6.655631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6.655631] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6.655631] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[ 6.655631] Stack: 0000000000000000 ffffffffa0091a60 0000000000000246 ffffffffa008f524
[ 6.655631] ffff88021a4e7a38 ffffffff8049f596 ffffffffa008f524 ffffffffa0091a18
[ 6.655631] ffff88021a4e79d8 ffff88021a4e79d8 1111111111111111 1111111111111111
[ 6.655631] Call Trace:
[ 6.655631] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 6.655631] [<ffffffff8049f596>] mutex_lock_nested+0xa6/0x250
[ 6.655631] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 6.655631] [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[ 6.655631] [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[ 6.655631] [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 6.655631] [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[ 6.655631] [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[ 6.655631] [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[ 6.655631] [<ffffffff803a7f5e>] acpi_start_single_object+0x2d/0x52
[ 6.655631] [<ffffffff803a9556>] acpi_device_probe+0x7e/0x92
[ 6.655631] [<ffffffff803dd3eb>] driver_probe_device+0x9b/0x1a0
[ 6.655631] [<ffffffff803dd576>] __driver_attach+0x86/0x90
[ 6.655631] [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[ 6.655631] [<ffffffff803dc93d>] bus_for_each_dev+0x5d/0x90
[ 6.655631] [<ffffffff803dd22c>] driver_attach+0x1c/0x20
[ 6.655631] [<ffffffff803dcf79>] bus_add_driver+0x1e9/0x260
[ 6.655631] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 6.655631] [<ffffffff803dd74f>] driver_register+0x5f/0x140
[ 6.655631] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 6.655631] [<ffffffff803a9866>] acpi_bus_register_driver+0x3e/0x40
[ 6.655631] [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[ 6.655631] [<ffffffff80209040>] _stext+0x40/0x180
[ 6.655631] [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[ 6.655631] [<ffffffff802676c2>] sys_init_module+0x142/0x1dc0
[ 6.655631] [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[ 6.655631] [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[ 6.655631] [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[ 6.655631]
[ 6.655631]
[ 6.655631] Code: 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 8b 47 08 49 89 d5 49 89 fc 89 c2 25 ff ff 00 00 c1 ea 10 39 c2 74 1d 49 8b 4
[ 6.655631] RIP [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[ 6.655631] RSP <ffff88021a4e7998>
[ 6.655631] CR2: 0000000000000858
[ 6.655631] ---[ end trace 8bbd31df1403e48e ]---
[ 7.024992] modprobe used greatest stack depth: 408 bytes left
[ 7.030988] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 7.031053] IP: [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[ 7.031053] PGD 0
[ 7.031053] Oops: 0000 [2] SMP
[ 7.031053] CPU 1
[ 7.031053] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.031053] Pid: 1259, comm: modprobe Tainted: G D 2.6.27-rc3 #29
[ 7.031053] RIP: 0010:[<ffffffff8023f39c>] [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[ 7.031053] RSP: 0018:ffff88021a4e77e8 EFLAGS: 00010246
[ 7.031053] RAX: 0000000000000000 RBX: 0000000000000198 RCX: 0000000000000000
[ 7.031053] RDX: 0000000000000000 RSI: ffffffff802740d0 RDI: 0000000000000000
[ 7.031053] RBP: ffff88021a4e7848 R08: 0000000000000001 R09: 0000000000000000
[ 7.031053] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88021a9c40a0
[ 7.031053] R13: 0000000000000009 R14: ffff88021a4e78e8 R15: ffff88021a18b8a0
[ 7.031053] FS: 0000000000000000(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[ 7.031053] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7.031053] CR2: 0000000000000048 CR3: 0000000000201000 CR4: 00000000000006e0
[ 7.031053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.031053] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7.031053] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[ 7.031053] Stack: 0000000000000001 0000000000000001 0000000000000040 0000000000000001
[ 7.031053] ffff88021a4e7828 ffffffff803c7d59 0000000000000092 0000000000000092
[ 7.031053] ffff88021a4e78e8 0000000000000009 ffff88021a4e78e8 ffff88021a18b8a0
[ 7.031053] Call Trace:
[ 7.031053] [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[ 7.031053] [<ffffffff804a1a57>] oops_end+0x87/0x90
[ 7.031053] [<ffffffff804a3d13>] do_page_fault+0x663/0x800
[ 7.031053] [<ffffffff804a162d>] error_exit+0x0/0x9a
[ 7.031053] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.031053] [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.031053] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.031053] [<ffffffff8049f596>] mutex_lock_nested+0xa6/0x250
[ 7.031053] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.031053] [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[ 7.031053] [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[ 7.031053] [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.031053] [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[ 7.031053] [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.031053] [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[ 7.031053] [<ffffffff803a7f5e>] acpi_start_single_object+0x2d/0x52
[ 7.031053] [<ffffffff803a9556>] acpi_device_probe+0x7e/0x92
[ 7.031053] [<ffffffff803dd3eb>] driver_probe_device+0x9b/0x1a0
[ 7.031053] [<ffffffff803dd576>] __driver_attach+0x86/0x90
[ 7.031053] [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[ 7.031053] [<ffffffff803dc93d>] bus_for_each_dev+0x5d/0x90
[ 7.031053] [<ffffffff803dd22c>] driver_attach+0x1c/0x20
[ 7.031053] [<ffffffff803dcf79>] bus_add_driver+0x1e9/0x260
[ 7.031053] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.031053] [<ffffffff803dd74f>] driver_register+0x5f/0x140
[ 7.031053] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.031053] [<ffffffff803a9866>] acpi_bus_register_driver+0x3e/0x40
[ 7.031053] [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[ 7.031053] [<ffffffff80209040>] _stext+0x40/0x180
[ 7.031053] [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[ 7.031053] [<ffffffff802676c2>] sys_init_module+0x142/0x1dc0
[ 7.031053] [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[ 7.031053] [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[ 7.031053] [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[ 7.031053]
[ 7.031053]
[ 7.031053] Code: e8 8a e3 0e 00 8b 45 b8 85 c0 74 16 49 8b 84 24 40 07 00 00 8b 80 4c 01 00 00 85 c0 0f 85 77 07 00 00 49 8b 44 24 08 48 8b 4
[ 7.031053] RIP [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[ 7.031053] RSP <ffff88021a4e77e8>
[ 7.031053] CR2: 0000000000000048
[ 7.421063] ------------[ cut here ]------------
[ 7.424883] WARNING: at kernel/sched_fair.c:884 hrtick_start_fair+0x187/0x190()
[ 7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.424883] Pid: 1259, comm: modprobe Tainted: G D 2.6.27-rc3 #29
[ 7.424883]
[ 7.424883] Call Trace:
[ 7.424883] <IRQ> [<ffffffff8023baef>] warn_on_slowpath+0x5f/0x80
[ 7.424883] [<ffffffff8022d927>] hrtick_start_fair+0x187/0x190
[ 7.424883] [<ffffffff8022ec79>] enqueue_task_fair+0x49/0x250
[ 7.424883] [<ffffffff8022c290>] enqueue_task+0x50/0x60
[ 7.424883] [<ffffffff8022c2c3>] activate_task+0x23/0x40
[ 7.424883] [<ffffffff80231653>] try_to_wake_up+0x253/0x280
[ 7.424883] [<ffffffff8023168d>] default_wake_function+0xd/0x10
[ 7.424883] [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[ 7.424883] [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[ 7.424883] [<ffffffff8022d373>] __wake_up+0x43/0x70
[ 7.424883] [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[ 7.424883] [<ffffffff8024df68>] insert_work+0x48/0x50
[ 7.424883] [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[ 7.424883] [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[ 7.424883] [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[ 7.424883] [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[ 7.424883] [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[ 7.424883] [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[ 7.424883] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.424883] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.424883] [<ffffffff802414f5>] irq_exit+0x85/0x90
[ 7.424883] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.424883] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.424883] <EOI> [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[ 7.424883] [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[ 7.424883] [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[ 7.424883] [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[ 7.424883] [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.424883] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.424883] [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.424883] [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[ 7.424883] [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[ 7.424883] [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[ 7.424883] [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[ 7.424883] [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[ 7.424883] [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[ 7.424883] [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[ 7.424883] [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[ 7.424883] [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.424883] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.424883] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.424883] [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[ 7.424883] [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[ 7.424883] [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[ 7.424883] [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[ 7.424883] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.424883]
[ 7.424883] ---[ end trace 8bbd31df1403e48e ]---
[ 7.424883] ------------[ cut here ]------------
[ 7.424883] kernel BUG at kernel/sched.c:1155!
[ 7.424883] invalid opcode: 0000 [3] SMP
[ 7.424883] CPU 1
[ 7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.424883] Pid: 1259, comm: modprobe Tainted: G D W 2.6.27-rc3 #29
[ 7.424883] RIP: 0010:[<ffffffff8022cc2b>] [<ffffffff8022cc2b>] resched_task+0x6b/0x70
[ 7.424883] RSP: 0018:ffff88022f0abce0 EFLAGS: 00010046
[ 7.424883] RAX: 0000000000000709 RBX: 0000000004bfe971 RCX: ffff88021a4e6000
[ 7.424883] RDX: 0000000000000709 RSI: 0000000000000000 RDI: ffff88021a9c40a0
[ 7.424883] RBP: ffff88022f0abce0 R08: ffff88022f180038 R09: ffff88021a9c40d8
[ 7.424883] R10: ffffffff810c9e00 R11: 0000000000000000 R12: ffff8800a6fc9000
[ 7.424883] R13: ffffffff810c9e00 R14: ffff88021a9c40a0 R15: 0000000000000001
[ 7.424883] FS: 0000000000000000(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[ 7.424883] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7.424883] CR2: 0000000000000048 CR3: 0000000000201000 CR4: 00000000000006e0
[ 7.424883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.424883] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7.424883] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[ 7.424883] Stack: ffff88022f0abd20 ffffffff802387b3 0000000000000400 0000000000400000
[ 7.424883] ffff88022f180000 ffff8800280a4e00 0000000000000001 0000000000000003
[ 7.424883] ffff88022f0abd70 ffffffff802314bf 0000000100000001 0000000000000000
[ 7.424883] Call Trace:
[ 7.424883] <IRQ> [<ffffffff802387b3>] check_preempt_wakeup+0x133/0x1c0
[ 7.424883] [<ffffffff802314bf>] try_to_wake_up+0xbf/0x280
[ 7.424883] [<ffffffff8023168d>] default_wake_function+0xd/0x10
[ 7.424883] [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[ 7.424883] [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[ 7.424883] [<ffffffff8022d373>] __wake_up+0x43/0x70
[ 7.424883] [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[ 7.424883] [<ffffffff8024df68>] insert_work+0x48/0x50
[ 7.424883] [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[ 7.424883] [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[ 7.424883] [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[ 7.424883] [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[ 7.424883] [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[ 7.424883] [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[ 7.424883] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.424883] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.424883] [<ffffffff802414f5>] irq_exit+0x85/0x90
[ 7.424883] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.424883] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.424883] <EOI> [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[ 7.424883] [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[ 7.424883] [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[ 7.424883] [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[ 7.424883] [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.424883] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.424883] [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.424883] [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[ 7.424883] [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[ 7.424883] [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[ 7.424883] [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[ 7.424883] [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[ 7.424883] [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[ 7.424883] [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[ 7.424883] [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[ 7.424883] [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.424883] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.424883] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.424883] [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[ 7.424883] [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[ 7.424883] [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[ 7.424883] [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[ 7.424883] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.424883]
[ 7.424883]
[ 7.424883] Code: 8b 47 08 8b 50 1c 65 8b 04 25 24 00 00 00 39 c2 74 0d 0f ae f0 48 8b 47 08 f6 40 18 04 74 02 c9 c3 89 d7 ff 15 1f 7b 3c 00 c
[ 7.424883] RIP [<ffffffff8022cc2b>] resched_task+0x6b/0x70
[ 7.424883] RSP <ffff88022f0abce0>
[ 7.424883] ---[ end trace 8bbd31df1403e48e ]---
[ 7.424883] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 7.424883] ------------[ cut here ]------------
[ 7.424883] WARNING: at kernel/smp.c:328 smp_call_function_mask+0x25a/0x260()
[ 7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.424883] Pid: 1259, comm: modprobe Tainted: G D W 2.6.27-rc3 #29
[ 7.424883]
[ 7.424883] Call Trace:
[ 7.424883] <IRQ> [<ffffffff8023baef>] warn_on_slowpath+0x5f/0x80
[ 7.424883] [<ffffffff8026514a>] smp_call_function_mask+0x25a/0x260
[ 7.424883] [<ffffffff803695bd>] ? string+0x3d/0xd0
[ 7.424883] [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[ 7.424883] [<ffffffff803695bd>] ? string+0x3d/0xd0
[ 7.424883] [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[ 7.424883] [<ffffffff803695bd>] ? string+0x3d/0xd0
[ 7.424883] [<ffffffff803695bd>] ? string+0x3d/0xd0
[ 7.424883] [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[ 7.424883] [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[ 7.424883] [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[ 7.424883] [<ffffffff80269ecd>] ? kallsyms_lookup+0x5d/0xa0
[ 7.424883] [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[ 7.424883] [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[ 7.424883] [<ffffffff80369dd8>] ? sprintf+0x68/0x70
[ 7.424883] [<ffffffff803695bd>] ? string+0x3d/0xd0
[ 7.424883] [<ffffffff804a3fa3>] ? __atomic_notifier_call_chain+0x83/0xa0
[ 7.424883] [<ffffffff804a3f20>] ? __atomic_notifier_call_chain+0x0/0xa0
[ 7.424883] [<ffffffff804a0ef6>] ? _spin_unlock+0x26/0x30
[ 7.424883] [<ffffffff8021c470>] ? stop_this_cpu+0x0/0x30
[ 7.424883] [<ffffffff80265190>] smp_call_function+0x40/0x50
[ 7.424883] [<ffffffff8021c4f3>] native_smp_send_stop+0x23/0x40
[ 7.424883] [<ffffffff8023be3f>] panic+0xaf/0x190
[ 7.424883] [<ffffffff8023cc97>] ? printk+0x67/0x70
[ 7.424883] [<ffffffff8049f4e9>] ? mutex_unlock+0x9/0x10
[ 7.424883] [<ffffffff80256d11>] ? blocking_notifier_call_chain+0x11/0x20
[ 7.424883] [<ffffffff8023f979>] do_exit+0x869/0xa10
[ 7.424883] [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[ 7.424883] [<ffffffff804a1a57>] oops_end+0x87/0x90
[ 7.424883] [<ffffffff8020e08e>] die+0x5e/0x90
[ 7.424883] [<ffffffff804a1f60>] do_trap+0x130/0x150
[ 7.424883] [<ffffffff8020e662>] do_invalid_op+0x92/0xb0
[ 7.424883] [<ffffffff8022cc2b>] ? resched_task+0x6b/0x70
[ 7.424883] [<ffffffff804a162d>] error_exit+0x0/0x9a
[ 7.424883] [<ffffffff8022cc2b>] ? resched_task+0x6b/0x70
[ 7.424883] [<ffffffff802387b3>] check_preempt_wakeup+0x133/0x1c0
[ 7.424883] [<ffffffff802314bf>] try_to_wake_up+0xbf/0x280
[ 7.424883] [<ffffffff8023168d>] default_wake_function+0xd/0x10
[ 7.424883] [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[ 7.424883] [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[ 7.424883] [<ffffffff8022d373>] __wake_up+0x43/0x70
[ 7.424883] [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[ 7.424883] [<ffffffff8024df68>] insert_work+0x48/0x50
[ 7.424883] [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[ 7.424883] [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[ 7.424883] [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[ 7.424883] [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[ 7.424883] [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[ 7.424883] [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[ 7.424883] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.424883] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.424883] [<ffffffff802414f5>] irq_exit+0x85/0x90
[ 7.424883] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.424883] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.424883] <EOI> [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[ 7.424883] [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[ 7.424883] [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[ 7.424883] [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[ 7.424883] [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[ 7.424883] [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[ 7.424883] [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[ 7.424883] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.424883] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.424883] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.424883] [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.424883] [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[ 7.424883] [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[ 7.424883] [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[ 7.424883] [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[ 7.424883] [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[ 7.424883] [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[ 7.424883] [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[ 7.424883] [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[ 7.424883] [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[ 7.424883] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.424883] [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.424883] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.424883] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.424883] [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[ 7.424883] [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[ 7.424883] [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[ 7.424883] [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[ 7.424883] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.424883]
[ 7.424883] ---[ end trace 8bbd31df1403e48e ]---


Attachments:
prob3.txt (25.29 kB)

2008-08-24 19:23:57

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Sun, Aug 24, 2008 at 12:03:37PM -0700, Linus Torvalds wrote:
>
>
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
> > Subject : Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
> > Submitter : Frans Pop <[email protected]>
> > Date : 2008-08-16 19:11 (8 days old)
> > References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4
>
> Hmm. Wasn't this already confirmed to be fixed by commit
> df60a8441866153d691ae69b77934904c2de5e0d?
>
> At least Adrian sent out an email saying "Confirmed, bug closed.", but
> bugzilla seems to disagree and still show it as open.

There were two different reports, Rafael opened a bug for each, and I
missed that there were two open bugs for the same issue.

The one I closed was #11344.

I've now closed #11356 as a duplicate of #11344.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-23 19:54:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11401] pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11401
Subject : pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
Submitter : Laurent Riffard <[email protected]>
Date : 2008-08-22 08:16 (2 days old)

2008-08-23 19:52:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11361] my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11361
Subject : my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-17 6:25 (7 days old)
References : http://marc.info/?l=linux-kernel&m=121895439927053&w=4
Handled-By : Rafael J. Wysocki <[email protected]>
Patch : http://marc.info/?l=linux-kernel&m=121917167232014&w=4

2008-08-23 19:57:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11407] suspend: unable to handle kernel paging request

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11407
Subject : suspend: unable to handle kernel paging request
Submitter : Vegard Nossum <[email protected]>
Date : 2008-08-21 17:28 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121933974928881&w=4
Handled-By : Rafael J. Wysocki <[email protected]>
Pekka Enberg <[email protected]>
Pavel Machek <[email protected]>

2008-08-23 19:46:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11237] corrupt PMD after resume

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11237
Subject : corrupt PMD after resume
Submitter : Alan Jenkins <[email protected]>
Date : 2008-08-02 9:51 (22 days old)
References : http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By : Hugh Dickins <[email protected]>
Jeremy Fitzhardinge <[email protected]>

2008-08-24 18:58:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Sun, 24 Aug 2008, Vegard Nossum wrote:
>
> I haven't really used the hlists before, so my first instinct was to
> do what is obvious.

I do agree that the hlist versions aren't very nice in this regard. The
regular lists are much better at moving lists around.

> Other than that, I guess open-coding list ops is also not very good
> programming practice? :-)

Agreed. It would be better if the people who use hlists most (I think that
would be networking) would think about this.

> But... feel free to submit your own patch. Oh, what am I saying.

Silly boy. Next you'll ask me to _test_ any patches I send out.

Anyway, I think your patch is likely fine, I just thought it looked a bit
odd to have a loop to move a list from one head pointer to another.

But regardless, it would need some testing. Daniel?

Linus

2008-08-23 19:41:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11191] 2.6.26-git8: spinlock lockup in c1e_idle()

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11191
Subject : 2.6.26-git8: spinlock lockup in c1e_idle()
Submitter : Mikhail Kshevetskiy <[email protected]>
Date : 2008-07-24 03:22 (31 days old)
References : http://lkml.org/lkml/2008/7/23/317
Handled-By : Thomas Gleixner <[email protected]>

2008-08-23 19:46:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11276] build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11276
Subject : build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things
Submitter : Randy Dunlap <[email protected]>
Date : 2008-08-06 17:18 (18 days old)
References : http://marc.info/?l=linux-kernel&m=121804329014332&w=4
http://lkml.org/lkml/2008/7/22/353
Handled-By : Bjorn Helgaas <[email protected]>
Patch : http://lkml.org/lkml/2008/7/22/364

2008-08-23 19:54:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11398] hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11398
Subject : hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Submitter : Frans Pop <[email protected]>
Date : 2008-08-21 17:17 (3 days old)

2008-08-23 19:44:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11230] Kconfig no longer outputs a .config with freshly updated defconfigs

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11230
Subject : Kconfig no longer outputs a .config with freshly updated defconfigs
Submitter : Josh Boyer <[email protected]>
Date : 2008-08-02 16:03 (22 days old)
References : http://marc.info/?l=linux-kernel&m=121769306319391&w=4

2008-08-23 19:55:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11402] skbuff bug?

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11402
Subject : skbuff bug?
Submitter : Yinghai Lu <[email protected]>
Date : 2008-08-21 3:56 (3 days old)
References : http://marc.info/?l=linux-kernel&m=121929102707658&w=4

2008-08-23 22:26:42

by Jeff Garzik

[permalink] [raw]
Subject: Re: [Bug #11271] BUG: fealnx in 2.6.27-rc1

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.26. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11271
> Subject : BUG: fealnx in 2.6.27-rc1
> Submitter : Jaswinder Singh <[email protected]>
> Date : 2008-08-05 14:58 (19 days old)
> References : http://marc.info/?l=linux-netdev&m=121794762016830&w=4
> http://lkml.org/lkml/2008/8/10/98
> Handled-By : Francois Romieu <[email protected]>


Jaswinder, does reverting 28cd4289abc2c8db90344ee4ff064a9bdf086fdf help?

That's the only material change to fealnx itself in years.

If not, any chance you could bisect this problem, and add more info to
the bug?

2008-08-23 19:42:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Bug #11210] libata badness

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26. Please verify if it still should be listed and let me know
(either way).


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11210
Subject : libata badness
Submitter : Kumar Gala <[email protected]>
Date : 2008-07-31 18:53 (24 days old)
References : http://marc.info/?l=linux-ide&m=121753059307310&w=4
Handled-By : Ben Dooks <[email protected]>

2008-08-25 12:44:49

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

[ 6.551876] all_generic_ide used greatest stack depth: 4784 bytes left
Begin: Loading essential drivers... ...
[ 6.658003] fuse init (API version 7.9)
[ 6.661876] modprobe used greatest stack depth: 1720 bytes left
[ 6.683510] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM CPU_TM2 1 MSFT 100000E)
[ 6.690632] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858
[ 6.690632] IP: [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[ 6.690632] PGD 21a145067 PUD 22f13a067 PMD 0
[ 6.690632] Oops: 0002 [1] SMP DEBUG_PAGEALLOC
[ 6.690632] CPU 1
[ 6.690632] Modules linked in: processor(+) fan thermal_sys fuse
[ 6.690632] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #30
[ 6.690632] RIP: 0010:[<ffffffff8025e512>] [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[ 6.690632] RSP: 0018:ffff88021a959998 EFLAGS: 00010002
[ 6.690632] RAX: 0000000000000000 RBX: ffff88021a9599d8 RCX: 0000000000000000
[ 6.690632] RDX: 0000000000000001 RSI: ffff88021a9599d8 RDI: ffffffffa0091a60
[ 6.690632] RBP: ffff88021a9599b8 R08: ffffffff811deff0 R09: ffff8800a6fdb000
[ 6.690632] R10: ffffffffa008f524 R11: 0000000000000000 R12: ffffffffa0091a60
[ 6.690632] R13: ffff88021a958000 R14: ffff88021a1c2050 R15: ffffffffa0091a98
[ 6.690632] FS: 00007f28063c16e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[ 6.690632] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6.690632] CR2: 0000000000000858 CR3: 0000000219c64000 CR4: 00000000000006e0
[ 6.690632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6.690632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6.690632] Process modprobe (pid: 1259, threadinfo ffff88021a958000, task ffff88021a1c2050)
[ 6.690632] Stack: 0000000000000000 ffffffffa0091a60 0000000000000246 ffffffffa008f524
[ 6.690632] ffff88021a959a38 ffffffff8049f856 ffffffffa008f524 ffffffffa0091a18
[ 6.690632] ffff88021a9599d8 ffff88021a9599d8 1111111111111111 1111111111111111
[ 6.690632] Call Trace:
[ 6.690632] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 6.690632] [<ffffffff8049f856>] mutex_lock_nested+0xa6/0x250
[ 6.690632] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 6.690632] [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[ 6.690632] [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[ 6.690632] [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 6.690632] [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[ 6.690632] [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[ 6.690632] [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[ 6.690632] [<ffffffff803a821e>] acpi_start_single_object+0x2d/0x52
[ 6.690632] [<ffffffff803a9816>] acpi_device_probe+0x7e/0x92
[ 6.690632] [<ffffffff803dd6ab>] driver_probe_device+0x9b/0x1a0
[ 6.690632] [<ffffffff803dd836>] __driver_attach+0x86/0x90
[ 6.690632] [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[ 6.690632] [<ffffffff803dcbfd>] bus_for_each_dev+0x5d/0x90
[ 6.690632] [<ffffffff803dd4ec>] driver_attach+0x1c/0x20
[ 6.690632] [<ffffffff803dd239>] bus_add_driver+0x1e9/0x260
[ 6.690632] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 6.690632] [<ffffffff803dda0f>] driver_register+0x5f/0x140
[ 6.690632] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 6.690632] [<ffffffff803a9b26>] acpi_bus_register_driver+0x3e/0x40
[ 6.690632] [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[ 6.690632] [<ffffffff80209040>] _stext+0x40/0x180
[ 6.690632] [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[ 6.690632] [<ffffffff802678d2>] sys_init_module+0x142/0x1dc0
[ 6.690632] [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[ 6.690632] [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[ 6.690632] [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[ 6.690632]
[ 6.690632]
[ 6.690632] Code: 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 8b 47 08 49 89 d5 49 89 fc 89 c2 25 ff ff 00 00 c1 ea 10 39 c2 74 1d 49 8b 4
[ 6.690632] RIP [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[ 6.690632] RSP <ffff88021a959998>
[ 6.690632] CR2: 0000000000000858
[ 6.690632] ---[ end trace 62c38812ae35bad0 ]---
[ 7.060556] ------------[ cut here ]------------
[ 7.060741] WARNING: at kernel/sched_fair.c:884 hrtick_start_fair+0x187/0x190()
[ 7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.060741] Pid: 1259, comm: modprobe Tainted: G D 2.6.27-rc3 #30
[ 7.060741]
[ 7.060741] Call Trace:
[ 7.060741] <IRQ> [<ffffffff8023bcff>] warn_on_slowpath+0x5f/0x80
[ 7.060741] [<ffffffff8022db37>] hrtick_start_fair+0x187/0x190
[ 7.060741] [<ffffffff8022ee89>] enqueue_task_fair+0x49/0x250
[ 7.060741] [<ffffffff8022c4a0>] enqueue_task+0x50/0x60
[ 7.060741] [<ffffffff8022c4d3>] activate_task+0x23/0x40
[ 7.060741] [<ffffffff80231863>] try_to_wake_up+0x253/0x280
[ 7.060741] [<ffffffff8023189d>] default_wake_function+0xd/0x10
[ 7.060741] [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[ 7.060741] [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[ 7.060741] [<ffffffff8022d583>] __wake_up+0x43/0x70
[ 7.060741] [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[ 7.060741] [<ffffffff8024e178>] insert_work+0x48/0x50
[ 7.060741] [<ffffffff8024eb01>] __queue_work+0x31/0x50
[ 7.060741] [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[ 7.060741] [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[ 7.060741] [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[ 7.060741] [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[ 7.060741] [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[ 7.060741] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.060741] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.060741] [<ffffffff80241705>] irq_exit+0x85/0x90
[ 7.060741] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.060741] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.060741] <EOI> [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[ 7.060741] [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8049fe57>] ? down_read+0x37/0x40
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[ 7.060741] [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[ 7.060741] [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[ 7.060741] [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[ 7.060741] [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.060741] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.060741] [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.060741] [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[ 7.060741] [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[ 7.060741] [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[ 7.060741] [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[ 7.060741] [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[ 7.060741] [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[ 7.060741] [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[ 7.060741] [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[ 7.060741] [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[ 7.060741] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060741] [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[ 7.060741] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060741] [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.060741] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.060741] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.060741] [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[ 7.060741] [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[ 7.060741] [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[ 7.060741] [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[ 7.060741] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.060741]
[ 7.060741] ---[ end trace 62c38812ae35bad0 ]---
[ 7.060741] ------------[ cut here ]------------
[ 7.060741] kernel BUG at kernel/sched.c:1155!
[ 7.060741] invalid opcode: 0000 [2] SMP DEBUG_PAGEALLOC
[ 7.060741] CPU 1
[ 7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.060741] Pid: 1259, comm: modprobe Tainted: G D W 2.6.27-rc3 #30
[ 7.060741] RIP: 0010:[<ffffffff8022ce3b>] [<ffffffff8022ce3b>] resched_task+0x6b/0x70
[ 7.060741] RSP: 0018:ffff88022f12bce0 EFLAGS: 00010046
[ 7.060741] RAX: 00000000000006e5 RBX: 00000000012c627a RCX: ffff88021a958000
[ 7.060741] RDX: 00000000000006e5 RSI: 0000000000000000 RDI: ffff88021a1c2050
[ 7.060741] RBP: ffff88022f12bce0 R08: ffff88022f1d8038 R09: ffff88021a1c2088
[ 7.060741] R10: ffffffff810c9e00 R11: 0000000000000000 R12: ffff8800a6fc9000
[ 7.060741] R13: ffffffff810c9e00 R14: ffff88021a1c2050 R15: 0000000000000001
[ 7.060741] FS: 00007f28063c16e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[ 7.060741] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7.060741] CR2: 0000000000000858 CR3: 0000000219c64000 CR4: 00000000000006e0
[ 7.060741] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.060741] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7.060741] Process modprobe (pid: 1259, threadinfo ffff88021a958000, task ffff88021a1c2050)
[ 7.060741] Stack: ffff88022f12bd20 ffffffff802389c3 0000000000000400 0000000000400000
[ 7.060741] ffff88022f1d8000 ffff8800280a4e00 0000000000000001 0000000000000003
[ 7.060741] ffff88022f12bd70 ffffffff802316cf 0000000100000001 0000000000000000
[ 7.060741] Call Trace:
[ 7.060741] <IRQ> [<ffffffff802389c3>] check_preempt_wakeup+0x133/0x1c0
[ 7.060741] [<ffffffff802316cf>] try_to_wake_up+0xbf/0x280
[ 7.060741] [<ffffffff8023189d>] default_wake_function+0xd/0x10
[ 7.060741] [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[ 7.060741] [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[ 7.060741] [<ffffffff8022d583>] __wake_up+0x43/0x70
[ 7.060741] [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[ 7.060741] [<ffffffff8024e178>] insert_work+0x48/0x50
[ 7.060741] [<ffffffff8024eb01>] __queue_work+0x31/0x50
[ 7.060741] [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[ 7.060741] [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[ 7.060741] [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[ 7.060741] [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[ 7.060741] [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[ 7.060741] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.060741] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.060741] [<ffffffff80241705>] irq_exit+0x85/0x90
[ 7.060741] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.060741] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.060741] <EOI> [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[ 7.060741] [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8049fe57>] ? down_read+0x37/0x40
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060741] [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[ 7.060741] [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[ 7.060741] [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[ 7.060741] [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[ 7.060741] [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[ 7.060741] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060741] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.060741] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.060741] [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.060741] [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[ 7.060741] [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[ 7.060741] [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[ 7.060741] [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[ 7.060741] [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[ 7.060741] [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[ 7.060741] [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[ 7.060741] [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[ 7.060741] [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[ 7.060741] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060741] [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[ 7.060741] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060741] [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.060741] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.060741] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.060741] [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[ 7.060741] [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[ 7.060741] [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[ 7.060741] [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[ 7.060741] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.060741]
[ 7.060741]
[ 7.060741] Code: 8b 47 08 8b 50 1c 65 8b 04 25 24 00 00 00 39 c2 74 0d 0f ae f0 48 8b 47 08 f6 40 18 04 74 02 c9 c3 89 d7 ff 15 0f 79 3c 00 c
[ 7.060741] RIP [<ffffffff8022ce3b>] resched_task+0x6b/0x70
[ 7.060741] RSP <ffff88022f12bce0>
[ 7.060741] ---[ end trace 62c38812ae35bad0 ]---
[ 7.060741] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 7.060741] ------------[ cut here ]------------
[ 7.060741] WARNING: at kernel/smp.c:328 smp_call_function_mask+0x25a/0x260()
[ 7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[ 7.060741] Pid: 1259, comm: modprobe Tainted: G D W 2.6.27-rc3 #30
[ 7.060741]
[ 7.060741] Call Trace:
[ 7.060741] <IRQ> [<ffffffff8023bcff>] warn_on_slowpath+0x5f/0x80
[ 7.060741] [<ffffffff8026535a>] smp_call_function_mask+0x25a/0x260
[ 7.060741] [<ffffffff8036987d>] ? string+0x3d/0xd0
[ 7.060741] [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[ 7.060741] [<ffffffff8036987d>] ? string+0x3d/0xd0
[ 7.060741] [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[ 7.060741] [<ffffffff8036987d>] ? string+0x3d/0xd0
[ 7.060741] [<ffffffff8036987d>] ? string+0x3d/0xd0
[ 7.060741] [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[ 7.060741] [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[ 7.060741] [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[ 7.060741] [<ffffffff8026a0dd>] ? kallsyms_lookup+0x5d/0xa0
[ 7.060741] [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[ 7.060741] [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[ 7.060741] [<ffffffff8036a098>] ? sprintf+0x68/0x70
[ 7.060741] [<ffffffff8036987d>] ? string+0x3d/0xd0
[ 7.060741] [<ffffffff804a4273>] ? __atomic_notifier_call_chain+0x83/0xa0
[ 7.060741] [<ffffffff804a41f0>] ? __atomic_notifier_call_chain+0x0/0xa0
[ 7.060741] [<ffffffff804a11b6>] ? _spin_unlock+0x26/0x30
[ 7.060741] [<ffffffff8021c470>] ? stop_this_cpu+0x0/0x30
[ 7.060741] [<ffffffff802653a0>] smp_call_function+0x40/0x50
[ 7.060741] [<ffffffff8021c4f3>] native_smp_send_stop+0x23/0x40
[ 7.060741] [<ffffffff8023c04f>] panic+0xaf/0x190
[ 7.060741] [<ffffffff8023cea7>] ? printk+0x67/0x70
[ 7.060741] [<ffffffff8049f7a9>] ? mutex_unlock+0x9/0x10
[ 7.060741] [<ffffffff80256f21>] ? blocking_notifier_call_chain+0x11/0x20
[ 7.060741] [<ffffffff8023fb89>] do_exit+0x869/0xa10
[ 7.060741] [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[ 7.060741] [<ffffffff804a1d17>] oops_end+0x87/0x90
[ 7.060741] [<ffffffff8020e08e>] die+0x5e/0x90
[ 7.060741] [<ffffffff804a2230>] do_trap+0x130/0x150
[ 7.060741] [<ffffffff8020e662>] do_invalid_op+0x92/0xb0
[ 7.060741] [<ffffffff8022ce3b>] ? resched_task+0x6b/0x70
[ 7.060741] [<ffffffff804a18ed>] error_exit+0x0/0x9a
[ 7.060741] [<ffffffff8022ce3b>] ? resched_task+0x6b/0x70
[ 7.060741] [<ffffffff802389c3>] check_preempt_wakeup+0x133/0x1c0
[ 7.060741] [<ffffffff802316cf>] try_to_wake_up+0xbf/0x280
[ 7.060741] [<ffffffff8023189d>] default_wake_function+0xd/0x10
[ 7.060741] [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[ 7.060741] [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[ 7.060741] [<ffffffff8022d583>] __wake_up+0x43/0x70
[ 7.060741] [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[ 7.060741] [<ffffffff8024e178>] insert_work+0x48/0x50
[ 7.060741] [<ffffffff8024eb01>] __queue_work+0x31/0x50
[ 7.060741] [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[ 7.060741] [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[ 7.060741] [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[ 7.060741] [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[ 7.060741] [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[ 7.060741] [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[ 7.060741] [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[ 7.060741] [<ffffffff80241705>] irq_exit+0x85/0x90
[ 7.060741] [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[ 7.060741] [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[ 7.060741] <EOI> [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[ 7.060741] [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[ 7.060741] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060742] [<ffffffff8049fe57>] ? down_read+0x37/0x40
[ 7.060742] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060742] [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[ 7.060742] [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[ 7.060742] [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[ 7.060742] [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[ 7.060742] [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[ 7.060742] [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[ 7.060742] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060742] [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[ 7.060742] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060742] [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[ 7.060742] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060742] [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[ 7.060742] [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[ 7.060742] [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[ 7.060742] [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[ 7.060742] [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[ 7.060742] [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[ 7.060742] [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[ 7.060742] [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[ 7.060742] [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[ 7.060742] [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[ 7.060742] [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[ 7.060742] [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[ 7.060742] [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[ 7.060742] [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[ 7.060742] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060742] [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[ 7.060742] [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[ 7.060742] [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[ 7.060742] [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[ 7.060742] [<ffffffff80209040>] ? _stext+0x40/0x180
[ 7.060742] [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[ 7.060742] [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[ 7.060742] [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[ 7.060742] [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[ 7.060742] [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[ 7.060742]
[ 7.060742] ---[ end trace 62c38812ae35bad0 ]---


Attachments:
prob4.txt (4.07 kB)
prob4a.txt (21.13 kB)
Download all attachments

2008-08-25 13:03:27

by Daniel J Blueman

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

Hi Linus, Vegard,

On Sun, Aug 24, 2008 at 7:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Sun, 24 Aug 2008, Vegard Nossum wrote:
[snip]
> Anyway, I think your patch is likely fine, I just thought it looked a bit
> odd to have a loop to move a list from one head pointer to another.
>
> But regardless, it would need some testing. Daniel?

This opens another lockdep report at boot-time [1] - promoting
pool_lock may not be the best fix?

We then see a new deadlock condition (on the pool_lock spinlock) [2],
which seemingly was avoided by taking the debug-bucket lock first.

We reproduce this by booting with debug_objects=1 and causing a lot of activity.

Daniel

--- [1]

[ INFO: possible irq lock inversion dependency detected ]
2.6.27-rc4-225c-debug #3
---------------------------------------------------------
rcu_sched_grace/9 just changed the state of lock:
(pool_lock){-...}, at: [<ffffffff80466c2c>] free_object+0x7c/0xc0
but this lock was taken by another, hard-irq-safe lock in the past:
(xtime_lock){++..}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
no locks held by rcu_sched_grace/9.

the first lock's dependencies:
-> (pool_lock){-...} ops: 59 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>] _spin_lock+0x41/0x80
[<ffffffff804675fa>]
__debug_object_init+0x14a/0x3e0
[<ffffffff804678df>]
debug_object_init+0x1f/0x30
[<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
[<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
[<ffffffff812f3810>]
sched_init+0x72/0x63d
[<ffffffff812e17e2>]
start_kernel+0x19c/0x456
[<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff
hardirq-on-W at:
[<ffffffff8026fae9>]
__lock_acquire+0x4f9/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>] _spin_lock+0x41/0x80
[<ffffffff80466c2c>]
free_object+0x7c/0xc0
[<ffffffff8046707e>]
debug_object_free+0xbe/0x130
[<ffffffff806a070e>]
schedule_timeout+0x7e/0xe0
[<ffffffff806a07ce>]
schedule_timeout_interruptible+0x1e/0x20
[<ffffffff8029cfb2>]
rcu_sched_grace_period+0xa2/0x3a0
[<ffffffff8025d50e>] kthread+0x4e/0x90
[<ffffffff8020d919>] child_rip+0xa/0x11
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff8088f5d8>] pool_lock+0x18/0x40

the second lock's dependencies:
-> (xtime_lock){++..} ops: 211 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>] _spin_lock+0x41/0x80
[<ffffffff812f5628>]
timekeeping_init+0x2f/0x144
[<ffffffff812e189d>]
start_kernel+0x257/0x456
[<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff
in-hardirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
in-softirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff80907220>] xtime_lock+0x20/0x40
-> (&obj_hash[i].lock){.+..} ops: 1003901 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
[<ffffffff80467567>]
__debug_object_init+0xb7/0x3e0
[<ffffffff804678df>]
debug_object_init+0x1f/0x30
[<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
[<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
[<ffffffff812f3810>]
sched_init+0x72/0x63d
[<ffffffff812e17e2>]
start_kernel+0x19c/0x456
[<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff
in-softirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff81cd2eb0>] __key.16550+0x0/0x8
-> (pool_lock){-...} ops: 59 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>]
_spin_lock+0x41/0x80
[<ffffffff804675fa>]
__debug_object_init+0x14a/0x3e0
[<ffffffff804678df>]
debug_object_init+0x1f/0x30
[<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
[<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
[<ffffffff812f3810>]
sched_init+0x72/0x63d
[<ffffffff812e17e2>]
start_kernel+0x19c/0x456
[<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>]
0xffffffffffffffff
hardirq-on-W at:
[<ffffffff8026fae9>]
__lock_acquire+0x4f9/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>]
_spin_lock+0x41/0x80
[<ffffffff80466c2c>]
free_object+0x7c/0xc0
[<ffffffff8046707e>]
debug_object_free+0xbe/0x130
[<ffffffff806a070e>]
schedule_timeout+0x7e/0xe0
[<ffffffff806a07ce>]
schedule_timeout_interruptible+0x1e/0x20
[<ffffffff8029cfb2>]
rcu_sched_grace_period+0xa2/0x3a0
[<ffffffff8025d50e>]
kthread+0x4e/0x90
[<ffffffff8020d919>]
child_rip+0xa/0x11
[<ffffffffffffffff>]
0xffffffffffffffff
}
... key at: [<ffffffff8088f5d8>] pool_lock+0x18/0x40
... acquired at:
[<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
[<ffffffff802707e1>] lock_acquire+0x91/0xc0
[<ffffffff806a2bb1>] _spin_lock+0x41/0x80
[<ffffffff804675fa>] __debug_object_init+0x14a/0x3e0
[<ffffffff804678df>] debug_object_init+0x1f/0x30
[<ffffffff80260eee>] hrtimer_init+0x2e/0x50
[<ffffffff80237571>] init_rt_bandwidth+0x41/0x60
[<ffffffff812f3810>] sched_init+0x72/0x63d
[<ffffffff812e17e2>] start_kernel+0x19c/0x456
[<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff

... acquired at:
[<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
[<ffffffff802707e1>] lock_acquire+0x91/0xc0
[<ffffffff806a2d43>] _spin_lock_irqsave+0x53/0x90
[<ffffffff80467567>] __debug_object_init+0xb7/0x3e0
[<ffffffff804678df>] debug_object_init+0x1f/0x30
[<ffffffff80260eee>] hrtimer_init+0x2e/0x50
[<ffffffff812f577b>] ntp_init+0x1e/0x2b
[<ffffffff812f5633>] timekeeping_init+0x3a/0x144
[<ffffffff812e189d>] start_kernel+0x257/0x456
[<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff

-> (clocksource_lock){++..} ops: 214 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
[<ffffffff80266245>]
clocksource_get_next+0x15/0x60
[<ffffffff812f5638>]
timekeeping_init+0x3f/0x144
[<ffffffff812e189d>]
start_kernel+0x257/0x456
[<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff
in-hardirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
in-softirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff80877bb8>] clocksource_lock+0x18/0x40
... acquired at:
[<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
[<ffffffff802707e1>] lock_acquire+0x91/0xc0
[<ffffffff806a2d43>] _spin_lock_irqsave+0x53/0x90
[<ffffffff80266245>] clocksource_get_next+0x15/0x60
[<ffffffff812f5638>] timekeeping_init+0x3f/0x144
[<ffffffff812e189d>] start_kernel+0x257/0x456
[<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
[<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
[<ffffffffffffffff>] 0xffffffffffffffff

-> (old_style_seqlock_init){++..} ops: 210 {
initial-use at:
[<ffffffffffffffff>] 0xffffffffffffffff
in-hardirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
in-softirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff812d61a0>] nl80211_policy+0xda0/0x2c00
... acquired at:
[<ffffffffffffffff>] 0xffffffffffffffff

-> (ftrace_shutdown_lock){++..} ops: 480 {
initial-use at:
[<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
[<ffffffff802707e1>]
lock_acquire+0x91/0xc0
[<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
[<ffffffff802a4216>]
ftrace_record_ip+0x196/0x2f0
[<ffffffff8020c6b4>]
mcount_call+0x5/0x31
[<ffffffff812e1c9b>]
kernel_init+0x14d/0x1b2
[<ffffffff8020d919>] child_rip+0xa/0x11
[<ffffffffffffffff>] 0xffffffffffffffff
in-hardirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
in-softirq-W at:
[<ffffffffffffffff>] 0xffffffffffffffff
}
... key at: [<ffffffff8087c1b8>] ftrace_shutdown_lock+0x18/0x40
... acquired at:
[<ffffffffffffffff>] 0xffffffffffffffff


stack backtrace:
Pid: 9, comm: rcu_sched_grace Not tainted 2.6.27-rc4-225c-debug #3

Call Trace:
[<ffffffff8026d8b2>] print_irq_inversion_bug+0x142/0x160
[<ffffffff8026dd27>] check_usage_backwards+0x67/0xb0
[<ffffffff8026ebd3>] mark_lock+0x363/0x7f0
[<ffffffff8026fae9>] __lock_acquire+0x4f9/0x1160
[<ffffffff802707e1>] lock_acquire+0x91/0xc0
[<ffffffff80466c2c>] ? free_object+0x7c/0xc0
[<ffffffff806a2bb1>] _spin_lock+0x41/0x80
[<ffffffff80466c2c>] ? free_object+0x7c/0xc0
[<ffffffff806a6bc9>] ? sub_preempt_count+0x69/0xd0
[<ffffffff80466c2c>] free_object+0x7c/0xc0
[<ffffffff8046707e>] debug_object_free+0xbe/0x130
[<ffffffff806a070e>] schedule_timeout+0x7e/0xe0
[<ffffffff802503a0>] ? process_timeout+0x0/0x10
[<ffffffff806a06f2>] ? schedule_timeout+0x62/0xe0
[<ffffffff8029cf10>] ? rcu_sched_grace_period+0x0/0x3a0
[<ffffffff806a07ce>] schedule_timeout_interruptible+0x1e/0x20
[<ffffffff8029cfb2>] rcu_sched_grace_period+0xa2/0x3a0
[<ffffffff8029cf10>] ? rcu_sched_grace_period+0x0/0x3a0
[<ffffffff8025d50e>] kthread+0x4e/0x90
[<ffffffff8020d919>] child_rip+0xa/0x11
[<ffffffff80239b8f>] ? finish_task_switch+0x5f/0x120
[<ffffffff806a356b>] ? _spin_unlock_irq+0x3b/0x70
[<ffffffff8020cf23>] ? restore_args+0x0/0x30
[<ffffffff8025d4c0>] ? kthread+0x0/0x90
[<ffffffff8020d90f>] ? child_rip+0x0/0x11

--- [2]

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
58 0 63972 17316 16 145172 128 21176 128 21176 6054 8662 79 21 0 0
47 2 81020 14380 16 139472 1120 17052 1240 17052 6106 9532 77 23 0 0
52 1 86276 33656 16 137140 604 5304 796 5304 5349 7954 81 19 0 0
94 0 86276 32192 16 137484 480 0 772 0 5418 7618 84 16 0 0
88 0 86264 22416 16 137800 96 0 396 0 4746 5937 87 13 0 0
63 1 86200 54636 16 137932 1380 0 1408 0 5472 8007 82 18 0 0
47 0 86020 22848 16 138132 256 0 312 0 6126 12227 72 28 0 0
75 2 103828 20252 16 135500 528 17836 592 17836 6655 12862 69 31 0 0
21 0 128568 17536 16 128888 2336 24732 2336 24732 6762 12891 66 34 0 0
159 0 154996 16888 16 124808 480 26236 504 26236 5930 7689 80 20 0 0
45 0 165616 40108 16 120544 192 10696 248 10696 6136 9163 77 23 0 0
95 0 165616 27296 16 120632 924 0 940 0 5293 7468 82 18 0 0
BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff80214407, registers:
CPU 0
Modules linked in: rfcomm l2cap bluetooth kvm_intel kvm microcode
dvb_usb_dtt200u dvb_usb uvcvideo dvb_core compat_ioctl32 i2c_core
videodev v4l1_compat shpchp pcig
Pid: 6948, comm: spiral Not tainted 2.6.27-rc4-225c-debug #3
RIP: 0010:[<ffffffff80214407>] [<ffffffff80214407>] native_read_tsc+0x7/0x30
RSP: 0018:ffffffff8153ab70 EFLAGS: 00000002
RAX: 00000467c85cb001 RBX: ffffffff8088f5c0 RCX: 00000000c85cb001
RDX: 00000000c85cb001 RSI: 0000000000000103 RDI: 0000000000000001
RBP: ffffffff8153ab70 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8800cf978000 R12: 00000000c85cb001
R13: 0000000000000001 R14: 0000000000000000 R15: ffff88012a61c8c0
FS: 00007fb48cb796e0(0000) GS:ffffffff80e82dc0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000075212e6 CR3: 00000000cc1e6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process spiral (pid: 6948, threadinfo ffff8800cf950000, task ffff8800cf978000)
Stack: ffffffff8153aba0 ffffffff804560c7 ffffffff8088f5c0 0000000005b61b94
00000000d6915628 0000000000000001 ffffffff8153abb0 ffffffff80455fff
ffffffff8153abe0 ffffffff80466302 ffffffff8088f5d8 ffffffff8088f5c0
Call Trace:
<IRQ> [<ffffffff804560c7>] delay_tsc+0x67/0xd0
[<ffffffff80455fff>] __delay+0xf/0x20
[<ffffffff80466302>] _raw_spin_lock+0x122/0x170
[<ffffffff806a2bd1>] _spin_lock+0x61/0x80
[<ffffffff80466c2c>] ? free_object+0x7c/0xc0
[<ffffffff80466c2c>] free_object+0x7c/0xc0
[<ffffffff80466e0f>] __debug_check_no_obj_freed+0x19f/0x1e0
[<ffffffff80466e65>] debug_check_no_obj_freed+0x15/0x20
[<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
[<ffffffff80517fa1>] ? scsi_pool_free_command+0x51/0x60
[<ffffffff80517fa1>] scsi_pool_free_command+0x51/0x60
[<ffffffff805184bf>] __scsi_put_command+0x5f/0xa0
[<ffffffff80518561>] scsi_put_command+0x61/0x70
[<ffffffff8051e47a>] scsi_next_command+0x3a/0x60
[<ffffffff8051e544>] scsi_end_request+0xa4/0xc0
[<ffffffff8051e68f>] scsi_io_completion+0x12f/0x440
[<ffffffff80517bf5>] scsi_finish_command+0x95/0xd0
[<ffffffff8051eb36>] scsi_softirq_done+0x86/0x110
[<ffffffff8043bbed>] blk_done_softirq+0x8d/0xa0
[<ffffffff8024ba04>] __do_softirq+0x74/0xf0
[<ffffffff8020dc7c>] call_softirq+0x1c/0x30
[<ffffffff8020f485>] do_softirq+0x75/0xb0
[<ffffffff8024b155>] irq_exit+0xa5/0xb0
[<ffffffff8020f7d3>] do_IRQ+0xe3/0x1d0
[<ffffffff80466c66>] ? free_object+0xb6/0xc0
[<ffffffff8020ce76>] ret_from_intr+0x0/0xf
<EOI> [<ffffffff80270ba0>] ? lock_release+0xe0/0x210
[<ffffffff806a3253>] ? _spin_unlock+0x23/0x60
[<ffffffff80466c66>] ? free_object+0xb6/0xc0
[<ffffffff8046707e>] ? debug_object_free+0xbe/0x130
[<ffffffff806a070e>] ? schedule_timeout+0x7e/0xe0
[<ffffffff802503a0>] ? process_timeout+0x0/0x10
[<ffffffff806a06f2>] ? schedule_timeout+0x62/0xe0
[<ffffffff802f8b3e>] ? do_select+0x4be/0x610
[<ffffffff802f8c90>] ? __pollwait+0x0/0x120
[<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
[<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
[<ffffffff8026f36d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff806a0b9e>] ? mutex_unlock+0xe/0x10
[<ffffffff8065919e>] ? unix_stream_recvmsg+0x32e/0x6d0
[<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
[<ffffffff8026f36d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff802f8f4b>] ? core_sys_select+0x19b/0x2e0
[<ffffffff802e89e9>] ? do_sync_read+0xf9/0x140
[<ffffffff8025d8e0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff806a6bc9>] ? sub_preempt_count+0x69/0xd0
[<ffffffff802f94b0>] ? sys_select+0xd0/0x1c0
[<ffffffff8020c86b>] ? system_call_fastpath+0x16/0x1b


Code: 90 90 90 90 55 89 f8 48 89 e5 e6 70 e4 71 c9 c3 0f 1f 40 00 55
89 f0 48 89 e5 e6 70 89 f8 e6 71 c9 c3 66 90 55 48 89 e5 0f 1f 00 <0f>
ae e8 0f 31 89 c1 0f 1f
BUG: NMI Watchdog detected LOCKUP<4>---[ end trace fd851c3db62e5044 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!
on CPU1, ip ffffffff80214407, registers:
CPU 1
Modules linked in: rfcomm l2cap bluetooth kvm_intel kvm microcode
dvb_usb_dtt200u dvb_usb uvcvideo dvb_core compat_ioctl32 i2c_core
videodev v4l1_compat shpchp pcig
Pid: 10150, comm: gcc Not tainted 2.6.27-rc4-225c-debug #3
RIP: 0010:[<ffffffff80214407>] [<ffffffff80214407>] native_read_tsc+0x7/0x30
RSP: 0018:ffff8800b6269c28 EFLAGS: 00000092
RAX: 0000000000000001 RBX: ffffffff8088f5c0 RCX: 00000000c85cafc2
RDX: 000000008b468b00 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8800b6269c28 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8800a65047e0 R12: 0000000005d22695
R13: 0000000000000001 R14: 0000000000000001 R15: ffff88009e9be940
FS: 00002b124e4fc6e0(0000) GS:ffff88012fa644b0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b8a265f5960 CR3: 00000000b61e9000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gcc (pid: 10150, threadinfo ffff8800b6268000, task ffff8800a65047e0)
Stack: ffff8800b6269c58 ffffffff8045608a ffffffff8088f5c0 0000000005d22695
00000000d6915628 0000000000000001 ffff8800b6269c68 ffffffff80455fff
ffff8800b6269c98 ffffffff80466302 ffffffff8088f5d8 ffffffff8088f5c0
Call Trace:
[<ffffffff8045608a>] delay_tsc+0x2a/0xd0
[<ffffffff80455fff>] __delay+0xf/0x20
[<ffffffff80466302>] _raw_spin_lock+0x122/0x170
[<ffffffff806a2bd1>] _spin_lock+0x61/0x80
[<ffffffff80466c2c>] ? free_object+0x7c/0xc0
[<ffffffff80466c2c>] free_object+0x7c/0xc0
[<ffffffff80466e0f>] __debug_check_no_obj_freed+0x19f/0x1e0
[<ffffffff80466e65>] debug_check_no_obj_freed+0x15/0x20
[<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
[<ffffffff8024264b>] ? __cleanup_signal+0x1b/0x20
[<ffffffff8024264b>] __cleanup_signal+0x1b/0x20
[<ffffffff80248273>] release_task+0x233/0x3d0
[<ffffffff80248960>] wait_consider_task+0x550/0x8b0
[<ffffffff80248e16>] do_wait+0x156/0x350
[<ffffffff8023b6f0>] ? default_wake_function+0x0/0x10
[<ffffffff802490a6>] sys_wait4+0x96/0xf0
[<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b


Code: 90 90 90 90 55 89 f8 48 89 e5 e6 70 e4 71 c9 c3 0f 1f 40 00 55
89 f0 48 89 e5 e6 70 89 f8 e6 71 c9 c3 66 90 55 48 89 e5 0f 1f 00 <0f>
ae e8 0f 31 89 c1 0f 1f
---[ end trace fd851c3db62e5044 ]---
--
Daniel J Blueman

2008-08-25 13:14:20

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Adding in SLUB debugging doesn't show anything new (I think). Example
boot log (w/ initcall_debug enabled) is at:

http://free.linux.hp.com/~adb/bug.11342/prob5.txt

This has happened 3 times in a row as well. Whilst this is being looked
at, I'm going to fast-forward ahead to the latest in Linus' tree, and
see if the problem is still occurring (I think Linus' point earlier
about some sort of rogue timing and/or corruption bug is spot on, but
it's probably better to see how close to "today's tree" I can reproduce
this). I'll also try kernels w/ the problematic merge patch backed out
to see if that still "fixes" (or more likely(?) just patches over the
real problem).

Alan

2008-08-25 14:04:33

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'

On Sun, Aug 24, 2008 at 08:13:55AM +0200, Frans Pop wrote:
> On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11356
> > Subject : Linux 2.6.27-rc3 - build failure: undefined reference to
> > `.lockdep_count_forward_deps'
> > Submitter : Frans Pop <[email protected]>
> > Date : 2008-08-16 19:11 (8 days old)
> > References : http://marc.info/?l=linux-kernel&m=121891396320127&w=4
>
> Fixed as per: http://marc.info/?l=linux-kernel&m=121898767530602&w=4
> Adrian mentioned that he'd closed the bug, but apparently not.

Sorry, I missed that Rafael had opened two bugs for two people reporting
the same issue, and only closed the other one.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-25 14:05:40

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

I built a kernel @

commit 83097aca8567a0bd593534853b71fe0fa9a75d69
Author: Arjan van de Ven <[email protected]>
Date: Sat Aug 23 21:45:21 2008 -0700

And it fails like the others do

o http://free.linux.hp.com/~adb/bug.11342/prob6.txt

SMP_DEBUG_PAGEALLOC

o http://free.linux.hp.com/~adb/bug.11342/prob6a.txt

[ 7.591198] BUG: unable to handle kernel NULL pointer dereference at
0000000000000858



I then backed out /just/ the merge for

commit 1c89ac55017f982355c7761e1c912c88c941483d
Merge: 88fa08f... b1b135c...
Author: Linus Torvalds <[email protected]>
Date: Tue Aug 12 08:40:19 2008 -0700

And the machine has booted fine 5 times in a row.



I've put the latest .config up at

http://free.linux.hp.com/~adb/bug.11342/config.txt


Is there /some/ way to break down the patches within the merged patch,
and I could by-hand bisect through those?

Here's what I did to take the latest tree, and back out that merge (to
get booting kernels):

git-diff
88fa08f67bee1a0c765237bdac106a32872f57d2..1c89ac55017f982355c7761e1c912c88c941483d
| patch -p1 -R
patching file Documentation/lguest/lguest.c
patching file arch/powerpc/Kconfig
patching file arch/x86/Kconfig
patching file arch/x86/mm/Makefile
patching file drivers/char/hvc_console.c
patching file drivers/lguest/page_tables.c
patching file include/linux/Kbuild
Hunk #1 succeeded at 358 (offset 2 lines).
patching file include/linux/init.h
patching file include/linux/mm.h
patching file init/main.c
patching file kernel/module.c
patching file kernel/stop_machine.c
patching file mm/Kconfig
patching file mm/util.c

Alan

2008-08-25 18:01:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>
> Before adding any more debugging, this is the status of my kernel boots:
> 3 times in a row w/ this same error. (Primary problem is the same,
> secondary stacks differ of course.)

Ok, so I took a closer look, and the oops really is suggestive..

> [ 6.482953] busybox used greatest stack depth: 4840 bytes left

Ok, 4840 bytes left out of 8kB.

> [ 6.521876] all_generic_ide used greatest stack depth: 4784 bytes left

.. and this one is 4784 bytes left..

> Begin: Loading essential drivers... ...
> [ 6.625509] fuse init (API version 7.9)
> [ 6.625509] modprobe used greatest stack depth: 1720 bytes left

Uhhuh! The previous "modprobe" uses stack like mad. It could be
"fuse_init()" that has done it, but looking at fuse, I seriously doubt it.
It doesn't seem to do anything particularly bad.

So something has used over 6kB of stack, and it may well be the module
loading code itself.

The next stage is the actual oops itself:

> [ 6.644854] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM CPU_TM2 1 MSFT 100000E)
> [ 6.651489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858

This really looks like

ti->task->blocked_on = waiter;

where "ti->task" is NULL. You probably have almost everything enabled in
order to turn "struct task_struct" that big, but judging by your register
state it's really an offset off a NULL pointer, not some small integer.

Now, there is no way "ti->task" can _possibly_ be NULL. No way.

Well, except that "ti" is just below the stack, and if you had a stack
overflow that overwrote it.

So I seriously do believe that you have run out of stack. If that is true,
then it's quite likely that with DEBUG_PAGE_ALLOC you'll actually get a
double fault, which in turn is fairly hard to debug (you look at it wrong
and it turns into a triple fault which is going to just reboot your
machine immediately).

Now, the stack oveflow probably happened a few calls earlier (and just
left your thread_info corrupted), but there is more reason to believe you
have stack overflow and thread_info corruption later in your output:

> [ 7.024992] modprobe used greatest stack depth: 408 bytes left
> [ 7.030988] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [ 7.031053] IP: [<ffffffff8023f39c>] do_exit+0x28c/0xa10

Here there is only 408 bytes left, which is _way_ too little, but it's
also an optimistic measure. What the stack code usage code does is to just
see how many zeroes it can find on the stack. If you have a big stack
frame somewhere, it's quite possible that it actually used all your stack
and then some, but left a bunch of zeroes around.

And the do_exit() oops is simply because once the thread_info is
corrupted, all the basic thread data structures are crap, and yes, you're
almost guaranteed to oops at that point.

Could you make your kernel image available somewhere, and we can take a
look at it? Some versions of gcc are total pigs when it comes to stack
usage, and your exact configuration matters too. But yes, module loading
is a bad case, for me "sys_init_module()" contains

subq $392, %rsp #,

which is probably mostly because of the insane inlining gcc does (ie it
will likely have inlined every single function in that file that is only
called once, and then it will make all local variables of all those
functions alive over the whole function and allocate stack-space for them
ALL AT THE SAME TIME).

Gcc sometimes drives me mad. It's inlining decisions are almost always
pure and utter sh*t. But clearly something changed for you to start
triggering this, and I think that also explains why you bisected things to
the merge commit rather than to any individual change - because it was
probably not any individual change that pushed it over the limit, but two
different changes that made for bigger stack pressure, and _together_ they
pushed you over the limit.

So it also explains why the merge you found had no possible merge errors
on a source level - there were no actual clashes anywhere. Just a slow
growth of stack that combined to something that overflowed.

And yes, I bet the change by Arjan to use do_one_initcall() was _part_ of
it. It adds roughly 112 bytes of stack pressure to that module loading
path, because of the 64-byte array and the extra function call (8 bytes
for return address) with at least 5 quad-words saved (40 bytes) for
register spills.

But there were probably other things happening too that made things worse.

So if there is some place where you can upload your 'vmlinux' binary, it
would be good.

Linus

2008-08-25 18:03:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>
> With /just/ DEBUG_PAGE_ALLOC defined, I have seen two general panic types:
>
> o A new double fault w/ SMP_DEBUG_PAGEALLOC problem (prob4.txt)

Yeah, that's a stack overflow.

Confirmed.

Linus

2008-08-25 18:10:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Mon, 25 Aug 2008, Linus Torvalds wrote:
>
> Could you make your kernel image available somewhere, and we can take a
> look at it? Some versions of gcc are total pigs when it comes to stack
> usage, and your exact configuration matters too. But yes, module loading
> is a bad case, for me "sys_init_module()" contains
>
> subq $392, %rsp #,
>
> which is probably mostly because of the insane inlining gcc does (ie it
> will likely have inlined every single function in that file that is only
> called once, and then it will make all local variables of all those
> functions alive over the whole function and allocate stack-space for them
> ALL AT THE SAME TIME).

I bet this one-liner will probably make your kernel work. It's not a full
solution, but it will make the module-loading path lose _all_ of the above
stack slots by just not inlining "load_module()" - the stack slots will
still be used when the module is _loaded_, but by the time we actually
callt he ->init function they will have been released since it's not all
in the same crazy function any more.

I _seriously_ believe that we were better off back when gcc only inlined
what we told it to inline, and never inlined on its own. The gcc inlining
logic is pure and utter sh*t in an environment like the kernel where stack
space is a valuable resource.

Anyway, Alan, even if this solves your particular problem, I'd still like
to see your kernel image, so that I can hunt for other problems like
this..

Linus

---
kernel/module.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 08864d2..9db1191 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1799,7 +1799,7 @@ static void *module_alloc_update_bounds(unsigned long size)

/* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
-static struct module *load_module(void __user *umod,
+static noinline struct module *load_module(void __user *umod,
unsigned long len,
const char __user *uargs)
{

2008-08-25 20:20:16

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> Could you make your kernel image available somewhere, and we can take a
>> look at it? Some versions of gcc are total pigs when it comes to stack
>> usage, and your exact configuration matters too. But yes, module loading
>> is a bad case, for me "sys_init_module()" contains
>>
>> subq $392, %rsp #,
>>
>> which is probably mostly because of the insane inlining gcc does (ie it
>> will likely have inlined every single function in that file that is only
>> called once, and then it will make all local variables of all those
>> functions alive over the whole function and allocate stack-space for them
>> ALL AT THE SAME TIME).

Mine has:

Dump of assembler code for function sys_init_module:
0xffffffff802688c0 <sys_init_module+0>: push %rbp
0xffffffff802688c1 <sys_init_module+1>: mov %rsp,%rbp
0xffffffff802688c4 <sys_init_module+4>: sub $0x1c0,%rsp
0xffffffff802688cb <sys_init_module+11>: mov %r12,-0x20(%rbp)
0xffffffff802688cf <sys_init_module+15>: mov %rdi,%r12

so 448 bytes.

The kernel is up at: http://free.linux.hp.com/~adb/bug.11342/vmlinux (if
you would let me know when you are through with it so I can free up some
space there I'd appreciate it...)

By doing the patch you provided, sys_init_module now looks like:

Dump of assembler code for function sys_init_module:
0xffffffff8026aa20 <sys_init_module+0>: push %rbp
0xffffffff8026aa21 <sys_init_module+1>: mov %rsp,%rbp
0xffffffff8026aa24 <sys_init_module+4>: sub $0x20,%rsp
0xffffffff8026aa28 <sys_init_module+8>: mov %r14,0x18(%rsp)
0xffffffff8026aa2d <sys_init_module+13>: mov %rdi,%r14


So only 32 bytes. (But of course, load_module() exists, and now has
0x1d0 (464) bytes...)

With the patch you provide, I /was/ able to repeatedly boot OK (latest
tree, and I also ran the patch against the 26.27.rc3-based kernel I was
having problems with initially, and that booted OK as well).

Alan

>
> I bet this one-liner will probably make your kernel work. It's not a full
> solution, but it will make the module-loading path lose _all_ of the above
> stack slots by just not inlining "load_module()" - the stack slots will
> still be used when the module is _loaded_, but by the time we actually
> callt he ->init function they will have been released since it's not all
> in the same crazy function any more.
>
> I _seriously_ believe that we were better off back when gcc only inlined
> what we told it to inline, and never inlined on its own. The gcc inlining
> logic is pure and utter sh*t in an environment like the kernel where stack
> space is a valuable resource.
>
> Anyway, Alan, even if this solves your particular problem, I'd still like
> to see your kernel image, so that I can hunt for other problems like
> this..
>
> Linus
>
> ---
> kernel/module.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index 08864d2..9db1191 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1799,7 +1799,7 @@ static void *module_alloc_update_bounds(unsigned long size)
>
> /* Allocate and load the module: note that size of section 0 is always
> zero, and we rely on this for optional sections. */
> -static struct module *load_module(void __user *umod,
> +static noinline struct module *load_module(void __user *umod,
> unsigned long len,
> const char __user *uargs)
> {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-08-25 20:43:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>
> Mine has:
>
> Dump of assembler code for function sys_init_module:
> 0xffffffff802688c4 <sys_init_module+4>: sub $0x1c0,%rsp
>
> so 448 bytes.

Yeah, your build seems to have consistently bigger stack usage, and that
may be due to some config option, but most likely it's a compiler version
issue.

But I think part of the reason is that you have frame pointers enabled:
that makes the stack frames bigger not only because of the frame pointer
save/restore, but also because you have more register pressure and thus
spills.

> The kernel is up at: http://free.linux.hp.com/~adb/bug.11342/vmlinux (if
> you would let me know when you are through with it so I can free up some
> space there I'd appreciate it...)

I'm downloading it now, I'll probably be done by the time you get this
email.

[ Update. Done. You can remove it ]

> By doing the patch you provided, sys_init_module now looks like:
>
> Dump of assembler code for function sys_init_module:
> 0xffffffff8026aa24 <sys_init_module+4>: sub $0x20,%rsp
>
> So only 32 bytes. (But of course, load_module() exists, and now has
> 0x1d0 (464) bytes...)

Right - the stack usage didn't go away, but the _lifetimes_ changed.

So now load_module() will still use almost 500 bytes of stack, and it will
call other routines that use stack too, but the lifetime of that stack
usage is no longer over the whole module loading and initialization part,
it's purely over just the loading thing.

And since the deep callchain came much later (in the actual ->init
routines), by the time we do that, we no longer now have the load_module
stack usage active any more.

> With the patch you provide, I /was/ able to repeatedly boot OK (latest
> tree, and I also ran the patch against the 26.27.rc3-based kernel I was
> having problems with initially, and that booted OK as well).

I had actually already committed it, because it was correct regardless
(and gcc really is a total ass for doing that inlining to begin with), but
it's good to have verification that the behaviour you saw was literally
about this thing.

I'll look at your vmlinux binary to see what else sucks from a stack depth
standpoint, but one of the problems in this whole thing is that the
stack usage is obviously both a static thing (with some functions using
_way_ too much stack!) _and_ a dynamic thing (with the total stack use
being not about any individual function, but the whole chain).

My patch obviously doesn't change the static stack usage, it just moves it
around a bit so that it's no longer on that same deep path, so the dynamic
stack usage is much less.

But I'll look at your vmlinux, see what stands out.

Linus

2008-08-25 20:45:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
> On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>> Mine has:
>>
>> Dump of assembler code for function sys_init_module:
>> 0xffffffff802688c4 <sys_init_module+4>: sub $0x1c0,%rsp
>>
>> so 448 bytes.
>
> Yeah, your build seems to have consistently bigger stack usage, and that
> may be due to some config option, but most likely it's a compiler version
> issue.
>

I wonder if we ought to have a light version of "make checkstack" always run,
but in such a way that we make a file with "limits" on the stack usage for key
functions (and we can grow this list over time when we learn about critical ones)..
and either warn very loudly or even fail the build if we're way over what could work.

2008-08-25 20:53:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Mon, 25 Aug 2008, Linus Torvalds wrote:
>
> But I'll look at your vmlinux, see what stands out.

Oops. I already see the problem.

Your .config has soem _huge_ CPU count, doesn't it?

checkstack.pl shows these things as the top problems:

0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]: 1616
0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]: 1600
0xffffffff8021f144 arch_setup_msi_irq [vmlinux]: 1600
0xffffffff8021e3b0 __assign_irq_vector [vmlinux]: 1592
0xffffffff8021e626 __assign_irq_vector [vmlinux]: 1592
0xffffffff8023257e move_task_off_dead_cpu [vmlinux]: 1592
0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]: 1592
0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544
0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544
0xffffffff803f3dc4 store_scaling_governor [vmlinux]: 1376
0xffffffff80279ef4 cpuset_write_resmask [vmlinux]: 1360
0xffffffff803f465d cpufreq_add_dev [vmlinux]: 1352
0xffffffff803f495b cpufreq_add_dev [vmlinux]: 1352
0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]: 1328
0xffffffff803f4064 store_scaling_min_freq [vmlinux]: 1328
0xffffffff803f44c4 cpufreq_update_policy [vmlinux]: 1328
..

and sys_init_module is actually way way down the list. I bet the only
reason it showed up at all was because dynamically it was such a deep
callchain, and part of that callchain probably called some of those really
nasty things.

Anyway, the reason smp_call_function_mask and friends have such _huge_
stack usages for you is that they contain a 'cpumask_t' on the stack.

For example, for me, usign a sane NR_CPU, the size of the stack frame for
smp_call_function_mask is under 200 bytes. For you, it's 2736 bytes.

How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you
really have four thousand CPU's in that system?

Oh, I guess you have the MAXSMP config enabled? I really think that was a
bit too aggressive.

Linus

2008-08-25 21:16:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Mon, 25 Aug 2008, Linus Torvalds wrote:
>
> checkstack.pl shows these things as the top problems:
>
> 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
> 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
> 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
>
> Anyway, the reason smp_call_function_mask and friends have such _huge_
> stack usages for you is that they contain a 'cpumask_t' on the stack.

In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in
size. And they tend to call each other.

Quite frankly, I don't think we were really ready for 4k CPU's. I'm going
to commit this patch to make sure others don't do that many CPU's by
mistake. It marks MAXCPU's as being 'broken' so you cannot select it, and
also limits the number of CPU's that you _can_ select to "just" 512.

Right now, 4k cpu's is known broken because of the stack usage. I'm not
willing to debug more of these kinds of stack smashers, they're really
nasty to work with. I wonder how many other random failures these have
been involved with?

This patch also makes the ifdef mess in Kconfig much cleaner and avoids
duplicate definitions by just conditionally suppressing the question and
giving higher defaults.

We can enable MAXSMP and raise the CPU limits some time in the future. But
that future is not going to be before 2.6.27 - the code simply isn't ready
for it.

The reason I picked 512 CPU's as the limit is that we _used_ to limit
things to 255. So it's higher than it used to be, but low enough to still
feel safe. Considering that a 4k-bit CPU mask (512 bytes) _almost_ worked,
the 512-bit (64 bytes) masks are almost certainly fine.

Still, sane people should limit their NR_CPUS to 8 or 16 or something like
that. Very very few people really need the pain of big NR_CPUS. Not even
"just" 512 CPU's.

Travis, Ingo and Thomas cc'd, since they were involved in the original
commit (1184dc2ffe2c8fb9afb766d870850f2c3165ef25) that raised the limit.

Linus

---
arch/x86/Kconfig | 30 ++++++++----------------------
1 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 68d91c8..ed92864 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -577,35 +577,29 @@ config SWIOTLB

config IOMMU_HELPER
def_bool (CALGARY_IOMMU || GART_IOMMU || SWIOTLB || AMD_IOMMU)
+
config MAXSMP
bool "Configure Maximum number of SMP Processors and NUMA Nodes"
- depends on X86_64 && SMP
+ depends on X86_64 && SMP && BROKEN
default n
help
Configure maximum number of CPUS and NUMA Nodes for this architecture.
If unsure, say N.

-if MAXSMP
-config NR_CPUS
- int
- default "4096"
-endif
-
-if !MAXSMP
config NR_CPUS
- int "Maximum number of CPUs (2-4096)"
- range 2 4096
+ int "Maximum number of CPUs (2-512)" if !MAXSMP
+ range 2 512
depends on SMP
+ default "4096" if MAXSMP
default "32" if X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000
default "8"
help
This allows you to specify the maximum number of CPUs which this
- kernel will support. The maximum supported value is 4096 and the
+ kernel will support. The maximum supported value is 512 and the
minimum value which makes sense is 2.

This is purely to save memory - each supported CPU adds
approximately eight kilobytes to the kernel image.
-endif

config SCHED_SMT
bool "SMT (Hyperthreading) scheduler support"
@@ -996,17 +990,10 @@ config NUMA_EMU
into virtual nodes when booted with "numa=fake=N", where N is the
number of nodes. This is only useful for debugging.

-if MAXSMP
-
config NODES_SHIFT
- int
- default "9"
-endif
-
-if !MAXSMP
-config NODES_SHIFT
- int "Maximum NUMA Nodes (as a power of 2)"
+ int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
range 1 9 if X86_64
+ default "9" if MAXSMP
default "6" if X86_64
default "4" if X86_NUMAQ
default "3"
@@ -1014,7 +1001,6 @@ config NODES_SHIFT
help
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accomodate various tables.
-endif

config HAVE_ARCH_BOOTMEM_NODE
def_bool y

2008-08-25 21:30:20

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> But I'll look at your vmlinux, see what stands out.
>
> Oops. I already see the problem.
>
> Your .config has soem _huge_ CPU count, doesn't it?
>
> checkstack.pl shows these things as the top problems:
>
> 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
> 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
> 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
> 0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]: 1616
> 0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]: 1600
> 0xffffffff8021f144 arch_setup_msi_irq [vmlinux]: 1600
> 0xffffffff8021e3b0 __assign_irq_vector [vmlinux]: 1592
> 0xffffffff8021e626 __assign_irq_vector [vmlinux]: 1592
> 0xffffffff8023257e move_task_off_dead_cpu [vmlinux]: 1592
> 0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]: 1592
> 0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544
> 0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544
> 0xffffffff803f3dc4 store_scaling_governor [vmlinux]: 1376
> 0xffffffff80279ef4 cpuset_write_resmask [vmlinux]: 1360
> 0xffffffff803f465d cpufreq_add_dev [vmlinux]: 1352
> 0xffffffff803f495b cpufreq_add_dev [vmlinux]: 1352
> 0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]: 1328
> 0xffffffff803f4064 store_scaling_min_freq [vmlinux]: 1328
> 0xffffffff803f44c4 cpufreq_update_policy [vmlinux]: 1328
> ..
>
> and sys_init_module is actually way way down the list. I bet the only
> reason it showed up at all was because dynamically it was such a deep
> callchain, and part of that callchain probably called some of those really
> nasty things.
>
> Anyway, the reason smp_call_function_mask and friends have such _huge_
> stack usages for you is that they contain a 'cpumask_t' on the stack.
>
> For example, for me, usign a sane NR_CPU, the size of the stack frame for
> smp_call_function_mask is under 200 bytes. For you, it's 2736 bytes.
>
> How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you
> really have four thousand CPU's in that system?
>
> Oh, I guess you have the MAXSMP config enabled? I really think that was a
> bit too aggressive.
>
> Linus

This probably all started when I was working on a software tool (aiod)
that was failing because somebody ELSE had 4,096 CPUs configured.
[[Seems that gcc had/has? it's MAX CPU value set to 1,024 (bits/sched.h
__CPU_SETSIZE), so when you issue system calls like sched_getaffinity,
it will "fail" for systems configured w/ 4,096 CPUs. I worked around it
by simply forgetting about the gcc values, and kept allocating larger
CPU masks until it worked.]]

I think you're right: the kernel as a whole may not be ready for 4,096
CPUs apparently...

Thanks for taking the time to look into this...

Alan

2008-08-25 22:09:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Alan D. Brunelle wrote:

> I think you're right: the kernel as a whole may not be ready for 4,096
> CPUs apparently...

Mike has been working diligently on getting all these cpumasks off the stack
for the last months and has created an infrastructure to do this. So I think
we are close. It might just be a matter of merging some more patches that are
still left in Ingo's tree.

2008-08-26 07:23:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


* Linus Torvalds <[email protected]> wrote:

> On Mon, 25 Aug 2008, Linus Torvalds wrote:
> >
> > checkstack.pl shows these things as the top problems:
> >
> > 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
> > 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
> > 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
> >
> > Anyway, the reason smp_call_function_mask and friends have such _huge_
> > stack usages for you is that they contain a 'cpumask_t' on the stack.
>
> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in
> size. And they tend to call each other.
>
> Quite frankly, I don't think we were really ready for 4k CPU's. I'm
> going to commit this patch to make sure others don't do that many
> CPU's by mistake. It marks MAXCPU's as being 'broken' so you cannot
> select it, and also limits the number of CPU's that you _can_ select
> to "just" 512.

yeah, that's OK i guess - distros can still enable 4K support if they
wish to. Someone interested in improving the stack footprint situation
should dust off the max-stack-footprint tracer so that we can catch
these things in a more structured way.

And i guess the next generation of 4K CPUs support should just get away
from cpumask_t-on-kernel-stack model altogether, as the current model is
not maintainable. We tried the on-kernel-stack variant, and it really
does not work reliably. We can fix this in v2.6.28.

Ingo

2008-08-26 07:46:20

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Ingo Molnar <[email protected]>
Date: Tue, 26 Aug 2008 09:22:20 +0200

> And i guess the next generation of 4K CPUs support should just get away
> from cpumask_t-on-kernel-stack model altogether, as the current model is
> not maintainable. We tried the on-kernel-stack variant, and it really
> does not work reliably. We can fix this in v2.6.28.

I recenetly did some work on sparc64 to use cpumask pointers
as much as possible.

The only case that didn't work was due to a limitation in
arch interfaces for the new generic smp_call_function() code.
It passes a cpumask_t instead of a pointer to one via
arch_send_call_function_ipi().

But other than that, the whole sparc64 SMP stuff uses cpumask_t
pointers only.

What it comes down to is that you have to do the "self cpu"
and other tests in the cross-call dispatch routines themselves,
instead of at the top-level working on cpumask_t objects.

Otherwise you have to modify cpumask_t objects and thus pluck
them onto the stack where they take up silly amounts of space.

2008-08-26 07:55:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


* David Miller <[email protected]> wrote:

> From: Ingo Molnar <[email protected]>
> Date: Tue, 26 Aug 2008 09:22:20 +0200
>
> > And i guess the next generation of 4K CPUs support should just get away
> > from cpumask_t-on-kernel-stack model altogether, as the current model is
> > not maintainable. We tried the on-kernel-stack variant, and it really
> > does not work reliably. We can fix this in v2.6.28.
>
> I recenetly did some work on sparc64 to use cpumask pointers as much
> as possible.
>
> The only case that didn't work was due to a limitation in arch
> interfaces for the new generic smp_call_function() code. It passes a
> cpumask_t instead of a pointer to one via
> arch_send_call_function_ipi().
>
> But other than that, the whole sparc64 SMP stuff uses cpumask_t
> pointers only.

nice!

> What it comes down to is that you have to do the "self cpu" and other
> tests in the cross-call dispatch routines themselves, instead of at
> the top-level working on cpumask_t objects.
>
> Otherwise you have to modify cpumask_t objects and thus pluck them
> onto the stack where they take up silly amounts of space.

What we did was this: we added MAXSMP which just revs up all the SMP
tunables to the maximum, so that we can see any problems early in
testing.

And we triggered problems, and we fixed a couple of regressions all
around stack footprint. But we didnt catch all of them - some were gcc
version dependent and configuration dependent. So i think it's safe to
say that the whole concept of allowing such a large cpumask_t to be on
the stack is fragile.

Hence, i think the best way forward is to change the whole cpumask_t
concept and disallow explicit masks altogether. It's so easy to smack a
cpumask_t variable on the stack and nothing really warns about it, and
any function can become part of a nested call sequence.

So i think the dynamics of it has to be changed: we need a get/put API
and we need to make on-stack cpumask illegal on the build level (in
generic code at least). This has been Rusty's main argument early on i
think, and i now concur.

Ingo

2008-08-26 08:00:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


* Christoph Lameter <[email protected]> wrote:

> Alan D. Brunelle wrote:
>
> > I think you're right: the kernel as a whole may not be ready for 4,096
> > CPUs apparently...
>
> Mike has been working diligently on getting all these cpumasks off the
> stack for the last months and has created an infrastructure to do
> this. So I think we are close. It might just be a matter of merging
> some more patches that are still left in Ingo's tree.

hm, there are no such patches left that i know of - the only bits in
-tip are the zero-based percpu, which was found to be a bit fragile in
testing:

earth4:~/tip> git-log-line --author=Travis linus..
d379497: Zero based percpu: infrastructure to rebase the per cpu area to zero
b3a0cb4: x86: extend percpu ops to 64 bit

[and it has no relevance to stack footprint.]

So i dont think the current cpumask_t approach will work. We simply
should not get into an endless fight against the windmills that
introduce on-stack cpumask_t again and again. We should just take the
plunge once and do a clean alloc/free cpumask model. Most of the hotpath
cpumasks are constant or pre-constructed, so they are not a real issue.

Plus, on the general question of stack footprint problems and the
difficulty of debugging them, the worst-case stack footprint tracer i
wrote for -rt some time ago should be dusted off as well and put into
ftrace. David has something quite close to that for Sparc64 already.

Ingo

2008-08-26 08:36:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 12:53 AM, Ingo Molnar <[email protected]> wrote:
>
> * David Miller <[email protected]> wrote:
>
>> From: Ingo Molnar <[email protected]>
>> Date: Tue, 26 Aug 2008 09:22:20 +0200
>>
>> > And i guess the next generation of 4K CPUs support should just get away
>> > from cpumask_t-on-kernel-stack model altogether, as the current model is
>> > not maintainable. We tried the on-kernel-stack variant, and it really
>> > does not work reliably. We can fix this in v2.6.28.
>>
>> I recenetly did some work on sparc64 to use cpumask pointers as much
>> as possible.
>>
>> The only case that didn't work was due to a limitation in arch
>> interfaces for the new generic smp_call_function() code. It passes a
>> cpumask_t instead of a pointer to one via
>> arch_send_call_function_ipi().
>>
>> But other than that, the whole sparc64 SMP stuff uses cpumask_t
>> pointers only.

wonder if could use "unsigned long *" directly.
so could dyn_array directly like

int cpumask_size;

unsigned long *online_cpu_map;
DEFINE_DYN_ARRAY(online_cpu_map, sizeof(unsigned long), cpumask_size,
PAGE_SIZE, NULL);

and after nr_cpu_ids is assigned, have
cpumask_size = (nr_cpu_ids + sizeof(unsigned long) - 1)/sizeof(unsigned long);

then we could NR_CPUS=4096 kernel to small system. ...

YH

2008-08-26 11:55:26

by Rusty Russell

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tuesday 26 August 2008 06:43:03 Linus Torvalds wrote:
> So now load_module() will still use almost 500 bytes of stack

Hmm, wants neatening anyway; I'll see if I can reduce stack usage side effect.

Your workaround is very random, and that scares me. I think a huge number of
CPUs needs a real solution (an actual cpumask allocator, then do something
clever if we come across an actual fastpath).

Thanks,
Rusty.

2008-08-26 16:54:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Yinghai Lu wrote:
>
> wonder if could use "unsigned long *" directly.

I would actually suggest something like this:

- we continue to have a magic "cpumask_t".

- we do different cases for big and small NR_CPUS:

#if NR_CPUS <= BITS_PER_LONG

/*
* Make it an array - that way passing it as an argument will
* always pass it as a pointer!
*/
typedef unsigned long cpumask_t[1];

static inline void create_cpumask(cpumask_t *p)
{
*p = 0;
}
static inline void free_cpumask(cpumask_t *p)
{
}

#else

typedef unsigned long *cpumask_t;

static inline void create_cpumask(cpumask_t *p)
{
*p = kcalloc(..);
}

static inline void free_cpumask(cpumask_t *p)
{
kfree(*p);
}

#endif

and now after you do this, you can just do something like

cpumask_t mycpu;

create_cpumask(&mycpu);
..
free_cpumask(&mycpu);

and in between, you can use 'cpumask' as a pointer, because even when it
is an array directly allocated on the stack, the array can always
degenerate into a pointer by C type rules!

And for the small-NR_CPUS case there is zero overhead.

Linus

2008-08-26 17:08:33

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 9:51 AM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Tue, 26 Aug 2008, Yinghai Lu wrote:
>>
>> wonder if could use "unsigned long *" directly.
>
> I would actually suggest something like this:
>
> - we continue to have a magic "cpumask_t".
>
> - we do different cases for big and small NR_CPUS:
>
> #if NR_CPUS <= BITS_PER_LONG
>
> /*
> * Make it an array - that way passing it as an argument will
> * always pass it as a pointer!
> */
> typedef unsigned long cpumask_t[1];
>
> static inline void create_cpumask(cpumask_t *p)
> {
> *p = 0;
> }
> static inline void free_cpumask(cpumask_t *p)
> {
> }
>
> #else
>
> typedef unsigned long *cpumask_t;
>
> static inline void create_cpumask(cpumask_t *p)
> {
> *p = kcalloc(..);
> }
>
> static inline void free_cpumask(cpumask_t *p)
> {
> kfree(*p);
> }
>
> #endif
>
> and now after you do this, you can just do something like
>
> cpumask_t mycpu;
>
> create_cpumask(&mycpu);
> ..
> free_cpumask(&mycpu);
>
> and in between, you can use 'cpumask' as a pointer, because even when it
> is an array directly allocated on the stack, the array can always
> degenerate into a pointer by C type rules!
>

that is good for local variables.

for global variables, need to allocate them in some point. may need one
int cpumask_size;

cpumask_t online_cpu_map;
DEFINE_DYN_ARRAY(online_cpu_map, sizeof(unsigned long), cpumask_size,
PAGE_SIZE, NULL);

or something like that.

YH

2008-08-26 17:35:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Rusty Russell wrote:
>
> Your workaround is very random, and that scares me. I think a huge number of
> CPUs needs a real solution (an actual cpumask allocator, then do something
> clever if we come across an actual fastpath).

The thing is, the inlining thing is a separate issue.

Yes, the cpumasks were what made stack pressure so critical to begin with,
but no, a 400-byte stack frame in a deep callchain isn't acceptable
_regardless_ of any cpumask_t issues.

Gcc inlining is a total and utter pile of shit. And _that_ is the problem.
I seriously think we shouldn't allow gcc to inline anything at all unless
we tell it to. That's how it used to work, and quite frankly, that's how
it _should_ work.

The downsides of inlining are big enough from both a debugging and a real
code generation angle (eg stack usage like this), that the upsides
(_somesimes_ smaller kernel, possibly slightly faster code) simply aren't
relevant.

So the "noinline" was random, yes, but this is a real issue. Looking at
checkstack output for a saner config (NR_CPUS=16), the top entries for me
are things like

ide_generic_init [vmlinux]: 1384
idefloppy_ioctl [vmlinux]: 1208
e1000_check_options [vmlinux]: 1152
...

which are "leaf" functions. They are broken as hell (the e1000 is
apparently because it builds structs on the stack that should all be
"static const", for example), but they are different from something like
the module init sequence in that they are not going to affect anything
else.

It would be interesting to see what "-fno-default-inline" does to the
kernel. It really would get rid of a _lot_ of gcc version issues too.
Inlining behavior of gcc has long been a problem for us.

Linus

2008-08-26 18:31:50

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 10:35:05AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 26 Aug 2008, Rusty Russell wrote:
> >
> > Your workaround is very random, and that scares me. I think a huge number of
> > CPUs needs a real solution (an actual cpumask allocator, then do something
> > clever if we come across an actual fastpath).
>
> The thing is, the inlining thing is a separate issue.
>
> Yes, the cpumasks were what made stack pressure so critical to begin with,
> but no, a 400-byte stack frame in a deep callchain isn't acceptable
> _regardless_ of any cpumask_t issues.
>
> Gcc inlining is a total and utter pile of shit. And _that_ is the problem.
> I seriously think we shouldn't allow gcc to inline anything at all unless
> we tell it to. That's how it used to work, and quite frankly, that's how
> it _should_ work.
>
> The downsides of inlining are big enough from both a debugging and a real
> code generation angle (eg stack usage like this), that the upsides
> (_somesimes_ smaller kernel, possibly slightly faster code) simply aren't
> relevant.
>...
> It would be interesting to see what "-fno-default-inline" does to the
> kernel. It really would get rid of a _lot_ of gcc version issues too.
> Inlining behavior of gcc has long been a problem for us.

I added "-fno-inline-functions-called-once -fno-early-inlining" to
KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel
image by 2%.

And when David's "-fwhole-program --combine" will become ready the cost
of disallowing gcc to inline functions will most likely increase.

A debugging option (for better traces) to disallow gcc some inlining
might make sense (and might even make sense for distributions to
enable in their kernels), but when you go to use cases that require
really small kernels the cost is too high.

But if you don't trust gcc's inlining you should revert
commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's
freedom regarding what to inline in 2.6.27 - what gcc 4.2 does in the
case of the regression tracked as Bugzilla #11276 is really not funny
(two callers -> function not inlined; gcc seems to emit the function
although both callers later get removed (or at least should be removed)
by dead code elimination).

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-26 18:42:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Adrian Bunk wrote:
>
> A debugging option (for better traces) to disallow gcc some inlining
> might make sense (and might even make sense for distributions to
> enable in their kernels), but when you go to use cases that require
> really small kernels the cost is too high.

You ignore the fact that it's really not just about debugging.

Inlining really isn't the great tool some people think it is. Especially
not since gcc stack allocation is so horrid that it won't re-use stack
slots etc (which I don't disagree with per se - it's _hard_ to re-use
stack slots while still allowing code scheduling).

NOTE! I also would never claim that _our_ choices of "inline" are all that
great, and we've often inlined too much or not inlined things that really
could be inlined. But at least when a developer says "inline" (or forgets
to say it), we have somebody to blame. When the compiler does insane
things that doesn't suit us, we're just screwed.

> But if you don't trust gcc's inlining you should revert
> commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's
> freedom regarding what to inline in 2.6.27

Actually, that just allows gcc to _not_ inline. Which is probably ok.

(Well, it would be ok if gcc did it well enough, it obviously has some
problems at times).

Linus

2008-08-26 18:48:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Adrian Bunk wrote:
>
> I added "-fno-inline-functions-called-once -fno-early-inlining" to
> KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel
> image by 2%.

Btw, did you check with just "-fno-inline-functions-called-once"?

The -fearly-inlining decisions _should_ be mostly right. If gcc sees early
that a function is so small (even without any constant propagation etc)
that it can be inlined, it's probably right.

The inline-functions-called-once thing is what causes even big functions
to be inlined, and that's where you find the big downsides too (eg the
stack usage).

Linus

2008-08-26 19:02:36

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> checkstack.pl shows these things as the top problems:
>>
>> 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
>> 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
>> 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
>>
>> Anyway, the reason smp_call_function_mask and friends have such _huge_
>> stack usages for you is that they contain a 'cpumask_t' on the stack.
>
> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in
> size. And they tend to call each other.
>
> Quite frankly, I don't think we were really ready for 4k CPU's. I'm going
> to commit this patch to make sure others don't do that many CPU's by
> mistake. It marks MAXCPU's as being 'broken' so you cannot select it, and
> also limits the number of CPU's that you _can_ select to "just" 512.
>
> Right now, 4k cpu's is known broken because of the stack usage. I'm not
> willing to debug more of these kinds of stack smashers, they're really
> nasty to work with. I wonder how many other random failures these have
> been involved with?
>
> This patch also makes the ifdef mess in Kconfig much cleaner and avoids
> duplicate definitions by just conditionally suppressing the question and
> giving higher defaults.
>
> We can enable MAXSMP and raise the CPU limits some time in the future. But
> that future is not going to be before 2.6.27 - the code simply isn't ready
> for it.
>
> The reason I picked 512 CPU's as the limit is that we _used_ to limit
> things to 255. So it's higher than it used to be, but low enough to still
> feel safe. Considering that a 4k-bit CPU mask (512 bytes) _almost_ worked,
> the 512-bit (64 bytes) masks are almost certainly fine.
>
> Still, sane people should limit their NR_CPUS to 8 or 16 or something like
> that. Very very few people really need the pain of big NR_CPUS. Not even
> "just" 512 CPU's.
>
> Travis, Ingo and Thomas cc'd, since they were involved in the original
> commit (1184dc2ffe2c8fb9afb766d870850f2c3165ef25) that raised the limit.
>
> Linus

Hi Linus,

[Sorry for the long winded response, but I felt that sufficient background
is needed to address this issue.... YOMV :-)]

The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
critical to our upcoming SGI systems using what we have been calling
"UV". This system will be capable of supporting 4096 cpu threads in a
single system image (and 16k cpus/2k nodes is right around the corner).
While obviously I cannot divulge too many details, it's sufficient to
say there are customers who not only require this extended capability,
but are extremely excited about it.

But the nature of some of these system environments is that they will
not accept a specially built kernel, but only a kernel that has been
built and certified (both from the application standpoint as well as
the security standpoint) by standard distributions. And you probably
know how extensively these distributions test and certify for many known
defects and absolutely require that incoming source changes come from
the community supported source bases, primarily yours.

Due to the lead time required to accomplish these certifications,
the version of the distributions that will be available when this
system releases will be based on 2.6.27. (They will allow patches
"post-2.6.27-rc.final" as long as those are committed in the source base.)
The two distributions that SGI supports for our customers is SLES
(SUSE Linux Enterprise Server) and RHEL (Red Hat Enterprise Linux).
[They, of course, are free to run any OS of their choosing, but SGI only
provides front line support for those two.]

I started last August to begin analyzing how to accomplish the above goals
and where exactly are the hot spots in the kernel that would require
attention. It quickly became clear that cpumask_t and nodemask_t are
two variables that are very casually used (along with NR_CPUS), because
the assumption was that 64 "was more than sufficient" for an upper limit
and even extending it to 128 or 255 (254 was the maximum IPI broadcast
ID until x2apic), only added a few more bytes here and there.

I chose not to introduce too many dramatic changes and instead analyzed
every instance where cpumask_t and NR_CPUS was being used (along with the
node counterparts.) An initial proposal was to allow the default stack
size to be increased, but this was met with a lot of objections because
of the extensive work that was done to bring it to it's current size.

So in summary, the goals of the changes that I have been making since
last October are:

1. Allow a "typical distro" configured kernel with NR_CPUS=4096
and NODES_SHIFT=9 to be booted on an x86_64 system with 2GB
of memory. (Some thought was given to use a 512Mb laptop
as the base, but because of other memory bloat from using a
64-bit kernel, that was considered not very useful.)

[Note I frequently use an allyes and allmod config for
testing.]

2. To lessen as much as possible, the impact on memory usage for
that same kernel on that same system.

3. To lessen as much as possible, the impact on system performance
for that same kernel on that same system. [Which mostly
depended on #2.]

I booted the first 4096 cpu kernel last February, and since around March
or April, Ingo has been (build and boot) testing the x86 branch using
"MAXSMP" to trigger the increased defaults quite extensively (IIRC, it
was somewhere between 75% and 90% of all kernels built.) We here at SGI
nightly build 4 trees (linux-2.6, linux-next, linux-mm, linux-x86) to
insure new checkins don't conflict with changes we've made in the past.
Unfortunately, our run testing wasn't sufficient to catch this latest
error (and I will be quickly fixing that.)

I will also revisit all the past areas to analyze if there have been
other abuses of stack and memory space added since the 4k cpu limit
was "certified" as usable and releasable. (See below for an initial
survey of size increases between a 512cpu/64node configuration and a
4096cpu/512node configuration.)

So perhaps "MAXSMP" is not needed (or perhaps should be more hidden to
reduce accidental uses), but allowing the defaults listed above to be in
the standard x86/Kconfig insures that the distros can at least attempt
certification with the maximally configured kernels for their enterprise
editions of Linux.

There are many more changes that will be proposed for the 2.6.28 window.
Most certainly your concerns, as well as others, about how to change the
current "cpumask paradigm" to be more easily manageable for systems with
huge cpu counts, will be visited. (And surely be well discussed. :-)

Thanks,
Mike
---

linux-2.6: v2.6.27-rc4-176-gb8e6c91

====== Data (-l 500)
... files 2 vars 1421 all 0 lim 500 unch 0

1 - 512-64-allmodconfig
2 - 4096-512-allmodconfig

.1. .2. ..final..
1671168 +3899392 5570560 +233% irq_desc(.data.cacheline_aligned)
591872 +3899392 4491264 +658% irq_cfg(.data.read_mostly)
76800 +537600 614400 +700% early_node_map(.init.data)
66176 +462336 528512 +698% init_mem_cgroup(.bss)
65536 +458752 524288 +700% boot_pageset(.bss)
63648 +419328 482976 +658% kmalloc_caches(.data.cacheline_aligned)
15328 +61376 76704 +400% def_root_domain(.bss)
10240 +43008 53248 +420% change_point_list(.init.data)
8760 +504 9264 +5% init_task(.data)
8192 +57344 65536 +700% kgdb_info(.bss)
6404 +26880 33284 +419% e820_saved(.bss)
6404 +26880 33284 +419% e820(.bss)
6400 +26880 33280 +420% new_bios(.init.data)
5120 +35840 40960 +700% node_devices(.bss)
5120 +21504 26624 +420% change_point(.init.data)
4160 +29120 33280 +700% cpu_bit_bitmap(.rodata)
4096 +28672 32768 +700% __cpu_pda(.init.data)
3776 +25088 28864 +664% hstates(.bss)
3584 +25088 28672 +700% bootmem_node_data(.init.data)
2560 +10752 13312 +420% overlap_list(.init.data)
2048 +14336 16384 +700% x86_cpu_to_node_map_early_map(.init.data)
2048 +14336 16384 +700% was_in_debug_nmi(.bss)
2048 +14336 16384 +700% rio_devs(.init.data)
2048 +14336 16384 +700% passive_cpu_wait(.bss)
2048 +14336 16384 +700% node_memblk_range(.init.data)
2048 +14336 16384 +700% ints(.init.data)
2048 +14336 16384 +700% cpu_in_kgdb(.bss)
1024 +7168 8192 +700% x86_cpu_to_apicid_early_map(.init.data)
1024 +7168 8192 +700% x86_bios_cpu_apicid_early_map(.init.data)
1024 +1024 2048 +100% pxm_to_node_map(.data)
1024 +7168 8192 +700% nodes_add(.bss)
1024 +7168 8192 +700% nodes(.init.data)
512 +3584 4096 +700% zone_movable_pfn(.init.data)
512 +3584 4096 +700% tvec_base_done(.data)
512 +3584 4096 +700% scal_devs(.init.data)
512 +3584 4096 +700% node_data(.data.read_mostly)
512 +3584 4096 +700% memblk_nodeid(.init.data)
0 +2048 2048 . node_to_pxm_map(.data)
0 +2048 2048 . node_order(.bss)
0 +2048 2048 . node_load(.bss)
0 +2048 2048 . fake_node_to_pxm_map(.init.data)
0 +768 768 . rcu_ctrlblk(.data)
0 +768 768 . rcu_bh_ctrlblk(.data)
0 +768 768 . per_cpu__cpu_info(.data.percpu)
0 +768 768 . boot_cpu_data(.data.read_mostly)
0 +760 760 . per_cpu__phys_domains(.data.percpu)
0 +760 760 . per_cpu__node_domains(.data.percpu)
0 +760 760 . per_cpu__cpu_domains(.data.percpu)
0 +760 760 . per_cpu__core_domains(.data.percpu)
0 +760 760 . per_cpu__allnodes_domains(.data.percpu)
0 +720 720 . top_cpuset(.data)
0 +640 640 . per_cpu__flush_state(.data.percpu)
0 +632 632 . pit_clockevent(.data)
0 +632 632 . per_cpu__lapic_events(.data.percpu)
0 +632 632 . lapic_clockevent(.data)
0 +632 632 . hpet_clockevent(.data)
0 +616 616 . net_dma(.data)
0 +579 579 . do_migrate_pages(.text)
0 +568 568 . irq2(.data)
0 +568 568 . irq0(.data)
0 +528 528 . per_cpu__sched_group_phys(.data.percpu)
0 +528 528 . per_cpu__sched_group_cpus(.data.percpu)
0 +528 528 . per_cpu__sched_group_core(.data.percpu)
0 +528 528 . per_cpu__sched_group_allnodes(.data.percpu)
0 +520 520 . out_of_memory(.text)
0 +520 520 . nohz(.data)
0 +512 512 . tick_broadcast_oneshot_mask(.bss)
0 +512 512 . tick_broadcast_mask(.bss)
0 +512 512 . prof_cpu_mask(.data)
0 +512 512 . per_cpu__local_cpu_mask(.data.percpu)
0 +512 512 . per_cpu__cpu_sibling_map(.data.percpu)
0 +512 512 . per_cpu__cpu_core_map(.data.percpu)
0 +512 512 . nohz_cpu_mask(.bss)
0 +512 512 . mce_device_initialized(.bss)
0 +512 512 . mce_cpus(.bss)
0 +512 512 . marked_cpus(.bss)
0 +512 512 . kmem_cach_cpu_free_init_once(.bss)
0 +512 512 . irq_default_affinity(.data)
0 +512 512 . frozen_cpus(.bss)
0 +512 512 . fallback_doms(.bss)
0 +512 512 . cpu_singlethread_map(.data.read_mostly)
0 +512 512 . cpu_sibling_setup_map(.bss)
0 +512 512 . cpu_present_map(.data.read_mostly)
0 +512 512 . cpu_possible_map(.bss)
0 +512 512 . cpu_populated_map(.data.read_mostly)
0 +512 512 . cpu_online_map(.data.read_mostly)
0 +512 512 . cpu_mask_none(.bss)
0 +512 512 . cpu_mask_all(.data.read_mostly)
0 +512 512 . cpu_isolated_map(.bss)
0 +512 512 . cpu_initialized(.data)
0 +512 512 . cpu_callout_map(.bss)
0 +512 512 . cpu_callin_map(.bss)
0 +512 512 . cpu_active_map(.bss)
0 +512 512 . cache_dev_map(.bss)
0 +512 512 . c1e_mask(.bss)
0 +512 512 . backtrace_mask(.bss)
2647360 +10283499 12930859 +388% Totals

====== Sections (-l 500)
... files 2 vars 36 all 0 lim 500 unch 0

1 - 512-64-allmodconfig
2 - 4096-512-allmodconfig

.1. .2. ..final..
66688274 +10345296 77033570 +15% Total
38237848 +44031 38281879 <1% .debug_info
8441752 +1215872 9657624 +14% .bss
2551715 +3136 2554851 <1% .text
1737600 +4318720 6056320 +248% .data.cacheline_aligned
1640096 +6784 1646880 <1% .data.percpu
1175061 +29104 1204165 +2% .rodata
1073400 +13712 1087112 +1% .debug_abbrev
901760 +1392 903152 <1% .debug_ranges
608192 +3906016 4514208 +642% .data.read_mostly
302704 +13504 316208 +4% .data
244896 +792112 1037008 +323% .init.data
123603298 +20689679 144292977 +16% Totals

====== Text/Data ()
... files 2 vars 6 all 0 lim 0 unch 0

1 - 512-64-allmodconfig
2 - 4096-512-allmodconfig

.1. .2. ..final..
2551808 +2048 2553856 <1% TextSize
1679360 +43008 1722368 +2% DataSize
8441856 +1216512 9658368 +14% BssSize
2138112 +798720 2936832 +37% InitSize
1640448 +6144 1646592 <1% PerCPU
2383872 +8228864 10612736 +345% OtherSize
18835456 +10295296 29130752 +54% Totals

====== PerCPU ()
... files 2 vars 22 all 0 lim 0 unch 0

1 - 512-64-allmodconfig
2 - 4096-512-allmodconfig

.1. .2. ..final..
2048 -2048 . -100% vm_event_states
2048 -2048 . -100% softnet_data
2048 -2048 . -100% init_sched_rt_entity
2048 -2048 . -100% core_domains
0 +2048 2048 . sched_group_core
0 +2048 2048 . node_domains
0 +2048 2048 . lru_add_active_pvecs
0 +2048 2048 . init_rt_rq
0 +2048 2048 . cpu_domains
0 +2048 2048 . cpu_core_map
0 +2048 2048 . cpu_buffer
8192 +6144 14336 +75% Totals

====== Stack (-l 500)
... files 2 vars 126 all 0 lim 500 unch 0

1 - 512-64-allmodconfig
2 - 4096-512-allmodconfig

.1. .2. ..final..
0 +2712 2712 . smp_call_function_mask
0 +1576 1576 . setup_IO_APIC_irq
0 +1576 1576 . move_task_off_dead_cpu
0 +1560 1560 . arch_setup_ht_irq
0 +1560 1560 . __assign_irq_vector
0 +1544 1544 . tick_handle_oneshot_broadcast
0 +1544 1544 . msi_compose_msg
0 +1440 1440 . cpuset_write_resmask
0 +1352 1352 . store_scaling_governor
0 +1352 1352 . cpufreq_add_dev
0 +1320 1320 . cpufreq_update_policy
0 +1312 1312 . store_scaling_min_freq
0 +1312 1312 . store_scaling_max_freq
0 +1176 1176 . threshold_create_device
0 +1128 1128 . setup_IO_APIC
0 +1096 1096 . sched_balance_self
0 +1080 1080 . sched_rt_period_timer
0 +1080 1080 . _cpu_down
0 +1064 1064 . set_ioapic_affinity_irq
0 +1048 1048 . store_interrupt_enable
0 +1048 1048 . setup_timer_IRQ0_pin
0 +1048 1048 . setup_ioapic_dest
0 +1048 1048 . set_msi_irq_affinity
0 +1048 1048 . set_ht_irq_affinity
0 +1048 1048 . native_machine_crash_shutdown
0 +1048 1048 . native_flush_tlb_others
0 +1048 1048 . dmar_msi_set_affinity
0 +1040 1040 . store_threshold_limit
0 +1040 1040 . show_error_count
0 +1040 1040 . acpi_map_lsapic
0 +1032 1032 . tick_do_periodic_broadcast
0 +1032 1032 . sched_setaffinity
0 +1032 1032 . native_send_call_func_ipi
0 +1032 1032 . local_cpus_show
0 +1032 1032 . local_cpulist_show
0 +1032 1032 . irq_select_affinity
0 +1032 1032 . irq_complete_move
0 +1032 1032 . irq_affinity_proc_write
0 +1032 1032 . flush_tlb_mm
0 +1032 1032 . flush_tlb_current_task
0 +1032 1032 . fixup_irqs
0 +1032 1032 . create_irq
0 +1024 1024 . uv_vector_allocation_domain
0 +1024 1024 . uv_send_IPI_allbutself
0 +1024 1024 . store_error_count
0 +1024 1024 . physflat_send_IPI_allbutself
0 +1024 1024 . pci_bus_show_cpuaffinity
0 +1024 1024 . move_masked_irq
0 +1024 1024 . flush_tlb_page
0 +1024 1024 . flat_send_IPI_allbutself
0 +784 784 . sd_init_ALLNODES
0 +776 776 . sd_init_SIBLING
0 +776 776 . sd_init_NODE
0 +768 768 . sd_init_MC
0 +768 768 . sd_init_CPU
0 +728 728 . update_flag
0 +696 696 . init_intel_cacheinfo
0 +680 680 . __build_sched_domains
0 +648 648 . thread_return
0 +648 648 . schedule
0 +640 640 . cpuset_attach
0 +616 616 . rebalance_domains
0 +600 600 . select_task_rq_fair
0 +600 600 . cache_add_dev
0 +584 584 . shmem_getpage
0 +568 568 . pdflush
0 +552 552 . tick_notify
0 +552 552 . partition_sched_domains
0 +552 552 . free_sched_groups
0 +552 552 . __percpu_alloc_mask
0 +544 544 . taskstats_user_cmd
0 +536 536 . sched_init_smp
0 +536 536 . pci_device_probe
0 +536 536 . cpuset_common_file_read
0 +536 536 . cpupri_find
0 +536 536 . acpi_processor_ffh_cstate_probe
0 +536 536 . __cpu_disable
0 +520 520 . uv_send_IPI_all
0 +520 520 . tick_do_broadcast
0 +520 520 . smp_call_function
0 +520 520 . show_related_cpus
0 +520 520 . show_affected_cpus
0 +520 520 . prof_cpu_mask_write_proc
0 +520 520 . physflat_send_IPI_mask
0 +520 520 . physflat_send_IPI_all
0 +520 520 . native_smp_send_reschedule
0 +520 520 . native_send_call_func_single_ipi
0 +520 520 . lapic_timer_broadcast
0 +520 520 . irq_set_affinity
0 +520 520 . flat_vector_allocation_domain
0 +520 520 . flat_send_IPI_all
0 +520 520 . find_lowest_rq
0 +520 520 . cpuset_can_attach
0 +520 520 . cpu_callback
0 +520 520 . compat_sys_sched_setaffinity
0 +520 520 . add_del_listener
0 +512 512 . sys_sched_setaffinity
0 +512 512 . sys_sched_getaffinity
0 +512 512 . run_rebalance_domains
0 +512 512 . ioapic_retrigger_irq
0 +512 512 . generic_processor_info
0 +512 512 . force_quiescent_state
0 +512 512 . destroy_irq
0 +512 512 . default_affinity_write
0 +512 512 . cpu_to_phys_group
0 +512 512 . cpu_to_allnodes_group
0 +512 512 . compat_sys_sched_getaffinity
0 +512 512 . check_preempt_curr_rt
0 +512 512 . assign_irq_vector
0 +92248 92248 +0% Totals

====== MemInfo ()
... files 0 vars 0 all 0 lim 0 unch 0

(runtime meminfo not collected.)

2008-08-26 19:03:21

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
> The inline-functions-called-once thing is what causes even big functions
> to be inlined, and that's where you find the big downsides too (eg the
> stack usage).

That's a bit bizarre, though, isn't it?

A function which is only called from one place should, if everything
made sense, _never_ use more stack through being inlined. Inlining
should just increase the opportunities that the called function's
local variables can share the same stack slots are the caller's dead
locals.

Whereas not inlining guarantees they occupy separate, immediately
adjacent regions of the stack, and shouldn't be increasing the total
numbers of local variables.

-- Jamie

2008-08-26 19:03:49

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
>
>> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>>> checkstack.pl shows these things as the top problems:
>>>
>>> 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
>>> 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
>>> 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
>>>
>>> Anyway, the reason smp_call_function_mask and friends have such _huge_
>>> stack usages for you is that they contain a 'cpumask_t' on the stack.
>> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in
>> size. And they tend to call each other.
>>
>> Quite frankly, I don't think we were really ready for 4k CPU's. I'm
>> going to commit this patch to make sure others don't do that many
>> CPU's by mistake. It marks MAXCPU's as being 'broken' so you cannot
>> select it, and also limits the number of CPU's that you _can_ select
>> to "just" 512.
>
> yeah, that's OK i guess - distros can still enable 4K support if they
> wish to. Someone interested in improving the stack footprint situation
> should dust off the max-stack-footprint tracer so that we can catch
> these things in a more structured way.
>
> And i guess the next generation of 4K CPUs support should just get away
> from cpumask_t-on-kernel-stack model altogether, as the current model is
> not maintainable. We tried the on-kernel-stack variant, and it really
> does not work reliably. We can fix this in v2.6.28.
>
> Ingo

I would be most interested in any tools to analyze call-trees and
accumulated stack usages. My current method of using kdb is really
time consuming.

Thanks!
Mike

2008-08-26 19:06:31

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

David Miller wrote:
> From: Ingo Molnar <[email protected]>
> Date: Tue, 26 Aug 2008 09:22:20 +0200
>
>> And i guess the next generation of 4K CPUs support should just get away
>> from cpumask_t-on-kernel-stack model altogether, as the current model is
>> not maintainable. We tried the on-kernel-stack variant, and it really
>> does not work reliably. We can fix this in v2.6.28.
>
> I recently did some work on sparc64 to use cpumask pointers
> as much as possible.
>
> The only case that didn't work was due to a limitation in
> arch interfaces for the new generic smp_call_function() code.
> It passes a cpumask_t instead of a pointer to one via
> arch_send_call_function_ipi().
>
> But other than that, the whole sparc64 SMP stuff uses cpumask_t
> pointers only.
>
> What it comes down to is that you have to do the "self cpu"
> and other tests in the cross-call dispatch routines themselves,
> instead of at the top-level working on cpumask_t objects.
>
> Otherwise you have to modify cpumask_t objects and thus pluck
> them onto the stack where they take up silly amounts of space.

Yes, I had proposed either modifying, or supplementing a new
smp_call function to pass the cpumask_t as a pointer (similar
to set_cpus_allowed_ptr.) But an ABI change such as this was
not well received at the time.

Thanks,
Mike

2008-08-26 19:11:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Mike Travis wrote:
>
> The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
> critical to our upcoming SGI systems using what we have been calling
> "UV".

That's fine. You can do it. The default kernel will not, because it's
clearly not safe.

I really don't care what you do to _your_ images. But I will not
distribute a known-broken kernel, and I will not debug random stack
overflows that happen in it.

If you want the default kernel to support 4k cores, we'll need to fix the
stack usage. I don't think that is impossible, but IT IS NOT GOING TO
HAPPEN for 2.6.27.

And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by
default, they are totally and utterly crazy.

But some SGI-specific binary that is meant for SGI machines only, and has
been extensively tested with the setup used on SGI machines is a different
thing.

Linus

2008-08-26 19:12:25

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Ingo Molnar wrote:
> * David Miller <[email protected]> wrote:
>
>> From: Ingo Molnar <[email protected]>
>> Date: Tue, 26 Aug 2008 09:22:20 +0200
>>
>>> And i guess the next generation of 4K CPUs support should just get away
>>> from cpumask_t-on-kernel-stack model altogether, as the current model is
>>> not maintainable. We tried the on-kernel-stack variant, and it really
>>> does not work reliably. We can fix this in v2.6.28.
>> I recenetly did some work on sparc64 to use cpumask pointers as much
>> as possible.
>>
>> The only case that didn't work was due to a limitation in arch
>> interfaces for the new generic smp_call_function() code. It passes a
>> cpumask_t instead of a pointer to one via
>> arch_send_call_function_ipi().
>>
>> But other than that, the whole sparc64 SMP stuff uses cpumask_t
>> pointers only.
>
> nice!
>
>> What it comes down to is that you have to do the "self cpu" and other
>> tests in the cross-call dispatch routines themselves, instead of at
>> the top-level working on cpumask_t objects.
>>
>> Otherwise you have to modify cpumask_t objects and thus pluck them
>> onto the stack where they take up silly amounts of space.
>
> What we did was this: we added MAXSMP which just revs up all the SMP
> tunables to the maximum, so that we can see any problems early in
> testing.
>
> And we triggered problems, and we fixed a couple of regressions all
> around stack footprint. But we didnt catch all of them - some were gcc
> version dependent and configuration dependent. So i think it's safe to
> say that the whole concept of allowing such a large cpumask_t to be on
> the stack is fragile.

Iirc, it was the problem of basing percpu variables at zero that hit
problems with various gcc toolset versions. I don't remember any
version problems with cpumask's on the stack, they all failed the
same way... :-)
>
> Hence, i think the best way forward is to change the whole cpumask_t
> concept and disallow explicit masks altogether. It's so easy to smack a
> cpumask_t variable on the stack and nothing really warns about it, and
> any function can become part of a nested call sequence.

This is a great idea!
>
> So i think the dynamics of it has to be changed: we need a get/put API
> and we need to make on-stack cpumask illegal on the build level (in
> generic code at least). This has been Rusty's main argument early on i
> think, and i now concur.
>
> Ingo

Removing cpumask_t's from the stack is fairly straight forward. The
problem of changing all functions to expect a cpumask pointer via a
global change is much more problematic. And of course all those
functions that return a cpumask value would need to be addressed.

Thanks,
Mike

2008-08-26 19:20:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Jamie Lokier wrote:
>
> A function which is only called from one place should, if everything
> made sense, _never_ use more stack through being inlined.

But that's simply not true.

See the whole discussion.

The problem is that if you inline that function, the stack usage of the
newly inlined function is now added to ALL THE OTHER paths too!

So the case we had in module loading was that yes, we had a function with
a big stack footprint, but it was NOT in the deep path.

But by inlining it, it now moved the stack footprint "up" one level to
another function, and now the big stack footprint really _was_ in the deep
path, because the caller was involved in a much deeper chain.

So inlining moves the code up the callchain, and that is a problem for the
backtrace, but that's "just" a debugging issue. But it also moves the
stack footprint up the callchain, and that can actually be a correctness
issue.

Of course, a compiler doesn't _have_ to do that. A compiler _could_ have
multiple different stack footprints for a single function, and do liveness
analysis etc. But no sane compiler probably does that, because it's very
painful indeed, and it's not even an issue if you aren't stack-limited
(and being stack-limited is really just a kernel thing).

(Yeah, it can be an issue even if you have a big stack, in that you get
worse cache behaviour, so a dense stack footprint _would_ help. But the
complexity of stack liveness analysis is almost certainly not worth the
relatively small gains it would get on some odd cases).

Linus

2008-08-26 19:35:41

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
>
> On Tue, 26 Aug 2008, Mike Travis wrote:
>> The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
>> critical to our upcoming SGI systems using what we have been calling
>> "UV".
>
> That's fine. You can do it. The default kernel will not, because it's
> clearly not safe.
>
> I really don't care what you do to _your_ images. But I will not
> distribute a known-broken kernel, and I will not debug random stack
> overflows that happen in it.
>
> If you want the default kernel to support 4k cores, we'll need to fix the
> stack usage. I don't think that is impossible, but IT IS NOT GOING TO
> HAPPEN for 2.6.27.
>
> And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by
> default, they are totally and utterly crazy.
>
> But some SGI-specific binary that is meant for SGI machines only, and has
> been extensively tested with the setup used on SGI machines is a different
> thing.
>
> Linus

Ok, thanks for the reply, and looking into this issue. We will "strongly
encourage" our distros to base the relevant releases on 2.6.28. :-)

[Supplying an SGI-specific kernel would not be acceptable to many of our
customers because of the certification issues I mentioned.]

Mike

2008-08-26 19:38:05

by Dave Jones

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 12:09:46PM -0700, Linus Torvalds wrote:

> If you want the default kernel to support 4k cores, we'll need to fix the
> stack usage. I don't think that is impossible, but IT IS NOT GOING TO
> HAPPEN for 2.6.27.
>
> And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by
> default, they are totally and utterly crazy.

heh. *picks through Fedora changelog*

* Thu Aug 14 2008 Dave Jones <[email protected]>
- Bump max cpus supported on x86-64 to 4096. Just to see what happens.

I never did get to find out unfortunatly, because of the security fiasco
in Fedora infrastructure the last week or two.

> But some SGI-specific binary that is meant for SGI machines only, and has
> been extensively tested with the setup used on SGI machines is a different
> thing.

Every extra kernel image a distro vendor ends up shipping has an associated cost.

* build time: It currently takes about 2 hours for a set of Fedora RPMs.
For RHEL it'll be even worse due to the extra archs).
Killing off -smp specific builds was a big win for us in this regard.
Adding extra flavours is always painful.

* diskspace (distro kernels aren't small. With the associated debugging symbols,
they take up a shitload of disk space really fast).

* Having everyone running the same kernel makes it much easier to test/debug.
Our QA guys hate adding extra columns to their test matrix.

But yes, for this to be even remotely feasible, there has to be a negligable
performance cost associated with it, which right now, we clearly don't have.
Given that the number of people running 4096 CPU boxes even in a few years time
will still be tiny, punishing the common case is obviously absurd.

Dave

--
http://www.codemonkey.org.uk

2008-08-26 19:42:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Mike Travis wrote:
>
> I would be most interested in any tools to analyze call-trees and
> accumulated stack usages. My current method of using kdb is really
> time consuming.

Well, even just scripts/checkstack.pl is quite relevant.

The fact is, anything with a stack footprint of more than a hundred bytes
is suspect. We _do_ have a lot of cases of several hundred bytes, and some
of them are even very intentional.

For an example of _intentional_ and valid large stacks, look at
do_sys_poll and do_select. They both have a big stack footprint in a
normal kernel, and that's on purpose - it's not pretty, but they are very
common and performance-sensitive functions, and using a big stack allows
some basic allocations to be much cheaper by default.

Same goes for early_printk(), although I don't think the reasons are
really very strong in that case.

Sadly, while those functions are _fairly_ high up, they aren't at the top,
and we do have a lot of other functions that have huge stack footprints
for totally bogus reasons. But the intentional ones are at least in the
top ten.

But the kernel that Alan had problems with was different. The
_intentional_ ones were way down in the noise. do_sys_poll wasn't in the
top ten, it was barely even in the top 50! (It was in fact #49, to be
exact).

So look at the top ten in my kernel:

1 ide_generic_init [vmlinux]: 1384
2 idefloppy_ioctl [vmlinux]: 1208
3 e1000_check_options [vmlinux]: 1152
4 do_sys_poll [vmlinux]: 904
5 ide_floppy_get_capacity [vmlinux]: 872
6 do_select [vmlinux]: 744
7 early_printk [vmlinux]: 720
8 do_task_stat [vmlinux]: 680
9 mmc_ioctl [vmlinux]: 648
10 elf_kcore_store_hdr [vmlinux]: 576

.. and in Alan's kernel:

1 smp_call_function_mask [vmlinux]: 2736
2 __build_sched_domains [vmlinux]: 2232
3 setup_IO_APIC_irq [vmlinux]: 1616
4 arch_setup_ht_irq [vmlinux]: 1600
5 arch_setup_msi_irq [vmlinux]: 1600
6 __assign_irq_vector [vmlinux]: 1592
7 move_task_off_dead_cpu [vmlinux]: 1592
8 tick_handle_oneshot_broadcast [vmlinux]:1544
9 store_scaling_governor [vmlinux]: 1376
10 cpuset_write_resmask [vmlinux]: 1360

That's a big difference. The top #1 in my kernel would just _barely_ be in
the top 10 in Alan's kernel (he doesn't have it at all, because he didn't
compile the drives I did into the kernel).

And the top three in my kernel are just because of crap code. That
"e1000_check_options" thing is there just because it creates multiple
"struct e1000_option" structures. I wrote an ugly but totally trivial
patch to get it down to ~600 bytes, and it would be less if I had bothered
to waste any more time on it.

The others are similar issues of "people just didn't think".

But look at the top ones in Alan's kernel. Not only are they _much_ bigger
than the top ones in a sane kernel, they are _all_ due to cpumask_t, I
think.

Linus

2008-08-26 19:49:18

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Ingo Molnar wrote:
> * Christoph Lameter <[email protected]> wrote:
>
>> Alan D. Brunelle wrote:
>>
>>> I think you're right: the kernel as a whole may not be ready for 4,096
>>> CPUs apparently...
>> Mike has been working diligently on getting all these cpumasks off the
>> stack for the last months and has created an infrastructure to do
>> this. So I think we are close. It might just be a matter of merging
>> some more patches that are still left in Ingo's tree.
>
> hm, there are no such patches left that i know of - the only bits in
> -tip are the zero-based percpu, which was found to be a bit fragile in
> testing:

Yes, it's just a case of new changes abusing the stack.
>
> earth4:~/tip> git-log-line --author=Travis linus..
> d379497: Zero based percpu: infrastructure to rebase the per cpu area to zero
> b3a0cb4: x86: extend percpu ops to 64 bit
>
> [and it has no relevance to stack footprint.]
>
> So i dont think the current cpumask_t approach will work. We simply
> should not get into an endless fight against the windmills that
> introduce on-stack cpumask_t again and again. We should just take the
> plunge once and do a clean alloc/free cpumask model. Most of the hotpath
> cpumasks are constant or pre-constructed, so they are not a real issue.

It would have been nice to know this 9 months ago... ;-)
>
> Plus, on the general question of stack footprint problems and the
> difficulty of debugging them, the worst-case stack footprint tracer i
> wrote for -rt some time ago should be dusted off as well and put into
> ftrace. David has something quite close to that for Sparc64 already.
>
> Ingo

I'll start experimenting with globally changing cpumask_t to be a pointer,
and see what falls out.

Thanks,
Mike

2008-08-26 19:56:10

by Jeff Garzik

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
> The downsides of inlining are big enough from both a debugging and a real
> code generation angle (eg stack usage like this), that the upsides
> (_somesimes_ smaller kernel, possibly slightly faster code) simply aren't
> relevant.
>
> So the "noinline" was random, yes, but this is a real issue. Looking at
> checkstack output for a saner config (NR_CPUS=16), the top entries for me
> are things like
>
> ide_generic_init [vmlinux]: 1384
> idefloppy_ioctl [vmlinux]: 1208
> e1000_check_options [vmlinux]: 1152
> ...
>
> which are "leaf" functions. They are broken as hell (the e1000 is
> apparently because it builds structs on the stack that should all be
> "static const", for example), but they are different from something like
> the module init sequence in that they are not going to affect anything
> else.


e1000_check_options builds a struct (singular) on the stack, really...
struct e1000_option is reasonably small.

The problem, which has also shown itself in large ioctl-style switch{}
statements, is that gcc will generate code such that the stack usage
from independent code branches

if {cond1} {
char buster1[1000];
foo(buster1);
} else if (cond2) {
char buster2[1000];
foo(buster2);
}

are added together, not noticed as mutually exclusive.

Of course, adding 'static const' as you noted is a reasonable
workaround, but gcc is really annoying WRT stack allocation in this manner.

I had problems in the past, before struct ethtool_ops, with like ethtool
ioctl switch statements using gobs of stack. In fact, that was a big
motivation for struct ethtool_ops.

Jeff

2008-08-26 20:01:30

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Dave Jones wrote:
...
>
> But yes, for this to be even remotely feasible, there has to be a negligable
> performance cost associated with it, which right now, we clearly don't have.
> Given that the number of people running 4096 CPU boxes even in a few years time
> will still be tiny, punishing the common case is obviously absurd.
>
> Dave
>

I did do some fairly extensive benchmarking between configs of NR_CPUS = 128 and
4096 and most performance hits were in the neighborhood of < 5% on systems with
8 cpus and 4GB of memory (our most common test system). [But changing cpumask_t's
to be pointers instead of values will likely increase this.] I've tried to be
very sensitive to this issue with all my previous changes, so convincing the distros
to set NR_CPUS=4096 would be as painless for them as possible. ;-)

Btw, huge count cpu systems I don't think are that far away. I believe the nextgen
Larabbee chips will be geared towards HPC applications [instead of just GFX apps],
and putting 4 of these chips on a motherboard would add up to 512 cpu threads (1024
if they support hyperthreading.)

Thanks,
Mike

2008-08-26 20:08:14

by Linus Torvalds

[permalink] [raw]
Subject: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)



On Tue, 26 Aug 2008, Jeff Garzik wrote:
>
> e1000_check_options builds a struct (singular) on the stack, really... struct
> e1000_option is reasonably small.

No it doesn't.

Look a bit more closely.

It builds a struct (singular) MANY MANY times. It also then builds up a
huge e1000_opt_list[] array, even though it is const and should be static
(and const).

I know. I wrote a patch to FIX it.

Here's the patch. It shrinks the stack from 1152 bytes to 192 bytes (the
first version, that only did the e1000_option part, got it down to 600
bytes). About half comes from not using multiple "e1000_option"
structures, the other half comes from turning the "e1000_opt_list[]"
arrays into "static const" instead, so that gcc doesn't copy them onto the
stack.

Most of the patch is actually doing things like turning

struct struct e1000_option opt = {

(which declares a _new_ e1000_option variable each time) into

opt = (struct e1000_option) {

which just re-uses the single variable.

It becomes slightly larger than that, because some places the "opt = .."
had to be moved around, since it's no longer a variable declaration, but a
regular assignment.

The rest is just adding "const" to the right places, and turning

struct e1000_opt_list speed_list[] = ..

into

static const struct e1000_opt_list speed_list[] = ..

instead, and fixing the indentation to be more straightforward.

I have not tested the dang thing, but I think it's correct. And it turns
stack usage from "totally horrible and broken" into "pretty reasonable".

Linus

---
drivers/net/e1000/e1000_param.c | 81 +++++++++++++++++++++-----------------
1 files changed, 45 insertions(+), 36 deletions(-)

diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c
index b9f90a5..213437d 100644
--- a/drivers/net/e1000/e1000_param.c
+++ b/drivers/net/e1000/e1000_param.c
@@ -208,7 +208,7 @@ struct e1000_option {
} r;
struct { /* list_option info */
int nr;
- struct e1000_opt_list { int i; char *str; } *p;
+ const struct e1000_opt_list { int i; char *str; } *p;
} l;
} arg;
};
@@ -242,7 +242,7 @@ static int __devinit e1000_validate_option(unsigned int *value,
break;
case list_option: {
int i;
- struct e1000_opt_list *ent;
+ const struct e1000_opt_list *ent;

for (i = 0; i < opt->arg.l.nr; i++) {
ent = &opt->arg.l.p[i];
@@ -279,7 +279,9 @@ static void e1000_check_copper_options(struct e1000_adapter *adapter);

void __devinit e1000_check_options(struct e1000_adapter *adapter)
{
+ struct e1000_option opt;
int bd = adapter->bd_number;
+
if (bd >= E1000_MAX_NIC) {
DPRINTK(PROBE, NOTICE,
"Warning: no configuration for board #%i\n", bd);
@@ -287,19 +289,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}

{ /* Transmit Descriptor Count */
- struct e1000_option opt = {
+ struct e1000_tx_ring *tx_ring = adapter->tx_ring;
+ int i;
+ e1000_mac_type mac_type = adapter->hw.mac_type;
+
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Transmit Descriptors",
.err = "using default of "
__MODULE_STRING(E1000_DEFAULT_TXD),
.def = E1000_DEFAULT_TXD,
- .arg = { .r = { .min = E1000_MIN_TXD }}
+ .arg = { .r = {
+ .min = E1000_MIN_TXD,
+ .max = mac_type < e1000_82544 ? E1000_MAX_TXD : E1000_MAX_82544_TXD
+ }}
};
- struct e1000_tx_ring *tx_ring = adapter->tx_ring;
- int i;
- e1000_mac_type mac_type = adapter->hw.mac_type;
- opt.arg.r.max = mac_type < e1000_82544 ?
- E1000_MAX_TXD : E1000_MAX_82544_TXD;

if (num_TxDescriptors > bd) {
tx_ring->count = TxDescriptors[bd];
@@ -313,19 +317,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
tx_ring[i].count = tx_ring->count;
}
{ /* Receive Descriptor Count */
- struct e1000_option opt = {
+ struct e1000_rx_ring *rx_ring = adapter->rx_ring;
+ int i;
+ e1000_mac_type mac_type = adapter->hw.mac_type;
+
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Receive Descriptors",
.err = "using default of "
__MODULE_STRING(E1000_DEFAULT_RXD),
.def = E1000_DEFAULT_RXD,
- .arg = { .r = { .min = E1000_MIN_RXD }}
+ .arg = { .r = {
+ .min = E1000_MIN_RXD,
+ .max = mac_type < e1000_82544 ? E1000_MAX_RXD : E1000_MAX_82544_RXD
+ }}
};
- struct e1000_rx_ring *rx_ring = adapter->rx_ring;
- int i;
- e1000_mac_type mac_type = adapter->hw.mac_type;
- opt.arg.r.max = mac_type < e1000_82544 ? E1000_MAX_RXD :
- E1000_MAX_82544_RXD;

if (num_RxDescriptors > bd) {
rx_ring->count = RxDescriptors[bd];
@@ -339,7 +345,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
rx_ring[i].count = rx_ring->count;
}
{ /* Checksum Offload Enable/Disable */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = enable_option,
.name = "Checksum Offload",
.err = "defaulting to Enabled",
@@ -363,7 +369,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
{ E1000_FC_FULL, "Flow Control Enabled" },
{ E1000_FC_DEFAULT, "Flow Control Hardware Default" }};

- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = list_option,
.name = "Flow Control",
.err = "reading default settings from EEPROM",
@@ -381,7 +387,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Transmit Interrupt Delay */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Transmit Interrupt Delay",
.err = "using default of " __MODULE_STRING(DEFAULT_TIDV),
@@ -399,7 +405,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Transmit Absolute Interrupt Delay */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Transmit Absolute Interrupt Delay",
.err = "using default of " __MODULE_STRING(DEFAULT_TADV),
@@ -417,7 +423,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Receive Interrupt Delay */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Receive Interrupt Delay",
.err = "using default of " __MODULE_STRING(DEFAULT_RDTR),
@@ -435,7 +441,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Receive Absolute Interrupt Delay */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Receive Absolute Interrupt Delay",
.err = "using default of " __MODULE_STRING(DEFAULT_RADV),
@@ -453,7 +459,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Interrupt Throttling Rate */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = range_option,
.name = "Interrupt Throttling Rate (ints/sec)",
.err = "using default of " __MODULE_STRING(DEFAULT_ITR),
@@ -497,7 +503,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Smart Power Down */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = enable_option,
.name = "PHY Smart Power Down",
.err = "defaulting to Disabled",
@@ -513,7 +519,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
}
}
{ /* Kumeran Lock Loss Workaround */
- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = enable_option,
.name = "Kumeran Lock Loss Workaround",
.err = "defaulting to Enabled",
@@ -578,16 +584,18 @@ static void __devinit e1000_check_fiber_options(struct e1000_adapter *adapter)

static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
{
+ struct e1000_option opt;
unsigned int speed, dplx, an;
int bd = adapter->bd_number;

{ /* Speed */
- struct e1000_opt_list speed_list[] = {{ 0, "" },
- { SPEED_10, "" },
- { SPEED_100, "" },
- { SPEED_1000, "" }};
+ static const struct e1000_opt_list speed_list[] = {
+ { 0, "" },
+ { SPEED_10, "" },
+ { SPEED_100, "" },
+ { SPEED_1000, "" }};

- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = list_option,
.name = "Speed",
.err = "parameter ignored",
@@ -604,11 +612,12 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
}
}
{ /* Duplex */
- struct e1000_opt_list dplx_list[] = {{ 0, "" },
- { HALF_DUPLEX, "" },
- { FULL_DUPLEX, "" }};
+ static const struct e1000_opt_list dplx_list[] = {
+ { 0, "" },
+ { HALF_DUPLEX, "" },
+ { FULL_DUPLEX, "" }};

- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = list_option,
.name = "Duplex",
.err = "parameter ignored",
@@ -637,7 +646,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
"parameter ignored\n");
adapter->hw.autoneg_advertised = AUTONEG_ADV_DEFAULT;
} else { /* Autoneg */
- struct e1000_opt_list an_list[] =
+ static const struct e1000_opt_list an_list[] =
#define AA "AutoNeg advertising "
{{ 0x01, AA "10/HD" },
{ 0x02, AA "10/FD" },
@@ -671,7 +680,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
{ 0x2e, AA "1000/FD, 100/FD, 100/HD, 10/FD" },
{ 0x2f, AA "1000/FD, 100/FD, 100/HD, 10/FD, 10/HD" }};

- struct e1000_option opt = {
+ opt = (struct e1000_option) {
.type = list_option,
.name = "AutoNeg",
.err = "parameter ignored",

2008-08-26 20:14:35

by Kok, Auke

[permalink] [raw]
Subject: Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)

Linus Torvalds wrote:
>
> On Tue, 26 Aug 2008, Jeff Garzik wrote:
>> e1000_check_options builds a struct (singular) on the stack, really... struct
>> e1000_option is reasonably small.
>
> No it doesn't.
>
> Look a bit more closely.
>
> It builds a struct (singular) MANY MANY times. It also then builds up a
> huge e1000_opt_list[] array, even though it is const and should be static
> (and const).
>
> I know. I wrote a patch to FIX it.

totally cool patch afaics - if I still maintained this driver I'd have this tested
and merged right away :)

I suppose Jeff Kirsher is already doing so right now.

I suppose that he'll have to look at the other Intel ethernet drivers as well :)

Jeff, please add my:

Reveiewed-by: Auke Kok <[email protected]>

Cheers,

Auke

>
> Here's the patch. It shrinks the stack from 1152 bytes to 192 bytes (the
> first version, that only did the e1000_option part, got it down to 600
> bytes). About half comes from not using multiple "e1000_option"
> structures, the other half comes from turning the "e1000_opt_list[]"
> arrays into "static const" instead, so that gcc doesn't copy them onto the
> stack.
>
> Most of the patch is actually doing things like turning
>
> struct struct e1000_option opt = {
>
> (which declares a _new_ e1000_option variable each time) into
>
> opt = (struct e1000_option) {
>
> which just re-uses the single variable.
>
> It becomes slightly larger than that, because some places the "opt = .."
> had to be moved around, since it's no longer a variable declaration, but a
> regular assignment.
>
> The rest is just adding "const" to the right places, and turning
>
> struct e1000_opt_list speed_list[] = ..
>
> into
>
> static const struct e1000_opt_list speed_list[] = ..
>
> instead, and fixing the indentation to be more straightforward.
>
> I have not tested the dang thing, but I think it's correct. And it turns
> stack usage from "totally horrible and broken" into "pretty reasonable".
>
> Linus
>
> ---
> drivers/net/e1000/e1000_param.c | 81 +++++++++++++++++++++-----------------
> 1 files changed, 45 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c
> index b9f90a5..213437d 100644
> --- a/drivers/net/e1000/e1000_param.c
> +++ b/drivers/net/e1000/e1000_param.c
> @@ -208,7 +208,7 @@ struct e1000_option {
> } r;
> struct { /* list_option info */
> int nr;
> - struct e1000_opt_list { int i; char *str; } *p;
> + const struct e1000_opt_list { int i; char *str; } *p;
> } l;
> } arg;
> };
> @@ -242,7 +242,7 @@ static int __devinit e1000_validate_option(unsigned int *value,
> break;
> case list_option: {
> int i;
> - struct e1000_opt_list *ent;
> + const struct e1000_opt_list *ent;
>
> for (i = 0; i < opt->arg.l.nr; i++) {
> ent = &opt->arg.l.p[i];
> @@ -279,7 +279,9 @@ static void e1000_check_copper_options(struct e1000_adapter *adapter);
>
> void __devinit e1000_check_options(struct e1000_adapter *adapter)
> {
> + struct e1000_option opt;
> int bd = adapter->bd_number;
> +
> if (bd >= E1000_MAX_NIC) {
> DPRINTK(PROBE, NOTICE,
> "Warning: no configuration for board #%i\n", bd);
> @@ -287,19 +289,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
>
> { /* Transmit Descriptor Count */
> - struct e1000_option opt = {
> + struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> + int i;
> + e1000_mac_type mac_type = adapter->hw.mac_type;
> +
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Transmit Descriptors",
> .err = "using default of "
> __MODULE_STRING(E1000_DEFAULT_TXD),
> .def = E1000_DEFAULT_TXD,
> - .arg = { .r = { .min = E1000_MIN_TXD }}
> + .arg = { .r = {
> + .min = E1000_MIN_TXD,
> + .max = mac_type < e1000_82544 ? E1000_MAX_TXD : E1000_MAX_82544_TXD
> + }}
> };
> - struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> - int i;
> - e1000_mac_type mac_type = adapter->hw.mac_type;
> - opt.arg.r.max = mac_type < e1000_82544 ?
> - E1000_MAX_TXD : E1000_MAX_82544_TXD;
>
> if (num_TxDescriptors > bd) {
> tx_ring->count = TxDescriptors[bd];
> @@ -313,19 +317,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> tx_ring[i].count = tx_ring->count;
> }
> { /* Receive Descriptor Count */
> - struct e1000_option opt = {
> + struct e1000_rx_ring *rx_ring = adapter->rx_ring;
> + int i;
> + e1000_mac_type mac_type = adapter->hw.mac_type;
> +
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Receive Descriptors",
> .err = "using default of "
> __MODULE_STRING(E1000_DEFAULT_RXD),
> .def = E1000_DEFAULT_RXD,
> - .arg = { .r = { .min = E1000_MIN_RXD }}
> + .arg = { .r = {
> + .min = E1000_MIN_RXD,
> + .max = mac_type < e1000_82544 ? E1000_MAX_RXD : E1000_MAX_82544_RXD
> + }}
> };
> - struct e1000_rx_ring *rx_ring = adapter->rx_ring;
> - int i;
> - e1000_mac_type mac_type = adapter->hw.mac_type;
> - opt.arg.r.max = mac_type < e1000_82544 ? E1000_MAX_RXD :
> - E1000_MAX_82544_RXD;
>
> if (num_RxDescriptors > bd) {
> rx_ring->count = RxDescriptors[bd];
> @@ -339,7 +345,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> rx_ring[i].count = rx_ring->count;
> }
> { /* Checksum Offload Enable/Disable */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = enable_option,
> .name = "Checksum Offload",
> .err = "defaulting to Enabled",
> @@ -363,7 +369,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> { E1000_FC_FULL, "Flow Control Enabled" },
> { E1000_FC_DEFAULT, "Flow Control Hardware Default" }};
>
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = list_option,
> .name = "Flow Control",
> .err = "reading default settings from EEPROM",
> @@ -381,7 +387,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Transmit Interrupt Delay */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Transmit Interrupt Delay",
> .err = "using default of " __MODULE_STRING(DEFAULT_TIDV),
> @@ -399,7 +405,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Transmit Absolute Interrupt Delay */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Transmit Absolute Interrupt Delay",
> .err = "using default of " __MODULE_STRING(DEFAULT_TADV),
> @@ -417,7 +423,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Receive Interrupt Delay */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Receive Interrupt Delay",
> .err = "using default of " __MODULE_STRING(DEFAULT_RDTR),
> @@ -435,7 +441,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Receive Absolute Interrupt Delay */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Receive Absolute Interrupt Delay",
> .err = "using default of " __MODULE_STRING(DEFAULT_RADV),
> @@ -453,7 +459,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Interrupt Throttling Rate */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = range_option,
> .name = "Interrupt Throttling Rate (ints/sec)",
> .err = "using default of " __MODULE_STRING(DEFAULT_ITR),
> @@ -497,7 +503,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Smart Power Down */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = enable_option,
> .name = "PHY Smart Power Down",
> .err = "defaulting to Disabled",
> @@ -513,7 +519,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
> }
> }
> { /* Kumeran Lock Loss Workaround */
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = enable_option,
> .name = "Kumeran Lock Loss Workaround",
> .err = "defaulting to Enabled",
> @@ -578,16 +584,18 @@ static void __devinit e1000_check_fiber_options(struct e1000_adapter *adapter)
>
> static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
> {
> + struct e1000_option opt;
> unsigned int speed, dplx, an;
> int bd = adapter->bd_number;
>
> { /* Speed */
> - struct e1000_opt_list speed_list[] = {{ 0, "" },
> - { SPEED_10, "" },
> - { SPEED_100, "" },
> - { SPEED_1000, "" }};
> + static const struct e1000_opt_list speed_list[] = {
> + { 0, "" },
> + { SPEED_10, "" },
> + { SPEED_100, "" },
> + { SPEED_1000, "" }};
>
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = list_option,
> .name = "Speed",
> .err = "parameter ignored",
> @@ -604,11 +612,12 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
> }
> }
> { /* Duplex */
> - struct e1000_opt_list dplx_list[] = {{ 0, "" },
> - { HALF_DUPLEX, "" },
> - { FULL_DUPLEX, "" }};
> + static const struct e1000_opt_list dplx_list[] = {
> + { 0, "" },
> + { HALF_DUPLEX, "" },
> + { FULL_DUPLEX, "" }};
>
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = list_option,
> .name = "Duplex",
> .err = "parameter ignored",
> @@ -637,7 +646,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
> "parameter ignored\n");
> adapter->hw.autoneg_advertised = AUTONEG_ADV_DEFAULT;
> } else { /* Autoneg */
> - struct e1000_opt_list an_list[] =
> + static const struct e1000_opt_list an_list[] =
> #define AA "AutoNeg advertising "
> {{ 0x01, AA "10/HD" },
> { 0x02, AA "10/FD" },
> @@ -671,7 +680,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
> { 0x2e, AA "1000/FD, 100/FD, 100/HD, 10/FD" },
> { 0x2f, AA "1000/FD, 100/FD, 100/HD, 10/FD, 10/HD" }};
>
> - struct e1000_option opt = {
> + opt = (struct e1000_option) {
> .type = list_option,
> .name = "AutoNeg",
> .err = "parameter ignored",

2008-08-26 20:22:33

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 11:40:10AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> >
> > A debugging option (for better traces) to disallow gcc some inlining
> > might make sense (and might even make sense for distributions to
> > enable in their kernels), but when you go to use cases that require
> > really small kernels the cost is too high.
>
> You ignore the fact that it's really not just about debugging.

I had in mind that we anyway have to support it for tiny kernels.

I simply don't see that we add kconfig options for 5kB of code for
tiny kernels but remove something like this that can cause size
increases > 1%.

> Inlining really isn't the great tool some people think it is. Especially
> not since gcc stack allocation is so horrid that it won't re-use stack
> slots etc (which I don't disagree with per se - it's _hard_ to re-use
> stack slots while still allowing code scheduling).

gcc's stack allocation has become better
(that's why we disable unit-at-a-time only for gcc 3.4 on i386).

> NOTE! I also would never claim that _our_ choices of "inline" are all that
> great, and we've often inlined too much or not inlined things that really
> could be inlined. But at least when a developer says "inline" (or forgets
> to say it), we have somebody to blame. When the compiler does insane
> things that doesn't suit us, we're just screwed.

Most LOCs of the kernel are not written by people like you or Al Viro or
David Miller, and the average kernel developer is unlikely to do it as
good as gcc.

For the average driver the choice is realistically between
"inline's randomly sprinkled across the driver" and
"no inline's, leave it to gcc".

And code evolves during the years from tiny with 1 caller to huge with
many callers.

BTW:
I just ran checkstack on a (roughly) allyesconfig kernel, and we have a
new driver that allocates "unsigned char recvbuf[1500];" on the stack...

> > But if you don't trust gcc's inlining you should revert
> > commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's
> > freedom regarding what to inline in 2.6.27
>
> Actually, that just allows gcc to _not_ inline. Which is probably ok.
>
> (Well, it would be ok if gcc did it well enough, it obviously has some
> problems at times).

With the "gcc inline's static functions" you complain about we have
4-5 years of experience.

Suddenly allowing 4 release series of gcc to ignore any inline's is a
completely new area for us. I'd generally agree with giving gcc more
freedom here, but I'd rather do it right by removing tons of wrong
inline's than doing one global change hoping that it will make things
better.

And whether the "optimized inlining" actually makes the kernel bigger or
smaller depends in my experience on the .config and the gcc version.

> Linus

cu
Adrian

[1] there are some rare exceptions

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-26 20:43:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Adrian Bunk wrote:
>
> I had in mind that we anyway have to support it for tiny kernels.

I actually don't think that is true.

If we really were to decide to be stricter about it, and it makes a big
size difference, we can probably also add a tool to warn about functions
that really should be inline.

> > Inlining really isn't the great tool some people think it is. Especially
> > not since gcc stack allocation is so horrid that it won't re-use stack
> > slots etc (which I don't disagree with per se - it's _hard_ to re-use
> > stack slots while still allowing code scheduling).
>
> gcc's stack allocation has become better
> (that's why we disable unit-at-a-time only for gcc 3.4 on i386).


I agree that it has become better. But it still absolutely *sucks*.

For example, see the patch I just posted about e1000 stack usage. Even
though the variables were all in completely separate scopes, they all got
individual space on the stack over the whole lifetime of the function,
causing an explosion of stack-space. As such, gcc used 500 bytes too much
of stack, just because it didn't re-use the stackspace.

That was with gcc-4.3.0, and no, there were hardly any inlining issues
involevd, although it is true that inlining actually did make it slightly
worse in that case too (but since it was essentially a leaf function, that
had little real life impact, since there were no deep callchains below it
to care).

So the fact is, "better" simply is not "good enough". We still need to do
a lot of optimizations _manually_, because gcc cannot see that it can
re-use the stack-slots.

And sometimes those "optimizations" are actually performance
pessimizations, because in order to make gcc not use all the stack at the
same time, you simply have to break things out and force-disable inlining.

> Most LOCs of the kernel are not written by people like you or Al Viro or
> David Miller, and the average kernel developer is unlikely to do it as
> good as gcc.

Sure. But we do have tools. We do have checkstack.pl, it's just that it
hasn't been an issue in a long time, so I suspect many people didn't even
_realize_ we have it, and I certainly can attest to the fact that even
people who remember it - like me - don't actually tend to run it all that
often.

> For the average driver the choice is realistically between
> "inline's randomly sprinkled across the driver" and
> "no inline's, leave it to gcc".

And neither is likely to be a big problem.

> BTW:
> I just ran checkstack on a (roughly) allyesconfig kernel, and we have a
> new driver that allocates "unsigned char recvbuf[1500];" on the stack...

Yeah, it's _way_ too easy to do bad things.

> With the "gcc inline's static functions" you complain about we have
> 4-5 years of experience.

Sure. And most of it isn't all that great.

But I do agree that lettign gcc make more decisions is _dangerous_.
However, in this case, at least, the decisions it makes would at least
make for less inlining, and thus less stack space explosion.

Linus

2008-08-26 20:45:52

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Mike Travis <[email protected]>
Date: Tue, 26 Aug 2008 12:06:18 -0700

> David Miller wrote:
> > The only case that didn't work was due to a limitation in
> > arch interfaces for the new generic smp_call_function() code.
> > It passes a cpumask_t instead of a pointer to one via
> > arch_send_call_function_ipi().
> >
> > But other than that, the whole sparc64 SMP stuff uses cpumask_t
> > pointers only.
> >
> > What it comes down to is that you have to do the "self cpu"
> > and other tests in the cross-call dispatch routines themselves,
> > instead of at the top-level working on cpumask_t objects.
> >
> > Otherwise you have to modify cpumask_t objects and thus pluck
> > them onto the stack where they take up silly amounts of space.
>
> Yes, I had proposed either modifying, or supplementing a new
> smp_call function to pass the cpumask_t as a pointer (similar
> to set_cpus_allowed_ptr.) But an ABI change such as this was
> not well received at the time.

What it seems to come down to is that any cpumask_t not inside of
a dynamically allocated object should be marked const.

And that is something we can enforce at compile time.

Linus has just suggested dynamically allocating cpumask_t's
for such cases but I don't see that as the fix either.

Just mark them const and enforce that cpumask_t objects can only
be modified when they appear in dynamically allocated objects.

You really don't need to modify the ones that passed around functions
anyways. The only code that wants to change bits in these things is
the cpu cross-call dispatch stuff, and that cpu choice logic can just
live where it belongs down in the cross-call dispatch code.

2008-08-26 21:00:25

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 11:47:01AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> >
> > I added "-fno-inline-functions-called-once -fno-early-inlining" to
> > KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel
> > image by 2%.
>
> Btw, did you check with just "-fno-inline-functions-called-once"?
>
> The -fearly-inlining decisions _should_ be mostly right. If gcc sees early
> that a function is so small (even without any constant propagation etc)
> that it can be inlined, it's probably right.
>
> The inline-functions-called-once thing is what causes even big functions
> to be inlined, and that's where you find the big downsides too (eg the
> stack usage).

-fno-inline-functions-called-once alone costs me nearly 1% in code size.

And I'd expect it to become more with "-fwhole-program --combine".


If you think we have too many stacksize problems I'd suggest to consider
removing the choice of 4k stacks on i386, sh and m68knommu instead of
using -fno-inline-functions-called-once:

Now that 32bit x86 is no longer used for extreme highend configurations
the only serious usecase for 4k stacks are AFAIK space savings on
embedded archs.

4k stacks have caused us much pain [1], and the cases where gcc inlined
too much were the easy ones.

I'm not saying that I'd like removing the choice of 4k stacks, but if we
want to reduce the number of stack related problems that's IMHO the
better alternative.


> Linus

cu
Adrian

[1] AFAIR some callpaths in the kernel are still too big

BTW: In case anyone wonders about why I suggest removing 4k stacks:
My position is that 4k stacks should either be enabled
unconditionally or no longer offered at all.
And if we remove 4k stacks from 32bit x86 it's no longer
realistically maintainable for other architectures.

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-26 21:05:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Adrian Bunk wrote:
>
> If you think we have too many stacksize problems I'd suggest to consider
> removing the choice of 4k stacks on i386, sh and m68knommu instead of
> using -fno-inline-functions-called-once:

Don't be silly. That makes the problem _worse_.

We're much better off with a 1% code-size reduction than forcing big
stacks on people. The 4kB stack option is also a good way of saying "if it
works with this, then 8kB is certainly safe".

And embedded people (the ones that might care about 1% code size) are the
ones that would also want smaller stacks even more!

Linus

2008-08-26 22:04:33

by Jeff Kirsher

[permalink] [raw]
Subject: Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)

On Tue, Aug 26, 2008 at 1:14 PM, Kok, Auke <[email protected]> wrote:
> Linus Torvalds wrote:
>>
>> On Tue, 26 Aug 2008, Jeff Garzik wrote:
>>> e1000_check_options builds a struct (singular) on the stack, really... struct
>>> e1000_option is reasonably small.
>>
>> No it doesn't.
>>
>> Look a bit more closely.
>>
>> It builds a struct (singular) MANY MANY times. It also then builds up a
>> huge e1000_opt_list[] array, even though it is const and should be static
>> (and const).
>>
>> I know. I wrote a patch to FIX it.
>
> totally cool patch afaics - if I still maintained this driver I'd have this tested
> and merged right away :)
>
> I suppose Jeff Kirsher is already doing so right now

You suppose correctly.
.
>
> I suppose that he'll have to look at the other Intel ethernet drivers as well :)
>
> Jeff, please add my:
>
> Reveiewed-by: Auke Kok <[email protected]>
>

Will do.

--
Cheers,
Jeff

2008-08-26 22:54:41

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
<[email protected]> wrote:

> And embedded people (the ones that might care about 1% code size) are the
> ones that would also want smaller stacks even more!

This is something I never understood - embedded devices are not going
to run more than a few processes and 4K*(Few Processes)
IMHO is not worth a saving now a days even in embedded world given
falling memory prices. Or do I misunderstand?

Parag

2008-08-26 23:00:48

by David VomLehn

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
> <[email protected]> wrote:
>
>> And embedded people (the ones that might care about 1% code size) are the
>> ones that would also want smaller stacks even more!
>
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
> IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Embedded applications span a huge range of sizes, from the very small devices to
which you refer, to quite complex devices. The cable settop boxes we develop have
over a hundred interrupt sources, typically run 250-300 threads, and have 192+
MiB of memory. For all that, we are very cost sensitive and are under constant
pressure to come up with reliable ways to save memory.

> Parag
--
David VomLehn

2008-08-26 23:25:16

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 02:04:57PM -0700, Linus Torvalds wrote:
>
>
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> >
> > If you think we have too many stacksize problems I'd suggest to consider
> > removing the choice of 4k stacks on i386, sh and m68knommu instead of
> > using -fno-inline-functions-called-once:
>
> Don't be silly. That makes the problem _worse_.
>
> We're much better off with a 1% code-size reduction than forcing big
> stacks on people. The 4kB stack option is also a good way of saying "if it
> works with this, then 8kB is certainly safe".
>...

You implicitely assume both would solve the same problem.

While 4kB stacks are something we anyway never got 100% working, the
cases where gcc inlining functions causes a critical increase in stack
usage are usually not that hard to find, and once found the fix is
trivial.

We should anyway monitor stack usages better since we have frequent
programming errors in this area, and problems caused by gcc can this
way be detected en passant.

You have a good point that aiming at 4kB makes 8kB a very safe choice.

But I do not think the problem you'd solve with
-fno-inline-functions-called-once is big enough to warrant the size
increase it causes.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-26 23:46:38

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 04:00:33PM -0700, David VomLehn wrote:
> Parag Warudkar wrote:
>> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
>> <[email protected]> wrote:
>>
>>> And embedded people (the ones that might care about 1% code size) are the
>>> ones that would also want smaller stacks even more!
>>
>> This is something I never understood - embedded devices are not going
>> to run more than a few processes and 4K*(Few Processes)
>> IMHO is not worth a saving now a days even in embedded world given
>> falling memory prices. Or do I misunderstand?
>
> Embedded applications span a huge range of sizes, from the very small
> devices to which you refer, to quite complex devices. The cable settop
> boxes we develop have over a hundred interrupt sources, typically run
> 250-300 threads, and have 192+ MiB of memory. For all that, we are very
> cost sensitive and are under constant pressure to come up with reliable
> ways to save memory.

As you say correctly the term "embedded" gets used for many different
devices.

And if you have 192+ MiB of memory you have so much that all these
kernel size discussions don't really matter.

> David VomLehn

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-26 23:48:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Parag Warudkar wrote:
>
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
> IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Well, by that argument, 1% of kernel size doesn't matter either..

1% of a kernel for an embedded device is roughly 10-30kB or so depending
on how small you make the configuration.

If that matters, then so should the difference of 3-8 processes' kernel
stack usage when you have a 4k/8k stack choice.

And they _all_ will have at least 3-8 processes on them. Even the simplest
ones will tend to have many more.

Linus

2008-08-26 23:52:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Wed, 27 Aug 2008, Adrian Bunk wrote:
> >
> > We're much better off with a 1% code-size reduction than forcing big
> > stacks on people. The 4kB stack option is also a good way of saying "if it
> > works with this, then 8kB is certainly safe".
>
> You implicitely assume both would solve the same problem.

I'm just saying that your logic doesn't hold water.

If we can save kernel stack usage, then a 1% increase in kernel size is
more than worth it.

> While 4kB stacks are something we anyway never got 100% working

What? Don't be silly.

Linux _historically_ always used 4kB stacks.

No, they are likely not usable on x86-64, but dammit, they should be more
than usable on x86-32 still.

> But I do not think the problem you'd solve with
> -fno-inline-functions-called-once is big enough to warrant the size
> increase it causes.

You continually try to see the inlining as a single solution to one
problem (debuggability, stack, whatever).

The biggest problem with gcc inlining has always been that it has been
_unpredictable_. It causes problems in many different ways. It has caused
stability issues due to gcc versions doing random things. It causes the
stack expansion. It makes stack traces harder for debugging, etc.

If it was any one thing, I wouldn't care. But it's exactly the fact that
it causes all these problems in different areas.

Linus

2008-08-27 00:24:22

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 04:51:52PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > >
> > > We're much better off with a 1% code-size reduction than forcing big
> > > stacks on people. The 4kB stack option is also a good way of saying "if it
> > > works with this, then 8kB is certainly safe".
> >
> > You implicitely assume both would solve the same problem.
>
> I'm just saying that your logic doesn't hold water.
>
> If we can save kernel stack usage, then a 1% increase in kernel size is
> more than worth it.

>From some tests the size increase seems to become bigger for smaller
kernels, but I don't have any really good data.


An interesting question is why most of our architectures for embedded
devices only offer bigger stacks:

The only architectures offering a 4kB stacks option are:
- m68knommu
- sh
- 32bit x86

The following architectures that are used in embedded devices
always use 8kB stacks (or bigger) in your tree:
- arm
- avr32
- blackfin
- cris
- frv
- h8300
- m32r
- m68k
- mips
- mn10300 (has an #ifdef CONFIG_4KSTACKS but no kconfig option)
- powerpc
- xtensa


> > While 4kB stacks are something we anyway never got 100% working
>
> What? Don't be silly.
>
> Linux _historically_ always used 4kB stacks.
>
> No, they are likely not usable on x86-64, but dammit, they should be more
> than usable on x86-32 still.


When did we get callpaths like like nfs+xfs+md+scsi reliably
working with 4kB stacks on x86-32?


>...
> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-27 00:30:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Wed, 27 Aug 2008, Adrian Bunk wrote:
>
> When did we get callpaths like like nfs+xfs+md+scsi reliably
> working with 4kB stacks on x86-32?

XFS may never have been usable, but the rest, sure.

And you seem to be making this whole argument an excuse to SUCK, adn an
excuse to let gcc crap even more on our stack space.

Why?

Why aren't you saying that we should be able to do better? Instead, you
seem to asking us to do even worse than we do now?

Linus

2008-08-27 00:56:10

by Greg Ungerer

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


Linus Torvalds wrote:
> On Tue, 26 Aug 2008, Parag Warudkar wrote:
>> This is something I never understood - embedded devices are not going
>> to run more than a few processes and 4K*(Few Processes)
>> IMHO is not worth a saving now a days even in embedded world given
>> falling memory prices. Or do I misunderstand?
>
> Well, by that argument, 1% of kernel size doesn't matter either..
>
> 1% of a kernel for an embedded device is roughly 10-30kB or so depending
> on how small you make the configuration.
>
> If that matters, then so should the difference of 3-8 processes' kernel
> stack usage when you have a 4k/8k stack choice.
>
> And they _all_ will have at least 3-8 processes on them. Even the simplest
> ones will tend to have many more.

I have some simple devices (network access/routers) with 8MB of RAM,
at power up not really being configured to do anything running 25
processes. (Heck there is over 10 kernel processes running!). Configure
some interfaces and services and that will easily push past 40.
I'd be happy with a 160k saving :-)

The init memory being freed at the end of the kernel boot is 88k,
4k stacks could save more than that.

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

2008-08-27 00:58:27

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 7:47 PM, Linus Torvalds
<[email protected]> wrote:

> If that matters, then so should the difference of 3-8 processes' kernel
> stack usage when you have a 4k/8k stack choice.

The savings part -financial ones- are not always realizable with the
way memory is priced/sized/fitted.
Savings in few Mb of Kernel stack are not necessarily going to allow
getting rid of a single memory chip of 64M or so.
Either that or embedded manufacturing/configurations are different
than the desktop world.

(If my device has 2 memory slots and my user space requires 100Mb
including kernel memory - I anyways have to put in 64Mx2 there to take
advantage of mass manufactured, general purpose memory - so no big
deal if I saved 1.2Mb in Kernel stack or not. And savings of 64Mb
Kernel memory are not feasible anyways to allow user space to work
with 64Mb.)

On the other hand reducing user space memory usage on those devices
(not counting savings from kernel stack size) is a way more attractive
option.

And although you said in your later reply that Linux x86 with 4K
stacks should be more than usable - my experiences running a untainted
desktop/file server with 4K stack have been always disastrous XFS or
not. It _might_ work for some well defined workloads but you would
not want to risk 4K stacks otherwise.

I understand the having 4K stack option as a non-default for very
specific workloads is a good idea but apart from that I think no one
else seems to bother with reducing stack sizes (by no one I mean other
OSes.)

Parag

2008-08-27 01:08:26

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 8:53 PM, Greg Ungerer <[email protected]> wrote:

> I have some simple devices (network access/routers) with 8MB of RAM,
> at power up not really being configured to do anything running 25
> processes. (Heck there is over 10 kernel processes running!). Configure
> some interfaces and services and that will easily push past 40.
> I'd be happy with a 160k saving :-)
>

So you really need to run all 25 processes on that 8Mb box?
(For reference even the NGW100 development board comes with 16Mb RAM).

Even if you do need those all 25 processes on the 8Mb box, fixing the
memory usage of those user space hogs is lot better than trying to
save 160Kb in kernel stacks.
Last I looked, user space wasn't particularly frugal with memory usage.

Parag

2008-08-27 01:34:19

by Greg Ungerer

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 8:53 PM, Greg Ungerer <[email protected]> wrote:
>
>> I have some simple devices (network access/routers) with 8MB of RAM,
>> at power up not really being configured to do anything running 25
>> processes. (Heck there is over 10 kernel processes running!). Configure
>> some interfaces and services and that will easily push past 40.
>> I'd be happy with a 160k saving :-)
>>
>
> So you really need to run all 25 processes on that 8Mb box?

Yes, of course. Considerable effort has been put into running
a minimal set of processes (that still for fills the required function
set of this device).


> (For reference even the NGW100 development board comes with 16Mb RAM).

Lots of development boards are fitted with lots of RAM.

And the pressure will still be on in _real_ products to reduce
the RAM footprint as much as possible. There are exceptions but
generally less is cheaper. Simple economics really.


> Even if you do need those all 25 processes on the 8Mb box, fixing the
> memory usage of those user space hogs is lot better than trying to
> save 160Kb in kernel stacks.

Yep, been done too. You don't squeeze a lot into these smaller
devices without looking at everything in it.


> Last I looked, user space wasn't particularly frugal with memory usage.

Then you haven't looked in the right places :-)

There are plenty of choices for making things small in user space.
Simple stuff like using uClibc, busybox, etc.

In this specific example things like /bin/init is 10k, /bin/inetd
is 10k, /bin/crond is 11k, etc. (Ofcourse it is a shared uClibc setup,
uClibc is ~300k). And XIP can help out here too.

Regards
Greg



------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

2008-08-27 01:50:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Parag Warudkar wrote:
>
> And although you said in your later reply that Linux x86 with 4K
> stacks should be more than usable - my experiences running a untainted
> desktop/file server with 4K stack have been always disastrous XFS or
> not. It _might_ work for some well defined workloads but you would
> not want to risk 4K stacks otherwise.

Umm. How long?

4kB used to be the _only_ choice. And no, there weren't even irq stacks.
So that 4kB was not just the whole kernel call-chain, it was also all the
irq nesting above it.

And yes, we've gotten much worse over time, and no, I can't really suggest
going back to that in general. The code bloat has certainly been
accompanied by a stack bloat too.

But part of it is definitely gcc. Some versions of gcc used to be
absolutely _horrid_ when it came to stack usage, especially with some
flags, and especially with the crazy inlining that module-at-a-time
caused.

But I'd be really happy if some embedded people tried to take some of that
bloat back, and aim for 4kB stacks. Because it's definitely not
unrealistic. At least it _shouldn't_ be. And a lot of the cases of us
having structures on the stack is actually not worth it, and tends to be
about being lazy rather than anything else.

Linus

2008-08-27 02:16:28

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 9:31 PM, Greg Ungerer <[email protected]> wrote:

>
> And the pressure will still be on in _real_ products to reduce
> the RAM footprint as much as possible. There are exceptions but
> generally less is cheaper. Simple economics really.

Well, sure - but the industry as a whole seems to have gone the other
way - do more with more at the similar or lower price points!
By that definition of less is better we should try and make the kernel
memory pageable (or has someone already done that?) - Windows does it,
by default ;)

Parag

2008-08-27 02:37:18

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 9:49 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Tue, 26 Aug 2008, Parag Warudkar wrote:
>>
>> And although you said in your later reply that Linux x86 with 4K
>> stacks should be more than usable - my experiences running a untainted
>> desktop/file server with 4K stack have been always disastrous XFS or
>> not. It _might_ work for some well defined workloads but you would
>> not want to risk 4K stacks otherwise.
>
> Umm. How long?
>

IIRC the last I tried 4K stacks with x86 was on 2.6.21 - Fedora 7
kernel, around June 07 time frame.
The oops included a ugly and long call trace that I still remember.

> And a lot of the cases of us
> having structures on the stack is actually not worth it, and tends to be
> about being lazy rather than anything else.

What about deep call chains? The problem with the uptake of 4K stacks
seems to be that is not reliably provable that it will work under all
circumstances.

Parag

2008-08-27 02:53:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Tue, 26 Aug 2008, Parag Warudkar wrote:
>
> What about deep call chains? The problem with the uptake of 4K stacks
> seems to be that is not reliably provable that it will work under all
> circumstances.

Umm. Neither is 8k stacks. Nobody "proved" anything.

But yes, some subsystems have insanely deep call chains. And yes, things
like the VFS recursion (for symlinks) makes that deeper yet for
filesystems, although only on the lookup path. And that is exactly the
kind of thing that can exacerbate the problem of the compiler artificially
making for a bigger stack footprint of a function (*).

For things like the VFS layer, right now we allow a nesting level of 8, I
think. If I remember correctly, it was 5 historically. Part of raising
that depth, though, was that we actually moved the recursive part into
fs/namei.c, and the nesting stack-depth was something pretty damn small
when the filesystem used "follow_link" properly and let the VFS do it for
it (ie the callchain to actually look up the link could be deep, but it
would not recurse back, and instead just return a pointer, so that the
actual _recursive_ part was just __do_follow_link() and is just a few
words on the stack).

So yes, we do have some deep callchains, but they tend to be pretty well
managed for _good_ code. The problems tend to be the areas with lots of
indirection layers, and yeah, XFS, MD and ACPI all have those kinds of
things.

In an embdedded world, many of those should be a non-issue, though.

Linus

(*) ie the function that _is_ on the deep chain doesn't actually need much
of a stack footprint at all itself, but it may call a helper function that
is _not_ in the deep chain, and if it gets inlined it may give its
excessive stack footprint to the deep chain - and this is _exactly_ the
problem that happened with inlining "load_module()".

2008-08-27 06:02:35

by Paul Mackerras

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds writes:

> 4kB used to be the _only_ choice. And no, there weren't even irq stacks.
> So that 4kB was not just the whole kernel call-chain, it was also all the
> irq nesting above it.

I think your memory is failing you. In 2.4 and earlier, the kernel
stack was 8kB minus the size of the task_struct, which sat at the
start of the 8kB. For instance, from include/asm-i386/processor.h for
2.4.29:

#define THREAD_SIZE (2*PAGE_SIZE)
#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1))
#define free_task_struct(p) free_pages((unsigned long) (p), 1)

Paul.

2008-08-27 06:54:49

by Nick Piggin

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wednesday 27 August 2008 06:01, Mike Travis wrote:
> Dave Jones wrote:
> ...
>
> > But yes, for this to be even remotely feasible, there has to be a
> > negligable performance cost associated with it, which right now, we
> > clearly don't have. Given that the number of people running 4096 CPU
> > boxes even in a few years time will still be tiny, punishing the common
> > case is obviously absurd.
> >
> > Dave
>
> I did do some fairly extensive benchmarking between configs of NR_CPUS =
> 128 and 4096 and most performance hits were in the neighborhood of < 5% on
> systems with 8 cpus and 4GB of memory (our most common test system).

5% is a pretty nasty performance hit... what sort of benchmarks are we
talking about here?

I just made some pretty crazy changes to the VM to get "only" around 5
or so % performance improvement in some workloads.

What places are making heavy use of cpumasks that causes such a slowdown?
Hopefully callers can mostly be improved so they don't need to use cpumasks
for common cases.

Until then, it would be kind of sad for a distro to ship a generic x86
kernel and lose 5% performance because it is set to 4096 CPUs...

But if I misunderstand and you're talking about specific microbenchmarks to
find the worst case for huge cpumasks, then I take that back.


> [But
> changing cpumask_t's to be pointers instead of values will likely increase
> this.] I've tried to be very sensitive to this issue with all my previous
> changes, so convincing the distros to set NR_CPUS=4096 would be as painless
> for them as possible. ;-)
>
> Btw, huge count cpu systems I don't think are that far away. I believe the
> nextgen Larabbee chips will be geared towards HPC applications [instead of
> just GFX apps], and putting 4 of these chips on a motherboard would add up
> to 512 cpu threads (1024 if they support hyperthreading.)

It would be quite interesting if they make them cache coherent / MP capable.
Will they be?

2008-08-27 07:05:29

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Nick Piggin <[email protected]>
Date: Wed, 27 Aug 2008 16:54:32 +1000

> 5% is a pretty nasty performance hit... what sort of benchmarks are we
> talking about here?
>
> I just made some pretty crazy changes to the VM to get "only" around 5
> or so % performance improvement in some workloads.
>
> What places are making heavy use of cpumasks that causes such a slowdown?
> Hopefully callers can mostly be improved so they don't need to use cpumasks
> for common cases.

It's almost certainly from the cross-call dispatch call chain.

As just one example, just to do a TLB flush mm->cpu_vm_mask probably
gets passed around as an aggregate two or three times on the way down
to the APIC programming code on x86. That's two or three 512 byte
copies on the stack :)

Look at the sparc64 SMP code for how I solved the problem there.

2008-08-27 07:47:42

by Nick Piggin

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wednesday 27 August 2008 17:05, David Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Wed, 27 Aug 2008 16:54:32 +1000
>
> > 5% is a pretty nasty performance hit... what sort of benchmarks are we
> > talking about here?
> >
> > I just made some pretty crazy changes to the VM to get "only" around 5
> > or so % performance improvement in some workloads.
> >
> > What places are making heavy use of cpumasks that causes such a slowdown?
> > Hopefully callers can mostly be improved so they don't need to use
> > cpumasks for common cases.
>
> It's almost certainly from the cross-call dispatch call chain.
>
> As just one example, just to do a TLB flush mm->cpu_vm_mask probably
> gets passed around as an aggregate two or three times on the way down
> to the APIC programming code on x86. That's two or three 512 byte
> copies on the stack :)

Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
sane when cpumasks were word sized ;))

Hopefully that accounts for a significant chunk...

2008-08-27 08:35:08

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, 2008-08-26 at 18:54 -0400, Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
> <[email protected]> wrote:
>
> > And embedded people (the ones that might care about 1% code size) are the
> > ones that would also want smaller stacks even more!
>
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
> IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Falling prices are no reason to increase the amount of available RAM (or
other hardware).
Especially if you (intend to) build >1E5 devices - where every Euro
counts.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 08:44:58

by Alan

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

> You have a good point that aiming at 4kB makes 8kB a very safe choice.

Not really no - we use separate IRQ stacks in 4K but not 8K mode on
x86-32. That means you've actually got no more space if you are unlucky
with the timing of events. The 8K mode is merely harder to debug.

If 4K stacks really are not safe then x86-32 really really needs to
switch to using IRQ stacks in 8K stack mode as well.

Alan

2008-08-27 08:45:23

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, 2008-08-26 at 22:16 -0400, Parag Warudkar wrote:
[...]
> Well, sure - but the industry as a whole seems to have gone the other

"The industry as a whole" doesn't exist on that low level. You can't
compare the laptop and/or desktop computer market (where one may buy
today hardware that runs in 3 years with the next generation/release of
the OS and applications) with the e.g. "WLAN router" market where - from
the commercial point of view - every Euro counts (and where the
requirements for the lifetime of the device are long frozen before the
thing gets in a shop).

> way - do more with more at the similar or lower price points!
> By that definition of less is better we should try and make the kernel
> memory pageable (or has someone already done that?) - Windows does it,

That doesn't help as in really small devices (like WLAN routers, cable
modems, etc.) you run without any means of paging/swapping. And even
binaries/read-only files are not necessarily executable in place (but
must be loaded into RAM). So you can't flush these pages.

And pageable kernel memory doesn't come for free - even if one only
counts the increased code and it's complexity.

> by default ;)

Which is more a sign that it is probably a very bad idea.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 08:45:43

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Nick Piggin <[email protected]>
Date: Wed, 27 Aug 2008 17:47:14 +1000

> Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
> sane when cpumasks were word sized ;))
>
> Hopefully that accounts for a significant chunk...

There is a lot of indirect costs that are hard to see as well.

Two things a lot of these cross-call dispatch paths do is:

1) Clear self-cpu

2) AND with cpus_online

#1 can normally be a simple bit clear, but some places can also
implement this with something like "cpus_andn(X, cpumask_of_cpu(cpu))"

It's simply easier to move those two things down to the bottom of
the APIC programming code, they just loop over the cpumask doing
an expensive APIC I/O operation anyways, might as well overlap it
with these "skip self-cpu" and "skip not-online cpus" checks.

And oh yeah we get the stack wastage fixed too, isn't what what we
were talking about? :-)

2008-08-27 08:51:40

by Alan

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

> What about deep call chains? The problem with the uptake of 4K stacks
> seems to be that is not reliably provable that it will work under all
> circumstances.

On x86-32 with 8K stacks your IRQ paths share them so that is even harder
to prove (not that you can prove any of them) and the bugs are more
obscure and random.

2008-08-27 09:00:31

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, 2008-08-26 at 20:58 -0400, Parag Warudkar wrote:
[...]
> The savings part -financial ones- are not always realizable with the
> way memory is priced/sized/fitted.
> Savings in few Mb of Kernel stack are not necessarily going to allow
> getting rid of a single memory chip of 64M or so.

No, but you can put an additional service(s) on it and sales people have
one (or two or ....) line more for their sales brochures.

> Either that or embedded manufacturing/configurations are different
> than the desktop world.

They are different. Think of running the complete system acting as a
bridge, router and/or firewall (Kernel early 2.4 though) from 4MB flash
in 32MB RAM and - listing the outside visible services - having a
command-line interface, web-GUI (implying a http server) and and a
(net-)SNMP agent on it.
Running a glibc without thread support is win there (implying that there
is no thread support available on that device).

> (If my device has 2 memory slots and my user space requires 100Mb
> including kernel memory - I anyways have to put in 64Mx2 there to take
> advantage of mass manufactured, general purpose memory - so no big
> deal if I saved 1.2Mb in Kernel stack or not. And savings of 64Mb
> Kernel memory are not feasible anyways to allow user space to work
> with 64Mb.)

As soon as product management realizes that there is space left on the
device, they get new ideas and/or customer requirements to run more
services on that device.

> On the other hand reducing user space memory usage on those devices
> (not counting savings from kernel stack size) is a way more attractive
> option.

There is no question if save space here or there. You save it - sooner
or later - on all fronts. Period.

> And although you said in your later reply that Linux x86 with 4K
> stacks should be more than usable - my experiences running a untainted
> desktop/file server with 4K stack have been always disastrous XFS or
> not. It _might_ work for some well defined workloads but you would
> not want to risk 4K stacks otherwise.

The embedded world of really small devices usually doesn't run XFS (or
ext? or reiser* of jfs or NFS or ...) or stacks block devices on files
or .....

> I understand the having 4K stack option as a non-default for very
> specific workloads is a good idea but apart from that I think no one
> else seems to bother with reducing stack sizes (by no one I mean other
> OSes.)

They probably gave the idea pretty soon because you need to
rework/improve large parts of the kernel + drivers (and that has two
major problems - it consumes a lot of man power for "no new features and
everything must be completely tested again"[0] and it adds new risks).
And that is practically impossible if one sells "stable driver APIs" for
3rd party (commercial) drivers because these must be changed too.

Bernd

[0]: Let alone if you (or your customers) need certificates from some
governmental agencys.
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 10:59:37

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Paul Mackerras wrote:
> Linus Torvalds writes:
>
>> 4kB used to be the _only_ choice. And no, there weren't even irq stacks.
>> So that 4kB was not just the whole kernel call-chain, it was also all the
>> irq nesting above it.
>
> I think your memory is failing you. In 2.4 and earlier, the kernel
> stack was 8kB minus the size of the task_struct, which sat at the
> start of the 8kB. For instance, from include/asm-i386/processor.h for
> 2.4.29:

but was shared with interrupts; so out of the 6Kb left, you had still really only 4Kb for user context stack

2008-08-27 11:59:32

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 27 Aug 2008, Adrian Bunk wrote:
> >
> > When did we get callpaths like like nfs+xfs+md+scsi reliably
> > working with 4kB stacks on x86-32?
>
> XFS may never have been usable, but the rest, sure.
>
> And you seem to be making this whole argument an excuse to SUCK, adn an
> excuse to let gcc crap even more on our stack space.
>
> Why?
>
> Why aren't you saying that we should be able to do better? Instead, you
> seem to asking us to do even worse than we do now?

My main point is:
- getting 4kB stacks working reliably is a hard task
- having an eye on gcc increasing the stack usage, and fixing it if
required, is relatively easy

If we should be able to do better at getting (and keeping) 4kB stacks
working, then coping with possible inlining problems caused by gcc
should not be a big problem for us.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-27 11:59:51

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Tue, Aug 26, 2008 at 06:49:19PM -0700, Linus Torvalds wrote:
>...
> But part of it is definitely gcc. Some versions of gcc used to be
> absolutely _horrid_ when it came to stack usage, especially with some
> flags, and especially with the crazy inlining that module-at-a-time
> caused.
>...

That was gcc 3.4.

And due to that we disable unit-at-a-time for gcc 3.4 on 32bit x86.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-27 12:52:22

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 4:25 AM, Alan Cox <[email protected]> wrote:
>> You have a good point that aiming at 4kB makes 8kB a very safe choice.
>
> Not really no - we use separate IRQ stacks in 4K but not 8K mode on
> x86-32. That means you've actually got no more space if you are unlucky
> with the timing of events. The 8K mode is merely harder to debug.
>

By your logic though, XFS on x86 should work fine with 4K stacks -
many will attest that it does not and blows up due to stack issues.

I have first hand experiences of things blowing up with deep call
chains when using 4K stacks where 8K worked just fine on same
workload.

So there is definitely some other problem with 4K stacks.

Thanks
Parag

2008-08-27 12:56:31

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch <[email protected]> wrote:

>
> They probably gave the idea pretty soon because you need to
> rework/improve large parts of the kernel + drivers (and that has two
> major problems - it consumes a lot of man power for "no new features and
> everything must be completely tested again"[0] and it adds new risks).
> And that is practically impossible if one sells "stable driver APIs" for
> 3rd party (commercial) drivers because these must be changed too.
>

But not many embedded Linux arches support 4K stacks like Adrian
pointed out earlier.
So the same (lot of man power requirement) would apply to Linux.

Sure it will be good - but how reasonable it is to attempt it and how
reliably it will work under all conceived loads - those are the
questions.

Thanks

Parag

2008-08-27 13:17:53

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


On Wed, 2008-08-27 at 08:56 -0400, Parag Warudkar wrote:
> On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch <[email protected]> wrote:
> > They probably gave the idea pretty soon because you need to
> > rework/improve large parts of the kernel + drivers (and that has two
> > major problems - it consumes a lot of man power for "no new features and
> > everything must be completely tested again"[0] and it adds new risks).
> > And that is practically impossible if one sells "stable driver APIs" for
> > 3rd party (commercial) drivers because these must be changed too.
>
> But not many embedded Linux arches support 4K stacks like Adrian

What is an "embedded Linux arch"?
Personally I encountered i386, ARM, MIPS and PPC in the embedded world.

> pointed out earlier.
> So the same (lot of man power requirement) would apply to Linux.

Of course. Look at the amount of work done by lots of people in that
area (including stack frame size reductions) and on-going discussions.

> Sure it will be good - but how reasonable it is to attempt it and how
> reliably it will work under all conceived loads - those are the
> questions.

If you "develop" an embedded system (which is partly system integration
of existing apps) to be installed in the field, you don't have that many
conceivable work loads compared to a desktop/server system. And you have
a fixed list of drivers and applications.
A usual approach is to run stress tests on several (or all)
subsystems/services/... in parallel and if the device survives it
functioning correctly, it is at least good enough.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 13:40:47

by Alan

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

> By your logic though, XFS on x86 should work fine with 4K stacks -
> many will attest that it does not and blows up due to stack issues.
>
> I have first hand experiences of things blowing up with deep call
> chains when using 4K stacks where 8K worked just fine on same
> workload.
>
> So there is definitely some other problem with 4K stacks.

Nothing of the sort. If it blows up with a 4K stack it will almost
certainly blow up with an 8K stack *eventually* - when a heavy stack usage
coincides with a heavy stack using IRQ handler.

You won't catch it in simple testing, you won't catch it in trivial
simulation and it'll be incredibly hard to reproduce. Not the kind of bug
you want in a production system really. IRQ stacks make things much more
predictable.

Alan

2008-08-27 14:35:39

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Nick Piggin wrote:
> On Wednesday 27 August 2008 06:01, Mike Travis wrote:
>> Dave Jones wrote:
>> ...
>>
>>> But yes, for this to be even remotely feasible, there has to be a
>>> negligable performance cost associated with it, which right now, we
>>> clearly don't have. Given that the number of people running 4096 CPU
>>> boxes even in a few years time will still be tiny, punishing the common
>>> case is obviously absurd.
>>>
>>> Dave
>> I did do some fairly extensive benchmarking between configs of NR_CPUS =
>> 128 and 4096 and most performance hits were in the neighborhood of < 5% on
>> systems with 8 cpus and 4GB of memory (our most common test system).
>
> 5% is a pretty nasty performance hit... what sort of benchmarks are we
> talking about here?

It's been a while now, I should go back and check my notes. Many of the
BM's did not have any changes. I believe the ones that were right on the
edge of paging were affected by the fact that less memory was available.
>
> I just made some pretty crazy changes to the VM to get "only" around 5
> or so % performance improvement in some workloads.
>
> What places are making heavy use of cpumasks that causes such a slowdown?
> Hopefully callers can mostly be improved so they don't need to use cpumasks
> for common cases.

That's another study I did, and it seemed that maybe 95% of the functions
would not be affected by passing pointers to cpumasks instead of the cpumasks
themselves, because the data was processed by a cpu_xxx function that
uses a pointer. Most commonly was to create a temp cpumask, using
cpus_and(temp_mask, callers_mask, cpu_online_map); The speedup to use nr_cpu_ids
instead of NR_CPUS in the traversal functions helped quite a bit. Using this
same method in the cpus_xxx functions would further speed up things. (As
well as only allocating the cpumask sized by nr_cpu_ids instead of NR_CPUS
as the current cpumask_t definition specifies.)

>
> Until then, it would be kind of sad for a distro to ship a generic x86
> kernel and lose 5% performance because it is set to 4096 CPUs...
>
> But if I misunderstand and you're talking about specific microbenchmarks to
> find the worst case for huge cpumasks, then I take that back.

Yes, I was (at the time) trying to determine how many of the cpumask functions
were actually in play by user tasks, so I was zeroing in on those (cpusets,
rescheds, etc.)

>
>
>> [But
>> changing cpumask_t's to be pointers instead of values will likely increase
>> this.] I've tried to be very sensitive to this issue with all my previous
>> changes, so convincing the distros to set NR_CPUS=4096 would be as painless
>> for them as possible. ;-)
>>
>> Btw, huge count cpu systems I don't think are that far away. I believe the
>> nextgen Larabbee chips will be geared towards HPC applications [instead of
>> just GFX apps], and putting 4 of these chips on a motherboard would add up
>> to 512 cpu threads (1024 if they support hyperthreading.)
>
> It would be quite interesting if they make them cache coherent / MP capable.
> Will they be?

There's not been a lot of info available yet, but I think the 128 cores will
share at least an L2 cache + memory controller. How the APIC's interact is
also another big question. And most likely some standard system controller
CPU will be needed, but that could be a tiny VIA processor... ;-)

Thanks,
Mike

2008-08-27 14:36:43

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

David Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Wed, 27 Aug 2008 16:54:32 +1000
>
>> 5% is a pretty nasty performance hit... what sort of benchmarks are we
>> talking about here?
>>
>> I just made some pretty crazy changes to the VM to get "only" around 5
>> or so % performance improvement in some workloads.
>>
>> What places are making heavy use of cpumasks that causes such a slowdown?
>> Hopefully callers can mostly be improved so they don't need to use cpumasks
>> for common cases.
>
> It's almost certainly from the cross-call dispatch call chain.
>
> As just one example, just to do a TLB flush mm->cpu_vm_mask probably
> gets passed around as an aggregate two or three times on the way down
> to the APIC programming code on x86. That's two or three 512 byte
> copies on the stack :)
>
> Look at the sparc64 SMP code for how I solved the problem there.

I will, thanks!

Mike

2008-08-27 14:49:15

by Mike Travis

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

David Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Wed, 27 Aug 2008 17:47:14 +1000
>
>> Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
>> sane when cpumasks were word sized ;))
>>
>> Hopefully that accounts for a significant chunk...
>
> There is a lot of indirect costs that are hard to see as well.
>
> Two things a lot of these cross-call dispatch paths do is:
>
> 1) Clear self-cpu
>
> 2) AND with cpus_online
>
> #1 can normally be a simple bit clear, but some places can also
> implement this with something like "cpus_andn(X, cpumask_of_cpu(cpu))"
>
> It's simply easier to move those two things down to the bottom of
> the APIC programming code, they just loop over the cpumask doing
> an expensive APIC I/O operation anyways, might as well overlap it
> with these "skip self-cpu" and "skip not-online cpus" checks.
>
> And oh yeah we get the stack wastage fixed too, isn't what what we
> were talking about? :-)

Yes, the most time consuming part was determining whether a kmalloc
could safely be used in the context of the function, and what to
do about the out-of-memory problem. Pushing that down to something
like: for_each_cpu_thats_online(cpu, *maskptr) would remove the need for
many of the temp masks. A simple if (cpu != me) would take care of
excluding self. It might have better interaction with cpu hotplug
as well, since the online map would be checked just before the call
to that cpu is made.

Thanks,
Mike

2008-08-27 15:20:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Wed, 27 Aug 2008, Paul Mackerras wrote:
>
> I think your memory is failing you. In 2.4 and earlier, the kernel
> stack was 8kB minus the size of the task_struct, which sat at the
> start of the 8kB.

Yup, you're right.

Linus

2008-08-27 15:49:36

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Bernd Petrovitsch wrote:
> If you "develop" an embedded system (which is partly system integration
> of existing apps) to be installed in the field, you don't have that many
> conceivable work loads compared to a desktop/server system. And you have
> a fixed list of drivers and applications.

Hah! Not in my line of embedded device.

32MB no-MMU ARM boards which people run new things and attach new
devices to rather often - without making new hardware. Volume's too
low per individual application to get new hardware designed and made.

I'm seriously thinking of forwarding porting the 4 year old firmware
from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
Backporting is tedious, so's feeling wretchedly far from the mainline
world.

> A usual approach is to run stress tests on several (or all)
> subsystems/services/... in parallel and if the device survives it
> functioning correctly, it is at least good enough.

Per application.

Some little devices run hundreds of different applications and
customers expect to customise, script themselves, and attach different
devices (over USB). The next customer in the chain expects the bits
you supplied to work in a variety of unexpected situations, even when
you advise that it probably won't do that.

Much like desktop/server Linux, but on a small device where silly
little things like 'create a process' are a stress for the dear little
thing.

(My biggest lesson: insist on an MMU next time!)

-- Jamie

2008-08-27 16:01:45

by Paul Mundt

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
> > On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > >
> > > When did we get callpaths like like nfs+xfs+md+scsi reliably
> > > working with 4kB stacks on x86-32?
> >
> > XFS may never have been usable, but the rest, sure.
> >
> > And you seem to be making this whole argument an excuse to SUCK, adn an
> > excuse to let gcc crap even more on our stack space.
> >
> > Why?
> >
> > Why aren't you saying that we should be able to do better? Instead, you
> > seem to asking us to do even worse than we do now?
>
> My main point is:
> - getting 4kB stacks working reliably is a hard task
> - having an eye on gcc increasing the stack usage, and fixing it if
> required, is relatively easy
>
> If we should be able to do better at getting (and keeping) 4kB stacks
> working, then coping with possible inlining problems caused by gcc
> should not be a big problem for us.
>
Out of the architectures you've mentioned for 4k stacks, they also tend
to do IRQ stacks, which is something you seem to have overlooked.

In addition to that, debugging the runaway stack users on 4k tends to be
easier anyways since you end up blowing the stack a lot sooner. On sh
we've had pretty good luck with it, though most of our users are using
fairly deterministic workloads and continually profiling the footprint.
Anything that runs away or uses an insane amount of stack space needs to
be fixed well before that anyways, so catching it sooner is always
preferable. I imagine the same case is true for m68knommu (even sans IRQ
stacks).

Things might be more sensitive on x86, but it's certainly not something
that's a huge problem for the various embedded platforms to wire up,
whether they want to go the IRQ stack route or not.

In any event, lack of support for something on embedded architectures in
the kernel is more often due to apathy/utter indifference on the part of
the architecture maintainer rather than being indicative of any intrinsic
difficulty in supporting the thing in question. Most new "features" on the
lesser maintained architectures tend to end up there either out of peer
pressure or copying-and-pasting accidents rather than any sort of design.
;-)

2008-08-27 16:22:23

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Linus Torvalds wrote:
> > Most LOCs of the kernel are not written by people like you or Al Viro or
> > David Miller, and the average kernel developer is unlikely to do it as
> > good as gcc.
>
> Sure. But we do have tools. We do have checkstack.pl, it's just that it
> hasn't been an issue in a long time, so I suspect many people didn't even
> _realize_ we have it, and I certainly can attest to the fact that even
> people who remember it - like me - don't actually tend to run it all that
> often.

Sounds like what's really desired here isn't more worry and
unpredictability, but for GCC+Binutils to gain the ability to
calculate the stack depth over all callchains (doesn't have to be
exact, just an upper bound; annotate recursions) in a way that's good
enough to do on every compile, complain if a depth is exceeded
statically (or it can't be proven), and to gain the
architecture-independent option "optimise to reduce stack usage".

> > BTW:
> > I just ran checkstack on a (roughly) allyesconfig kernel, and we have a
> > new driver that allocates "unsigned char recvbuf[1500];" on the stack...
>
> Yeah, it's _way_ too easy to do bad things.

In my userspace code, I have macros tmp_alloc and tmp_free. They must
be matched in the same function:

unsigned char * recvbuf = tmp_alloc(1500);
....
tmp_free(recvbuf);

When stack is plentiful, it maps to alloca() which is roughly
equivalent to using a stack variable.

When stack is constrained (as it is on my little devices), that maps
to xmalloc/free. The kernel equivalent would be kmalloc GFP_ATOMIC
(perhaps).

With different macros to mine, it may be possible to map small
fixed-size requests exactly onto local variables, and large ones to
kmalloc(). A stab at it (not tested):

#define LOCAL_ALLOC_THRESHOLD 128

#define LOCAL_ALLOC(type, ptr) \
__typeof__(type) __attribute__((__unused__)) ptr##_local_struct; \
__typeof__(type) * ptr = \
((__builtin_constant_p(sizeof(type)) \
&& sizeof(type) <= LOCAL_ALLOC_THRESHOLD) \
? &ptr##_local_struct : kmalloc(sizeof(type), GFP_ATOMIC))

#define LOCAL_FREE(ptr) \
((__builtin_constant_p(sizeof (*(ptr))) \
&& sizeof(*(ptr)) <= LOCAL_ALLOC_THRESHOLD) \
? (void) 0 : kfree(ptr))

Would that be useful in the kernel?

I'm thinking if it were a commonly used pattern for temporary buffers,
unknown structures and arrays of macro-determined size, the "new
driver" author would be less likely to accidentally drop a big object
on the stack.

Obviously it would be nicer for GCC to code such a thing
automatically, but that really is wishful thinking.

-- Jamie

2008-08-27 16:25:49

by Parag Warudkar

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 9:21 AM, Alan Cox <[email protected]> wrote:
>> By your logic though, XFS on x86 should work fine with 4K stacks -
>> many will attest that it does not and blows up due to stack issues.
>>
>> I have first hand experiences of things blowing up with deep call
>> chains when using 4K stacks where 8K worked just fine on same
>> workload.
>>
>> So there is definitely some other problem with 4K stacks.
>
> Nothing of the sort. If it blows up with a 4K stack it will almost
> certainly blow up with an 8K stack *eventually* - when a heavy stack usage
> coincides with a heavy stack using IRQ handler.
>
> You won't catch it in simple testing, you won't catch it in trivial
> simulation and it'll be incredibly hard to reproduce. Not the kind of bug
> you want in a production system really. IRQ stacks make things much more
> predictable.


I see - so if I end up having a workload on 8k where heavy stack using
IRQs and deep kernel call chains come at the same time - even 8K will
blow up.
So 4K will blow too except that it doesn't require IRQs also to use
heavy stack, just XFS is good enough :)

It then seems like the IRQs using lot of stack is not so much of a
problem in the current kernel as much as deeper call chains and stack
usage of normal non-irq path code is.
So 8k makes it possible for the deeper call chains of non-irq path to
survive since they get better part of the 8K to themselves and IRQs
can do with less almost always.

At least that's what I can derive from the fact that we do not have
lots of reports of 8K stack blowing up.

Thanks

Parag

2008-08-27 16:39:48

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, 2008-08-27 at 16:48 +0100, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > If you "develop" an embedded system (which is partly system integration
> > of existing apps) to be installed in the field, you don't have that many
> > conceivable work loads compared to a desktop/server system. And you have
> > a fixed list of drivers and applications.
>
> Hah! Not in my line of embedded device.
>
> 32MB no-MMU ARM boards which people run new things and attach new
> devices to rather often - without making new hardware. Volume's too
> low per individual application to get new hardware designed and made.

Yes, you may have several products on the same hardware with somewhat
differing requirements (or not). But that is much less than a general
purpose system IMHO.

> I'm seriously thinking of forwarding porting the 4 year old firmware
> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.

That sounds reasonable (and I never meant maintaining the old system
infinitely. Actually once the thing is shipped it usually enters deep
maintenance mode and the next is more a fork from the old).

> Backporting is tedious, so's feeling wretchedly far from the mainline
> world.

ACK. But that also depends on amount local changes (and sorry, but not
all locally necessary patches would be accepted in mainline in any way).

> > A usual approach is to run stress tests on several (or all)
> > subsystems/services/... in parallel and if the device survives it
> > functioning correctly, it is at least good enough.
>
> Per application.
>
> Some little devices run hundreds of different applications and
> customers expect to customise, script themselves, and attach different
> devices (over USB). The next customer in the chain expects the bits
> you supplied to work in a variety of unexpected situations, even when
> you advise that it probably won't do that.

Basically their problem. Yes, "they" actually think they get a Linux
system where they can do everything and it simply works.

Oh, that's obviously not a usual "WLAN-router style" of product (where
you are not expected to actually login on a console or per ssh).

> Much like desktop/server Linux, but on a small device where silly
> little things like 'create a process' are a stress for the dear little
> thing.
>
> (My biggest lesson: insist on an MMU next time!)

ACK. We avoid MMU-less hardware too - especially since there is enough
hardware with a MMU around.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 17:36:47

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
> > > On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > > >
> > > > When did we get callpaths like like nfs+xfs+md+scsi reliably
> > > > working with 4kB stacks on x86-32?
> > >
> > > XFS may never have been usable, but the rest, sure.
> > >
> > > And you seem to be making this whole argument an excuse to SUCK, adn an
> > > excuse to let gcc crap even more on our stack space.
> > >
> > > Why?
> > >
> > > Why aren't you saying that we should be able to do better? Instead, you
> > > seem to asking us to do even worse than we do now?
> >
> > My main point is:
> > - getting 4kB stacks working reliably is a hard task
> > - having an eye on gcc increasing the stack usage, and fixing it if
> > required, is relatively easy
> >
> > If we should be able to do better at getting (and keeping) 4kB stacks
> > working, then coping with possible inlining problems caused by gcc
> > should not be a big problem for us.
> >
> Out of the architectures you've mentioned for 4k stacks, they also tend
> to do IRQ stacks, which is something you seem to have overlooked.

No, I am aware of that, and on i386 IRQ stacks are only used with
4kB stacks.

On i386 it is effectively a step from 6kB to 4kB.

> In addition to that, debugging the runaway stack users on 4k tends to be
> easier anyways since you end up blowing the stack a lot sooner. On sh
> we've had pretty good luck with it, though most of our users are using
> fairly deterministic workloads and continually profiling the footprint.
> Anything that runs away or uses an insane amount of stack space needs to
> be fixed well before that anyways, so catching it sooner is always
> preferable. I imagine the same case is true for m68knommu (even sans IRQ
> stacks).

CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
wanted with an arbitrary limit.

> Things might be more sensitive on x86, but it's certainly not something
> that's a huge problem for the various embedded platforms to wire up,
> whether they want to go the IRQ stack route or not.

How many platforms use 4kB stacks on sh?

Only 1 out of 34 defconfigs uses it.

Are there any numbers for real life usage.

> In any event, lack of support for something on embedded architectures in
> the kernel is more often due to apathy/utter indifference on the part of
> the architecture maintainer rather than being indicative of any intrinsic
> difficulty in supporting the thing in question. Most new "features" on the
> lesser maintained architectures tend to end up there either out of peer
> pressure or copying-and-pasting accidents rather than any sort of design.
> ;-)

arm or powerpc aren't exactly lesser maintained architectures.

4kB has shown to be a hard to achieve limit. After more than 4 years in
mainline being available on i386 there are still cases where 4kB are not
enough.

IMHO there seems to currently be a mismatch between it's maintainance
cost and the actual number of users. That's in my opinion the main
problem with it, no matter in which direction it gets resolved.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-27 17:53:00

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Bernd Petrovitsch wrote:
> > 32MB no-MMU ARM boards which people run new things and attach new
> > devices to rather often - without making new hardware. Volume's too
> > low per individual application to get new hardware designed and made.
>
> Yes, you may have several products on the same hardware with somewhat
> differing requirements (or not). But that is much less than a general
> purpose system IMHO.

It is, but the idea that small embedded systems go through a 'all
components are known, drivers are known, test and if it passes it's
shippable' does not always apply.

> > I'm seriously thinking of forwarding porting the 4 year old firmware
> > from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
>
> That sounds reasonable (and I never meant maintaining the old system
> infinitely.

Sounds reasonable, but it's vetoed for anticipated time and cost,
compared with backporting on demand. Fair enough, since 2.6.current
doesn't support ARM no-MMU last I heard ('soon'?).

On the other hand, the 2.6 anti-fragmentation patches, including
latest SLUB stuff, ironically meant to help big machines, sound really
appealing for my current problem and totally unrealistic to
backport...

> ACK. We avoid MMU-less hardware too - especially since there is enough
> hardware with a MMU around.

I can't emphasise enough how much difference MMU makes to Linux userspace.

It's practically: MMU = standard Linux (with less RAM), have everything.
No-MMU = lots of familiar 'Linux' things not available or break.

-- Jamie

2008-08-27 19:31:38

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Mit, 2008-08-27 at 18:51 +0100, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
[...]
> It is, but the idea that small embedded systems go through a 'all
> components are known, drivers are known, test and if it passes it's
> shippable' does not always apply.

Not always but often enough. And yes, there is ARM-based embedded
hardware with 1GB Flash-RAM and 128MB RAM.

> > > I'm seriously thinking of forwarding porting the 4 year old firmware
> > > from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
> >
> > That sounds reasonable (and I never meant maintaining the old system
> > infinitely.
>
> Sounds reasonable, but it's vetoed for anticipated time and cost,

That is to be expected;-)

[....]
> > ACK. We avoid MMU-less hardware too - especially since there is enough
> > hardware with a MMU around.
>
> I can't emphasise enough how much difference MMU makes to Linux userspace.
>
> It's practically: MMU = standard Linux (with less RAM), have everything.
> No-MMU = lots of familiar 'Linux' things not available or break.

ACK. And tell that a customer that everything is more effort and more
risk and not just "simply cross-compile it as it runs on my desktop
too".

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-08-27 20:41:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Peter Osterlund wrote:
>
> Why not just revert the offending change and try again during the next
> merge window, assuming someone has figured out an acceptable way to
> handle this mess by then?

Well,, for 2.6.27 that's what we'll have to do. But there's actually a
real problem here - the unlocked ioctl's (which we _should_ prefer) have a
strictly weaker and worse interface. I also wonder if any other
block_ioctl users were converted..

Anyway, I'll take your email as an ack for the revert.

Linus

2008-08-27 20:47:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Linus Torvalds wrote:
>
> I also wonder if any other block_ioctl users were converted..

Well, doing

git log -p v2.6.26.. -Sunlocked_ioctl

and looking for blkdev_ioctl, that does seem to be the only one. So
hopefully no other case like this is lurking, although it is possible that
non-block areas have similar issues.

Linus

2008-08-27 21:28:56

by Peter Osterlund

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

Linus Torvalds <[email protected]> writes:

> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11401
>> Subject : pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
>> Submitter : Laurent Riffard <[email protected]>
>> Date : 2008-08-22 08:16 (2 days old)
>
> This one looks irritating.
>
> It's bisected to 5b6155ee70e9c4d2ad7e6f514c8eee06e2711c3a ("pktcdvd: push
> BKL down into driver"), but the problem goes deeper than that.
...
> Grr.
...
> Double grr.
...
> Hmm?
>
> We need to fix this.

Why not just revert the offending change and try again during the next
merge window, assuming someone has figured out an acceptable way to
handle this mess by then?

--
Peter Osterlund - [email protected]
http://web.telia.com/~u89404340

2008-08-27 22:27:23

by Alan

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

> >
> > We need to fix this.
>
> Why not just revert the offending change and try again during the next
> merge window, assuming someone has figured out an acceptable way to
> handle this mess by then?

Easier just to fix it. Its a case of building everything until it
compiles with the prototype change. Almost all stuff will just take the
argument initially and not use it.

Anyone else plan to do it or shall I hit all the x86 cases and post a
patch ?

2008-08-27 22:40:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Alan Cox wrote:
>
> Easier just to fix it. Its a case of building everything until it
> compiles with the prototype change. Almost all stuff will just take the
> argument initially and not use it.
>
> Anyone else plan to do it or shall I hit all the x86 cases and post a
> patch ?

Well, I alrady reverted it, but if you actually fix unlocked_ioctl() to
have the same calling convention as regular ioctl() then a lot of the
noise from ioctl conversion goes away, and all that remains is literally
just the BKL part.

Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that
too? That's another difference between the "unlocked" and the traditional
version..

As to the "x86 cases", I think you should try to hit them all. Doing a
"git grep unlocked_ioctl" gets 185 entries, and it looks like only
something like 8 of them are non-x86 (3 in the arch/ directory, five in
s390 drivers).

Of course, some of them may be drivers that aren't available on x86 for
other reasons (ie the ARM embedded stuff), but regardless..

Anyway, the pure size of that patch makes me suspect that we might as well
leave it until the next merge window, but if you do it and it's obviously
totally mechanical, I'd be likely to just let it slip in early.

Linus

2008-08-27 22:43:56

by David Miller

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

From: Linus Torvalds <[email protected]>
Date: Wed, 27 Aug 2008 15:38:16 -0700 (PDT)

> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that
> too? That's another difference between the "unlocked" and the traditional
> version..

The return values want to be "long" sign extended all the way back
down to syscall dispatch, I think this is just an effort to add
some consistency here so that the int --> long extension eventually
can be eliminated once unlocked_ioctl is the only case left.

2008-08-27 22:44:37

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Wed, Aug 27, 2008 at 03:38:16PM -0700, Linus Torvalds wrote:
> On Wed, 27 Aug 2008, Alan Cox wrote:
> >
> > Easier just to fix it. Its a case of building everything until it
> > compiles with the prototype change. Almost all stuff will just take the
> > argument initially and not use it.
> >
> > Anyone else plan to do it or shall I hit all the x86 cases and post a
> > patch ?
>
> Well, I alrady reverted it, but if you actually fix unlocked_ioctl() to
> have the same calling convention as regular ioctl() then a lot of the
> noise from ioctl conversion goes away, and all that remains is literally
> just the BKL part.
>
> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that
> too? That's another difference between the "unlocked" and the traditional
> version..
>
> As to the "x86 cases", I think you should try to hit them all. Doing a
> "git grep unlocked_ioctl" gets 185 entries, and it looks like only
> something like 8 of them are non-x86 (3 in the arch/ directory, five in
> s390 drivers).
>
> Of course, some of them may be drivers that aren't available on x86 for
> other reasons (ie the ARM embedded stuff), but regardless..
>
> Anyway, the pure size of that patch makes me suspect that we might as well
> leave it until the next merge window, but if you do it and it's obviously
> totally mechanical, I'd be likely to just let it slip in early.

Anybody doing this, don't forget to actually use "inode" instead of all those
dereferences:

struct inode *inode = filp->f_path.dentry->d_inode;

2008-08-27 22:46:37

by Alan

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that
> too? That's another difference between the "unlocked" and the traditional
> version..

I don't know - a lot of syscall returns got defined as long and I guess
someone thought propogating the right type was a good diea ?
>
> As to the "x86 cases", I think you should try to hit them all. Doing a
> "git grep unlocked_ioctl" gets 185 entries, and it looks like only
> something like 8 of them are non-x86 (3 in the arch/ directory, five in
> s390 drivers).
>
> Of course, some of them may be drivers that aren't available on x86 for
> other reasons (ie the ARM embedded stuff), but regardless..
>
> Anyway, the pure size of that patch makes me suspect that we might as well
> leave it until the next merge window, but if you do it and it's obviously
> totally mechanical, I'd be likely to just let it slip in early.

I'll take a crack at it tomorrow - but if its 185 entries then it
probably wants to go into -next instead.

Alan

2008-08-27 23:01:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Alan Cox wrote:
>
> I'll take a crack at it tomorrow - but if its 185 entries then it
> probably wants to go into -next instead.

Being more careful.. This:

git grep 'unlocked_ioctl.*=' |
sed 's/^.*=[ ]*\([_a-zA-Z0-9]*\).*$/\1/' |
uniq | wc

says that ther are 160 distinct cases. I'm not sure it catches everything
exactly, but it will be reasonably close, at least.

I wonder if I could essentially automate something to do the conversion..

Linus

2008-08-27 23:14:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Linus Torvalds wrote:
>
> I wonder if I could essentially automate something to do the conversion..

Hmm. compat_ioctl() actually has exactly the same issue. Damn.

So you can't just add the new argument, you also have to _pass_ the
argument in the compat_ioctl handlers to the non-compat ones.

Linus

2008-08-28 00:09:26

by Greg Ungerer

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
>>> 32MB no-MMU ARM boards which people run new things and attach new
>>> devices to rather often - without making new hardware. Volume's too
>>> low per individual application to get new hardware designed and made.
>> Yes, you may have several products on the same hardware with somewhat
>> differing requirements (or not). But that is much less than a general
>> purpose system IMHO.
>
> It is, but the idea that small embedded systems go through a 'all
> components are known, drivers are known, test and if it passes it's
> shippable' does not always apply.
>
>>> I'm seriously thinking of forwarding porting the 4 year old firmware
>>> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
>> That sounds reasonable (and I never meant maintaining the old system
>> infinitely.
>
> Sounds reasonable, but it's vetoed for anticipated time and cost,
> compared with backporting on demand. Fair enough, since 2.6.current
> doesn't support ARM no-MMU last I heard ('soon'?).
>
> On the other hand, the 2.6 anti-fragmentation patches, including
> latest SLUB stuff, ironically meant to help big machines, sound really
> appealing for my current problem and totally unrealistic to
> backport...
>
>> ACK. We avoid MMU-less hardware too - especially since there is enough
>> hardware with a MMU around.
>
> I can't emphasise enough how much difference MMU makes to Linux userspace.
>
> It's practically: MMU = standard Linux (with less RAM), have everything.
> No-MMU = lots of familiar 'Linux' things not available or break.

And lots of things work in the usual way...

Of course the flip side is that for people who have platforms
without MMU they can run something more than the mostly "toy"
like operating systems typically available. There are plenty of
problem domains that the non-MMU limitations are not a problem for.
(Yours doesn't sound like one of them :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

2008-08-28 00:14:24

by Greg Ungerer

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
>> If you "develop" an embedded system (which is partly system integration
>> of existing apps) to be installed in the field, you don't have that many
>> conceivable work loads compared to a desktop/server system. And you have
>> a fixed list of drivers and applications.
>
> Hah! Not in my line of embedded device.
>
> 32MB no-MMU ARM boards which people run new things and attach new
> devices to rather often - without making new hardware. Volume's too
> low per individual application to get new hardware designed and made.
>
> I'm seriously thinking of forwarding porting the 4 year old firmware
> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
> Backporting is tedious, so's feeling wretchedly far from the mainline
> world.
>
>> A usual approach is to run stress tests on several (or all)
>> subsystems/services/... in parallel and if the device survives it
>> functioning correctly, it is at least good enough.
>
> Per application.
>
> Some little devices run hundreds of different applications and
> customers expect to customise, script themselves, and attach different
> devices (over USB). The next customer in the chain expects the bits
> you supplied to work in a variety of unexpected situations, even when
> you advise that it probably won't do that.
>
> Much like desktop/server Linux, but on a small device where silly
> little things like 'create a process' are a stress for the dear little
> thing.
>
> (My biggest lesson: insist on an MMU next time!)

But given you have hardware you can't change would you choose
to not run Linux, even with the limitations of non-MMU?

Hell no :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

2008-08-28 00:33:08

by Paul Mundt

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> > On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > In addition to that, debugging the runaway stack users on 4k tends to be
> > easier anyways since you end up blowing the stack a lot sooner. On sh
> > we've had pretty good luck with it, though most of our users are using
> > fairly deterministic workloads and continually profiling the footprint.
> > Anything that runs away or uses an insane amount of stack space needs to
> > be fixed well before that anyways, so catching it sooner is always
> > preferable. I imagine the same case is true for m68knommu (even sans IRQ
> > stacks).
>
> CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> wanted with an arbitrary limit.
>
In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
only performed from do_IRQ(), which is sporadic at best, especially on
tickless. While it catches some things, it's not a complete solution in
and of iteslf.

In addition to this, there are even fewer platforms that support it than
there are platforms that do 4k stacks. At first glance, it looks like
it's only m32r, powerpc, sh, x86, and xtensa. Others support the Kconfig
option, but don't seem to realize that it's not an option that the kernel
does anything with by itself, and so don't actually do anything (ie,
FRV).

> > Things might be more sensitive on x86, but it's certainly not something
> > that's a huge problem for the various embedded platforms to wire up,
> > whether they want to go the IRQ stack route or not.
>
> How many platforms use 4kB stacks on sh?
>
> Only 1 out of 34 defconfigs uses it.
>
The defconfigs tend to enable as much random stuff as people are
interested in for development and testing purposes. Most of these end up
being reference boards and are the basis for products, rather than
shipping products themselves. In the latter case, everything is gradually
tightened down, and 4k stack utilization in that case is the norm, rather
than the exception.

> > In any event, lack of support for something on embedded architectures in
> > the kernel is more often due to apathy/utter indifference on the part of
> > the architecture maintainer rather than being indicative of any intrinsic
> > difficulty in supporting the thing in question. Most new "features" on the
> > lesser maintained architectures tend to end up there either out of peer
> > pressure or copying-and-pasting accidents rather than any sort of design.
> > ;-)
>
> arm or powerpc aren't exactly lesser maintained architectures.
>
Indeed, which is why I find it bizarre that you would even bother
applying what was said to those platforms. Specifically I was referring
to the embedded platforms that don't do 4k stacks today. The fact they
don't support them today has much less to do with 4k being an
unattainable limit as it does with people simply not bothering to
implement it on their platform.

> IMHO there seems to currently be a mismatch between it's maintainance
> cost and the actual number of users. That's in my opinion the main
> problem with it, no matter in which direction it gets resolved.
>
Perhaps that's true on x86, but in general I take issue with that. On sh
we've had to do very little maintenance for it and most shipping products
are using it today (at least on MMU-Linux, we don't bother with it on
nommu). Most of the problems we ran in to with 4k stacks tended to be
stuff that we wanted to fix for 8k anyways. I suspect that this case is
true for the other embedded platforms also.

Note that on sh we also conditionalize IRQ stacks separately, so while
they are often used together, it's possible to use 4k stacks without
resorting to IRQ stacks (as m68knommu also seems to do).

2008-08-28 00:37:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26



On Wed, 27 Aug 2008, Linus Torvalds wrote:
>
> Hmm. compat_ioctl() actually has exactly the same issue. Damn.
>
> So you can't just add the new argument, you also have to _pass_ the
> argument in the compat_ioctl handlers to the non-compat ones.

What the hell. Here's a test patch. A largish part of it was generated
through a stupid script that basically did a number of grep + 'sed' on a
lot of files, and then the rest was fixed up manually after running "make
allmodconfig".

I'm not going to guarantee anything, but it gets close. A starting point
for somebody else, and considering that it is

208 files changed, 370 insertions(+), 376 deletions(-)

this is definitely linux-next material.

The extra deletions are mainly because the passing of "inode" as an
argument means that some functions don't need to look it up manually any
more.

And yeah, I changed the return type to "int". There's no way the kernel
can validly return anything bigger than that anyway. And this way all the
ioctl functions have the same type, no confusion.

TOTALLY UNTESTED apart from the fact that it compiles.

Linus

---
arch/mips/sibyte/common/sb_tbprof.c | 2 +-
arch/parisc/kernel/perf.c | 4 +-
arch/sparc/kernel/apc.c | 2 +-
arch/x86/kernel/apm_32.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce_64.c | 2 +-
arch/x86/kernel/cpu/mtrr/if.c | 3 +-
block/bsg.c | 2 +-
block/compat_ioctl.c | 18 +++++++++++-----
block/ioctl.c | 3 +-
drivers/block/DAC960.c | 2 +-
drivers/block/cciss.c | 4 +-
drivers/block/loop.c | 3 +-
drivers/block/paride/pt.c | 4 +-
drivers/char/agp/agp.h | 2 +-
drivers/char/agp/compat_ioctl.c | 2 +-
drivers/char/agp/frontend.c | 2 +-
drivers/char/ds1302.c | 2 +-
drivers/char/dsp56k.c | 2 +-
drivers/char/efirtc.c | 4 +-
drivers/char/ip2/ip2main.c | 6 ++--
drivers/char/ip27-rtc.c | 4 +-
drivers/char/ipmi/ipmi_devintf.c | 2 +-
drivers/char/mmtimer.c | 4 +-
drivers/char/mwave/mwavedd.c | 4 +-
drivers/char/pcmcia/cm4000_cs.c | 3 +-
drivers/char/ppdev.c | 2 +-
drivers/char/random.c | 2 +-
drivers/char/rio/rio_linux.c | 4 +-
drivers/char/rtc.c | 4 +-
drivers/char/sx.c | 4 +-
drivers/char/tty_io.c | 14 +++++-------
drivers/char/viotape.c | 2 +-
drivers/firewire/fw-cdev.c | 8 +++---
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/i915/i915_ioc32.c | 2 +-
drivers/gpu/drm/mga/mga_drv.h | 2 +-
drivers/gpu/drm/mga/mga_ioc32.c | 2 +-
drivers/gpu/drm/r128/r128_drv.h | 2 +-
drivers/gpu/drm/r128/r128_ioc32.c | 2 +-
drivers/gpu/drm/radeon/radeon_drv.h | 2 +-
drivers/gpu/drm/radeon/radeon_ioc32.c | 2 +-
drivers/hid/hidraw.c | 3 +-
drivers/hid/usbhid/hiddev.c | 6 ++--
drivers/i2c/i2c-dev.c | 2 +-
drivers/ieee1394/dv1394.c | 20 +++++++++---------
drivers/ieee1394/raw1394.c | 4 +-
drivers/ieee1394/video1394.c | 34 ++++++++++++++++----------------
drivers/infiniband/core/user_mad.c | 4 +-
drivers/input/evdev.c | 4 +-
drivers/input/joydev.c | 4 +-
drivers/input/misc/uinput.c | 2 +-
drivers/md/dm-ioctl.c | 6 ++--
drivers/media/video/compat_ioctl32.c | 26 ++++++++++++------------
drivers/message/fusion/mptctl.c | 8 +++---
drivers/message/i2o/i2o_config.c | 2 +-
drivers/misc/phantom.c | 6 ++--
drivers/misc/sgi-gru/grufile.c | 2 +-
drivers/net/ppp_generic.c | 2 +-
drivers/pci/proc.c | 2 +-
drivers/rtc/rtc-dev.c | 2 +-
drivers/s390/block/dasd_int.h | 2 +-
drivers/s390/char/tape_char.c | 2 +-
drivers/s390/char/vmcp.c | 2 +-
drivers/s390/cio/chsc_sch.c | 2 +-
drivers/s390/crypto/zcrypt_api.c | 4 +-
drivers/s390/scsi/zfcp_cfdc.c | 2 +-
drivers/sbus/char/cpwatchdog.c | 2 +-
drivers/sbus/char/display7seg.c | 2 +-
drivers/sbus/char/openprom.c | 2 +-
drivers/scsi/aacraid/linit.c | 2 +-
drivers/scsi/ch.c | 6 ++--
drivers/scsi/dpt_i2o.c | 7 +----
drivers/scsi/megaraid/megaraid_mm.c | 10 ++++----
drivers/scsi/megaraid/megaraid_sas.c | 10 ++++----
drivers/scsi/osst.c | 2 +-
drivers/scsi/sd.c | 2 +-
drivers/scsi/sg.c | 2 +-
drivers/scsi/st.c | 4 +-
drivers/spi/spidev.c | 4 +-
drivers/telephony/ixj.c | 2 +-
drivers/usb/class/usblp.c | 2 +-
drivers/usb/gadget/inode.c | 6 ++--
drivers/usb/gadget/printer.c | 4 +-
drivers/usb/misc/iowarrior.c | 2 +-
drivers/usb/misc/rio500.c | 2 +-
drivers/usb/misc/sisusbvga/sisusb.c | 10 ++++----
drivers/usb/misc/usblcd.c | 2 +-
drivers/video/fbmem.c | 4 +--
drivers/watchdog/acquirewdt.c | 2 +-
drivers/watchdog/advantechwdt.c | 2 +-
drivers/watchdog/alim1535_wdt.c | 2 +-
drivers/watchdog/alim7101_wdt.c | 2 +-
drivers/watchdog/ar7_wdt.c | 2 +-
drivers/watchdog/at32ap700x_wdt.c | 2 +-
drivers/watchdog/at91rm9200_wdt.c | 2 +-
drivers/watchdog/bfin_wdt.c | 2 +-
drivers/watchdog/booke_wdt.c | 2 +-
drivers/watchdog/cpu5wdt.c | 2 +-
drivers/watchdog/davinci_wdt.c | 2 +-
drivers/watchdog/ep93xx_wdt.c | 2 +-
drivers/watchdog/eurotechwdt.c | 2 +-
drivers/watchdog/hpwdt.c | 2 +-
drivers/watchdog/i6300esb.c | 2 +-
drivers/watchdog/iTCO_wdt.c | 2 +-
drivers/watchdog/ib700wdt.c | 2 +-
drivers/watchdog/ibmasr.c | 2 +-
drivers/watchdog/indydog.c | 2 +-
drivers/watchdog/iop_wdt.c | 2 +-
drivers/watchdog/it8712f_wdt.c | 2 +-
drivers/watchdog/ixp2000_wdt.c | 2 +-
drivers/watchdog/ixp4xx_wdt.c | 2 +-
drivers/watchdog/ks8695_wdt.c | 2 +-
drivers/watchdog/machzwd.c | 2 +-
drivers/watchdog/mixcomwd.c | 2 +-
drivers/watchdog/mpc5200_wdt.c | 2 +-
drivers/watchdog/mpc8xxx_wdt.c | 2 +-
drivers/watchdog/mpcore_wdt.c | 2 +-
drivers/watchdog/mtx-1_wdt.c | 2 +-
drivers/watchdog/mv64x60_wdt.c | 2 +-
drivers/watchdog/omap_wdt.c | 2 +-
drivers/watchdog/pc87413_wdt.c | 2 +-
drivers/watchdog/pcwd.c | 2 +-
drivers/watchdog/pcwd_pci.c | 2 +-
drivers/watchdog/pcwd_usb.c | 2 +-
drivers/watchdog/pnx4008_wdt.c | 2 +-
drivers/watchdog/rm9k_wdt.c | 4 +-
drivers/watchdog/s3c2410_wdt.c | 2 +-
drivers/watchdog/sa1100_wdt.c | 2 +-
drivers/watchdog/sb_wdog.c | 2 +-
drivers/watchdog/sbc60xxwdt.c | 2 +-
drivers/watchdog/sbc7240_wdt.c | 2 +-
drivers/watchdog/sbc_epx_c3.c | 2 +-
drivers/watchdog/sc1200wdt.c | 2 +-
drivers/watchdog/sc520_wdt.c | 2 +-
drivers/watchdog/scx200_wdt.c | 2 +-
drivers/watchdog/shwdt.c | 2 +-
drivers/watchdog/smsc37b787_wdt.c | 2 +-
drivers/watchdog/softdog.c | 2 +-
drivers/watchdog/txx9wdt.c | 2 +-
drivers/watchdog/w83627hf_wdt.c | 2 +-
drivers/watchdog/w83697hf_wdt.c | 2 +-
drivers/watchdog/w83877f_wdt.c | 2 +-
drivers/watchdog/w83977f_wdt.c | 2 +-
drivers/watchdog/wafer5823wdt.c | 2 +-
drivers/watchdog/wdrtas.c | 2 +-
drivers/watchdog/wdt.c | 2 +-
drivers/watchdog/wdt285.c | 2 +-
drivers/watchdog/wdt977.c | 2 +-
drivers/watchdog/wdt_pci.c | 2 +-
fs/bad_inode.c | 4 +-
fs/block_dev.c | 7 +++++-
fs/cifs/cifsfs.h | 2 +-
fs/cifs/ioctl.c | 3 +-
fs/compat_ioctl.c | 3 +-
fs/ext2/ext2.h | 4 +-
fs/ext2/ioctl.c | 7 ++---
fs/ext3/ioctl.c | 3 +-
fs/ext4/ext4.h | 4 +-
fs/ext4/ioctl.c | 7 ++---
fs/fat/dir.c | 3 +-
fs/gfs2/ops_file.c | 2 +-
fs/inotify_user.c | 2 +-
fs/ioctl.c | 8 +++---
fs/jffs2/ioctl.c | 2 +-
fs/jffs2/os-linux.h | 2 +-
fs/jfs/ioctl.c | 7 ++---
fs/jfs/jfs_inode.h | 4 +-
fs/ncpfs/ioctl.c | 3 +-
fs/ocfs2/ioctl.c | 7 ++---
fs/ocfs2/ioctl.h | 4 +-
fs/pipe.c | 3 +-
fs/proc/inode.c | 14 ++++++------
fs/reiserfs/ioctl.c | 3 +-
fs/ubifs/ioctl.c | 7 ++---
fs/ubifs/ubifs.h | 4 +-
fs/xfs/linux-2.6/xfs_file.c | 8 +++---
fs/xfs/linux-2.6/xfs_ioctl32.c | 6 +++-
fs/xfs/linux-2.6/xfs_ioctl32.h | 4 +-
include/linux/ext3_fs.h | 2 +-
include/linux/fs.h | 10 ++++----
include/linux/ncp_fs.h | 2 +-
include/linux/reiserfs_fs.h | 2 +-
include/linux/tty.h | 2 +-
include/linux/wanrouter.h | 2 +-
include/media/v4l2-ioctl.h | 2 +-
kernel/power/user.c | 2 +-
net/irda/irnet/irnet_ppp.c | 3 +-
net/irda/irnet/irnet_ppp.h | 5 ++-
net/socket.c | 8 +++---
net/wanrouter/wanmain.c | 3 +-
sound/core/control.c | 2 +-
sound/core/control_compat.c | 4 +-
sound/core/hwdep.c | 2 +-
sound/core/hwdep_compat.c | 4 +-
sound/core/info.c | 2 +-
sound/core/init.c | 2 +-
sound/core/oss/mixer_oss.c | 2 +-
sound/core/oss/pcm_oss.c | 2 +-
sound/core/pcm_compat.c | 2 +-
sound/core/pcm_native.c | 4 +-
sound/core/rawmidi.c | 2 +-
sound/core/rawmidi_compat.c | 4 +-
sound/core/seq/oss/seq_oss.c | 6 ++--
sound/core/seq/seq_clientmgr.c | 2 +-
sound/core/seq/seq_compat.c | 2 +-
sound/core/timer.c | 2 +-
sound/core/timer_compat.c | 4 +-
virt/kvm/kvm_main.c | 8 +++---
208 files changed, 370 insertions(+), 376 deletions(-)

diff --git a/arch/mips/sibyte/common/sb_tbprof.c b/arch/mips/sibyte/common/sb_tbprof.c
index 66e3e3f..5419f85 100644
--- a/arch/mips/sibyte/common/sb_tbprof.c
+++ b/arch/mips/sibyte/common/sb_tbprof.c
@@ -507,7 +507,7 @@ static ssize_t sbprof_tb_read(struct file *filp, char *buf,
return count;
}

-static long sbprof_tb_ioctl(struct file *filp,
+static int sbprof_tb_ioctl(struct inode *inode, struct file *filp,
unsigned int command,
unsigned long arg)
{
diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index f696f57..6d98acc 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -198,7 +198,7 @@ static int perf_open(struct inode *inode, struct file *file);
static ssize_t perf_read(struct file *file, char __user *buf, size_t cnt, loff_t *ppos);
static ssize_t perf_write(struct file *file, const char __user *buf, size_t count,
loff_t *ppos);
-static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int perf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
static void perf_start_counters(void);
static int perf_stop_counters(uint32_t *raddr);
static const struct rdr_tbl_ent * perf_rdr_get_entry(uint32_t rdr_num);
@@ -442,7 +442,7 @@ static void perf_patch_images(void)
* must be running on the processor that you wish to change.
*/

-static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int perf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
long error_start;
uint32_t raddr[4];
diff --git a/arch/sparc/kernel/apc.c b/arch/sparc/kernel/apc.c
index 5267d48..d49a35a 100644
--- a/arch/sparc/kernel/apc.c
+++ b/arch/sparc/kernel/apc.c
@@ -85,7 +85,7 @@ static int apc_release(struct inode *inode, struct file *f)
return 0;
}

-static long apc_ioctl(struct file *f, unsigned int cmd, unsigned long __arg)
+static int apc_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long __arg)
{
__u8 inarg, __user *arg;

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 9ee24e6..329e4c5 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -1460,7 +1460,7 @@ static unsigned int do_poll(struct file *fp, poll_table *wait)
return 0;
}

-static long do_ioctl(struct file *filp, u_int cmd, u_long arg)
+static int do_ioctl(struct inode *inode, struct file *filp, u_int cmd, u_long arg)
{
struct apm_user *as;
int ret;
diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 726a5fc..91f970f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -645,7 +645,7 @@ static unsigned int mce_poll(struct file *file, poll_table *wait)
return 0;
}

-static long mce_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int mce_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
{
int __user *p = (int __user *)arg;

diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index 84c480b..d6b053b 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -145,8 +145,7 @@ mtrr_write(struct file *file, const char __user *buf, size_t len, loff_t * ppos)
return -EINVAL;
}

-static long
-mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
+static int mtrr_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long __arg)
{
int err = 0;
mtrr_type type;
diff --git a/block/bsg.c b/block/bsg.c
index 0aae8d7..1ec2e02 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -872,7 +872,7 @@ static unsigned int bsg_poll(struct file *file, poll_table *wait)
return mask;
}

-static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int bsg_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct bsg_device *bd = file->private_data;
int __user *uarg = (int __user *) arg;
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index c23177e..2c32818 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -709,7 +709,7 @@ static int compat_blkdev_driver_ioctl(struct inode *inode, struct file *file,
}

if (disk->fops->unlocked_ioctl)
- return disk->fops->unlocked_ioctl(file, cmd, arg);
+ return disk->fops->unlocked_ioctl(inode, file, cmd, arg);

if (disk->fops->ioctl) {
lock_kernel();
@@ -773,10 +773,16 @@ static int compat_blkdev_locked_ioctl(struct inode *inode, struct file *file,
return -ENOIOCTLCMD;
}

-/* Most of the generic ioctls are handled in the normal fallback path.
- This assumes the blkdev's low level compat_ioctl always returns
- ENOIOCTLCMD for unknown ioctls. */
-long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+/*
+ * Most of the generic ioctls are handled in the normal fallback path.
+ * This assumes the blkdev's low level compat_ioctl always returns
+ * ENOIOCTLCMD for unknown ioctls.
+ *
+ * NOTE! We ignore the on-disk inode that was passed as
+ * an argument, and use the "f_mapping->host" inode for
+ * all block ioctls!
+ */
+int compat_blkdev_ioctl(struct inode *unused, struct file *file, unsigned cmd, unsigned long arg)
{
int ret = -ENOIOCTLCMD;
struct inode *inode = file->f_mapping->host;
@@ -806,7 +812,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
ret = compat_blkdev_locked_ioctl(inode, file, bdev, cmd, arg);
/* FIXME: why do we assume -> compat_ioctl needs the BKL? */
if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl)
- ret = disk->fops->compat_ioctl(file, cmd, arg);
+ ret = disk->fops->compat_ioctl(inode, file, cmd, arg);
unlock_kernel();

if (ret != -ENOIOCTLCMD)
diff --git a/block/ioctl.c b/block/ioctl.c
index 77185e5..a85824e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -204,8 +204,9 @@ int blkdev_driver_ioctl(struct inode *inode, struct file *file,
struct gendisk *disk, unsigned cmd, unsigned long arg)
{
int ret;
+
if (disk->fops->unlocked_ioctl)
- return disk->fops->unlocked_ioctl(file, cmd, arg);
+ return disk->fops->unlocked_ioctl(inode, file, cmd, arg);

if (disk->fops->ioctl) {
lock_kernel();
diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index a002a38..972539d 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -6628,7 +6628,7 @@ static void DAC960_DestroyProcEntries(DAC960_Controller_T *Controller)
* DAC960_gam_ioctl is the ioctl function for performing RAID operations.
*/

-static long DAC960_gam_ioctl(struct file *file, unsigned int Request,
+static int DAC960_gam_ioctl(struct inode *inode, struct file *file, unsigned int Request,
unsigned long Argument)
{
long ErrorCode = 0;
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index b73116e..67404dd 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -192,7 +192,7 @@ static void cciss_procinit(int i)
#endif /* CONFIG_PROC_FS */

#ifdef CONFIG_COMPAT
-static long cciss_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int cciss_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
#endif

static struct block_device_operations cciss_fops = {
@@ -618,7 +618,7 @@ static int cciss_ioctl32_passthru(struct file *f, unsigned cmd,
static int cciss_ioctl32_big_passthru(struct file *f, unsigned cmd,
unsigned long arg);

-static long cciss_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg)
+static int cciss_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg)
{
switch (cmd) {
case CCISS_GETPCIINFO:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d3a25b0..bfa4f44 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1292,9 +1292,8 @@ loop_get_status_compat(struct loop_device *lo,
return err;
}

-static long lo_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int lo_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
struct loop_device *lo = inode->i_bdev->bd_disk->private_data;
int err;

diff --git a/drivers/block/paride/pt.c b/drivers/block/paride/pt.c
index 673b8b2..5a6fe4a 100644
--- a/drivers/block/paride/pt.c
+++ b/drivers/block/paride/pt.c
@@ -190,7 +190,7 @@ module_param_array(drive3, int, NULL, 0);
#define ATAPI_LOG_SENSE 0x4d

static int pt_open(struct inode *inode, struct file *file);
-static long pt_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int pt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
static int pt_release(struct inode *inode, struct file *file);
static ssize_t pt_read(struct file *filp, char __user *buf,
size_t count, loff_t * ppos);
@@ -690,7 +690,7 @@ out:
return err;
}

-static long pt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct pt_unit *tape = file->private_data;
struct mtop __user *p = (void __user *)arg;
diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index 4bada0e..acdeee0 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -313,7 +313,7 @@ extern const struct aper_size_info_16 agp3_generic_sizes[];
extern int agp_off;
extern int agp_try_unsupported_boot;

-long compat_agp_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int compat_agp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);

/* Chipset independant registers (from AGP Spec) */
#define AGP_APBASE 0x10
diff --git a/drivers/char/agp/compat_ioctl.c b/drivers/char/agp/compat_ioctl.c
index 58c57cb..abd8974 100644
--- a/drivers/char/agp/compat_ioctl.c
+++ b/drivers/char/agp/compat_ioctl.c
@@ -202,7 +202,7 @@ static int compat_agpioc_unbind_wrap(struct agp_file_private *priv, void __user
return agp_unbind_memory(memory);
}

-long compat_agp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int compat_agp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct agp_file_private *curr_priv = file->private_data;
int ret_val = -ENOTTY;
diff --git a/drivers/char/agp/frontend.c b/drivers/char/agp/frontend.c
index a96f319..0a2d134 100644
--- a/drivers/char/agp/frontend.c
+++ b/drivers/char/agp/frontend.c
@@ -971,7 +971,7 @@ int agpioc_chipset_flush_wrap(struct agp_file_private *priv)
return 0;
}

-static long agp_ioctl(struct file *file,
+static int agp_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct agp_file_private *curr_priv = file->private_data;
diff --git a/drivers/char/ds1302.c b/drivers/char/ds1302.c
index c5e67a6..95aac80 100644
--- a/drivers/char/ds1302.c
+++ b/drivers/char/ds1302.c
@@ -154,7 +154,7 @@ static unsigned char days_in_mo[] =

/* ioctl that supports RTC_RD_TIME and RTC_SET_TIME (read and set time/date). */

-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
unsigned long flags;

diff --git a/drivers/char/dsp56k.c b/drivers/char/dsp56k.c
index ca7c72a..e4866bc 100644
--- a/drivers/char/dsp56k.c
+++ b/drivers/char/dsp56k.c
@@ -303,7 +303,7 @@ static ssize_t dsp56k_write(struct file *file, const char __user *buf, size_t co
}
}

-static long dsp56k_ioctl(struct file *file, unsigned int cmd,
+static int dsp56k_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int dev = iminor(file->f_path.dentry->d_inode) & 0x0f;
diff --git a/drivers/char/efirtc.c b/drivers/char/efirtc.c
index 34d15d5..3131dc0 100644
--- a/drivers/char/efirtc.c
+++ b/drivers/char/efirtc.c
@@ -51,7 +51,7 @@

static DEFINE_SPINLOCK(efi_rtc_lock);

-static long efi_rtc_ioctl(struct file *file, unsigned int cmd,
+static int efi_rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg);

#define is_leap(year) \
@@ -146,7 +146,7 @@ convert_from_efi_time(efi_time_t *eft, struct rtc_time *wtime)
}
}

-static long efi_rtc_ioctl(struct file *file, unsigned int cmd,
+static int efi_rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{

diff --git a/drivers/char/ip2/ip2main.c b/drivers/char/ip2/ip2main.c
index 689f9dc..5ac4c8d 100644
--- a/drivers/char/ip2/ip2main.c
+++ b/drivers/char/ip2/ip2main.c
@@ -203,7 +203,7 @@ static int set_serial_info(i2ChanStrPtr, struct serial_struct __user *);

static ssize_t ip2_ipl_read(struct file *, char __user *, size_t, loff_t *);
static ssize_t ip2_ipl_write(struct file *, const char __user *, size_t, loff_t *);
-static long ip2_ipl_ioctl(struct file *, UINT, ULONG);
+static int ip2_ipl_ioctl(struct inode *inode, struct file *, UINT, ULONG);
static int ip2_ipl_open(struct inode *, struct file *);

static int DumpTraceBuffer(char __user *, int);
@@ -2845,8 +2845,8 @@ ip2_ipl_write(struct file *pFile, const char __user *pData, size_t count, loff_t
/* */
/* */
/******************************************************************************/
-static long
-ip2_ipl_ioctl (struct file *pFile, UINT cmd, ULONG arg )
+static int
+ip2_ipl_ioctl(struct inode *inode, struct file *pFile, UINT cmd, ULONG arg )
{
unsigned int iplminor = iminor(pFile->f_path.dentry->d_inode);
int rc = 0;
diff --git a/drivers/char/ip27-rtc.c b/drivers/char/ip27-rtc.c
index ec9d044..f85a353 100644
--- a/drivers/char/ip27-rtc.c
+++ b/drivers/char/ip27-rtc.c
@@ -47,7 +47,7 @@
#include <asm/sn/sn0/hub.h>
#include <asm/sn/sn_private.h>

-static long rtc_ioctl(struct file *filp, unsigned int cmd,
+static int rtc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);

static int rtc_read_proc(char *page, char **start, off_t off,
@@ -76,7 +76,7 @@ static unsigned long epoch = 1970; /* year corresponding to 0x00 */
static const unsigned char days_in_mo[] =
{0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};

-static long rtc_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{

struct rtc_time wtime;
diff --git a/drivers/char/ipmi/ipmi_devintf.c b/drivers/char/ipmi/ipmi_devintf.c
index 64e1c16..02a8511 100644
--- a/drivers/char/ipmi/ipmi_devintf.c
+++ b/drivers/char/ipmi/ipmi_devintf.c
@@ -762,7 +762,7 @@ static long put_compat_ipmi_recv(struct ipmi_recv *p64,
/*
* Handle compatibility ioctls
*/
-static long compat_ipmi_ioctl(struct file *filep, unsigned int cmd,
+static int compat_ipmi_ioctl(struct inode *inode, struct file *filep, unsigned int cmd,
unsigned long arg)
{
int rc;
diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c
index 918711a..e2b2463 100644
--- a/drivers/char/mmtimer.c
+++ b/drivers/char/mmtimer.c
@@ -58,7 +58,7 @@ extern unsigned long sn_rtc_cycles_per_second;

#define rtc_time() (*RTC_COUNTER_ADDR)

-static long mmtimer_ioctl(struct file *file, unsigned int cmd,
+static int mmtimer_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg);
static int mmtimer_mmap(struct file *file, struct vm_area_struct *vma);

@@ -365,7 +365,7 @@ restart:
* %MMTIMER_GETCOUNTER - Gets the current value in the counter and places it
* in the address specified by @arg.
*/
-static long mmtimer_ioctl(struct file *file, unsigned int cmd,
+static int mmtimer_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int ret = 0;
diff --git a/drivers/char/mwave/mwavedd.c b/drivers/char/mwave/mwavedd.c
index 4f8d67f..41a3af0 100644
--- a/drivers/char/mwave/mwavedd.c
+++ b/drivers/char/mwave/mwavedd.c
@@ -86,7 +86,7 @@ module_param(mwave_uart_io, int, 0);

static int mwave_open(struct inode *inode, struct file *file);
static int mwave_close(struct inode *inode, struct file *file);
-static long mwave_ioctl(struct file *filp, unsigned int iocmd,
+static int mwave_ioctl(struct inode *inode, struct file *filp, unsigned int iocmd,
unsigned long ioarg);

MWAVE_DEVICE_DATA mwave_s_mdd;
@@ -119,7 +119,7 @@ static int mwave_close(struct inode *inode, struct file *file)
return retval;
}

-static long mwave_ioctl(struct file *file, unsigned int iocmd,
+static int mwave_ioctl(struct inode *inode, struct file *file, unsigned int iocmd,
unsigned long ioarg)
{
unsigned int retval = 0;
diff --git a/drivers/char/pcmcia/cm4000_cs.c b/drivers/char/pcmcia/cm4000_cs.c
index f070ae7..f556c56 100644
--- a/drivers/char/pcmcia/cm4000_cs.c
+++ b/drivers/char/pcmcia/cm4000_cs.c
@@ -1406,11 +1406,10 @@ static void stop_monitor(struct cm4000_dev *dev)
DEBUGP(3, dev, "<- stop_monitor\n");
}

-static long cmm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int cmm_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
struct cm4000_dev *dev = filp->private_data;
unsigned int iobase = dev->p_dev->io.BasePort1;
- struct inode *inode = filp->f_path.dentry->d_inode;
struct pcmcia_device *link;
int size;
int rc;
diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index bee39fd..fafcc15 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -633,7 +633,7 @@ static int pp_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return 0;
}

-static long pp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
long ret;
lock_kernel();
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 1838aa3..93e26d0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1061,7 +1061,7 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
return (ssize_t)count;
}

-static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int random_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
{
int size, ent_count;
int __user *p = (int __user *)arg;
diff --git a/drivers/char/rio/rio_linux.c b/drivers/char/rio/rio_linux.c
index a8f68a3..1fad0e4 100644
--- a/drivers/char/rio/rio_linux.c
+++ b/drivers/char/rio/rio_linux.c
@@ -179,7 +179,7 @@ static int rio_set_real_termios(void *ptr);
static void rio_hungup(void *ptr);
static void rio_close(void *ptr);
static int rio_chars_in_buffer(void *ptr);
-static long rio_fw_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+static int rio_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);
static int rio_init_drivers(void);

static void my_hd(void *addr, int len);
@@ -560,7 +560,7 @@ static void rio_close(void *ptr)



-static long rio_fw_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int rio_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
int rc = 0;
func_enter();
diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index f53d4d0..3bb7b51 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -142,7 +142,7 @@ static DEFINE_TIMER(rtc_irq_timer, rtc_dropped_irq, 0, 0);
static ssize_t rtc_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos);

-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
static void rtc_get_rtc_time(struct rtc_time *rtc_tm);

#ifdef RTC_IRQ
@@ -717,7 +717,7 @@ static int rtc_do_ioctl(unsigned int cmd, unsigned long arg, int kernel)
&wtime, sizeof wtime) ? -EFAULT : 0;
}

-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
long ret;
lock_kernel();
diff --git a/drivers/char/sx.c b/drivers/char/sx.c
index c385206..54d0c48 100644
--- a/drivers/char/sx.c
+++ b/drivers/char/sx.c
@@ -286,7 +286,7 @@ static void sx_close(void *ptr);
static int sx_chars_in_buffer(void *ptr);
static int sx_init_board(struct sx_board *board);
static int sx_init_portstructs(int nboards, int nports);
-static long sx_fw_ioctl(struct file *filp, unsigned int cmd,
+static int sx_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);
static int sx_init_drivers(void);

@@ -1686,7 +1686,7 @@ static int do_memtest_w(struct sx_board *board, int min, int max)
}
#endif

-static long sx_fw_ioctl(struct file *filp, unsigned int cmd,
+static int sx_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
long rc = 0;
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index daeb8f7..835658b 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -150,9 +150,9 @@ ssize_t redirected_tty_write(struct file *, const char __user *,
static unsigned int tty_poll(struct file *, poll_table *);
static int tty_open(struct inode *, struct file *);
static int tty_release(struct inode *, struct file *);
-long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
#ifdef CONFIG_COMPAT
-static long tty_compat_ioctl(struct file *file, unsigned int cmd,
+static int tty_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg);
#else
#define tty_compat_ioctl NULL
@@ -785,13 +785,13 @@ static unsigned int hung_up_tty_poll(struct file *filp, poll_table *wait)
return POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDNORM | POLLWRNORM;
}

-static long hung_up_tty_ioctl(struct file *file, unsigned int cmd,
+static int hung_up_tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
}

-static long hung_up_tty_compat_ioctl(struct file *file,
+static int hung_up_tty_compat_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
@@ -2941,13 +2941,12 @@ static int tty_tiocmset(struct tty_struct *tty, struct file *file, unsigned int
/*
* Split this up, as gcc can choke on it otherwise..
*/
-long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct tty_struct *tty, *real_tty;
void __user *p = (void __user *)arg;
int retval;
struct tty_ldisc *ld;
- struct inode *inode = file->f_dentry->d_inode;

tty = (struct tty_struct *)file->private_data;
if (tty_paranoia_check(tty, inode, "tty_ioctl"))
@@ -3075,10 +3074,9 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
}

#ifdef CONFIG_COMPAT
-static long tty_compat_ioctl(struct file *file, unsigned int cmd,
+static int tty_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
- struct inode *inode = file->f_dentry->d_inode;
struct tty_struct *tty = file->private_data;
struct tty_ldisc *ld;
int retval = -ENOIOCTLCMD;
diff --git a/drivers/char/viotape.c b/drivers/char/viotape.c
index 7a70a40..649b50e 100644
--- a/drivers/char/viotape.c
+++ b/drivers/char/viotape.c
@@ -678,7 +678,7 @@ free_op:
return ret;
}

-static long viotap_unlocked_ioctl(struct file *file,
+static int viotap_unlocked_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
long rc;
diff --git a/drivers/firewire/fw-cdev.c b/drivers/firewire/fw-cdev.c
index 2e6d584..c7b1e3d 100644
--- a/drivers/firewire/fw-cdev.c
+++ b/drivers/firewire/fw-cdev.c
@@ -916,8 +916,8 @@ dispatch_ioctl(struct client *client, unsigned int cmd, void __user *arg)
return 0;
}

-static long
-fw_device_op_ioctl(struct file *file,
+static int
+fw_device_op_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct client *client = file->private_data;
@@ -929,8 +929,8 @@ fw_device_op_ioctl(struct file *file,
}

#ifdef CONFIG_COMPAT
-static long
-fw_device_op_compat_ioctl(struct file *file,
+static int
+fw_device_op_compat_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct client *client = file->private_data;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d7326d9..ecc9ce6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -216,7 +216,7 @@ extern void i915_driver_lastclose(struct drm_device * dev);
extern void i915_driver_preclose(struct drm_device *dev,
struct drm_file *file_priv);
extern int i915_driver_device_is_agp(struct drm_device * dev);
-extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int i915_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);

/* i915_irq.c */
diff --git a/drivers/gpu/drm/i915/i915_ioc32.c b/drivers/gpu/drm/i915/i915_ioc32.c
index 1fe68a2..f8f623e 100644
--- a/drivers/gpu/drm/i915/i915_ioc32.c
+++ b/drivers/gpu/drm/i915/i915_ioc32.c
@@ -199,7 +199,7 @@ drm_ioctl_compat_t *i915_compat_ioctls[] = {
* \param arg user argument.
* \return zero on success or negative number on failure.
*/
-long i915_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int i915_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
unsigned int nr = DRM_IOCTL_NR(cmd);
drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/mga/mga_drv.h b/drivers/gpu/drm/mga/mga_drv.h
index f6ebd24..dfe6cd7 100644
--- a/drivers/gpu/drm/mga/mga_drv.h
+++ b/drivers/gpu/drm/mga/mga_drv.h
@@ -187,7 +187,7 @@ extern irqreturn_t mga_driver_irq_handler(DRM_IRQ_ARGS);
extern void mga_driver_irq_preinstall(struct drm_device * dev);
extern void mga_driver_irq_postinstall(struct drm_device * dev);
extern void mga_driver_irq_uninstall(struct drm_device * dev);
-extern long mga_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int mga_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);

#define mga_flush_write_combine() DRM_WRITEMEMORYBARRIER()
diff --git a/drivers/gpu/drm/mga/mga_ioc32.c b/drivers/gpu/drm/mga/mga_ioc32.c
index 30d0047..b5d0826 100644
--- a/drivers/gpu/drm/mga/mga_ioc32.c
+++ b/drivers/gpu/drm/mga/mga_ioc32.c
@@ -208,7 +208,7 @@ drm_ioctl_compat_t *mga_compat_ioctls[] = {
* \param arg user argument.
* \return zero on success or negative number on failure.
*/
-long mga_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int mga_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
unsigned int nr = DRM_IOCTL_NR(cmd);
drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/r128/r128_drv.h b/drivers/gpu/drm/r128/r128_drv.h
index 011105e..e145952 100644
--- a/drivers/gpu/drm/r128/r128_drv.h
+++ b/drivers/gpu/drm/r128/r128_drv.h
@@ -159,7 +159,7 @@ extern void r128_driver_lastclose(struct drm_device * dev);
extern void r128_driver_preclose(struct drm_device * dev,
struct drm_file *file_priv);

-extern long r128_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int r128_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);

/* Register definitions, register access macros and drmAddMap constants
diff --git a/drivers/gpu/drm/r128/r128_ioc32.c b/drivers/gpu/drm/r128/r128_ioc32.c
index d3cb676..f242fdb 100644
--- a/drivers/gpu/drm/r128/r128_ioc32.c
+++ b/drivers/gpu/drm/r128/r128_ioc32.c
@@ -198,7 +198,7 @@ drm_ioctl_compat_t *r128_compat_ioctls[] = {
* \param arg user argument.
* \return zero on success or negative number on failure.
*/
-long r128_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int r128_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
unsigned int nr = DRM_IOCTL_NR(cmd);
drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.h b/drivers/gpu/drm/radeon/radeon_drv.h
index 0993816..4b55abd 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.h
+++ b/drivers/gpu/drm/radeon/radeon_drv.h
@@ -401,7 +401,7 @@ extern void radeon_driver_preclose(struct drm_device * dev, struct drm_file *fil
extern void radeon_driver_postclose(struct drm_device * dev, struct drm_file * filp);
extern void radeon_driver_lastclose(struct drm_device * dev);
extern int radeon_driver_open(struct drm_device * dev, struct drm_file * filp_priv);
-extern long radeon_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int radeon_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);

/* r300_cmdbuf.c */
diff --git a/drivers/gpu/drm/radeon/radeon_ioc32.c b/drivers/gpu/drm/radeon/radeon_ioc32.c
index 56decda..6b518cb 100644
--- a/drivers/gpu/drm/radeon/radeon_ioc32.c
+++ b/drivers/gpu/drm/radeon/radeon_ioc32.c
@@ -401,7 +401,7 @@ drm_ioctl_compat_t *radeon_compat_ioctls[] = {
* \param arg user argument.
* \return zero on success or negative number on failure.
*/
-long radeon_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int radeon_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
unsigned int nr = DRM_IOCTL_NR(cmd);
drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
index c40f040..0a15260 100644
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -217,10 +217,9 @@ static int hidraw_release(struct inode * inode, struct file * file)
return 0;
}

-static long hidraw_ioctl(struct file *file, unsigned int cmd,
+static int hidraw_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
unsigned int minor = iminor(inode);
long ret = 0;
/* FIXME: What stops hidraw_table going NULL */
diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
index 842e9ed..0b08caf 100644
--- a/drivers/hid/usbhid/hiddev.c
+++ b/drivers/hid/usbhid/hiddev.c
@@ -544,7 +544,7 @@ static noinline int hiddev_ioctl_string(struct hiddev *hiddev, unsigned int cmd,
return len;
}

-static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int hiddev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct hiddev_list *list = file->private_data;
struct hiddev *hiddev = list->hiddev;
@@ -761,9 +761,9 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
}

#ifdef CONFIG_COMPAT
-static long hiddev_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int hiddev_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- return hiddev_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+ return hiddev_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
}
#endif

diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index af4491f..98ec3d2 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -367,7 +367,7 @@ static noinline int i2cdev_ioctl_smbus(struct i2c_client *client,
return res;
}

-static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int i2cdev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct i2c_client *client = (struct i2c_client *)file->private_data;
unsigned long funcs;
diff --git a/drivers/ieee1394/dv1394.c b/drivers/ieee1394/dv1394.c
index b6eb2cf..a8bdc2c 100644
--- a/drivers/ieee1394/dv1394.c
+++ b/drivers/ieee1394/dv1394.c
@@ -158,7 +158,7 @@ static void it_tasklet_func(unsigned long data);
static void ir_tasklet_func(unsigned long data);

#ifdef CONFIG_COMPAT
-static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
+static int dv1394_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg);
#endif

@@ -1533,7 +1533,7 @@ static ssize_t dv1394_read(struct file *file, char __user *buffer, size_t count

/*** DEVICE IOCTL INTERFACE ************************************************/

-static long dv1394_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int dv1394_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct video_card *video = file_to_video_card(file);
unsigned long flags;
@@ -2457,7 +2457,7 @@ struct dv1394_status32 {

/* RED-PEN: this should use compat_alloc_userspace instead */

-static int handle_dv1394_init(struct file *file, unsigned int cmd, unsigned long arg)
+static int handle_dv1394_init(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct dv1394_init32 dv32;
struct dv1394_init dv;
@@ -2480,13 +2480,13 @@ static int handle_dv1394_init(struct file *file, unsigned int cmd, unsigned long

old_fs = get_fs();
set_fs(KERNEL_DS);
- ret = dv1394_ioctl(file, DV1394_IOC_INIT, (unsigned long)&dv);
+ ret = dv1394_ioctl(inode, file, DV1394_IOC_INIT, (unsigned long)&dv);
set_fs(old_fs);

return ret;
}

-static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigned long arg)
+static int handle_dv1394_get_status(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct dv1394_status32 dv32;
struct dv1394_status dv;
@@ -2498,7 +2498,7 @@ static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigne

old_fs = get_fs();
set_fs(KERNEL_DS);
- ret = dv1394_ioctl(file, DV1394_IOC_GET_STATUS, (unsigned long)&dv);
+ ret = dv1394_ioctl(inode, file, DV1394_IOC_GET_STATUS, (unsigned long)&dv);
set_fs(old_fs);

if (!ret) {
@@ -2523,7 +2523,7 @@ static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigne



-static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
+static int dv1394_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
switch (cmd) {
@@ -2532,12 +2532,12 @@ static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
case DV1394_IOC_WAIT_FRAMES:
case DV1394_IOC_RECEIVE_FRAMES:
case DV1394_IOC_START_RECEIVE:
- return dv1394_ioctl(file, cmd, arg);
+ return dv1394_ioctl(inode, file, cmd, arg);

case DV1394_IOC32_INIT:
- return handle_dv1394_init(file, cmd, arg);
+ return handle_dv1394_init(inode, file, cmd, arg);
case DV1394_IOC32_GET_STATUS:
- return handle_dv1394_get_status(file, cmd, arg);
+ return handle_dv1394_get_status(inode, file, cmd, arg);
default:
return -ENOIOCTLCMD;
}
diff --git a/drivers/ieee1394/raw1394.c b/drivers/ieee1394/raw1394.c
index 6fa9e4a..6cf46fa 100644
--- a/drivers/ieee1394/raw1394.c
+++ b/drivers/ieee1394/raw1394.c
@@ -2656,7 +2656,7 @@ static long do_raw1394_ioctl(struct file *file, unsigned int cmd,
return -EINVAL;
}

-static long raw1394_ioctl(struct file *file, unsigned int cmd,
+static int raw1394_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
long ret;
@@ -2717,7 +2717,7 @@ static long raw1394_read_cycle_timer32(struct file_info *fi, void __user * uaddr
return err;
}

-static long raw1394_compat_ioctl(struct file *file,
+static int raw1394_compat_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct file_info *fi = file->private_data;
diff --git a/drivers/ieee1394/video1394.c b/drivers/ieee1394/video1394.c
index 25db6e6..ed4eb78 100644
--- a/drivers/ieee1394/video1394.c
+++ b/drivers/ieee1394/video1394.c
@@ -716,7 +716,7 @@ static inline unsigned video1394_buffer_state(struct dma_iso_ctx *d,
return ret;
}

-static long video1394_ioctl(struct file *file,
+static int video1394_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct file_ctx *ctx = (struct file_ctx *)file->private_data;
@@ -1272,7 +1272,7 @@ static int video1394_release(struct inode *inode, struct file *file)
}

#ifdef CONFIG_COMPAT
-static long video1394_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int video1394_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
#endif

static struct cdev video1394_cdev;
@@ -1386,7 +1386,7 @@ struct video1394_wait32 {
struct compat_timeval filltime;
};

-static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_wr_wait32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct video1394_wait32 __user *argp = (void __user *)arg;
struct video1394_wait32 wait32;
@@ -1405,11 +1405,11 @@ static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned lon
old_fs = get_fs();
set_fs(KERNEL_DS);
if (cmd == VIDEO1394_IOC32_LISTEN_WAIT_BUFFER)
- ret = video1394_ioctl(file,
+ ret = video1394_ioctl(inode, file,
VIDEO1394_IOC_LISTEN_WAIT_BUFFER,
(unsigned long) &wait);
else
- ret = video1394_ioctl(file,
+ ret = video1394_ioctl(inode, file,
VIDEO1394_IOC_LISTEN_POLL_BUFFER,
(unsigned long) &wait);
set_fs(old_fs);
@@ -1427,7 +1427,7 @@ static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned lon
return ret;
}

-static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_w_wait32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct video1394_wait32 wait32;
struct video1394_wait wait;
@@ -1445,11 +1445,11 @@ static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long
old_fs = get_fs();
set_fs(KERNEL_DS);
if (cmd == VIDEO1394_IOC32_LISTEN_QUEUE_BUFFER)
- ret = video1394_ioctl(file,
+ ret = video1394_ioctl(inode, file,
VIDEO1394_IOC_LISTEN_QUEUE_BUFFER,
(unsigned long) &wait);
else
- ret = video1394_ioctl(file,
+ ret = video1394_ioctl(inode, file,
VIDEO1394_IOC_TALK_WAIT_BUFFER,
(unsigned long) &wait);
set_fs(old_fs);
@@ -1457,33 +1457,33 @@ static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long
return ret;
}

-static int video1394_queue_buf32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_queue_buf32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
return -EFAULT; /* ??? was there before. */

- return video1394_ioctl(file,
+ return video1394_ioctl(inode, file,
VIDEO1394_IOC_TALK_QUEUE_BUFFER, arg);
}

-static long video1394_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg)
+static int video1394_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg)
{
switch (cmd) {
case VIDEO1394_IOC_LISTEN_CHANNEL:
case VIDEO1394_IOC_UNLISTEN_CHANNEL:
case VIDEO1394_IOC_TALK_CHANNEL:
case VIDEO1394_IOC_UNTALK_CHANNEL:
- return video1394_ioctl(f, cmd, arg);
+ return video1394_ioctl(inode, f, cmd, arg);

case VIDEO1394_IOC32_LISTEN_QUEUE_BUFFER:
- return video1394_w_wait32(f, cmd, arg);
+ return video1394_w_wait32(inode, f, cmd, arg);
case VIDEO1394_IOC32_LISTEN_WAIT_BUFFER:
- return video1394_wr_wait32(f, cmd, arg);
+ return video1394_wr_wait32(inode, f, cmd, arg);
case VIDEO1394_IOC_TALK_QUEUE_BUFFER:
- return video1394_queue_buf32(f, cmd, arg);
+ return video1394_queue_buf32(inode, f, cmd, arg);
case VIDEO1394_IOC32_TALK_WAIT_BUFFER:
- return video1394_w_wait32(f, cmd, arg);
+ return video1394_w_wait32(inode, f, cmd, arg);
case VIDEO1394_IOC32_LISTEN_POLL_BUFFER:
- return video1394_wr_wait32(f, cmd, arg);
+ return video1394_wr_wait32(inode, f, cmd, arg);
default:
return -ENOIOCTLCMD;
}
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 268a2d2..6cd0bc3 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -743,7 +743,7 @@ static long ib_umad_enable_pkey(struct ib_umad_file *file)
return ret;
}

-static long ib_umad_ioctl(struct file *filp, unsigned int cmd,
+static int ib_umad_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
switch (cmd) {
@@ -759,7 +759,7 @@ static long ib_umad_ioctl(struct file *filp, unsigned int cmd,
}

#ifdef CONFIG_COMPAT
-static long ib_umad_compat_ioctl(struct file *filp, unsigned int cmd,
+static int ib_umad_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
switch (cmd) {
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 3524bef..9fd8fa9 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -888,13 +888,13 @@ static long evdev_ioctl_handler(struct file *file, unsigned int cmd,
return retval;
}

-static long evdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int evdev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
return evdev_ioctl_handler(file, cmd, (void __user *)arg, 0);
}

#ifdef CONFIG_COMPAT
-static long evdev_ioctl_compat(struct file *file,
+static int evdev_ioctl_compat(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
return evdev_ioctl_handler(file, cmd, compat_ptr(arg), 1);
diff --git a/drivers/input/joydev.c b/drivers/input/joydev.c
index 65d7077..d4db145 100644
--- a/drivers/input/joydev.c
+++ b/drivers/input/joydev.c
@@ -555,7 +555,7 @@ static int joydev_ioctl_common(struct joydev *joydev,
}

#ifdef CONFIG_COMPAT
-static long joydev_compat_ioctl(struct file *file,
+static int joydev_compat_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct joydev_client *client = file->private_data;
@@ -622,7 +622,7 @@ static long joydev_compat_ioctl(struct file *file,
}
#endif /* CONFIG_COMPAT */

-static long joydev_ioctl(struct file *file,
+static int joydev_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
struct joydev_client *client = file->private_data;
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c
index 223d56d..a37877e 100644
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -455,7 +455,7 @@ static int uinput_release(struct inode *inode, struct file *file)
__ret; \
})

-static long uinput_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int uinput_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int retval;
struct uinput_device *udev;
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index b262c00..c21cbdc 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1470,15 +1470,15 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
return r;
}

-static long dm_ctl_ioctl(struct file *file, uint command, ulong u)
+static int dm_ctl_ioctl(struct inode *inode, struct file *file, uint command, ulong u)
{
return (long)ctl_ioctl(command, (struct dm_ioctl __user *)u);
}

#ifdef CONFIG_COMPAT
-static long dm_compat_ctl_ioctl(struct file *file, uint command, ulong u)
+static int dm_compat_ctl_ioctl(struct inode *inode, struct file *file, uint command, ulong u)
{
- return (long)dm_ctl_ioctl(file, command, (ulong) compat_ptr(u));
+ return dm_ctl_ioctl(inode, file, command, (ulong) compat_ptr(u));
}
#else
#define dm_compat_ctl_ioctl NULL
diff --git a/drivers/media/video/compat_ioctl32.c b/drivers/media/video/compat_ioctl32.c
index bd5d9de..7eacc2d 100644
--- a/drivers/media/video/compat_ioctl32.c
+++ b/drivers/media/video/compat_ioctl32.c
@@ -110,15 +110,15 @@ struct video_window32 {
};
#endif

-static int native_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int native_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int ret = -ENOIOCTLCMD;

if (file->f_op->unlocked_ioctl)
- ret = file->f_op->unlocked_ioctl(file, cmd, arg);
+ ret = file->f_op->unlocked_ioctl(inode, file, cmd, arg);
else if (file->f_op->ioctl) {
lock_kernel();
- ret = file->f_op->ioctl(file->f_path.dentry->d_inode, file, cmd, arg);
+ ret = file->f_op->ioctl(inode, file, cmd, arg);
unlock_kernel();
}

@@ -549,7 +549,7 @@ enum {
MaxClips = (~0U-sizeof(struct video_window))/sizeof(struct video_clip)
};

-static int do_set_window(struct file *file, unsigned int cmd, unsigned long arg)
+static int do_set_window(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct video_window32 __user *up = compat_ptr(arg);
struct video_window __user *vw;
@@ -607,11 +607,11 @@ static int do_set_window(struct file *file, unsigned int cmd, unsigned long arg)
}
}

- return native_ioctl(file, VIDIOCSWIN, (unsigned long)vw);
+ return native_ioctl(inode, file, VIDIOCSWIN, (unsigned long)vw);
}
#endif

-static int do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int do_video_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
union {
#ifdef CONFIG_VIDEO_V4L1_COMPAT
@@ -754,12 +754,12 @@ static int do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg
goto out;

if(compatible_arg)
- err = native_ioctl(file, realcmd, (unsigned long)up);
+ err = native_ioctl(inode, file, realcmd, (unsigned long)up);
else {
mm_segment_t old_fs = get_fs();

set_fs(KERNEL_DS);
- err = native_ioctl(file, realcmd, (unsigned long) &karg);
+ err = native_ioctl(inode, file, realcmd, (unsigned long) &karg);
set_fs(old_fs);
}
if(err == 0) {
@@ -827,7 +827,7 @@ out:
return err;
}

-long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
+int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int ret = -ENOIOCTLCMD;

@@ -837,7 +837,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
switch (cmd) {
#ifdef CONFIG_VIDEO_V4L1_COMPAT
case VIDIOCSWIN32:
- ret = do_set_window(file, cmd, arg);
+ ret = do_set_window(inode, file, cmd, arg);
break;
case VIDIOCGTUNER32:
case VIDIOCSTUNER32:
@@ -885,7 +885,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
case VIDIOC_S_INPUT32:
case VIDIOC_TRY_FMT32:
case VIDIOC_S_HW_FREQ_SEEK:
- ret = do_video_ioctl(file, cmd, arg);
+ ret = do_video_ioctl(inode, file, cmd, arg);
break;

#ifdef CONFIG_VIDEO_V4L1_COMPAT
@@ -913,7 +913,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
case _IOR('v' , BASE_VIDIOCPRIVATE+5, int):
case _IOR('v' , BASE_VIDIOCPRIVATE+6, int):
case _IOR('v' , BASE_VIDIOCPRIVATE+7, int):
- ret = native_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+ ret = native_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
break;
#endif
default:
@@ -922,7 +922,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
return ret;
}
#else
-long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
+int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
return -ENOIOCTLCMD;
}
diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c
index f5233f3..3d20a3c 100644
--- a/drivers/message/fusion/mptctl.c
+++ b/drivers/message/fusion/mptctl.c
@@ -116,7 +116,7 @@ static int mptctl_probe(struct pci_dev *, const struct pci_device_id *);
static void mptctl_remove(struct pci_dev *);

#ifdef CONFIG_COMPAT
-static long compat_mpctl_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int compat_mpctl_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
#endif
/*
* Private function calls.
@@ -652,8 +652,8 @@ __mptctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return ret;
}

-static long
-mptctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+mptctl_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
long ret;
lock_kernel();
@@ -2818,7 +2818,7 @@ compat_mpt_command(struct file *filp, unsigned int cmd,
return ret;
}

-static long compat_mpctl_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int compat_mpctl_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
{
long ret;
lock_kernel();
diff --git a/drivers/message/i2o/i2o_config.c b/drivers/message/i2o/i2o_config.c
index 4238de9..442cdb3 100644
--- a/drivers/message/i2o/i2o_config.c
+++ b/drivers/message/i2o/i2o_config.c
@@ -746,7 +746,7 @@ out:
return rcode;
}

-static long i2o_cfg_compat_ioctl(struct file *file, unsigned cmd,
+static int i2o_cfg_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd,
unsigned long arg)
{
int ret;
diff --git a/drivers/misc/phantom.c b/drivers/misc/phantom.c
index daf5856..4902f28 100644
--- a/drivers/misc/phantom.c
+++ b/drivers/misc/phantom.c
@@ -83,7 +83,7 @@ static int phantom_status(struct phantom_device *dev, unsigned long newstat)
* File ops
*/

-static long phantom_ioctl(struct file *file, unsigned int cmd,
+static int phantom_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct phantom_device *dev = file->private_data;
@@ -195,14 +195,14 @@ static long phantom_ioctl(struct file *file, unsigned int cmd,
}

#ifdef CONFIG_COMPAT
-static long phantom_compat_ioctl(struct file *filp, unsigned int cmd,
+static int phantom_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
if (_IOC_NR(cmd) <= 3 && _IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
cmd &= ~(_IOC_SIZEMASK << _IOC_SIZESHIFT);
cmd |= sizeof(void *) << _IOC_SIZESHIFT;
}
- return phantom_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
+ return phantom_ioctl(inode, filp, cmd, (unsigned long)compat_ptr(arg));
}
#else
#define phantom_compat_ioctl NULL
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 23c91f5..fb6d7ad 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -233,7 +233,7 @@ static long gru_get_chiplet_status(unsigned long arg)
*
* Called to update file attributes via IOCTL calls.
*/
-static long gru_file_unlocked_ioctl(struct file *file, unsigned int req,
+static int gru_file_unlocked_ioctl(struct inode *inode, struct file *file, unsigned int req,
unsigned long arg)
{
int err = -EBADRQC;
diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index ddccc07..3ec394d 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -547,7 +547,7 @@ static int get_filter(void __user *arg, struct sock_filter **p)
}
#endif /* CONFIG_PPP_FILTER */

-static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ppp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct ppp_file *pf = file->private_data;
struct ppp *ppp;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index e1098c3..db5903f 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -201,7 +201,7 @@ struct pci_filp_private {
int write_combine;
};

-static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
+static int proc_bus_pci_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
const struct proc_dir_entry *dp = PDE(file->f_dentry->d_inode);
diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index f118252..ac41969 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -203,7 +203,7 @@ static unsigned int rtc_dev_poll(struct file *file, poll_table *wait)
return (data != 0) ? (POLLIN | POLLRDNORM) : 0;
}

-static long rtc_dev_ioctl(struct file *file,
+static int rtc_dev_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int err = 0;
diff --git a/drivers/s390/block/dasd_int.h b/drivers/s390/block/dasd_int.h
index 31ecaa4..ab20e29 100644
--- a/drivers/s390/block/dasd_int.h
+++ b/drivers/s390/block/dasd_int.h
@@ -611,7 +611,7 @@ void dasd_destroy_partitions(struct dasd_block *);

/* externals in dasd_ioctl.c */
int dasd_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
-long dasd_compat_ioctl(struct file *, unsigned int, unsigned long);
+int dasd_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);

/* externals in dasd_proc.c */
int dasd_proc_init(void);
diff --git a/drivers/s390/char/tape_char.c b/drivers/s390/char/tape_char.c
index be0ce22..5b42a1f 100644
--- a/drivers/s390/char/tape_char.c
+++ b/drivers/s390/char/tape_char.c
@@ -37,7 +37,7 @@ static int tapechar_open(struct inode *,struct file *);
static int tapechar_release(struct inode *,struct file *);
static int tapechar_ioctl(struct inode *, struct file *, unsigned int,
unsigned long);
-static long tapechar_compat_ioctl(struct file *, unsigned int,
+static int tapechar_compat_ioctl(struct inode *inode, struct file *, unsigned int,
unsigned long);

static const struct file_operations tape_fops =
diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c
index 09e7d9b..3de2abe 100644
--- a/drivers/s390/char/vmcp.c
+++ b/drivers/s390/char/vmcp.c
@@ -138,7 +138,7 @@ vmcp_write(struct file *file, const char __user *buff, size_t count,
* let userspace to change the response size, if userspace expects a bigger
* response
*/
-static long vmcp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int vmcp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct vmcp_session *session;
int temp;
diff --git a/drivers/s390/cio/chsc_sch.c b/drivers/s390/cio/chsc_sch.c
index 91ca87a..6a0904e 100644
--- a/drivers/s390/cio/chsc_sch.c
+++ b/drivers/s390/cio/chsc_sch.c
@@ -737,7 +737,7 @@ out_free:
return ret;
}

-static long chsc_ioctl(struct file *filp, unsigned int cmd,
+static int chsc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
CHSC_MSG(2, "chsc_ioctl called, cmd=%x\n", cmd);
diff --git a/drivers/s390/crypto/zcrypt_api.c b/drivers/s390/crypto/zcrypt_api.c
index cb22b97..6e82f85 100644
--- a/drivers/s390/crypto/zcrypt_api.c
+++ b/drivers/s390/crypto/zcrypt_api.c
@@ -621,7 +621,7 @@ static long zcrypt_ica_status(struct file *filp, unsigned long arg)
return ret;
}

-static long zcrypt_unlocked_ioctl(struct file *filp, unsigned int cmd,
+static int zcrypt_unlocked_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
int rc;
@@ -872,7 +872,7 @@ static long trans_xcRB32(struct file *filp, unsigned int cmd,
return rc;
}

-static long zcrypt_compat_ioctl(struct file *filp, unsigned int cmd,
+static int zcrypt_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
if (cmd == ICARSAMODEXPO)
diff --git a/drivers/s390/scsi/zfcp_cfdc.c b/drivers/s390/scsi/zfcp_cfdc.c
index ec2abce..de0380f 100644
--- a/drivers/s390/scsi/zfcp_cfdc.c
+++ b/drivers/s390/scsi/zfcp_cfdc.c
@@ -160,7 +160,7 @@ static void zfcp_cfdc_req_to_sense(struct zfcp_cfdc_data *data,
sizeof(req->qtcb->bottom.support.els));
}

-static long zfcp_cfdc_dev_ioctl(struct file *file, unsigned int command,
+static int zfcp_cfdc_dev_ioctl(struct inode *inode, struct file *file, unsigned int command,
unsigned long buffer)
{
struct zfcp_cfdc_data *data;
diff --git a/drivers/sbus/char/cpwatchdog.c b/drivers/sbus/char/cpwatchdog.c
index 23abfdf..1d272b6 100644
--- a/drivers/sbus/char/cpwatchdog.c
+++ b/drivers/sbus/char/cpwatchdog.c
@@ -397,7 +397,7 @@ static int wd_ioctl(struct inode *inode, struct file *file,
return(0);
}

-static long wd_compat_ioctl(struct file *file, unsigned int cmd,
+static int wd_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int rval = -ENOIOCTLCMD;
diff --git a/drivers/sbus/char/display7seg.c b/drivers/sbus/char/display7seg.c
index d8f5c0c..74842f3 100644
--- a/drivers/sbus/char/display7seg.c
+++ b/drivers/sbus/char/display7seg.c
@@ -117,7 +117,7 @@ static int d7s_release(struct inode *inode, struct file *f)
return 0;
}

-static long d7s_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int d7s_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
__u8 regs = readb(d7s_regs);
__u8 ireg = 0;
diff --git a/drivers/sbus/char/openprom.c b/drivers/sbus/char/openprom.c
index 29dc735..9a37df0 100644
--- a/drivers/sbus/char/openprom.c
+++ b/drivers/sbus/char/openprom.c
@@ -650,7 +650,7 @@ static int openprom_ioctl(struct inode * inode, struct file * file,
};
}

-static long openprom_compat_ioctl(struct file *file, unsigned int cmd,
+static int openprom_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
long rval = -ENOTTY;
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 9aa301c..ff0ec51 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -751,7 +751,7 @@ static int aac_compat_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
return aac_compat_do_ioctl(dev, cmd, (unsigned long)arg);
}

-static long aac_compat_cfg_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int aac_compat_cfg_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
{
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 3c257fe..a9ac914 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -596,7 +596,7 @@ ch_checkrange(scsi_changer *ch, unsigned int type, unsigned int unit)
return 0;
}

-static long ch_ioctl(struct file *file,
+static int ch_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
scsi_changer *ch = file->private_data;
@@ -843,7 +843,7 @@ struct changer_element_status32 {
};
#define CHIOGSTATUS32 _IOW('c', 8,struct changer_element_status32)

-static long ch_ioctl_compat(struct file * file,
+static int ch_ioctl_compat(struct inode *inode, struct file * file,
unsigned int cmd, unsigned long arg)
{
scsi_changer *ch = file->private_data;
@@ -858,7 +858,7 @@ static long ch_ioctl_compat(struct file * file,
case CHIOINITELEM:
case CHIOSVOLTAG:
/* compatible */
- return ch_ioctl(file, cmd, arg);
+ return ch_ioctl(inode, file, cmd, arg);
case CHIOGSTATUS32:
{
struct changer_element_status32 ces32;
diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 1fe0901..0c4e821 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -115,7 +115,7 @@ static int hba_count = 0;
static struct class *adpt_sysfs_class;

#ifdef CONFIG_COMPAT
-static long compat_adpt_ioctl(struct file *, unsigned int, unsigned long);
+static int compat_adpt_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
#endif

static const struct file_operations adpt_fops = {
@@ -2147,14 +2147,11 @@ static int adpt_ioctl(struct inode *inode, struct file *file, uint cmd,
}

#ifdef CONFIG_COMPAT
-static long compat_adpt_ioctl(struct file *file,
+static int compat_adpt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
- struct inode *inode;
long ret;

- inode = file->f_dentry->d_inode;
-
lock_kernel();

switch(cmd) {
diff --git a/drivers/scsi/megaraid/megaraid_mm.c b/drivers/scsi/megaraid/megaraid_mm.c
index f680561..e3d9a55 100644
--- a/drivers/scsi/megaraid/megaraid_mm.c
+++ b/drivers/scsi/megaraid/megaraid_mm.c
@@ -44,7 +44,7 @@ static void mraid_mm_free_adp_resources(mraid_mmadp_t *);
static void mraid_mm_teardown_dma_pools(mraid_mmadp_t *);

#ifdef CONFIG_COMPAT
-static long mraid_mm_compat_ioctl(struct file *, unsigned int, unsigned long);
+static int mraid_mm_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
#endif

MODULE_AUTHOR("LSI Logic Corporation");
@@ -1218,13 +1218,13 @@ mraid_mm_init(void)
* @cmd : ioctl command
* @arg : user ioctl packet
*/
-static long
-mraid_mm_compat_ioctl(struct file *filep, unsigned int cmd,
- unsigned long arg)
+static int
+mraid_mm_compat_ioctl(struct inode *inode, struct file *filep,
+ unsigned int cmd, unsigned long arg)
{
int err;

- err = mraid_mm_ioctl(NULL, filep, cmd, arg);
+ err = mraid_mm_ioctl(inode, filep, cmd, arg);

return err;
}
diff --git a/drivers/scsi/megaraid/megaraid_sas.c b/drivers/scsi/megaraid/megaraid_sas.c
index 97b7633..8bcd1bd 100644
--- a/drivers/scsi/megaraid/megaraid_sas.c
+++ b/drivers/scsi/megaraid/megaraid_sas.c
@@ -3269,8 +3269,8 @@ static int megasas_mgmt_ioctl_aen(struct file *file, unsigned long arg)
/**
* megasas_mgmt_ioctl - char node ioctl entry point
*/
-static long
-megasas_mgmt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+megasas_mgmt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case MEGASAS_IOC_FIRMWARE:
@@ -3324,9 +3324,9 @@ static int megasas_mgmt_compat_ioctl_fw(struct file *file, unsigned long arg)
return error;
}

-static long
-megasas_mgmt_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
+static int
+megasas_mgmt_compat_ioctl(struct inode *inode, struct file *file,
+ unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case MEGASAS_IOC_FIRMWARE32:
diff --git a/drivers/scsi/osst.c b/drivers/scsi/osst.c
index 1c79f97..4d6867f 100644
--- a/drivers/scsi/osst.c
+++ b/drivers/scsi/osst.c
@@ -5191,7 +5191,7 @@ out:
}

#ifdef CONFIG_COMPAT
-static long osst_compat_ioctl(struct file * file, unsigned int cmd_in, unsigned long arg)
+static int osst_compat_ioctl(struct inode *inode, struct file * file, unsigned int cmd_in, unsigned long arg)
{
struct osst_tape *STp = file->private_data;
struct scsi_device *sdev = STp->device;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e5e7d78..e283650 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -921,7 +921,7 @@ static void sd_rescan(struct device *dev)
* This gets directly called from VFS. When the ioctl
* is not recognized we go back to the other translation paths.
*/
-static long sd_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int sd_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct block_device *bdev = file->f_path.dentry->d_inode->i_bdev;
struct gendisk *disk = bdev->bd_disk;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 661f9f2..5d4e1aa 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1113,7 +1113,7 @@ sg_ioctl(struct inode *inode, struct file *filp,
}

#ifdef CONFIG_COMPAT
-static long sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
+static int sg_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd_in, unsigned long arg)
{
Sg_device *sdp;
Sg_fd *sfp;
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index c2bb53e..245c8ba 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -3233,7 +3233,7 @@ static int partition_tape(struct scsi_tape *STp, int size)


/* The ioctl command */
-static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
+static int st_ioctl(struct inode *inode, struct file *file, unsigned int cmd_in, unsigned long arg)
{
int i, cmd_nr, cmd_type, bt;
int retval = 0;
@@ -3586,7 +3586,7 @@ static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
}

#ifdef CONFIG_COMPAT
-static long st_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int st_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct scsi_tape *STp = file->private_data;
struct scsi_device *sdev = STp->device;
diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index e5e0cfe..70b3a16 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -299,8 +299,8 @@ done:
return status;
}

-static long
-spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int
+spidev_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
int err = 0;
int retval = 0;
diff --git a/drivers/telephony/ixj.c b/drivers/telephony/ixj.c
index ec7aeb5..afea51d 100644
--- a/drivers/telephony/ixj.c
+++ b/drivers/telephony/ixj.c
@@ -6661,7 +6661,7 @@ static long do_ixj_ioctl(struct file *file_p, unsigned int cmd, unsigned long ar
return retval;
}

-static long ixj_ioctl(struct file *file_p, unsigned int cmd, unsigned long arg)
+static int ixj_ioctl(struct inode *inode, struct file *file_p, unsigned int cmd, unsigned long arg)
{
long ret;
lock_kernel();
diff --git a/drivers/usb/class/usblp.c b/drivers/usb/class/usblp.c
index 0647164..0fdca42 100644
--- a/drivers/usb/class/usblp.c
+++ b/drivers/usb/class/usblp.c
@@ -487,7 +487,7 @@ static unsigned int usblp_poll(struct file *file, struct poll_table_struct *wait
return ret;
}

-static long usblp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int usblp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct usblp *usblp = file->private_data;
int length, err, i;
diff --git a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
index f4585d3..c772d34 100644
--- a/drivers/usb/gadget/inode.c
+++ b/drivers/usb/gadget/inode.c
@@ -482,7 +482,7 @@ ep_release (struct inode *inode, struct file *fd)
return 0;
}

-static long ep_ioctl(struct file *fd, unsigned code, unsigned long value)
+static int ep_ioctl(struct inode *inode, struct file *fd, unsigned code, unsigned long value)
{
struct ep_data *data = fd->private_data;
int status;
@@ -1292,7 +1292,7 @@ out:
return mask;
}

-static long dev_ioctl (struct file *fd, unsigned code, unsigned long value)
+static int dev_ioctl(struct inode *inode, struct file *fd, unsigned code, unsigned long value)
{
struct dev_data *dev = fd->private_data;
struct usb_gadget *gadget = dev->gadget;
@@ -1300,7 +1300,7 @@ static long dev_ioctl (struct file *fd, unsigned code, unsigned long value)

if (gadget->ops->ioctl) {
lock_kernel();
- ret = gadget->ops->ioctl (gadget, code, value);
+ ret = gadget->ops->ioctl(gadget, code, value);
unlock_kernel();
}
return ret;
diff --git a/drivers/usb/gadget/printer.c b/drivers/usb/gadget/printer.c
index e009008..d02ce89 100644
--- a/drivers/usb/gadget/printer.c
+++ b/drivers/usb/gadget/printer.c
@@ -828,8 +828,8 @@ printer_poll(struct file *fd, poll_table *wait)
return status;
}

-static long
-printer_ioctl(struct file *fd, unsigned int code, unsigned long arg)
+static int
+printer_ioctl(struct inode *inode, struct file *fd, unsigned int code, unsigned long arg)
{
struct printer_dev *dev = fd->private_data;
unsigned long flags;
diff --git a/drivers/usb/misc/iowarrior.c b/drivers/usb/misc/iowarrior.c
index a4ef77e..5e3411a 100644
--- a/drivers/usb/misc/iowarrior.c
+++ b/drivers/usb/misc/iowarrior.c
@@ -473,7 +473,7 @@ exit:
/**
* iowarrior_ioctl
*/
-static long iowarrior_ioctl(struct file *file, unsigned int cmd,
+static int iowarrior_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct iowarrior *dev = NULL;
diff --git a/drivers/usb/misc/rio500.c b/drivers/usb/misc/rio500.c
index 248a12a..3ba8ef2 100644
--- a/drivers/usb/misc/rio500.c
+++ b/drivers/usb/misc/rio500.c
@@ -104,7 +104,7 @@ static int close_rio(struct inode *inode, struct file *file)
return 0;
}

-static long ioctl_rio(struct file *file, unsigned int cmd, unsigned long arg)
+static int ioctl_rio(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct RioCommand rio_cmd;
struct rio_usb_data *rio = &rio_instance;
diff --git a/drivers/usb/misc/sisusbvga/sisusb.c b/drivers/usb/misc/sisusbvga/sisusb.c
index 69c34a5..26142aa 100644
--- a/drivers/usb/misc/sisusbvga/sisusb.c
+++ b/drivers/usb/misc/sisusbvga/sisusb.c
@@ -2982,8 +2982,8 @@ sisusb_handle_command(struct sisusb_usb_data *sisusb, struct sisusb_command *y,
return retval;
}

-static long
-sisusb_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+sisusb_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct sisusb_usb_data *sisusb;
struct sisusb_info x;
@@ -3058,8 +3058,8 @@ err_out:
}

#ifdef SISUSB_NEW_CONFIG_COMPAT
-static long
-sisusb_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int
+sisusb_compat_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
{
long retval;

@@ -3067,7 +3067,7 @@ sisusb_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
case SISUSB_GET_CONFIG_SIZE:
case SISUSB_GET_CONFIG:
case SISUSB_COMMAND:
- retval = sisusb_ioctl(f, cmd, arg);
+ retval = sisusb_ioctl(inode, f, cmd, arg);
return retval;

default:
diff --git a/drivers/usb/misc/usblcd.c b/drivers/usb/misc/usblcd.c
index 2db4228..3f46226 100644
--- a/drivers/usb/misc/usblcd.c
+++ b/drivers/usb/misc/usblcd.c
@@ -146,7 +146,7 @@ static ssize_t lcd_read(struct file *file, char __user * buffer, size_t count, l
return retval;
}

-static long lcd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int lcd_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct usb_lcd *dev;
u16 bcdDevice;
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index 98843c2..aebf6f0 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -1223,10 +1223,8 @@ static int fb_get_fscreeninfo(struct inode *inode, struct file *file,
return err;
}

-static long
-fb_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fb_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int fbidx = iminor(inode);
struct fb_info *info = registered_fb[fbidx];
struct fb_ops *fb = info->fbops;
diff --git a/drivers/watchdog/acquirewdt.c b/drivers/watchdog/acquirewdt.c
index 6e46a55..7579e79 100644
--- a/drivers/watchdog/acquirewdt.c
+++ b/drivers/watchdog/acquirewdt.c
@@ -145,7 +145,7 @@ static ssize_t acq_write(struct file *file, const char __user *buf,
return count;
}

-static long acq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int acq_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int options, retval = -EINVAL;
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/advantechwdt.c b/drivers/watchdog/advantechwdt.c
index a5110f9..518c159 100644
--- a/drivers/watchdog/advantechwdt.c
+++ b/drivers/watchdog/advantechwdt.c
@@ -132,7 +132,7 @@ static ssize_t advwdt_write(struct file *file, const char __user *buf,
return count;
}

-static long advwdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int advwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int new_timeout;
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/alim1535_wdt.c b/drivers/watchdog/alim1535_wdt.c
index 2a7690e..bde5fbc 100644
--- a/drivers/watchdog/alim1535_wdt.c
+++ b/drivers/watchdog/alim1535_wdt.c
@@ -176,7 +176,7 @@ static ssize_t ali_write(struct file *file, const char __user *data,
* we want an extension to enable irq ack monitoring and the like
*/

-static long ali_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ali_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c
index a045ef8..4c0ef21 100644
--- a/drivers/watchdog/alim7101_wdt.c
+++ b/drivers/watchdog/alim7101_wdt.c
@@ -234,7 +234,7 @@ static int fop_close(struct inode *inode, struct file *file)
return 0;
}

-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/ar7_wdt.c b/drivers/watchdog/ar7_wdt.c
index 55dcbfe..2dcd13e 100644
--- a/drivers/watchdog/ar7_wdt.c
+++ b/drivers/watchdog/ar7_wdt.c
@@ -240,7 +240,7 @@ static ssize_t ar7_wdt_write(struct file *file, const char *data,
return len;
}

-static long ar7_wdt_ioctl(struct file *file,
+static int ar7_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
static struct watchdog_info ident = {
diff --git a/drivers/watchdog/at32ap700x_wdt.c b/drivers/watchdog/at32ap700x_wdt.c
index e8ae638..aafa445 100644
--- a/drivers/watchdog/at32ap700x_wdt.c
+++ b/drivers/watchdog/at32ap700x_wdt.c
@@ -212,7 +212,7 @@ static struct watchdog_info at32_wdt_info = {
/*
* Handle commands from user-space.
*/
-static long at32_wdt_ioctl(struct file *file,
+static int at32_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/at91rm9200_wdt.c b/drivers/watchdog/at91rm9200_wdt.c
index 993e5f5..8658fc7 100644
--- a/drivers/watchdog/at91rm9200_wdt.c
+++ b/drivers/watchdog/at91rm9200_wdt.c
@@ -128,7 +128,7 @@ static struct watchdog_info at91_wdt_info = {
/*
* Handle commands from user-space.
*/
-static long at91_wdt_ioctl(struct file *file,
+static int at91_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/bfin_wdt.c b/drivers/watchdog/bfin_wdt.c
index 31b4225..e82f2ed 100644
--- a/drivers/watchdog/bfin_wdt.c
+++ b/drivers/watchdog/bfin_wdt.c
@@ -248,7 +248,7 @@ static ssize_t bfin_wdt_write(struct file *file, const char __user *data,
* Query basic information from the device or ping it, as outlined by the
* watchdog API.
*/
-static long bfin_wdt_ioctl(struct file *file,
+static int bfin_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index c3b78a7..c5a7ce1 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -82,7 +82,7 @@ static struct watchdog_info ident = {
.identity = "PowerPC Book-E Watchdog",
};

-static long booke_wdt_ioctl(struct file *file,
+static int booke_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
u32 tmp = 0;
diff --git a/drivers/watchdog/cpu5wdt.c b/drivers/watchdog/cpu5wdt.c
index 71f6d7e..feb30cd 100644
--- a/drivers/watchdog/cpu5wdt.c
+++ b/drivers/watchdog/cpu5wdt.c
@@ -148,7 +148,7 @@ static int cpu5wdt_release(struct inode *inode, struct file *file)
return 0;
}

-static long cpu5wdt_ioctl(struct file *file, unsigned int cmd,
+static int cpu5wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/davinci_wdt.c b/drivers/watchdog/davinci_wdt.c
index 2e13602..81f676a 100644
--- a/drivers/watchdog/davinci_wdt.c
+++ b/drivers/watchdog/davinci_wdt.c
@@ -142,7 +142,7 @@ static struct watchdog_info ident = {
.identity = "DaVinci Watchdog",
};

-static long davinci_wdt_ioctl(struct file *file,
+static int davinci_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/ep93xx_wdt.c b/drivers/watchdog/ep93xx_wdt.c
index e9f950f..496a5fa 100644
--- a/drivers/watchdog/ep93xx_wdt.c
+++ b/drivers/watchdog/ep93xx_wdt.c
@@ -135,7 +135,7 @@ static struct watchdog_info ident = {
.identity = "EP93xx Watchdog",
};

-static long ep93xx_wdt_ioctl(struct file *file,
+static int ep93xx_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/eurotechwdt.c b/drivers/watchdog/eurotechwdt.c
index bbd14e3..ecb704d 100644
--- a/drivers/watchdog/eurotechwdt.c
+++ b/drivers/watchdog/eurotechwdt.c
@@ -233,7 +233,7 @@ size_t count, loff_t *ppos)
* according to their available features.
*/

-static long eurwdt_ioctl(struct file *file,
+static int eurwdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index a3765e0..de4a065 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -556,7 +556,7 @@ static struct watchdog_info ident = {
.identity = "HP iLO2 HW Watchdog Timer",
};

-static long hpwdt_ioctl(struct file *file, unsigned int cmd,
+static int hpwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/i6300esb.c b/drivers/watchdog/i6300esb.c
index c13383f..013eb0d 100644
--- a/drivers/watchdog/i6300esb.c
+++ b/drivers/watchdog/i6300esb.c
@@ -256,7 +256,7 @@ static ssize_t esb_write(struct file *file, const char __user *data,
return len;
}

-static long esb_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int esb_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int new_options, retval = -EINVAL;
int new_heartbeat;
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index bfb93bc..4d1015a 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -510,7 +510,7 @@ static ssize_t iTCO_wdt_write(struct file *file, const char __user *data,
return len;
}

-static long iTCO_wdt_ioctl(struct file *file, unsigned int cmd,
+static int iTCO_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_options, retval = -EINVAL;
diff --git a/drivers/watchdog/ib700wdt.c b/drivers/watchdog/ib700wdt.c
index 05a2810..53bf64d 100644
--- a/drivers/watchdog/ib700wdt.c
+++ b/drivers/watchdog/ib700wdt.c
@@ -187,7 +187,7 @@ static ssize_t ibwdt_write(struct file *file, const char __user *buf,
return count;
}

-static long ibwdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ibwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int new_margin;
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/ibmasr.c b/drivers/watchdog/ibmasr.c
index b82405c..14a32af 100644
--- a/drivers/watchdog/ibmasr.c
+++ b/drivers/watchdog/ibmasr.c
@@ -270,7 +270,7 @@ static ssize_t asr_write(struct file *file, const char __user *buf,
return count;
}

-static long asr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int asr_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
static const struct watchdog_info ident = {
.options = WDIOF_KEEPALIVEPING |
diff --git a/drivers/watchdog/indydog.c b/drivers/watchdog/indydog.c
index 73c9e79..97e8619 100644
--- a/drivers/watchdog/indydog.c
+++ b/drivers/watchdog/indydog.c
@@ -108,7 +108,7 @@ static ssize_t indydog_write(struct file *file, const char *data,
return len;
}

-static long indydog_ioctl(struct file *file, unsigned int cmd,
+static int indydog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int options, retval = -EINVAL;
diff --git a/drivers/watchdog/iop_wdt.c b/drivers/watchdog/iop_wdt.c
index 96eb2cb..91070a7 100644
--- a/drivers/watchdog/iop_wdt.c
+++ b/drivers/watchdog/iop_wdt.c
@@ -130,7 +130,7 @@ static const struct watchdog_info ident = {
.identity = "iop watchdog",
};

-static long iop_wdt_ioctl(struct file *file,
+static int iop_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int options;
diff --git a/drivers/watchdog/it8712f_wdt.c b/drivers/watchdog/it8712f_wdt.c
index 2270ee0..a2851a6 100644
--- a/drivers/watchdog/it8712f_wdt.c
+++ b/drivers/watchdog/it8712f_wdt.c
@@ -231,7 +231,7 @@ static ssize_t it8712f_wdt_write(struct file *file, const char __user *data,
return len;
}

-static long it8712f_wdt_ioctl(struct file *file, unsigned int cmd,
+static int it8712f_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/ixp2000_wdt.c b/drivers/watchdog/ixp2000_wdt.c
index 4f4b35a..4e8c501 100644
--- a/drivers/watchdog/ixp2000_wdt.c
+++ b/drivers/watchdog/ixp2000_wdt.c
@@ -105,7 +105,7 @@ static struct watchdog_info ident = {
.identity = "IXP2000 Watchdog",
};

-static long ixp2000_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ixp2000_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/ixp4xx_wdt.c b/drivers/watchdog/ixp4xx_wdt.c
index 8302ef0..0933442 100644
--- a/drivers/watchdog/ixp4xx_wdt.c
+++ b/drivers/watchdog/ixp4xx_wdt.c
@@ -96,7 +96,7 @@ static struct watchdog_info ident = {
};


-static long ixp4xx_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ixp4xx_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/ks8695_wdt.c b/drivers/watchdog/ks8695_wdt.c
index 0b798fd..76b310f 100644
--- a/drivers/watchdog/ks8695_wdt.c
+++ b/drivers/watchdog/ks8695_wdt.c
@@ -152,7 +152,7 @@ static struct watchdog_info ks8695_wdt_info = {
/*
* Handle commands from user-space.
*/
-static long ks8695_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ks8695_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/machzwd.c b/drivers/watchdog/machzwd.c
index 2dfc275..fb840c5 100644
--- a/drivers/watchdog/machzwd.c
+++ b/drivers/watchdog/machzwd.c
@@ -303,7 +303,7 @@ static ssize_t zf_write(struct file *file, const char __user *buf, size_t count,
return count;
}

-static long zf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int zf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/mixcomwd.c b/drivers/watchdog/mixcomwd.c
index 407b025..cc1d238 100644
--- a/drivers/watchdog/mixcomwd.c
+++ b/drivers/watchdog/mixcomwd.c
@@ -195,7 +195,7 @@ static ssize_t mixcomwd_write(struct file *file, const char __user *data,
return len;
}

-static long mixcomwd_ioctl(struct file *file,
+static int mixcomwd_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mpc5200_wdt.c b/drivers/watchdog/mpc5200_wdt.c
index db91892..614bc2c 100644
--- a/drivers/watchdog/mpc5200_wdt.c
+++ b/drivers/watchdog/mpc5200_wdt.c
@@ -94,7 +94,7 @@ static struct watchdog_info mpc5200_wdt_info = {
.options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING,
.identity = "mpc5200 watchdog on GPT0",
};
-static long mpc5200_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpc5200_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct mpc5200_wdt *wdt = file->private_data;
diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
index 38c588e..243bce7 100644
--- a/drivers/watchdog/mpc8xxx_wdt.c
+++ b/drivers/watchdog/mpc8xxx_wdt.c
@@ -143,7 +143,7 @@ static int mpc8xxx_wdt_release(struct inode *inode, struct file *file)
return 0;
}

-static long mpc8xxx_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpc8xxx_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mpcore_wdt.c b/drivers/watchdog/mpcore_wdt.c
index 2a9bfa8..91db86e 100644
--- a/drivers/watchdog/mpcore_wdt.c
+++ b/drivers/watchdog/mpcore_wdt.c
@@ -218,7 +218,7 @@ static struct watchdog_info ident = {
.identity = "MPcore Watchdog",
};

-static long mpcore_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpcore_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct mpcore_wdt *wdt = file->private_data;
diff --git a/drivers/watchdog/mtx-1_wdt.c b/drivers/watchdog/mtx-1_wdt.c
index b4b7b0a..fd7f85d 100644
--- a/drivers/watchdog/mtx-1_wdt.c
+++ b/drivers/watchdog/mtx-1_wdt.c
@@ -136,7 +136,7 @@ static int mtx1_wdt_release(struct inode *inode, struct file *file)
return 0;
}

-static long mtx1_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mtx1_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c
index acf589d..b3dea2d 100644
--- a/drivers/watchdog/mv64x60_wdt.c
+++ b/drivers/watchdog/mv64x60_wdt.c
@@ -173,7 +173,7 @@ static ssize_t mv64x60_wdt_write(struct file *file, const char __user *data,
return len;
}

-static long mv64x60_wdt_ioctl(struct file *file,
+static int mv64x60_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int timeout;
diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
index 3a11dad..8bbc9bf 100644
--- a/drivers/watchdog/omap_wdt.c
+++ b/drivers/watchdog/omap_wdt.c
@@ -185,7 +185,7 @@ static ssize_t omap_wdt_write(struct file *file, const char __user *data,
return len;
}

-static long omap_wdt_ioctl(struct file *file, unsigned int cmd,
+static int omap_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_margin;
diff --git a/drivers/watchdog/pc87413_wdt.c b/drivers/watchdog/pc87413_wdt.c
index 484c215..9417f9c 100644
--- a/drivers/watchdog/pc87413_wdt.c
+++ b/drivers/watchdog/pc87413_wdt.c
@@ -397,7 +397,7 @@ static ssize_t pc87413_write(struct file *file, const char __user *data,
* querying capabilities and current status.
*/

-static long pc87413_ioctl(struct file *file, unsigned int cmd,
+static int pc87413_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_timeout;
diff --git a/drivers/watchdog/pcwd.c b/drivers/watchdog/pcwd.c
index 9e1331a..0f6f9a6 100644
--- a/drivers/watchdog/pcwd.c
+++ b/drivers/watchdog/pcwd.c
@@ -594,7 +594,7 @@ static int pcwd_get_temperature(int *temperature)
* /dev/watchdog handling
*/

-static long pcwd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int rv;
int status;
diff --git a/drivers/watchdog/pcwd_pci.c b/drivers/watchdog/pcwd_pci.c
index 90eb1d4..dae0372 100644
--- a/drivers/watchdog/pcwd_pci.c
+++ b/drivers/watchdog/pcwd_pci.c
@@ -453,7 +453,7 @@ static ssize_t pcipcwd_write(struct file *file, const char __user *data,
return len;
}

-static long pcipcwd_ioctl(struct file *file, unsigned int cmd,
+static int pcipcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/pcwd_usb.c b/drivers/watchdog/pcwd_usb.c
index c1685c9..68419ee 100644
--- a/drivers/watchdog/pcwd_usb.c
+++ b/drivers/watchdog/pcwd_usb.c
@@ -368,7 +368,7 @@ static ssize_t usb_pcwd_write(struct file *file, const char __user *data,
return len;
}

-static long usb_pcwd_ioctl(struct file *file, unsigned int cmd,
+static int usb_pcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/pnx4008_wdt.c b/drivers/watchdog/pnx4008_wdt.c
index 0ed8416..44ab680 100644
--- a/drivers/watchdog/pnx4008_wdt.c
+++ b/drivers/watchdog/pnx4008_wdt.c
@@ -173,7 +173,7 @@ static const struct watchdog_info ident = {
.identity = "PNX4008 Watchdog",
};

-static long pnx4008_wdt_ioctl(struct inode *inode, struct file *file,
+static int pnx4008_wdt_ioctl(struct inode *inode, struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/rm9k_wdt.c b/drivers/watchdog/rm9k_wdt.c
index f1ae372..384738c 100644
--- a/drivers/watchdog/rm9k_wdt.c
+++ b/drivers/watchdog/rm9k_wdt.c
@@ -55,7 +55,7 @@ static int wdt_gpi_open(struct inode *, struct file *);
static int wdt_gpi_release(struct inode *, struct file *);
static ssize_t wdt_gpi_write(struct file *, const char __user *, size_t,
loff_t *);
-static long wdt_gpi_ioctl(struct file *, unsigned int, unsigned long);
+static int wdt_gpi_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
static int wdt_gpi_notify(struct notifier_block *, unsigned long, void *);
static const struct resource *wdt_gpi_get_resource(struct platform_device *,
const char *, unsigned int);
@@ -244,7 +244,7 @@ static ssize_t wdt_gpi_write(struct file *f, const char __user *d, size_t s,
return s ? 1 : 0;
}

-static long wdt_gpi_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int wdt_gpi_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
{
long res = -ENOTTY;
const long size = _IOC_SIZE(cmd);
diff --git a/drivers/watchdog/s3c2410_wdt.c b/drivers/watchdog/s3c2410_wdt.c
index 86d4280..28e9488 100644
--- a/drivers/watchdog/s3c2410_wdt.c
+++ b/drivers/watchdog/s3c2410_wdt.c
@@ -272,7 +272,7 @@ static const struct watchdog_info s3c2410_wdt_ident = {
};


-static long s3c2410wdt_ioctl(struct file *file, unsigned int cmd,
+static int s3c2410wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/sa1100_wdt.c b/drivers/watchdog/sa1100_wdt.c
index 31a4843..30d2bda 100644
--- a/drivers/watchdog/sa1100_wdt.c
+++ b/drivers/watchdog/sa1100_wdt.c
@@ -86,7 +86,7 @@ static const struct watchdog_info ident = {
.identity = "SA1100/PXA255 Watchdog",
};

-static long sa1100dog_ioctl(struct file *file, unsigned int cmd,
+static int sa1100dog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/sb_wdog.c b/drivers/watchdog/sb_wdog.c
index 27e526a..55aa97b 100644
--- a/drivers/watchdog/sb_wdog.c
+++ b/drivers/watchdog/sb_wdog.c
@@ -164,7 +164,7 @@ static ssize_t sbwdog_write(struct file *file, const char __user *data,
return len;
}

-static long sbwdog_ioctl(struct file *file, unsigned int cmd,
+static int sbwdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int ret = -ENOTTY;
diff --git a/drivers/watchdog/sbc60xxwdt.c b/drivers/watchdog/sbc60xxwdt.c
index 3266daa..9507175 100644
--- a/drivers/watchdog/sbc60xxwdt.c
+++ b/drivers/watchdog/sbc60xxwdt.c
@@ -225,7 +225,7 @@ static int fop_close(struct inode *inode, struct file *file)
return 0;
}

-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/sbc7240_wdt.c b/drivers/watchdog/sbc7240_wdt.c
index 67ddeb1..74d648c 100644
--- a/drivers/watchdog/sbc7240_wdt.c
+++ b/drivers/watchdog/sbc7240_wdt.c
@@ -168,7 +168,7 @@ static const struct watchdog_info ident = {
};


-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case WDIOC_GETSUPPORT:
diff --git a/drivers/watchdog/sbc_epx_c3.c b/drivers/watchdog/sbc_epx_c3.c
index e5e470c..a367d9f 100644
--- a/drivers/watchdog/sbc_epx_c3.c
+++ b/drivers/watchdog/sbc_epx_c3.c
@@ -100,7 +100,7 @@ static ssize_t epx_c3_write(struct file *file, const char __user *data,
return len;
}

-static long epx_c3_ioctl(struct file *file, unsigned int cmd,
+static int epx_c3_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int options, retval = -EINVAL;
diff --git a/drivers/watchdog/sc1200wdt.c b/drivers/watchdog/sc1200wdt.c
index 23da3cc..0f57878 100644
--- a/drivers/watchdog/sc1200wdt.c
+++ b/drivers/watchdog/sc1200wdt.c
@@ -182,7 +182,7 @@ static int sc1200wdt_open(struct inode *inode, struct file *file)
}


-static long sc1200wdt_ioctl(struct file *file, unsigned int cmd,
+static int sc1200wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_timeout;
diff --git a/drivers/watchdog/sc520_wdt.c b/drivers/watchdog/sc520_wdt.c
index a2b6c10..d2851bf 100644
--- a/drivers/watchdog/sc520_wdt.c
+++ b/drivers/watchdog/sc520_wdt.c
@@ -279,7 +279,7 @@ static int fop_close(struct inode *inode, struct file *file)
return 0;
}

-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/scx200_wdt.c b/drivers/watchdog/scx200_wdt.c
index 9e19a10..8203518 100644
--- a/drivers/watchdog/scx200_wdt.c
+++ b/drivers/watchdog/scx200_wdt.c
@@ -155,7 +155,7 @@ static ssize_t scx200_wdt_write(struct file *file, const char __user *data,
return 0;
}

-static long scx200_wdt_ioctl(struct file *file, unsigned int cmd,
+static int scx200_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/shwdt.c b/drivers/watchdog/shwdt.c
index cdc7138..8eebcb3 100644
--- a/drivers/watchdog/shwdt.c
+++ b/drivers/watchdog/shwdt.c
@@ -338,7 +338,7 @@ static int sh_wdt_mmap(struct file *file, struct vm_area_struct *vma)
* Query basic information from the device or ping it, as outlined by the
* watchdog API.
*/
-static long sh_wdt_ioctl(struct file *file, unsigned int cmd,
+static int sh_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_heartbeat;
diff --git a/drivers/watchdog/smsc37b787_wdt.c b/drivers/watchdog/smsc37b787_wdt.c
index 988ff1d..09828f1 100644
--- a/drivers/watchdog/smsc37b787_wdt.c
+++ b/drivers/watchdog/smsc37b787_wdt.c
@@ -423,7 +423,7 @@ static ssize_t wb_smsc_wdt_write(struct file *file, const char __user *data,

/* ioctl => control interface */

-static long wb_smsc_wdt_ioctl(struct file *file,
+static int wb_smsc_wdt_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
int new_timeout;
diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
index c650464..9a2d3fa 100644
--- a/drivers/watchdog/softdog.c
+++ b/drivers/watchdog/softdog.c
@@ -192,7 +192,7 @@ static ssize_t softdog_write(struct file *file, const char __user *data,
return len;
}

-static long softdog_ioctl(struct file *file, unsigned int cmd,
+static int softdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/txx9wdt.c b/drivers/watchdog/txx9wdt.c
index 6adab77..8184c55 100644
--- a/drivers/watchdog/txx9wdt.c
+++ b/drivers/watchdog/txx9wdt.c
@@ -127,7 +127,7 @@ static ssize_t txx9wdt_write(struct file *file, const char __user *data,
return len;
}

-static long txx9wdt_ioctl(struct file *file, unsigned int cmd,
+static int txx9wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 69396ad..9ec4bed 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -191,7 +191,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
return count;
}

-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/w83697hf_wdt.c b/drivers/watchdog/w83697hf_wdt.c
index 445d30a..b969baa 100644
--- a/drivers/watchdog/w83697hf_wdt.c
+++ b/drivers/watchdog/w83697hf_wdt.c
@@ -229,7 +229,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
return count;
}

-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/w83877f_wdt.c b/drivers/watchdog/w83877f_wdt.c
index 24587d2..36ed0b2 100644
--- a/drivers/watchdog/w83877f_wdt.c
+++ b/drivers/watchdog/w83877f_wdt.c
@@ -242,7 +242,7 @@ static int fop_close(struct inode *inode, struct file *file)
return 0;
}

-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/w83977f_wdt.c b/drivers/watchdog/w83977f_wdt.c
index 2525da5..ab029dd 100644
--- a/drivers/watchdog/w83977f_wdt.c
+++ b/drivers/watchdog/w83977f_wdt.c
@@ -377,7 +377,7 @@ static struct watchdog_info ident = {
.identity = WATCHDOG_NAME,
};

-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int status;
int new_options, retval = -EINVAL;
diff --git a/drivers/watchdog/wafer5823wdt.c b/drivers/watchdog/wafer5823wdt.c
index 68377ae..06e6cae 100644
--- a/drivers/watchdog/wafer5823wdt.c
+++ b/drivers/watchdog/wafer5823wdt.c
@@ -121,7 +121,7 @@ static ssize_t wafwdt_write(struct file *file, const char __user *buf,
return count;
}

-static long wafwdt_ioctl(struct file *file, unsigned int cmd,
+static int wafwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_timeout;
diff --git a/drivers/watchdog/wdrtas.c b/drivers/watchdog/wdrtas.c
index 5d3b1a8..823ed73 100644
--- a/drivers/watchdog/wdrtas.c
+++ b/drivers/watchdog/wdrtas.c
@@ -305,7 +305,7 @@ out:
* wdrtas_ioctl implements the watchdog API ioctls
*/

-static long wdrtas_ioctl(struct file *file, unsigned int cmd,
+static int wdrtas_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/wdt.c b/drivers/watchdog/wdt.c
index deeebb2..cfbba80 100644
--- a/drivers/watchdog/wdt.c
+++ b/drivers/watchdog/wdt.c
@@ -349,7 +349,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
* querying capabilities and current status.
*/

-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = (void __user *)arg;
int __user *p = argp;
diff --git a/drivers/watchdog/wdt285.c b/drivers/watchdog/wdt285.c
index db362c3..e799311 100644
--- a/drivers/watchdog/wdt285.c
+++ b/drivers/watchdog/wdt285.c
@@ -132,7 +132,7 @@ static const struct watchdog_info ident = {
.identity = "Footbridge Watchdog",
};

-static long watchdog_ioctl(struct file *file, unsigned int cmd,
+static int watchdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
unsigned int new_margin;
diff --git a/drivers/watchdog/wdt977.c b/drivers/watchdog/wdt977.c
index 60e28d4..348674f 100644
--- a/drivers/watchdog/wdt977.c
+++ b/drivers/watchdog/wdt977.c
@@ -351,7 +351,7 @@ static const struct watchdog_info ident = {
* according to their available features.
*/

-static long wdt977_ioctl(struct file *file, unsigned int cmd,
+static int wdt977_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int status;
diff --git a/drivers/watchdog/wdt_pci.c b/drivers/watchdog/wdt_pci.c
index ed02bdb..2d6a3e5 100644
--- a/drivers/watchdog/wdt_pci.c
+++ b/drivers/watchdog/wdt_pci.c
@@ -403,7 +403,7 @@ static ssize_t wdtpci_write(struct file *file, const char __user *buf,
* querying capabilities and current status.
*/

-static long wdtpci_ioctl(struct file *file, unsigned int cmd,
+static int wdtpci_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
int new_heartbeat;
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 5f1538c..acb7af1 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -61,13 +61,13 @@ static int bad_file_ioctl (struct inode *inode, struct file *filp,
return -EIO;
}

-static long bad_file_unlocked_ioctl(struct file *file, unsigned cmd,
+static int bad_file_unlocked_ioctl(struct inode *inode, struct file *file, unsigned cmd,
unsigned long arg)
{
return -EIO;
}

-static long bad_file_compat_ioctl(struct file *file, unsigned int cmd,
+static int bad_file_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
return -EIO;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aff5421..d1384f0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1150,8 +1150,13 @@ static int blkdev_close(struct inode * inode, struct file * filp)
return blkdev_put(bdev);
}

-static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int block_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
{
+ /*
+ * NOTE! We ignore the on-disk inode that was passed as
+ * an argument, and use the "f_mapping->host" inode for
+ * all block ioctls!
+ */
return blkdev_ioctl(file->f_mapping->host, file, cmd, arg);
}

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 135c965..f46a281 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -95,7 +95,7 @@ extern int cifs_setxattr(struct dentry *, const char *, const void *,
size_t, int);
extern ssize_t cifs_getxattr(struct dentry *, const char *, void *, size_t);
extern ssize_t cifs_listxattr(struct dentry *, char *, size_t);
-extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+extern int cifs_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg);

#ifdef CONFIG_CIFS_EXPERIMENTAL
extern const struct export_operations cifs_export_ops;
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 0088a5b..c6b9fa4 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -30,9 +30,8 @@

#define CIFS_IOC_CHECKUMOUNT _IO(0xCF, 2)

-long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
+int cifs_ioctl(struct inode *inode, struct file *filep, unsigned int command, unsigned long arg)
{
- struct inode *inode = filep->f_dentry->d_inode;
int rc = -ENOTTY; /* strange error - but the precedent */
int xid;
struct cifs_sb_info *cifs_sb;
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 5235c67..d3a3093 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -2804,7 +2804,8 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, unsigned int cmd,

default:
if (filp->f_op && filp->f_op->compat_ioctl) {
- error = filp->f_op->compat_ioctl(filp, cmd, arg);
+ struct inode *inode = filp->f_dentry->d_inode;
+ error = filp->f_op->compat_ioctl(inode, filp, cmd, arg);
if (error != -ENOIOCTLCMD)
goto out_fput;
}
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 47d88da..6924f85 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -138,8 +138,8 @@ int __ext2_write_begin(struct file *file, struct address_space *mapping,
struct page **pagep, void **fsdata);

/* ioctl.c */
-extern long ext2_ioctl(struct file *, unsigned int, unsigned long);
-extern long ext2_compat_ioctl(struct file *, unsigned int, unsigned long);
+extern int ext2_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int ext2_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);

/* namei.c */
struct dentry *ext2_get_parent(struct dentry *child);
diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
index de876fa..ba84585 100644
--- a/fs/ext2/ioctl.c
+++ b/fs/ext2/ioctl.c
@@ -18,9 +18,8 @@
#include <asm/uaccess.h>


-long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ext2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = filp->f_dentry->d_inode;
struct ext2_inode_info *ei = EXT2_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
@@ -156,7 +155,7 @@ setflags_out:
}

#ifdef CONFIG_COMPAT
-long ext2_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext2_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
@@ -175,6 +174,6 @@ long ext2_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
default:
return -ENOIOCTLCMD;
}
- return ext2_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+ return ext2_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
}
#endif
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 0d0c701..7cf4617 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -294,9 +294,8 @@ group_add_out:
}

#ifdef CONFIG_COMPAT
-long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext3_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int ret;

/* These are just misnamed, they actually get/put from/to user an int */
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2950032..4bee000 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1079,8 +1079,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);

/* ioctl.c */
-extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
-extern long ext4_compat_ioctl (struct file *, unsigned int, unsigned long);
+extern int ext4_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int ext4_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);

/* migrate.c */
extern int ext4_ext_migrate(struct inode *, struct file *, unsigned int,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 7a6c2f1..f72db70 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -18,9 +18,8 @@
#include "ext4_jbd2.h"
#include "ext4.h"

-long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ext4_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = filp->f_dentry->d_inode;
struct ext4_inode_info *ei = EXT4_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
@@ -275,7 +274,7 @@ setversion_out:
}

#ifdef CONFIG_COMPAT
-long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext4_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
@@ -316,6 +315,6 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
default:
return -ENOIOCTLCMD;
}
- return ext4_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+ return ext4_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
}
#endif
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index cd4a016..32c94b1 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -796,10 +796,9 @@ static int fat_dir_ioctl(struct inode *inode, struct file *filp,

FAT_IOCTL_FILLDIR_FUNC(fat_compat_ioctl_filldir, compat_dirent)

-static long fat_compat_dir_ioctl(struct file *filp, unsigned cmd,
+static int fat_compat_dir_ioctl(struct inode *inode, struct file *filp, unsigned cmd,
unsigned long arg)
{
- struct inode *inode = filp->f_path.dentry->d_inode;
struct compat_dirent __user *d1 = compat_ptr(arg);
int short_only, both;

diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index e9a366d..b7bf87c 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -289,7 +289,7 @@ static int gfs2_set_flags(struct file *filp, u32 __user *ptr)
return do_gfs2_set_flags(filp, gfsflags, ~GFS2_DIF_JDATA);
}

-static long gfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int gfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
switch(cmd) {
case FS_IOC_GETFLAGS:
diff --git a/fs/inotify_user.c b/fs/inotify_user.c
index 6024942..9cfde4e 100644
--- a/fs/inotify_user.c
+++ b/fs/inotify_user.c
@@ -533,7 +533,7 @@ static int inotify_release(struct inode *ignored, struct file *file)
return 0;
}

-static long inotify_ioctl(struct file *file, unsigned int cmd,
+static int inotify_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct inotify_device *dev;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 7db32b3..2adb993 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -31,20 +31,20 @@
static long vfs_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
{
+ struct inode *inode;
int error = -ENOTTY;

if (!filp->f_op)
goto out;

+ inode = filp->f_path.dentry->d_inode;
if (filp->f_op->unlocked_ioctl) {
- error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
+ error = filp->f_op->unlocked_ioctl(inode, filp, cmd, arg);
if (error == -ENOIOCTLCMD)
error = -EINVAL;
- goto out;
} else if (filp->f_op->ioctl) {
lock_kernel();
- error = filp->f_op->ioctl(filp->f_path.dentry->d_inode,
- filp, cmd, arg);
+ error = filp->f_op->ioctl(inode, filp, cmd, arg);
unlock_kernel();
}

diff --git a/fs/jffs2/ioctl.c b/fs/jffs2/ioctl.c
index 9d41f43..80aa967 100644
--- a/fs/jffs2/ioctl.c
+++ b/fs/jffs2/ioctl.c
@@ -12,7 +12,7 @@
#include <linux/fs.h>
#include "nodelist.h"

-long jffs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jffs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
/* Later, this will provide for lsattr.jffs2 and chattr.jffs2, which
will include compression support etc. */
diff --git a/fs/jffs2/os-linux.h b/fs/jffs2/os-linux.h
index 5e194a5..7ef2c62 100644
--- a/fs/jffs2/os-linux.h
+++ b/fs/jffs2/os-linux.h
@@ -167,7 +167,7 @@ int jffs2_fsync(struct file *, struct dentry *, int);
int jffs2_do_readpage_unlock (struct inode *inode, struct page *pg);

/* ioctl.c */
-long jffs2_ioctl(struct file *, unsigned int, unsigned long);
+int jffs2_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);

/* symlink.c */
extern const struct inode_operations jffs2_symlink_inode_operations;
diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
index afe222b..0fdf047 100644
--- a/fs/jfs/ioctl.c
+++ b/fs/jfs/ioctl.c
@@ -52,9 +52,8 @@ static long jfs_map_ext2(unsigned long flags, int from)
}


-long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jfs_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = filp->f_dentry->d_inode;
struct jfs_inode_info *jfs_inode = JFS_IP(inode);
unsigned int flags;

@@ -129,7 +128,7 @@ setflags_out:
}

#ifdef CONFIG_COMPAT
-long jfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jfs_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
/* While these ioctl numbers defined with 'long' and have different
* numbers than the 64bit ABI,
@@ -143,6 +142,6 @@ long jfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
cmd = JFS_IOC_SETFLAGS;
break;
}
- return jfs_ioctl(filp, cmd, arg);
+ return jfs_ioctl(inode, filp, cmd, arg);
}
#endif
diff --git a/fs/jfs/jfs_inode.h b/fs/jfs/jfs_inode.h
index adb2faf..a94ca32 100644
--- a/fs/jfs/jfs_inode.h
+++ b/fs/jfs/jfs_inode.h
@@ -22,8 +22,8 @@ struct fid;

extern struct inode *ialloc(struct inode *, umode_t);
extern int jfs_fsync(struct file *, struct dentry *, int);
-extern long jfs_ioctl(struct file *, unsigned int, unsigned long);
-extern long jfs_compat_ioctl(struct file *, unsigned int, unsigned long);
+extern int jfs_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int jfs_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
extern struct inode *jfs_iget(struct super_block *, unsigned long);
extern int jfs_commit_inode(struct inode *, int);
extern int jfs_write_inode(struct inode*, int);
diff --git a/fs/ncpfs/ioctl.c b/fs/ncpfs/ioctl.c
index 3a97c95..75c3c29 100644
--- a/fs/ncpfs/ioctl.c
+++ b/fs/ncpfs/ioctl.c
@@ -874,9 +874,8 @@ int ncp_ioctl(struct inode *inode, struct file *filp,
}

#ifdef CONFIG_COMPAT
-long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ncp_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int ret;

lock_kernel();
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 7b142f0..bf5c6a2 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -109,9 +109,8 @@ bail:
return status;
}

-long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ocfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = filp->f_path.dentry->d_inode;
unsigned int flags;
int new_clusters;
int status;
@@ -168,7 +167,7 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}

#ifdef CONFIG_COMPAT
-long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+int ocfs2_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
{
switch (cmd) {
case OCFS2_IOC32_GETFLAGS:
@@ -189,6 +188,6 @@ long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg)
return -ENOIOCTLCMD;
}

- return ocfs2_ioctl(file, cmd, arg);
+ return ocfs2_ioctl(inode, file, cmd, arg);
}
#endif
diff --git a/fs/ocfs2/ioctl.h b/fs/ocfs2/ioctl.h
index cf9a5ee..0632b05 100644
--- a/fs/ocfs2/ioctl.h
+++ b/fs/ocfs2/ioctl.h
@@ -10,7 +10,7 @@
#ifndef OCFS2_IOCTL_H
#define OCFS2_IOCTL_H

-long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
-long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+int ocfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);
+int ocfs2_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg);

#endif /* OCFS2_IOCTL_H */
diff --git a/fs/pipe.c b/fs/pipe.c
index fcba654..8765108 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -577,9 +577,8 @@ bad_pipe_w(struct file *filp, const char __user *buf, size_t count,
return -EBADF;
}

-static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int pipe_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = filp->f_path.dentry->d_inode;
struct pipe_inode_info *pipe;
int count, buf, nrbufs;

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 8bb03f0..711bb4f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -239,11 +239,11 @@ static unsigned int proc_reg_poll(struct file *file, struct poll_table_struct *p
return rv;
}

-static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int proc_reg_unlocked_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
long rv = -ENOTTY;
- long (*unlocked_ioctl)(struct file *, unsigned int, unsigned long);
+ int (*unlocked_ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
int (*ioctl)(struct inode *, struct file *, unsigned int, unsigned long);

spin_lock(&pde->pde_unload_lock);
@@ -257,12 +257,12 @@ static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigne
spin_unlock(&pde->pde_unload_lock);

if (unlocked_ioctl) {
- rv = unlocked_ioctl(file, cmd, arg);
+ rv = unlocked_ioctl(inode, file, cmd, arg);
if (rv == -ENOIOCTLCMD)
rv = -EINVAL;
} else if (ioctl) {
lock_kernel();
- rv = ioctl(file->f_path.dentry->d_inode, file, cmd, arg);
+ rv = ioctl(inode, file, cmd, arg);
unlock_kernel();
}

@@ -271,11 +271,11 @@ static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigne
}

#ifdef CONFIG_COMPAT
-static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int proc_reg_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
long rv = -ENOTTY;
- long (*compat_ioctl)(struct file *, unsigned int, unsigned long);
+ int (*compat_ioctl)(struct inode *, struct file *, unsigned int, unsigned long);

spin_lock(&pde->pde_unload_lock);
if (!pde->proc_fops) {
@@ -287,7 +287,7 @@ static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned
spin_unlock(&pde->pde_unload_lock);

if (compat_ioctl)
- rv = compat_ioctl(file, cmd, arg);
+ rv = compat_ioctl(inode, file, cmd, arg);

pde_users_dec(pde);
return rv;
diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c
index 8303320..d85fe0d 100644
--- a/fs/reiserfs/ioctl.c
+++ b/fs/reiserfs/ioctl.c
@@ -115,10 +115,9 @@ setversion_out:
}

#ifdef CONFIG_COMPAT
-long reiserfs_compat_ioctl(struct file *file, unsigned int cmd,
+int reiserfs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int ret;

/* These are just misnamed, they actually get/put from/to user an int */
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index 5e82cff..08cf595 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -145,10 +145,9 @@ out_unlock:
return err;
}

-long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ubifs_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
int flags, err;
- struct inode *inode = file->f_path.dentry->d_inode;

switch (cmd) {
case FS_IOC_GETFLAGS:
@@ -187,7 +186,7 @@ long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
}

#ifdef CONFIG_COMPAT
-long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ubifs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case FS_IOC32_GETFLAGS:
@@ -199,6 +198,6 @@ long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
default:
return -ENOIOCTLCMD;
}
- return ubifs_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+ return ubifs_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
}
#endif
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index d7f706f..d82737e 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1639,10 +1639,10 @@ int ubifs_recover_size(struct ubifs_info *c);
void ubifs_destroy_size_tree(struct ubifs_info *c);

/* ioctl.c */
-long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int ubifs_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
void ubifs_set_inode_flags(struct inode *inode);
#ifdef CONFIG_COMPAT
-long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int ubifs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
#endif

/* compressor.c */
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 5311c1a..a4c1d10 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -376,14 +376,14 @@ xfs_file_mmap(
return 0;
}

-STATIC long
+STATIC int
xfs_file_ioctl(
+ struct inode *inode,
struct file *filp,
unsigned int cmd,
unsigned long p)
{
int error;
- struct inode *inode = filp->f_path.dentry->d_inode;

error = xfs_ioctl(XFS_I(inode), filp, 0, cmd, (void __user *)p);
xfs_iflags_set(XFS_I(inode), XFS_IMODIFIED);
@@ -397,14 +397,14 @@ xfs_file_ioctl(
return error;
}

-STATIC long
+STATIC int
xfs_file_ioctl_invis(
+ struct inode *inode,
struct file *filp,
unsigned int cmd,
unsigned long p)
{
int error;
- struct inode *inode = filp->f_path.dentry->d_inode;

error = xfs_ioctl(XFS_I(inode), filp, IO_INVIS, cmd, (void __user *)p);
xfs_iflags_set(XFS_I(inode), XFS_IMODIFIED);
diff --git a/fs/xfs/linux-2.6/xfs_ioctl32.c b/fs/xfs/linux-2.6/xfs_ioctl32.c
index a4b254e..dfe42ab 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl32.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl32.c
@@ -469,8 +469,9 @@ xfs_compat_ioctl(
return error;
}

-long
+int
xfs_file_compat_ioctl(
+ struct inode *inode,
struct file *file,
unsigned cmd,
unsigned long arg)
@@ -478,8 +479,9 @@ xfs_file_compat_ioctl(
return xfs_compat_ioctl(0, file, cmd, arg);
}

-long
+int
xfs_file_compat_invis_ioctl(
+ struct inode *inode,
struct file *file,
unsigned cmd,
unsigned long arg)
diff --git a/fs/xfs/linux-2.6/xfs_ioctl32.h b/fs/xfs/linux-2.6/xfs_ioctl32.h
index 02de6e6..7e64783 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl32.h
+++ b/fs/xfs/linux-2.6/xfs_ioctl32.h
@@ -18,7 +18,7 @@
#ifndef __XFS_IOCTL32_H__
#define __XFS_IOCTL32_H__

-extern long xfs_file_compat_ioctl(struct file *, unsigned, unsigned long);
-extern long xfs_file_compat_invis_ioctl(struct file *, unsigned, unsigned long);
+extern int xfs_file_compat_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);
+extern int xfs_file_compat_invis_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);

#endif /* __XFS_IOCTL32_H__ */
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 80171ee..c30e0ab 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -841,7 +841,7 @@ extern void ext3_set_aops(struct inode *inode);
/* ioctl.c */
extern int ext3_ioctl (struct inode *, struct file *, unsigned int,
unsigned long);
-extern long ext3_compat_ioctl (struct file *, unsigned int, unsigned long);
+extern int ext3_compat_ioctl (struct inode *, struct file *, unsigned int, unsigned long);

/* namei.c */
extern int ext3_orphan_add(handle_t *, struct inode *);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..9bcfbcd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1211,8 +1211,8 @@ struct block_device_operations {
int (*open) (struct inode *, struct file *);
int (*release) (struct inode *, struct file *);
int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
- long (*unlocked_ioctl) (struct file *, unsigned, unsigned long);
- long (*compat_ioctl) (struct file *, unsigned, unsigned long);
+ int (*unlocked_ioctl) (struct inode *, struct file *, unsigned, unsigned long);
+ int (*compat_ioctl) (struct inode *, struct file *, unsigned, unsigned long);
int (*direct_access) (struct block_device *, sector_t,
void **, unsigned long *);
int (*media_changed) (struct gendisk *);
@@ -1242,8 +1242,8 @@ struct file_operations {
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
- long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
- long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+ int (*unlocked_ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+ int (*compat_ioctl) (struct inode *,struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
@@ -1656,7 +1656,7 @@ extern int blkdev_ioctl(struct inode *, struct file *, unsigned, unsigned long);
extern int blkdev_driver_ioctl(struct inode *inode, struct file *file,
struct gendisk *disk, unsigned cmd,
unsigned long arg);
-extern long compat_blkdev_ioctl(struct file *, unsigned, unsigned long);
+extern int compat_blkdev_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);
extern int blkdev_get(struct block_device *, mode_t, unsigned);
extern int blkdev_put(struct block_device *);
extern int bd_claim(struct block_device *, void *);
diff --git a/include/linux/ncp_fs.h b/include/linux/ncp_fs.h
index 9f2d763..af7d026 100644
--- a/include/linux/ncp_fs.h
+++ b/include/linux/ncp_fs.h
@@ -211,7 +211,7 @@ void ncp_date_unix2dos(int unix_date, __le16 * time, __le16 * date);

/* linux/fs/ncpfs/ioctl.c */
int ncp_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
-long ncp_compat_ioctl(struct file *, unsigned int, unsigned long);
+int ncp_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);

/* linux/fs/ncpfs/sock.c */
int ncp_request2(struct ncp_server *server, int function,
diff --git a/include/linux/reiserfs_fs.h b/include/linux/reiserfs_fs.h
index e9963af..3422037 100644
--- a/include/linux/reiserfs_fs.h
+++ b/include/linux/reiserfs_fs.h
@@ -2174,7 +2174,7 @@ __u32 r5_hash(const signed char *msg, int len);
/* prototypes from ioctl.c */
int reiserfs_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg);
-long reiserfs_compat_ioctl(struct file *filp,
+int reiserfs_compat_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg);
int reiserfs_unpack(struct inode *inode, struct file *filp);

diff --git a/include/linux/tty.h b/include/linux/tty.h
index 0cbec74..bdb65a2 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -365,7 +365,7 @@ extern const struct file_operations tty_ldiscs_proc_fops;
extern void tty_wakeup(struct tty_struct *tty);
extern void tty_ldisc_flush(struct tty_struct *tty);

-extern long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+extern int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
extern int tty_mode_ioctl(struct tty_struct *tty, struct file *file,
unsigned int cmd, unsigned long arg);
extern int tty_perform_flush(struct tty_struct *tty, unsigned long arg);
diff --git a/include/linux/wanrouter.h b/include/linux/wanrouter.h
index e0aa396..82d2547 100644
--- a/include/linux/wanrouter.h
+++ b/include/linux/wanrouter.h
@@ -522,7 +522,7 @@ extern int wanrouter_proc_init(void);
extern void wanrouter_proc_cleanup(void);
extern int wanrouter_proc_add(struct wan_device *wandev);
extern int wanrouter_proc_delete(struct wan_device *wandev);
-extern long wanrouter_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+extern int wanrouter_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);

/* Public Data */
/* list of registered devices */
diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h
index dc64046..9ab9474 100644
--- a/include/media/v4l2-ioctl.h
+++ b/include/media/v4l2-ioctl.h
@@ -286,7 +286,7 @@ int v4l_compat_translate_ioctl(struct inode *inode, struct file *file,
#endif

/* 32 Bits compatibility layer for 64 bits processors */
-extern long v4l_compat_ioctl32(struct file *file, unsigned int cmd,
+extern int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg);

extern int video_ioctl2(struct inode *inode, struct file *file,
diff --git a/kernel/power/user.c b/kernel/power/user.c
index a6332a3..6f8b19d 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -187,7 +187,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
return res;
}

-static long snapshot_ioctl(struct file *filp, unsigned int cmd,
+static int snapshot_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
{
int error = 0;
diff --git a/net/irda/irnet/irnet_ppp.c b/net/irda/irnet/irnet_ppp.c
index 6d8ae03..ae45f37 100644
--- a/net/irda/irnet/irnet_ppp.c
+++ b/net/irda/irnet/irnet_ppp.c
@@ -631,8 +631,9 @@ dev_irnet_poll(struct file * file,
* This is the way pppd configure us and control us while the PPP
* instance is active.
*/
-static long
+static int
dev_irnet_ioctl(
+ struct inode * inode,
struct file * file,
unsigned int cmd,
unsigned long arg)
diff --git a/net/irda/irnet/irnet_ppp.h b/net/irda/irnet/irnet_ppp.h
index d9f8bd4..44bd8ec 100644
--- a/net/irda/irnet/irnet_ppp.h
+++ b/net/irda/irnet/irnet_ppp.h
@@ -76,8 +76,9 @@ static ssize_t
static unsigned int
dev_irnet_poll(struct file *,
poll_table *);
-static long
- dev_irnet_ioctl(struct file *,
+static int
+ dev_irnet_ioctl(struct inode *,
+ struct file *,
unsigned int,
unsigned long);
/* ------------------------ PPP INTERFACE ------------------------ */
diff --git a/net/socket.c b/net/socket.c
index 8ef8ba8..5d6824b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -107,9 +107,9 @@ static int sock_mmap(struct file *file, struct vm_area_struct *vma);
static int sock_close(struct inode *inode, struct file *file);
static unsigned int sock_poll(struct file *file,
struct poll_table_struct *wait);
-static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int sock_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
#ifdef CONFIG_COMPAT
-static long compat_sock_ioctl(struct file *file,
+static int compat_sock_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg);
#endif
static int sock_fasync(int fd, struct file *filp, int on);
@@ -850,7 +850,7 @@ EXPORT_SYMBOL(dlci_ioctl_set);
* what to do with it - that's up to the protocol still.
*/

-static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int sock_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
{
struct socket *sock;
struct sock *sk;
@@ -2316,7 +2316,7 @@ void socket_seq_show(struct seq_file *seq)
#endif /* CONFIG_PROC_FS */

#ifdef CONFIG_COMPAT
-static long compat_sock_ioctl(struct file *file, unsigned cmd,
+static int compat_sock_ioctl(struct inode *inode, struct file *file, unsigned cmd,
unsigned long arg)
{
struct socket *sock = file->private_data;
diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c
index 7f07152..2974428 100644
--- a/net/wanrouter/wanmain.c
+++ b/net/wanrouter/wanmain.c
@@ -349,9 +349,8 @@ __be16 wanrouter_type_trans(struct sk_buff *skb, struct net_device *dev)
* o execute requested action or pass command to the device driver
*/

-long wanrouter_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int wanrouter_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int err = 0;
struct proc_dir_entry *dent;
struct wan_device *wandev;
diff --git a/sound/core/control.c b/sound/core/control.c
index 281b2e2..f10a3f0 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -1149,7 +1149,7 @@ static int snd_ctl_tlv_ioctl(struct snd_ctl_file *file,
return err;
}

-static long snd_ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_ctl_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_ctl_file *ctl;
struct snd_card *card;
diff --git a/sound/core/control_compat.c b/sound/core/control_compat.c
index 6101259..0af5c5b 100644
--- a/sound/core/control_compat.c
+++ b/sound/core/control_compat.c
@@ -390,7 +390,7 @@ enum {
SNDRV_CTL_IOCTL_ELEM_REPLACE32 = _IOWR('U', 0x18, struct snd_ctl_elem_info32),
};

-static inline long snd_ctl_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static inline int snd_ctl_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_ctl_file *ctl;
struct snd_kctl_ioctl *p;
@@ -412,7 +412,7 @@ static inline long snd_ctl_ioctl_compat(struct file *file, unsigned int cmd, uns
case SNDRV_CTL_IOCTL_TLV_READ:
case SNDRV_CTL_IOCTL_TLV_WRITE:
case SNDRV_CTL_IOCTL_TLV_COMMAND:
- return snd_ctl_ioctl(file, cmd, (unsigned long)argp);
+ return snd_ctl_ioctl(inode, file, cmd, (unsigned long)argp);
case SNDRV_CTL_IOCTL_ELEM_LIST32:
return snd_ctl_elem_list_compat(ctl->card, argp);
case SNDRV_CTL_IOCTL_ELEM_INFO32:
diff --git a/sound/core/hwdep.c b/sound/core/hwdep.c
index 6d6589f..7518eaa 100644
--- a/sound/core/hwdep.c
+++ b/sound/core/hwdep.c
@@ -231,7 +231,7 @@ static int snd_hwdep_dsp_load(struct snd_hwdep *hw,
return 0;
}

-static long snd_hwdep_ioctl(struct file * file, unsigned int cmd,
+static int snd_hwdep_ioctl(struct inode *inode, struct file * file, unsigned int cmd,
unsigned long arg)
{
struct snd_hwdep *hw = file->private_data;
diff --git a/sound/core/hwdep_compat.c b/sound/core/hwdep_compat.c
index 3827c0c..3c7cc2a 100644
--- a/sound/core/hwdep_compat.c
+++ b/sound/core/hwdep_compat.c
@@ -59,7 +59,7 @@ enum {
SNDRV_HWDEP_IOCTL_DSP_LOAD32 = _IOW('H', 0x03, struct snd_hwdep_dsp_image32)
};

-static long snd_hwdep_ioctl_compat(struct file * file, unsigned int cmd,
+static int snd_hwdep_ioctl_compat(struct inode *inode, struct file * file, unsigned int cmd,
unsigned long arg)
{
struct snd_hwdep *hw = file->private_data;
@@ -68,7 +68,7 @@ static long snd_hwdep_ioctl_compat(struct file * file, unsigned int cmd,
case SNDRV_HWDEP_IOCTL_PVERSION:
case SNDRV_HWDEP_IOCTL_INFO:
case SNDRV_HWDEP_IOCTL_DSP_STATUS:
- return snd_hwdep_ioctl(file, cmd, (unsigned long)argp);
+ return snd_hwdep_ioctl(inode, file, cmd, (unsigned long)argp);
case SNDRV_HWDEP_IOCTL_DSP_LOAD32:
return snd_hwdep_dsp_load_compat(hw, argp);
}
diff --git a/sound/core/info.c b/sound/core/info.c
index c67773a..5f8e1e9 100644
--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -465,7 +465,7 @@ static unsigned int snd_info_entry_poll(struct file *file, poll_table * wait)
return mask;
}

-static long snd_info_entry_ioctl(struct file *file, unsigned int cmd,
+static int snd_info_entry_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct snd_info_private_data *data;
diff --git a/sound/core/init.c b/sound/core/init.c
index df46bbc..82323a2 100644
--- a/sound/core/init.c
+++ b/sound/core/init.c
@@ -275,7 +275,7 @@ static unsigned int snd_disconnect_poll(struct file * file, poll_table * wait)
return POLLERR | POLLNVAL;
}

-static long snd_disconnect_ioctl(struct file *file,
+static int snd_disconnect_ioctl(struct inode *inode, struct file *file,
unsigned int cmd, unsigned long arg)
{
return -ENODEV;
diff --git a/sound/core/oss/mixer_oss.c b/sound/core/oss/mixer_oss.c
index 581aa2c..273f177 100644
--- a/sound/core/oss/mixer_oss.c
+++ b/sound/core/oss/mixer_oss.c
@@ -359,7 +359,7 @@ static int snd_mixer_oss_ioctl1(struct snd_mixer_oss_file *fmixer, unsigned int
return -ENXIO;
}

-static long snd_mixer_oss_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_mixer_oss_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
return snd_mixer_oss_ioctl1((struct snd_mixer_oss_file *) file->private_data, cmd, arg);
}
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
index 4c601b1..229513c 100644
--- a/sound/core/oss/pcm_oss.c
+++ b/sound/core/oss/pcm_oss.c
@@ -2428,7 +2428,7 @@ static int snd_pcm_oss_release(struct inode *inode, struct file *file)
return 0;
}

-static long snd_pcm_oss_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_pcm_oss_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_pcm_oss_file *pcm_oss_file;
int __user *p = (int __user *)arg;
diff --git a/sound/core/pcm_compat.c b/sound/core/pcm_compat.c
index 49aa693..f480fda 100644
--- a/sound/core/pcm_compat.c
+++ b/sound/core/pcm_compat.c
@@ -460,7 +460,7 @@ enum {

};

-static long snd_pcm_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_pcm_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_pcm_file *pcm_file;
struct snd_pcm_substream *substream;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index c49b9d9..4c703aa 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2726,7 +2726,7 @@ static int snd_pcm_capture_ioctl1(struct file *file,
return snd_pcm_common_ioctl1(file, substream, cmd, arg);
}

-static long snd_pcm_playback_ioctl(struct file *file, unsigned int cmd,
+static int snd_pcm_playback_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct snd_pcm_file *pcm_file;
@@ -2740,7 +2740,7 @@ static long snd_pcm_playback_ioctl(struct file *file, unsigned int cmd,
(void __user *)arg);
}

-static long snd_pcm_capture_ioctl(struct file *file, unsigned int cmd,
+static int snd_pcm_capture_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct snd_pcm_file *pcm_file;
diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c
index f7ea728..8c103ca 100644
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -687,7 +687,7 @@ static int snd_rawmidi_input_status(struct snd_rawmidi_substream *substream,
return 0;
}

-static long snd_rawmidi_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_rawmidi_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_rawmidi_file *rfile;
void __user *argp = (void __user *)arg;
diff --git a/sound/core/rawmidi_compat.c b/sound/core/rawmidi_compat.c
index 5268c1f..2764275 100644
--- a/sound/core/rawmidi_compat.c
+++ b/sound/core/rawmidi_compat.c
@@ -99,7 +99,7 @@ enum {
SNDRV_RAWMIDI_IOCTL_STATUS32 = _IOWR('W', 0x20, struct snd_rawmidi_status32),
};

-static long snd_rawmidi_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_rawmidi_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_rawmidi_file *rfile;
void __user *argp = compat_ptr(arg);
@@ -110,7 +110,7 @@ static long snd_rawmidi_ioctl_compat(struct file *file, unsigned int cmd, unsign
case SNDRV_RAWMIDI_IOCTL_INFO:
case SNDRV_RAWMIDI_IOCTL_DROP:
case SNDRV_RAWMIDI_IOCTL_DRAIN:
- return snd_rawmidi_ioctl(file, cmd, (unsigned long)argp);
+ return snd_rawmidi_ioctl(inode, file, cmd, (unsigned long)argp);
case SNDRV_RAWMIDI_IOCTL_PARAMS32:
return snd_rawmidi_ioctl_params_compat(rfile, argp);
case SNDRV_RAWMIDI_IOCTL_STATUS32:
diff --git a/sound/core/seq/oss/seq_oss.c b/sound/core/seq/oss/seq_oss.c
index 777796e..b1fd18e 100644
--- a/sound/core/seq/oss/seq_oss.c
+++ b/sound/core/seq/oss/seq_oss.c
@@ -63,7 +63,7 @@ static int odev_open(struct inode *inode, struct file *file);
static int odev_release(struct inode *inode, struct file *file);
static ssize_t odev_read(struct file *file, char __user *buf, size_t count, loff_t *offset);
static ssize_t odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offset);
-static long odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int odev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
static unsigned int odev_poll(struct file *file, poll_table * wait);


@@ -178,8 +178,8 @@ odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offs
return snd_seq_oss_write(dp, buf, count, file);
}

-static long
-odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+odev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct seq_oss_devinfo *dp;
dp = file->private_data;
diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c
index 7a1545d..d9ebb9d 100644
--- a/sound/core/seq/seq_clientmgr.c
+++ b/sound/core/seq/seq_clientmgr.c
@@ -2191,7 +2191,7 @@ static int snd_seq_do_ioctl(struct snd_seq_client *client, unsigned int cmd,
}


-static long snd_seq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_seq_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_seq_client *client = file->private_data;

diff --git a/sound/core/seq/seq_compat.c b/sound/core/seq/seq_compat.c
index 9628c06..f1a7060 100644
--- a/sound/core/seq/seq_compat.c
+++ b/sound/core/seq/seq_compat.c
@@ -87,7 +87,7 @@ enum {
SNDRV_SEQ_IOCTL_QUERY_NEXT_PORT32 = _IOWR('S', 0x52, struct snd_seq_port_info32),
};

-static long snd_seq_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_seq_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
struct snd_seq_client *client = file->private_data;
void __user *argp = compat_ptr(arg);
diff --git a/sound/core/timer.c b/sound/core/timer.c
index 0af337e..f505d69 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1756,7 +1756,7 @@ enum {
SNDRV_TIMER_IOCTL_PAUSE_OLD = _IO('T', 0x23),
};

-static long snd_timer_user_ioctl(struct file *file, unsigned int cmd,
+static int snd_timer_user_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
unsigned long arg)
{
struct snd_timer_user *tu;
diff --git a/sound/core/timer_compat.c b/sound/core/timer_compat.c
index 5512f53..2dc4785 100644
--- a/sound/core/timer_compat.c
+++ b/sound/core/timer_compat.c
@@ -93,7 +93,7 @@ enum {
SNDRV_TIMER_IOCTL_STATUS32 = _IOW('T', 0x14, struct snd_timer_status32),
};

-static long snd_timer_user_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_timer_user_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
void __user *argp = compat_ptr(arg);

@@ -114,7 +114,7 @@ static long snd_timer_user_ioctl_compat(struct file *file, unsigned int cmd, uns
case SNDRV_TIMER_IOCTL_PAUSE:
case SNDRV_TIMER_IOCTL_PAUSE_OLD:
case SNDRV_TIMER_IOCTL_NEXT_DEVICE:
- return snd_timer_user_ioctl(file, cmd, (unsigned long)argp);
+ return snd_timer_user_ioctl(inode, file, cmd, (unsigned long)argp);
case SNDRV_TIMER_IOCTL_INFO32:
return snd_timer_user_info_compat(file, argp);
case SNDRV_TIMER_IOCTL_STATUS32:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7dd9b0b..51368d7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,7 +66,7 @@ static __read_mostly struct preempt_ops kvm_preempt_ops;

struct dentry *kvm_debugfs_dir;

-static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
+static int kvm_vcpu_ioctl(struct inode *inode, struct file *file, unsigned int ioctl,
unsigned long arg);

bool kvm_rebooting;
@@ -1112,7 +1112,7 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
return 0;
}

-static long kvm_vcpu_ioctl(struct file *filp,
+static int kvm_vcpu_ioctl(struct inode *inode, struct file *filp,
unsigned int ioctl, unsigned long arg)
{
struct kvm_vcpu *vcpu = filp->private_data;
@@ -1295,7 +1295,7 @@ out:
return r;
}

-static long kvm_vm_ioctl(struct file *filp,
+static int kvm_vm_ioctl(struct inode *inode, struct file *filp,
unsigned int ioctl, unsigned long arg)
{
struct kvm *kvm = filp->private_data;
@@ -1415,7 +1415,7 @@ static int kvm_dev_ioctl_create_vm(void)
return fd;
}

-static long kvm_dev_ioctl(struct file *filp,
+static int kvm_dev_ioctl(struct inode *inode, struct file *filp,
unsigned int ioctl, unsigned long arg)
{
long r = -EINVAL;

2008-08-28 00:46:34

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Paul Mundt <[email protected]>
Date: Thu, 28 Aug 2008 09:32:13 +0900

> On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > wanted with an arbitrary limit.
>
> In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> only performed from do_IRQ(), which is sporadic at best, especially on
> tickless. While it catches some things, it's not a complete solution in
> and of iteslf.

BTW, on sparc64 we have a stack overflow checker that runs via
the profiling _mcount hook. So every function call we check
if the stack is getting overused.

If so, we jump onto a special static debugging stack and print
the stack overflow message.

And yes it works with IRQ stacks which is all that sparc64 uses
nowadays.

Perhaps this is useful enough to make generic.

2008-08-28 01:03:19

by Paul Mundt

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Wed, Aug 27, 2008 at 05:46:05PM -0700, David Miller wrote:
> From: Paul Mundt <[email protected]>
> Date: Thu, 28 Aug 2008 09:32:13 +0900
>
> > On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > > wanted with an arbitrary limit.
> >
> > In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> > only performed from do_IRQ(), which is sporadic at best, especially on
> > tickless. While it catches some things, it's not a complete solution in
> > and of iteslf.
>
> BTW, on sparc64 we have a stack overflow checker that runs via
> the profiling _mcount hook. So every function call we check
> if the stack is getting overused.
>
> If so, we jump onto a special static debugging stack and print
> the stack overflow message.
>
> And yes it works with IRQ stacks which is all that sparc64 uses
> nowadays.
>
> Perhaps this is useful enough to make generic.

Thanks for the pointer, I'll take a look at it!

2008-08-28 01:08:50

by Greg Ungerer

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected


Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
>> On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
>>> On Wed, 27 Aug 2008, Adrian Bunk wrote:
>>>> When did we get callpaths like like nfs+xfs+md+scsi reliably
>>>> working with 4kB stacks on x86-32?
>>> XFS may never have been usable, but the rest, sure.
>>>
>>> And you seem to be making this whole argument an excuse to SUCK, adn an
>>> excuse to let gcc crap even more on our stack space.
>>>
>>> Why?
>>>
>>> Why aren't you saying that we should be able to do better? Instead, you
>>> seem to asking us to do even worse than we do now?
>> My main point is:
>> - getting 4kB stacks working reliably is a hard task
>> - having an eye on gcc increasing the stack usage, and fixing it if
>> required, is relatively easy
>>
>> If we should be able to do better at getting (and keeping) 4kB stacks
>> working, then coping with possible inlining problems caused by gcc
>> should not be a big problem for us.
>>
> Out of the architectures you've mentioned for 4k stacks, they also tend
> to do IRQ stacks, which is something you seem to have overlooked.
>
> In addition to that, debugging the runaway stack users on 4k tends to be
> easier anyways since you end up blowing the stack a lot sooner. On sh
> we've had pretty good luck with it, though most of our users are using
> fairly deterministic workloads and continually profiling the footprint.
> Anything that runs away or uses an insane amount of stack space needs to
> be fixed well before that anyways, so catching it sooner is always
> preferable. I imagine the same case is true for m68knommu (even sans IRQ
> stacks).

Yep, definitely true for m68knommu in my experience. I haven't had
any problems with 4k stacks recently. But yes the workloads do tend
to be constrained - and almost never use any of the more exotic
filesystems or drivers.



> Things might be more sensitive on x86, but it's certainly not something
> that's a huge problem for the various embedded platforms to wire up,
> whether they want to go the IRQ stack route or not.
>
> In any event, lack of support for something on embedded architectures in
> the kernel is more often due to apathy/utter indifference on the part of
> the architecture maintainer rather than being indicative of any intrinsic
> difficulty in supporting the thing in question. Most new "features" on the
> lesser maintained architectures tend to end up there either out of peer
> pressure or copying-and-pasting accidents rather than any sort of design.
> ;-)

Indeed :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

2008-08-28 13:53:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26

On Wed, Aug 27, 2008 at 01:40:10PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 27 Aug 2008, Peter Osterlund wrote:
> >
> > Why not just revert the offending change and try again during the next
> > merge window, assuming someone has figured out an acceptable way to
> > handle this mess by then?
>
> Well,, for 2.6.27 that's what we'll have to do. But there's actually a
> real problem here - the unlocked ioctl's (which we _should_ prefer) have a
> strictly weaker and worse interface. I also wonder if any other
> block_ioctl users were converted..

Actually both interfaces are a fscking disaster. The right things to
pass is neither and inode nor a file but a struct block_device. Al had
all this work done a while and it just needs rebasing to a current tree:

http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary

2008-08-28 16:17:42

by Adrian Bunk

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

On Thu, Aug 28, 2008 at 09:32:13AM +0900, Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> > > On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > > In addition to that, debugging the runaway stack users on 4k tends to be
> > > easier anyways since you end up blowing the stack a lot sooner. On sh
> > > we've had pretty good luck with it, though most of our users are using
> > > fairly deterministic workloads and continually profiling the footprint.
> > > Anything that runs away or uses an insane amount of stack space needs to
> > > be fixed well before that anyways, so catching it sooner is always
> > > preferable. I imagine the same case is true for m68knommu (even sans IRQ
> > > stacks).
> >
> > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > wanted with an arbitrary limit.
> >
> In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> only performed from do_IRQ(), which is sporadic at best, especially on
> tickless. While it catches some things, it's not a complete solution in
> and of iteslf.
>
> In addition to this, there are even fewer platforms that support it than
> there are platforms that do 4k stacks. At first glance, it looks like
> it's only m32r, powerpc, sh, x86, and xtensa.
>...

As far as I can see the only architectures that optionally offer 4kB
stacks today are m68knommu, s390, sh and x86.

Did I miss some architectures or is 5 < 4 ;) ?

> Others support the Kconfig
> option, but don't seem to realize that it's not an option that the kernel
> does anything with by itself, and so don't actually do anything (ie,
> FRV).

Unless I miss anything these "others" include only FRV.

> > IMHO there seems to currently be a mismatch between it's maintainance
> > cost and the actual number of users. That's in my opinion the main
> > problem with it, no matter in which direction it gets resolved.
> >
> Perhaps that's true on x86, but in general I take issue with that. On sh
> we've had to do very little maintenance for it and most shipping products
> are using it today (at least on MMU-Linux, we don't bother with it on
> nommu). Most of the problems we ran in to with 4k stacks tended to be
> stuff that we wanted to fix for 8k anyways. I suspect that this case is
> true for the other embedded platforms also.
>...

Most stack issues are not platform or architecture specific.

The maintainance effort therefore mostly depends on whether a non-zero
number of architectures uses 4kB stacks.

And if something is considered to be important for small embedded
systems, but not supported on ARM, MIPS or PowerPC, then that's
a bit strange.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-08-29 12:42:28

by Jes Sorensen

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

Change smp_call_function_mask() to take a pointer to the cpumask_t
rather than passing it by value. This avoids recursive copies of the
cpumask_t on the stack in the IPI call. For large NR_CPUS, this is
particularly bad, and the cost of doing this for
NR_CPUS < bits_per_long is negligeble.

Signed-off-by: Jes Sorensen <[email protected]>

---
arch/alpha/include/asm/smp.h | 2 +-
arch/alpha/kernel/smp.c | 4 ++--
arch/arm/include/asm/smp.h | 2 +-
arch/arm/kernel/smp.c | 4 ++--
arch/ia64/include/asm/smp.h | 2 +-
arch/ia64/kernel/smp.c | 6 +++---
arch/m32r/kernel/smp.c | 4 ++--
arch/mips/kernel/smp.c | 4 ++--
arch/parisc/kernel/smp.c | 6 +++---
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/kernel/smp.c | 4 ++--
arch/sh/include/asm/smp.h | 2 +-
arch/sh/kernel/smp.c | 4 ++--
arch/sparc/include/asm/smp_64.h | 2 +-
arch/sparc64/kernel/smp.c | 4 ++--
include/asm-m32r/smp.h | 2 +-
include/asm-mips/smp.h | 2 +-
include/asm-parisc/smp.h | 2 +-
include/asm-x86/smp.h | 4 ++--
include/linux/smp.h | 2 +-
kernel/smp.c | 15 ++++++++-------
virt/kvm/kvm_main.c | 4 ++--
22 files changed, 42 insertions(+), 41 deletions(-)

Index: linux-2.6.git/arch/alpha/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/alpha/include/asm/smp.h
+++ linux-2.6.git/arch/alpha/include/asm/smp.h
@@ -48,7 +48,7 @@
#define cpu_possible_map cpu_present_map

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#else /* CONFIG_SMP */

Index: linux-2.6.git/arch/alpha/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/alpha/kernel/smp.c
+++ linux-2.6.git/arch/alpha/kernel/smp.c
@@ -637,9 +637,9 @@
send_ipi_message(to_whom, IPI_CPU_STOP);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
- send_ipi_message(mask, IPI_CALL_FUNC);
+ send_ipi_message(*mask, IPI_CALL_FUNC);
}

void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/arm/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/arm/include/asm/smp.h
+++ linux-2.6.git/arch/arm/include/asm/smp.h
@@ -102,7 +102,7 @@
extern void platform_cpu_enable(unsigned int cpu);

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

/*
* Local timer interrupt handling function (can be IPI'ed).
Index: linux-2.6.git/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/arm/kernel/smp.c
+++ linux-2.6.git/arch/arm/kernel/smp.c
@@ -356,9 +356,9 @@
local_irq_restore(flags);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
- send_ipi_message(mask, IPI_CALL_FUNC);
+ send_ipi_message(*mask, IPI_CALL_FUNC);
}

void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/ia64/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/ia64/include/asm/smp.h
+++ linux-2.6.git/arch/ia64/include/asm/smp.h
@@ -127,7 +127,7 @@
extern int is_multithreading_enabled(void);

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#else /* CONFIG_SMP */

Index: linux-2.6.git/arch/ia64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/ia64/kernel/smp.c
+++ linux-2.6.git/arch/ia64/kernel/smp.c
@@ -166,11 +166,11 @@
* Called with preemption disabled.
*/
static inline void
-send_IPI_mask(cpumask_t mask, int op)
+send_IPI_mask(cpumask_t *mask, int op)
{
unsigned int cpu;

- for_each_cpu_mask(cpu, mask) {
+ for_each_cpu_mask(cpu, *mask) {
send_IPI_single(cpu, op);
}
}
@@ -316,7 +316,7 @@
send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
send_IPI_mask(mask, IPI_CALL_FUNC);
}
Index: linux-2.6.git/arch/m32r/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/m32r/kernel/smp.c
+++ linux-2.6.git/arch/m32r/kernel/smp.c
@@ -546,9 +546,9 @@
for ( ; ; );
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
- send_IPI_mask(mask, CALL_FUNCTION_IPI, 0);
+ send_IPI_mask(*mask, CALL_FUNCTION_IPI, 0);
}

void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/mips/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/mips/kernel/smp.c
+++ linux-2.6.git/arch/mips/kernel/smp.c
@@ -131,9 +131,9 @@
cpu_idle();
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
- mp_ops->send_ipi_mask(mask, SMP_CALL_FUNCTION);
+ mp_ops->send_ipi_mask(*mask, SMP_CALL_FUNCTION);
}

/*
Index: linux-2.6.git/arch/parisc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/parisc/kernel/smp.c
+++ linux-2.6.git/arch/parisc/kernel/smp.c
@@ -228,11 +228,11 @@
}

static void
-send_IPI_mask(cpumask_t mask, enum ipi_message_type op)
+send_IPI_mask(cpumask_t *mask, enum ipi_message_type op)
{
int cpu;

- for_each_cpu_mask(cpu, mask)
+ for_each_cpu_mask(cpu, *mask)
ipi_send(cpu, op);
}

@@ -274,7 +274,7 @@
send_IPI_allbutself(IPI_NOP);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
send_IPI_mask(mask, IPI_CALL_FUNC);
}
Index: linux-2.6.git/arch/powerpc/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/powerpc/include/asm/smp.h
+++ linux-2.6.git/arch/powerpc/include/asm/smp.h
@@ -119,7 +119,7 @@
extern struct smp_ops_t *smp_ops;

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#endif /* __ASSEMBLY__ */

Index: linux-2.6.git/arch/powerpc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6.git/arch/powerpc/kernel/smp.c
@@ -135,11 +135,11 @@
smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
unsigned int cpu;

- for_each_cpu_mask(cpu, mask)
+ for_each_cpu_mask(cpu, *mask)
smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNCTION);
}

Index: linux-2.6.git/arch/sh/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/sh/include/asm/smp.h
+++ linux-2.6.git/arch/sh/include/asm/smp.h
@@ -39,7 +39,7 @@
int plat_register_ipi_handler(unsigned int message,
void (*handler)(void *), void *arg);
extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#else

Index: linux-2.6.git/arch/sh/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sh/kernel/smp.c
+++ linux-2.6.git/arch/sh/kernel/smp.c
@@ -171,11 +171,11 @@
smp_call_function(stop_this_cpu, 0, 0);
}

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
int cpu;

- for_each_cpu_mask(cpu, mask)
+ for_each_cpu_mask(cpu, *mask)
plat_send_ipi(cpu, SMP_MSG_FUNCTION);
}

Index: linux-2.6.git/arch/sparc/include/asm/smp_64.h
===================================================================
--- linux-2.6.git.orig/arch/sparc/include/asm/smp_64.h
+++ linux-2.6.git/arch/sparc/include/asm/smp_64.h
@@ -35,7 +35,7 @@
extern int sparc64_multi_core;

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

/*
* General functions that each host system must provide.
Index: linux-2.6.git/arch/sparc64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sparc64/kernel/smp.c
+++ linux-2.6.git/arch/sparc64/kernel/smp.c
@@ -810,9 +810,9 @@

extern unsigned long xcall_call_function;

-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
{
- xcall_deliver((u64) &xcall_call_function, 0, 0, &mask);
+ xcall_deliver((u64) &xcall_call_function, 0, 0, mask);
}

extern unsigned long xcall_call_function_single;
Index: linux-2.6.git/include/asm-m32r/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-m32r/smp.h
+++ linux-2.6.git/include/asm-m32r/smp.h
@@ -90,7 +90,7 @@
extern unsigned long send_IPI_mask_phys(cpumask_t, int, int);

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#endif /* not __ASSEMBLY__ */

Index: linux-2.6.git/include/asm-mips/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-mips/smp.h
+++ linux-2.6.git/include/asm-mips/smp.h
@@ -58,6 +58,6 @@
extern asmlinkage void smp_call_function_interrupt(void);

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#endif /* __ASM_SMP_H */
Index: linux-2.6.git/include/asm-parisc/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-parisc/smp.h
+++ linux-2.6.git/include/asm-parisc/smp.h
@@ -31,7 +31,7 @@
extern void smp_send_all_nop(void);

extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);

#endif /* !ASSEMBLY */

Index: linux-2.6.git/include/asm-x86/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/smp.h
+++ linux-2.6.git/include/asm-x86/smp.h
@@ -101,9 +101,9 @@
smp_ops.send_call_func_single_ipi(cpu);
}

-static inline void arch_send_call_function_ipi(cpumask_t mask)
+static inline void arch_send_call_function_ipi(cpumask_t *mask)
{
- smp_ops.send_call_func_ipi(mask);
+ smp_ops.send_call_func_ipi(*mask);
}

void native_smp_prepare_boot_cpu(void);
Index: linux-2.6.git/include/linux/smp.h
===================================================================
--- linux-2.6.git.orig/include/linux/smp.h
+++ linux-2.6.git/include/linux/smp.h
@@ -62,7 +62,7 @@
* Call a function on all other processors
*/
int smp_call_function(void(*func)(void *info), void *info, int wait);
-int smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
+int smp_call_function_mask(cpumask_t *mask, void(*func)(void *info), void *info,
int wait);
int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
int wait);
Index: linux-2.6.git/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/kernel/smp.c
+++ linux-2.6.git/kernel/smp.c
@@ -318,7 +318,7 @@
* hardware interrupt handler or from a bottom half handler. Preemption
* must be disabled when calling this function.
*/
-int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
+int smp_call_function_mask(cpumask_t *mask, void (*func)(void *), void *info,
int wait)
{
struct call_function_data d;
@@ -334,8 +334,8 @@
cpu = smp_processor_id();
allbutself = cpu_online_map;
cpu_clear(cpu, allbutself);
- cpus_and(mask, mask, allbutself);
- num_cpus = cpus_weight(mask);
+ cpus_and(*mask, *mask, allbutself);
+ num_cpus = cpus_weight(*mask);

/*
* If zero CPUs, return. If just a single CPU, turn this request
@@ -344,7 +344,7 @@
if (!num_cpus)
return 0;
else if (num_cpus == 1) {
- cpu = first_cpu(mask);
+ cpu = first_cpu(*mask);
return smp_call_function_single(cpu, func, info, wait);
}

@@ -364,7 +364,7 @@
data->csd.func = func;
data->csd.info = info;
data->refs = num_cpus;
- data->cpumask = mask;
+ data->cpumask = *mask;

spin_lock_irqsave(&call_function_lock, flags);
list_add_tail_rcu(&data->csd.list, &call_function_queue);
@@ -377,7 +377,7 @@
if (wait) {
csd_flag_wait(&data->csd);
if (unlikely(slowpath))
- smp_call_function_mask_quiesce_stack(mask);
+ smp_call_function_mask_quiesce_stack(*mask);
}

return 0;
@@ -402,9 +402,10 @@
int smp_call_function(void (*func)(void *), void *info, int wait)
{
int ret;
+ cpumask_t tmp_online_map = cpu_online_map;

preempt_disable();
- ret = smp_call_function_mask(cpu_online_map, func, info, wait);
+ ret = smp_call_function_mask(&tmp_online_map, func, info, wait);
preempt_enable();
return ret;
}
Index: linux-2.6.git/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.git.orig/virt/kvm/kvm_main.c
+++ linux-2.6.git/virt/kvm/kvm_main.c
@@ -124,7 +124,7 @@
if (cpus_empty(cpus))
goto out;
++kvm->stat.remote_tlb_flush;
- smp_call_function_mask(cpus, ack_flush, NULL, 1);
+ smp_call_function_mask(&cpus, ack_flush, NULL, 1);
out:
put_cpu();
}
@@ -149,7 +149,7 @@
}
if (cpus_empty(cpus))
goto out;
- smp_call_function_mask(cpus, ack_flush, NULL, 1);
+ smp_call_function_mask(&cpus, ack_flush, NULL, 1);
out:
put_cpu();
}


Attachments:
0040-smp-call-cpumask.patch (13.94 kB)

2008-08-29 16:15:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected



On Fri, 29 Aug 2008, Jes Sorensen wrote:
>
> I have only tested this on ia64, but it boots, so it's obviously
> perfect<tm> :-)

Well, it probably boots because it doesn't really seem to _change_ much of
anything.

Things like this:

-static inline void arch_send_call_function_ipi(cpumask_t mask)
+static inline void arch_send_call_function_ipi(cpumask_t *mask)
{
- smp_ops.send_call_func_ipi(mask);
+ smp_ops.send_call_func_ipi(*mask);
}

will still do that stack allocation at the time of the call. You'd have to
pass the thing all the way down as a pointer..

Linus

2008-08-29 20:05:16

by David Miller

[permalink] [raw]
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

From: Linus Torvalds <[email protected]>
Date: Fri, 29 Aug 2008 09:14:44 -0700 (PDT)

> Well, it probably boots because it doesn't really seem to _change_ much of
> anything.
>
> Things like this:
>
> -static inline void arch_send_call_function_ipi(cpumask_t mask)
> +static inline void arch_send_call_function_ipi(cpumask_t *mask)
> {
> - smp_ops.send_call_func_ipi(mask);
> + smp_ops.send_call_func_ipi(*mask);
> }
>
> will still do that stack allocation at the time of the call. You'd have to
> pass the thing all the way down as a pointer..

True, but we have to get there one step at a time.

BTW, sparc64 already wants a pointer here, so it's completely ready for
this:

void arch_send_call_function_ipi(cpumask_t mask)
{
xcall_deliver((u64) &xcall_call_function, 0, 0, &mask);
}