2008-08-24 16:41:28

by Avi Kivity

[permalink] [raw]
Subject: oops due to smp_call_function_single changes

My 2s x 2c Intel server (Xeon 5150) won't boot anymore. I bisected this to

commit cc7a486cac78f6fc1a24e8cd63036bae8d2ab431
Author: Nick Piggin <[email protected]>
Date: Mon Aug 11 13:49:30 2008 +1000

generic-ipi: fix stack and rcu interaction bug in
smp_call_function_mask()

* Venki Pallipadi <[email protected]> wrote:

> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask()
with
> wait parameter.
>
[...]
Nice debugging work.

I'd suggest something like the attached (boot tested) patch as the
simple
fix for now.

I expect the benefits from the less synchronized,
multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!

Signed-off-by: Ingo Molnar <[email protected]>


Reverting this commit (and cc7a486cac78f6fc1a24e8cd63036bae8d2ab431,
which is an add-on fix) allows my guest to boot.

My .config can be found in
http://userweb.kernel.org/~avi/scf-oops/config. I have an oops
somewhere inside a mobile phone but have yet to find a way to dig it
out. Netconsole doesn't work for me built-in for some reason, and this
is during boot (I think during the loading of the ahci modules).

--
error compiling committee.c: too many arguments to function