2011-02-01 08:28:05

by Jan Beulich

[permalink] [raw]
Subject: calling smp_call_function_many() with non-stable CPU mask

There are a couple of examples of smp_call_function_many() getting
called with mm_cpumask() as the first argument. Since that mask
generally can change while smp_call_function_many() is executing,
it seems there might be a problem with the case where that mask
becomes empty after the initial checks, but before the mask is made
permanent (by copying into data->cpumask).

Shouldn't there be a check of data->refs being zero right after
setting it (to avoid having csd_lock_wait() wait for a remote CPU
to clear the lock flag, and to avoid adding the entry to
call_function.queue)?

If that isn't considered necessary, is it then incorrect to pass
in-flight CPU masks to smp_call_function_many() (and should
this requirement then be documented somewhere, and the
existing calls all be inspected for correctness)?

Thanks, Jan


2011-02-01 08:44:43

by Milton Miller

[permalink] [raw]
Subject: Re: calling smp_call_function_many() with non-stable CPU mask

> There are a couple of examples of smp_call_function_many() getting
> called with mm_cpumask() as the first argument. Since that mask
> generally can change while smp_call_function_many() is executing,
> it seems there might be a problem with the case where that mask
> becomes empty after the initial checks, but before the mask is made
> permanent (by copying into data->cpumask).
>
> Shouldn't there be a check of data->refs being zero right after
> setting it (to avoid having csd_lock_wait() wait for a remote CPU
> to clear the lock flag, and to avoid adding the entry to
> call_function.queue)?
>
> If that isn't considered necessary, is it then incorrect to pass
> in-flight CPU masks to smp_call_function_many() (and should
> this requirement then be documented somewhere, and the
> existing calls all be inspected for correctness)?
>

Mike Galbraith just brought this up, and I supplied a patch, and even
a rediff on top of other changes:

http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/02813.html
http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/03172.html

http://lkml.indiana.edu/hypermail/linux/kernel/1102.0/00017.html


This doesn't address https://bugzilla.kernel.org/show_bug.cgi?id=23042
which is x86 not expecting the mask to be cleared while its thinking
about the mask.

milton

2011-02-01 08:44:59

by Milton Miller

[permalink] [raw]
Subject: Re: calling smp_call_function_many() with non-stable CPU mask

> There are a couple of examples of smp_call_function_many() getting
> called with mm_cpumask() as the first argument. Since that mask
> generally can change while smp_call_function_many() is executing,
> it seems there might be a problem with the case where that mask
> becomes empty after the initial checks, but before the mask is made
> permanent (by copying into data->cpumask).
>
> Shouldn't there be a check of data->refs being zero right after
> setting it (to avoid having csd_lock_wait() wait for a remote CPU
> to clear the lock flag, and to avoid adding the entry to
> call_function.queue)?
>
> If that isn't considered necessary, is it then incorrect to pass
> in-flight CPU masks to smp_call_function_many() (and should
> this requirement then be documented somewhere, and the
> existing calls all be inspected for correctness)?
>

Mike Galbraith just brought this up, and I supplied a patch, and even
a rediff on top of other changes:

http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/02813.html
http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/03172.html

http://lkml.indiana.edu/hypermail/linux/kernel/1102.0/00017.html


This doesn't address https://bugzilla.kernel.org/show_bug.cgi?id=23042
which is x86 not expecting the mask to be cleared while its thinking
about the mask.

milton

2011-02-01 08:45:18

by Milton Miller

[permalink] [raw]
Subject: Re: calling smp_call_function_many() with non-stable CPU mask

> There are a couple of examples of smp_call_function_many() getting
> called with mm_cpumask() as the first argument. Since that mask
> generally can change while smp_call_function_many() is executing,
> it seems there might be a problem with the case where that mask
> becomes empty after the initial checks, but before the mask is made
> permanent (by copying into data->cpumask).
>
> Shouldn't there be a check of data->refs being zero right after
> setting it (to avoid having csd_lock_wait() wait for a remote CPU
> to clear the lock flag, and to avoid adding the entry to
> call_function.queue)?
>
> If that isn't considered necessary, is it then incorrect to pass
> in-flight CPU masks to smp_call_function_many() (and should
> this requirement then be documented somewhere, and the
> existing calls all be inspected for correctness)?
>

Mike Galbraith just brought this up, and I supplied a patch, and even
a rediff on top of other changes:

http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/02813.html
http://lkml.indiana.edu/hypermail/linux/kernel/1101.3/03172.html

http://lkml.indiana.edu/hypermail/linux/kernel/1102.0/00017.html


This doesn't address https://bugzilla.kernel.org/show_bug.cgi?id=23042
which is x86 not expecting the mask to be cleared while its thinking
about the mask.

milton

2011-02-01 09:00:31

by Mike Galbraith

[permalink] [raw]
Subject: Re: calling smp_call_function_many() with non-stable CPU mask

On Tue, 2011-02-01 at 09:27 +0100, Jan Beulich wrote:
> There are a couple of examples of smp_call_function_many() getting
> called with mm_cpumask() as the first argument. Since that mask
> generally can change while smp_call_function_many() is executing,
> it seems there might be a problem with the case where that mask
> becomes empty after the initial checks, but before the mask is made
> permanent (by copying into data->cpumask).
>
> Shouldn't there be a check of data->refs being zero right after
> setting it (to avoid having csd_lock_wait() wait for a remote CPU
> to clear the lock flag, and to avoid adding the entry to
> call_function.queue)?
>
> If that isn't considered necessary, is it then incorrect to pass
> in-flight CPU masks to smp_call_function_many() (and should
> this requirement then be documented somewhere, and the
> existing calls all be inspected for correctness)?


Freshly baked.

http://www.google.de/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3A%
2F%2Flkml.org%2Flkml%2F2011%2F2%2F1%2F18&rct=j&q=PATCH%202%2F3%20v2%5D%
20smp_call_function_many%3A%20handle%20concurrent%20clearing%20of%
20mask&ei=M8ZHTda-LILCswaE6uyVAw&usg=AFQjCNE4M55BFGih2jXsHoAkNd5oTzpQvQ&cad=rja

-Mike