2005-12-05 11:01:16

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH] Fix RCU race in access of nohz_cpu_mask

Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can
cause tickless idle CPUs to be included in rsp->cpumask, which will
extend graceperiods unnecessarily.

Patch below (against 2.6.15-rc5-mm1) fixes this race. It has been tested using
extensions to RCU torture module that forces various CPUs to become idle.

Signed-off-by : Srivatsa Vaddagiri <[email protected]>

---

linux-2.6.15-rc5-mm1-root/kernel/rcupdate.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff -puN kernel/rcupdate.c~rcu-lock kernel/rcupdate.c
--- linux-2.6.15-rc5-mm1/kernel/rcupdate.c~rcu-lock 2005-12-05 15:28:33.000000000 +0530
+++ linux-2.6.15-rc5-mm1-root/kernel/rcupdate.c 2005-12-05 15:28:40.000000000 +0530
@@ -244,15 +244,15 @@ static void rcu_start_batch(struct rcu_c

if (rcp->next_pending &&
rcp->completed == rcp->cur) {
- /* Can't change, since spin lock held. */
- cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
-
rcp->next_pending = 0;
/* next_pending == 0 must be visible in __rcu_process_callbacks()
* before it can see new value of cur.
*/
smp_wmb();
rcp->cur++;
+ smp_mb();
+ cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
+
}
}


_


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017


2005-12-08 00:36:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

Srivatsa Vaddagiri <[email protected]> wrote:
>
> + smp_mb();
> + cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);

Please always include a comment when adding a barrier. Because it's often
not obvious to the reader what that barrier is actually doing.

2005-12-08 18:16:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

Srivatsa Vaddagiri wrote:
>
> Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can
> cause tickless idle CPUs to be included in rsp->cpumask, which will
> extend graceperiods unnecessarily.
> ...
> @@ -244,15 +244,15 @@ static void rcu_start_batch(struct rcu_c
>
> if (rcp->next_pending &&
> rcp->completed == rcp->cur) {
> - /* Can't change, since spin lock held. */
> - cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
> -
> rcp->next_pending = 0;
> /* next_pending == 0 must be visible in __rcu_process_callbacks()
> * before it can see new value of cur.
> */
> smp_wmb();
> rcp->cur++;
> + smp_mb();
> + cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
> +

Could you explain why this patch makes any difference?

grep shows the only user of nohz_cpu_mask in arch/s390/kernel/time.c,
start_hz_timer() and stop_hz_timer().

I can't see how this change can prevent idle cpus to be included in
->cpumask, cpu can add itself to nohz_cpu_mask right after some other
cpu started new grace period.

I think cpu should call cpu_quiet() after adding itself to nohz mask
to eliminate this race.

No?

Oleg.

2005-12-09 02:45:58

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote:
> I can't see how this change can prevent idle cpus to be included in
> ->cpumask, cpu can add itself to nohz_cpu_mask right after some other
> cpu started new grace period.

Yes that can happen, but if they check for rcu_pending right after that
it will prevent them from going tickless atleast (which will prevent grace
periods from being unnecessarily extended). Something like below:


CPU0 CPU1



rcp->cur++; /* New GP */

smp_mb();

rsp->cpumask = 0x3

cpu_set(1, nohz_cpu_mask);

rcu_pending()?
- Yes,
cpu_clear(1, nohz_cpu_mask);
Abort going tickless


Ideally we would have needed a smp_mb() in CPU1 also between setting CPU1
in nohz_cpu_mask and checking for rcu_pending(), but I guess it is not needed
in s390 because of its strong ordering?

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-12-09 02:56:29

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote:
> I think cpu should call cpu_quiet() after adding itself to nohz mask
> to eliminate this race.

That would require rsp->lock to be taken on every idle CPU that wishes to go
tickless. IMO that may not be a good idea.

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-12-09 18:02:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

Srivatsa Vaddagiri wrote:
>
> On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote:
> > I can't see how this change can prevent idle cpus to be included in
> > ->cpumask, cpu can add itself to nohz_cpu_mask right after some other
> > cpu started new grace period.
>
> Yes that can happen, but if they check for rcu_pending right after that
> it will prevent them from going tickless atleast (which will prevent grace
> periods from being unnecessarily extended). Something like below:
>
> CPU0 CPU1
>
> rcp->cur++; /* New GP */
>
> smp_mb();

I think I need some education on memory barriers.

Does this mb() garantees that the new value of ->cur will be visible
on other cpus immediately after smp_mb() (so that rcu_pending() will
notice it) ?

My understanding is that it only garantees that all stores before it
must be visible before any store after mb. (yes, mb implies rmb, but
I think it does not matter if CPU1 adds itself to nonhz mask after
CPU0 reads nohz_cpu_mask). This means that CPU1 can read the stale
value of ->cur. I guess I am wrong, but I can't prove it to myself.

Could you please clarify this?

Even simpler question:

CPU0
var = 1;
wmb();

after that CPU1 does rmb().

Does it garantees that CPU1 will see the new value of var?

> Ideally we would have needed a smp_mb() in CPU1 also between setting CPU1
> in nohz_cpu_mask and checking for rcu_pending(), but I guess it is not needed
> in s390 because of its strong ordering?

I don't know, but please note that s390's definition of smp_mb__after_atomic_inc()
is not a 'nop'.

Oleg.

2005-12-10 15:24:21

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote:
> Srivatsa Vaddagiri wrote:
> >
> > On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote:
> > > I can't see how this change can prevent idle cpus to be included in
> > > ->cpumask, cpu can add itself to nohz_cpu_mask right after some other
> > > cpu started new grace period.
> >
> > Yes that can happen, but if they check for rcu_pending right after that
> > it will prevent them from going tickless atleast (which will prevent grace
> > periods from being unnecessarily extended). Something like below:
> >
> > CPU0 CPU1
> >
> > rcp->cur++; /* New GP */
> >
> > smp_mb();
>
> I think I need some education on memory barriers.
>
> Does this mb() garantees that the new value of ->cur will be visible
> on other cpus immediately after smp_mb() (so that rcu_pending() will
> notice it) ?

AFAIK the new value of ->cur should be visible to other CPUs when smp_mb()
returns. Here's a definition of smp_mb() from Paul's article:

smp_mb(): "memory barrier" that orders both loads and stores. This means loads
and stores preceding the memory barrier are committed to memory before any
loads and stores following the memory barrier.

[ http://www.linuxjournal.com/article/8211 ]

> My understanding is that it only garantees that all stores before it
> must be visible before any store after mb. (yes, mb implies rmb, but
> I think it does not matter if CPU1 adds itself to nonhz mask after
> CPU0 reads nohz_cpu_mask). This means that CPU1 can read the stale
> value of ->cur. I guess I am wrong, but I can't prove it to myself.
>
> Could you please clarify this?
>
> Even simpler question:
>
> CPU0
> var = 1;
> wmb();
>
> after that CPU1 does rmb().
>
> Does it garantees that CPU1 will see the new value of var?

Again, going by the above article, I would expect CPU1 to see the new value of
var.


--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-12-10 17:40:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

Srivatsa Vaddagiri wrote:
>
> On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote:
> > > rcp->cur++; /* New GP */
> > >
> > > smp_mb();
> >
> > I think I need some education on memory barriers.
> >
> > Does this mb() garantees that the new value of ->cur will be visible
> > on other cpus immediately after smp_mb() (so that rcu_pending() will
> > notice it) ?
>
> AFAIK the new value of ->cur should be visible to other CPUs when smp_mb()
> returns. Here's a definition of smp_mb() from Paul's article:
>
> smp_mb(): "memory barrier" that orders both loads and stores. This means loads
> and stores preceding the memory barrier are committed to memory before any
> loads and stores following the memory barrier.

Thanks, but this definition talks about ordering, it does not say
anything about the time when stores are really commited to memory
(and does it mean also that cache-lines are invalidated on other
cpus ?).

> [ http://www.linuxjournal.com/article/8211 ]

And thanks for this link, now I have some understanding about
read_barrier_depends() ...

Oleg.

2005-12-11 17:40:50

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

[Changed the subject line to be more generic in the interest of wider audience]

We seem to be having some confusion over the exact semantics of smp_mb().

Specifically, are all stores preceding smp_mb() guaranteed to have finished
(committed to memory/corresponding cache-lines on other CPUs invalidated)
*before* successive loads are issued?


>From IA-32 manual (download.intel.com/design/Pentium4/manuals/25366817.pdf
Page 271):

"Memory mapped devices and other I/O devices on the bus are often sensitive to
the order of writes to their I/O buffers. I/O instructions can be used to (the
IN and OUT instructions) impose strong write ordering on such accesses as
follows. Prior to executing an I/O instruction, the processor waits for all
___________________________
previous instructions in the program to complete and for all buffered
_____________________________________________________________________
writes to drain to memory. Only instruction fetch and page tables walks can
_________________________
pass I/O instructions. Execution of subsequent instructions do not begin until
the processor determines that the I/O instruction has been completed."

Synchronization mechanisms in multiple-processor systems may depend upon a
strong memory-ordering model. Here, a program can use a locking instruction
such as the XCHG instruction or the LOCK prefix to insure that a
read-modify-write operation on memory is carried out atomically. Locking
operations typically operate like I/O operations in that they wait for all
_________________
previous instructions to complete and for all buffered writes to drain to
_________________________________________________________________________
memory (see Section 7.1.2, “Bus Locking”).
______

Program synchronization can also be carried out with serializing instructions
(see Section 7.4). These instructions are typically used at critical procedure
or task boundaries to force completion of all previous instructions before a
jump to a new section of code or a context switch occurs. Like the I/O and
locking instructions, the processor waits until all previous instructions have
________________________________________________________
been completed and all buffered writes have been drained to memory before
_________________________________________________________________________
executing the serializing instruction."
_____________________________________


>From this, looks like we can interpret that IA-32 processor will wait for all
writes to drain to memory (implies cache invalidation on other CPUs?) *before*
executing the synchronizing instruction?

Question is can this be generalized for other CPUs too?


On Sat, Dec 10, 2005 at 09:55:35PM +0300, Oleg Nesterov wrote:
> Srivatsa Vaddagiri wrote:
> >
> > On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote:
> > > > rcp->cur++; /* New GP */
> > > >
> > > > smp_mb();
> > >
> > > I think I need some education on memory barriers.
> > >
> > > Does this mb() garantees that the new value of ->cur will be visible
> > > on other cpus immediately after smp_mb() (so that rcu_pending() will
> > > notice it) ?
> >
> > AFAIK the new value of ->cur should be visible to other CPUs when smp_mb()
> > returns. Here's a definition of smp_mb() from Paul's article:
> >
> > smp_mb(): "memory barrier" that orders both loads and stores. This means loads
> > and stores preceding the memory barrier are committed to memory before any
> > loads and stores following the memory barrier.
>
> Thanks, but this definition talks about ordering, it does not say
> anything about the time when stores are really commited to memory
> (and does it mean also that cache-lines are invalidated on other
> cpus ?).
>
> > [ http://www.linuxjournal.com/article/8211 ]
>
> And thanks for this link, now I have some understanding about
> read_barrier_depends() ...
>
> Oleg.

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-12-11 21:21:37

by Andrew James Wade

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote:
> [Changed the subject line to be more generic in the interest of wider audience]
>
> We seem to be having some confusion over the exact semantics of smp_mb().
>
> Specifically, are all stores preceding smp_mb() guaranteed to have finished
> (committed to memory/corresponding cache-lines on other CPUs invalidated)
> *before* successive loads are issued?

I doubt it. That's definitely not true of smp_wmb(), which boils down to
__asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains
how the compiler orders write instructions, but is otherwise a nop. i386
has in-order writes.).

And it makes sense that wmb() wouldn't wait for writes: RCU needs
constraints on the order in which writes become visible, but has very week
constraints on when they do. Waiting for writes to flush would hurt
performance.

Andrew Wade

2005-12-11 23:45:17

by Rusty Russell

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote:
> On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote:
> > [Changed the subject line to be more generic in the interest of wider audience]
> >
> > We seem to be having some confusion over the exact semantics of smp_mb().
> >
> > Specifically, are all stores preceding smp_mb() guaranteed to have finished
> > (committed to memory/corresponding cache-lines on other CPUs invalidated)
> > *before* successive loads are issued?
>
> I doubt it. That's definitely not true of smp_wmb(), which boils down to
> __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains
> how the compiler orders write instructions, but is otherwise a nop. i386
> has in-order writes.).
>
> And it makes sense that wmb() wouldn't wait for writes: RCU needs
> constraints on the order in which writes become visible, but has very week
> constraints on when they do. Waiting for writes to flush would hurt
> performance.

On the contrary. I did some digging and asking and thinking about this
for the Unreliable Guide to Kernel Locking, years ago:

wmb() means all writes preceeding will complete before any writes
following are started.
rmb() means all reads preceeding will complete before any reads
following are started.
mb() means all reads and writes preceeding will complete before any
reads and writes following are started.

This does not map to all the possibilities, nor does it take advantage
of all architectures, but it's generally sufficient without getting
insanely complex.

Hope that clarifies,
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol

2005-12-12 00:49:37

by Keith Owens

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Mon, 12 Dec 2005 10:45:16 +1100,
Rusty Russell <[email protected]> wrote:
>On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote:
>> On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote:
>> > We seem to be having some confusion over the exact semantics of smp_mb().
>> >
>> > Specifically, are all stores preceding smp_mb() guaranteed to have finished
>> > (committed to memory/corresponding cache-lines on other CPUs invalidated)
>> > *before* successive loads are issued?
>>
>> I doubt it. That's definitely not true of smp_wmb(), which boils down to
>> __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains
>> how the compiler orders write instructions, but is otherwise a nop. i386
>> has in-order writes.).
>>
>> And it makes sense that wmb() wouldn't wait for writes: RCU needs
>> constraints on the order in which writes become visible, but has very week
>> constraints on when they do. Waiting for writes to flush would hurt
>> performance.
>
>On the contrary. I did some digging and asking and thinking about this
>for the Unreliable Guide to Kernel Locking, years ago:
>
>wmb() means all writes preceeding will complete before any writes
>following are started.
>rmb() means all reads preceeding will complete before any reads
>following are started.
>mb() means all reads and writes preceeding will complete before any
>reads and writes following are started.

FWIW, wmb() on IA64 does not require that preceding stores are flushed
to main memory. It only requires that they be "made visible to other
processors in the coherence domain". "visible" means that the updated
value must reach (at least) an externally snooped cache. There is no
requirement that the preceding stores be flushed all the way to main
memory, the updates only have to get as far as a cache level that other
cpus can see. The cache snooping takes care of flushing to main memory
when necessary.

IA64 does have a memory fence that stalls the cpu until the data is
"accepted by the external platform". That format is expensive and is
only used for memory mapped I/O, where the data really does have to
read the memory before the cpu can perform its next operation. For
example, in the mmiowb() case.

2005-12-12 03:10:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

On Sat, Dec 10, 2005 at 09:55:35PM +0300, Oleg Nesterov wrote:
> Srivatsa Vaddagiri wrote:
> >
> > On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote:
> > > > rcp->cur++; /* New GP */
> > > >
> > > > smp_mb();
> > >
> > > I think I need some education on memory barriers.
> > >
> > > Does this mb() garantees that the new value of ->cur will be visible
> > > on other cpus immediately after smp_mb() (so that rcu_pending() will
> > > notice it) ?
> >
> > AFAIK the new value of ->cur should be visible to other CPUs when smp_mb()
> > returns. Here's a definition of smp_mb() from Paul's article:
> >
> > smp_mb(): "memory barrier" that orders both loads and stores. This means loads
> > and stores preceding the memory barrier are committed to memory before any
> > loads and stores following the memory barrier.
>
> Thanks, but this definition talks about ordering, it does not say
> anything about the time when stores are really commited to memory
> (and does it mean also that cache-lines are invalidated on other
> cpus ?).

Think of a pair of CPUs (creatively named "CPU 0" and "CPU 1") with
a cache-coherent interconnect between them. Then:

1. wmb() guarantees that any writes preceding the wmb() will
be seen by the interconnect before any writes following the
wmb(). But this applies -only- to the writes executed by
the CPU doing the wmb().

2. rmb() guarantees that any changes seen by the interconnect
preceding the rmb() will be seen by any reads following the
rmb(). Again, this applies only to reads executed by the
CPU doing the wmb(). However, the changes might be due to
any CPU.

3. mb() combines the guarantees made by rmb() and wmb().

All CPUs but Alpha make stronger guarantees. On those CPUs, you can
replace "interconnect" in #1 above with "all CPUs", and you can replace
"seen by the interconnect" in #2 above with "caused by any CPU".
Rumor has it that more recent Alpha CPUs also have stronger memory
barriers, but I have thus far been unable to confirm this.

On with the rest of the definitions:

4. smp_wmb(), smp_rmb(), and smb_mb() do nothing in UP kernels,
but do the corresponding memory barrier in SMP kernels.

5. read_barrier_depends() is rmb() on Alpha, and nop on other
CPUs. Non-Alpha CPUs guarantee that if CPU 0 does the
following, where p->a is initially zero and global_pointer->a
is initially 1:

p->a = 2;
smp_wmb();
global_pointer = p;

and if CPU 1 does:

x = global_pointer->a;

then the value of x will be either 1 or 2, never zero. On Alpha,
strange though it may seem (and it did seem strange to me when I
first encountered it!), the value of x could well be zero.
To get the same guarantee on Alpha as on the other CPUs, the
assignment to x must instead be as follows:

tmp = global_pointer;
read_memory_depends();
x = tmp->a;

or, equivalently:

x = rcu_dereference(global_pointer)->a;

There is an smp_read_barrier_depends() that takes effect only
in an SMP kernel, similar to smp_wmb() and friends.

This example may seem quite strange, but the need for the barrier
follows directly from the definition #1, which makes its ordering
guarantee only to the -interconnect-, -not- to the other CPUs.

> > [ http://www.linuxjournal.com/article/8211 ]

Hmm... That one does look familiar. ;-)

> And thanks for this link, now I have some understanding about
> read_barrier_depends() ...

Let's just say it required much patience and perseverence on the part
of the Alpha architects to explain it to me the first time around. ;-)

Thanx, Paul

2005-12-12 04:32:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

"Paul E. McKenney" <[email protected]> wrote:
>
> 1. wmb() guarantees that any writes preceding the wmb() will
> be seen by the interconnect before any writes following the
> wmb(). But this applies -only- to the writes executed by
> the CPU doing the wmb().
>
> 2. rmb() guarantees that any changes seen by the interconnect
> preceding the rmb() will be seen by any reads following the
> rmb(). Again, this applies only to reads executed by the
> CPU doing the wmb(). However, the changes might be due to
> any CPU.
>
> 3. mb() combines the guarantees made by rmb() and wmb().

So foo_mb() in preemptible code is potentially buggy.

I guess we assume that a context switch accidentally did enough of the
right types of barriers for things to work OK.

2005-12-12 04:38:29

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

From: Andrew Morton <[email protected]>
Date: Sun, 11 Dec 2005 20:32:26 -0800

> So foo_mb() in preemptible code is potentially buggy.
>
> I guess we assume that a context switch accidentally did enough of the
> right types of barriers for things to work OK.

A trap ought to flush all memory operations.

There are some incredibly difficult memory error handling cases if the
cpu does not synchronize all pending memory operations when a trap
occurs.

Failing that, yes, to be absolutely safe we'd need to have some
explicit memory barrier in the context switch.

2005-12-12 04:49:59

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

Andrew Morton writes:

> So foo_mb() in preemptible code is potentially buggy.
>
> I guess we assume that a context switch accidentally did enough of the
> right types of barriers for things to work OK.

The context switch code on powerpc includes an mb() for exactly this
reason.

Paul.

2005-12-12 04:48:17

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

David S. Miller wrote:
> From: Andrew Morton <[email protected]>
> Date: Sun, 11 Dec 2005 20:32:26 -0800
>
>
>>So foo_mb() in preemptible code is potentially buggy.
>>
>>I guess we assume that a context switch accidentally did enough of the
>>right types of barriers for things to work OK.
>
>
> A trap ought to flush all memory operations.
>
> There are some incredibly difficult memory error handling cases if the
> cpu does not synchronize all pending memory operations when a trap
> occurs.
>
> Failing that, yes, to be absolutely safe we'd need to have some
> explicit memory barrier in the context switch.

But it isn't that mbs in preemptible code are buggy. If they are
scheduled off then back on the same CPU, the barrier will still
be executed in the expected instruction sequence for that process.

I think the minimum needed is for cpu *migrations* to be memory
barriers. Again, this isn't any particular problem of mb()
instructions - if cpu migrations weren't memory barriers, then
preemptible code isn't even guaranteed ordering with its own memory
operations, which would be quite interesting :)

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-12-12 06:27:49

by Keith Owens

[permalink] [raw]
Subject: Re: [PATCH] Fix RCU race in access of nohz_cpu_mask

On Sun, 11 Dec 2005 20:32:26 -0800,
Andrew Morton <[email protected]> wrote:
>"Paul E. McKenney" <[email protected]> wrote:
>>
>> 1. wmb() guarantees that any writes preceding the wmb() will
>> be seen by the interconnect before any writes following the
>> wmb(). But this applies -only- to the writes executed by
>> the CPU doing the wmb().
>>
>> 2. rmb() guarantees that any changes seen by the interconnect
>> preceding the rmb() will be seen by any reads following the
>> rmb(). Again, this applies only to reads executed by the
>> CPU doing the wmb(). However, the changes might be due to
>> any CPU.
>>
>> 3. mb() combines the guarantees made by rmb() and wmb().
>
>So foo_mb() in preemptible code is potentially buggy.
>
>I guess we assume that a context switch accidentally did enough of the
>right types of barriers for things to work OK.

Not by accident. Any context switch must flush the memory state from
the old cpu's internal buffers, and that flush must get at least as far
as the globally snoopable cache. Otherwise the old cpu could still own
partial memory updates from the process, even though the process was
now running on a new cpu.

2005-12-12 08:40:45

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Mon, Dec 12, 2005 at 11:49:07AM +1100, Keith Owens wrote:
> >On the contrary. I did some digging and asking and thinking about this
> >for the Unreliable Guide to Kernel Locking, years ago:
> >
> >wmb() means all writes preceeding will complete before any writes
> >following are started.
> >rmb() means all reads preceeding will complete before any reads
> >following are started.
> >mb() means all reads and writes preceeding will complete before any
> >reads and writes following are started.
>
> FWIW, wmb() on IA64 does not require that preceding stores are flushed
> to main memory. It only requires that they be "made visible to other
> processors in the coherence domain". "visible" means that the updated
> value must reach (at least) an externally snooped cache. There is no
> requirement that the preceding stores be flushed all the way to main
> memory, the updates only have to get as far as a cache level that other
> cpus can see. The cache snooping takes care of flushing to main memory
> when necessary.

For the context of the problem that we are dealing with, I think this fact
that writes are made "visible" to other CPUs (before smp_mb() finishes and
before other reads are started) is good enough.

Oleg, with all these inputs, I consider the patch I had sent to be correct.
Let me know if you still have some lingering doubts!

P.S :- Thanks to everybody who reponded clarifying this subject.


--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-12-13 05:08:04

by Andrew James Wade

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Sunday 11 December 2005 18:45, Rusty Russell wrote:
> On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote:
> > On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote:
> > > [Changed the subject line to be more generic in the interest of wider audience]
> > >
> > > We seem to be having some confusion over the exact semantics of smp_mb().
> > >
> > > Specifically, are all stores preceding smp_mb() guaranteed to have finished
> > > (committed to memory/corresponding cache-lines on other CPUs invalidated)
> > > *before* successive loads are issued?
> >
> > I doubt it. That's definitely not true of smp_wmb(), which boils down to
> > __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains
> > how the compiler orders write instructions, but is otherwise a nop. i386
> > has in-order writes.).

Hrrm, after doing a bit of reading on cache-coherency, verifying that the
corresponding cache-lines on other CPUs are invalidated can (sometimes)
be done quite quickly, so waiting for that before issuing reads might not
destroy performance. I still doubt that i386es do thing this way, but I
don't think it matters (see below).

> >
> > And it makes sense that wmb() wouldn't wait for writes: RCU needs
> > constraints on the order in which writes become visible, but has very week
> > constraints on when they do. Waiting for writes to flush would hurt
> > performance.
>
> On the contrary. I did some digging and asking and thinking about this
> for the Unreliable Guide to Kernel Locking, years ago:
>
> wmb() means all writes preceeding will complete before any writes
> following are started.

What does it mean for a write to start? For that matter, what does it mean
for a write to complete?

I think my focusing on the hardware details (of which I am woefully
ignorant) was a mistake: the hardware can really do whatever it wants so
long as it implements the expected abstract machine model, and it is
ordering that matters in that model. So it makes sense that ordering, not
time, is what the definitions of the memory barriers focus on.

I think that Oleg's question:
> Does this mb() garantees that the new value of ->cur will be visible
> on other cpus immediately after smp_mb() (so that rcu_pending() will
> notice it) ?
is really just about the timeliness with which writes before a smp_mb()
become visible to other CPUs. Does Linux run on architectures in which
writes are not propagated in a timely manner (that is: other CPUs can read
stale values for a "considerable" time)? I rather doubt it.

But with a proviso to my answer: the compiler will happily hoist reads out
of loops. So writes won't necessarily get read in a timely manner.

Andrew

2005-12-13 05:42:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Tue, Dec 13, 2005 at 12:07:47AM -0500, Andrew James Wade wrote:
> On Sunday 11 December 2005 18:45, Rusty Russell wrote:
> > On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote:
> > > On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote:
> > > > [Changed the subject line to be more generic in the interest of wider audience]
> > > >
> > > > We seem to be having some confusion over the exact semantics of smp_mb().
> > > >
> > > > Specifically, are all stores preceding smp_mb() guaranteed to have finished
> > > > (committed to memory/corresponding cache-lines on other CPUs invalidated)
> > > > *before* successive loads are issued?
> > >
> > > I doubt it. That's definitely not true of smp_wmb(), which boils down to
> > > __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains
> > > how the compiler orders write instructions, but is otherwise a nop. i386
> > > has in-order writes.).
>
> Hrrm, after doing a bit of reading on cache-coherency, verifying that the
> corresponding cache-lines on other CPUs are invalidated can (sometimes)
> be done quite quickly, so waiting for that before issuing reads might not
> destroy performance. I still doubt that i386es do thing this way, but I
> don't think it matters (see below).

Hardware designers need only be concerned with how things -appear-
to the software. They can and do use optimizations that get the
effect of waiting without actually waiting. For example, an x86
CPU need not -really- keep writes ordered as long as the software
cannot tell that they have become misordered. This may sound
strange, but one important example would be a CPU doing writes that
are to variables that it knows do not appear in any other CPU's
cache. In this sort of case, the CPU could reorder the writes until
such time as it detected a request from another CPU requsting a
cache line containing one of the variables.

> > > And it makes sense that wmb() wouldn't wait for writes: RCU needs
> > > constraints on the order in which writes become visible, but has very week
> > > constraints on when they do. Waiting for writes to flush would hurt
> > > performance.
> >
> > On the contrary. I did some digging and asking and thinking about this
> > for the Unreliable Guide to Kernel Locking, years ago:
> >
> > wmb() means all writes preceeding will complete before any writes
> > following are started.
>
> What does it mean for a write to start? For that matter, what does it mean
> for a write to complete?

Potentially many things in both cases:

o Fetching the store instruction that will do the write.

o Determining that all instructions whose execution logically
precedes the write are free of fatal errors (e.g., page
faults, exceptions, traps, ...).

o Determining the exact value that is to be written (which might
have been computed by prior instructions).

o Determining the exact address that is to be written to (which
also might have been computed by prior instructions).

o Determining that the store instruction itself is free of
fatal errors. Note that it might be necessary to compute
the address to be sure.

o Determining that no hardware interrupt will precede the
completion of the store instruction.

o Storing the value to one of the CPU's write buffers (which
is not yet in the coherence region represented by the CPU's
cache).

o Starting the process of fetching the cache line into which
the value is to be stored.

o Starting the process of invalidating the corresponding cache
line from all the other CPUs' caches.

o Receiving the cache line into which the value is to be stored.

o Storing the value to the CPU's cache, which involves combining
the value to be written with the rest of the cache line.

o Responding to the first request for the corresponding cache
line from some other CPU.

o Completing the process of invalidating the corresponding cache
line from all the other CPUs' caches. Note that Alpha's
weak memory-barrier semantics allow this step to complete
long after the value is stored into the writing CPU's cache.

o Storing the value to physical DRAM.

Most of these steps can execute in parallel or out of order. And a
CPU designer would gleefully point out -lots- of steps I have omitted,
some for the sake of brevity, others out of ignorance.

You asked!!!

> I think my focusing on the hardware details (of which I am woefully
> ignorant) was a mistake: the hardware can really do whatever it wants so
> long as it implements the expected abstract machine model, and it is
> ordering that matters in that model. So it makes sense that ordering, not
> time, is what the definitions of the memory barriers focus on.

Yes, maintaining one's sanity requires taking a very simplified view
of memory ordering, especially since no two CPUs order memory references
in exactly the same way.

> I think that Oleg's question:
> > Does this mb() garantees that the new value of ->cur will be visible
> > on other cpus immediately after smp_mb() (so that rcu_pending() will
> > notice it) ?
> is really just about the timeliness with which writes before a smp_mb()
> become visible to other CPUs. Does Linux run on architectures in which
> writes are not propagated in a timely manner (that is: other CPUs can read
> stale values for a "considerable" time)? I rather doubt it.

Remember that smp_mb() is officially about controlling the order in
which values are communicated between the CPU executing the smp_mb()
and the interconnect, -not- between a pair of CPUs. In the general
case (which includes Alpha), controlling the order in which values
are communicated from one CPU to another requires that -both- CPUs
execute memory barriers.

> But with a proviso to my answer: the compiler will happily hoist reads out
> of loops. So writes won't necessarily get read in a timely manner.

Very good!

And this is in fact why smp_wmb() and friends also contain must directives
instructing the compiler not to mess with ordering.

Thanx, Paul

2005-12-13 06:48:40

by Andi Kleen

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

Andrew James Wade <[email protected]> writes:
>
> What does it mean for a write to start? For that matter, what does it mean
> for a write to complete?

wmb() or smp_smb() really makes no guarantees when a write
completes (e.g. when another CPU or a device sees the new value)
It just guarantees ordering. This means when you do e.g.

*p = 1;
wmb();
*p = 2;
wmb();

then another CPU will never see 2 before 1 (but possibly it might
miss the "1 state completely"). When it actually sees the values
(or if it keeps using whatever value was in *p before the first
assignment) is undefined.

The definition quoted above is wrong I think. Rusty can you
perhaps fix it up?

> I think my focusing on the hardware details (of which I am woefully
> ignorant) was a mistake: the hardware can really do whatever it wants so
> long as it implements the expected abstract machine model, and it is
> ordering that matters in that model. So it makes sense that ordering, not
> time, is what the definitions of the memory barriers focus on.

Exactly.

> > Does this mb() garantees that the new value of ->cur will be visible
> > on other cpus immediately after smp_sees 1 or 2 is undefined. mb() (so that rcu_pending() will
> > notice it) ?
> is really just about the timeliness with which writes before a smp_mb()
> become visible to other CPUs. Does Linux run on architectures in which
> writes are not propagated in a timely manner (that is: other CPUs can read
> stale values for a "considerable" time)? I rather doubt it.
>
> But with a proviso to my answer: the compiler will happily hoist reads out
> of loops. So writes won't necessarily get read in a timely manner.

Or just consider interrupts. Any CPU can stop for an very long time
at any time as seen by the other CPUs. Or it might take an machine check
and come back. You can never make any assumptions about when another
CPU sees a write unless you add explicit synchronization like a spinlock.

-Andi

2005-12-13 16:20:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Tue, Dec 13, 2005 at 04:20:13AM -0700, Andi Kleen wrote:
> Andrew James Wade <[email protected]> writes:
> >
> > What does it mean for a write to start? For that matter, what does it mean
> > for a write to complete?
>
> wmb() or smp_smb() really makes no guarantees when a write
> completes (e.g. when another CPU or a device sees the new value)
> It just guarantees ordering. This means when you do e.g.
>
> *p = 1;
> wmb();
> *p = 2;
> wmb();
>
> then another CPU will never see 2 before 1 (but possibly it might
> miss the "1 state completely"). When it actually sees the values
> (or if it keeps using whatever value was in *p before the first
> assignment) is undefined.

It is OK on the write side, but the corresponding barriers are required
on the read side (even on non-Alpha CPUs, since the value of the pointer
remains constant).

Writer Reader

p = 1; p1 = p;
smp_wmb(); smp_rmb();
p = 2; p2 = p;

With these memory barriers in place, if p2 is set to 1, then p1 must
also be set to 1. Similarly, if p1 is set to 2, then p2 must also
be set to 2. If the smp_rmb() is not present on the read side, then
the reading CPU (and compiler) are free to interchange the order
of the two assignment statements. As Andi says, memory barriers do
not necessarily make reads or writes happen faster, instead they only
enforce limited ordering constraints.

If the variable p references MMIO rather than normal memory, then
wmb() and rmb() are needed instead of smp_wmb() and smp_rmb().
This is because the I/O device needs to see the accesses ordered
even in a UP system.

Thanx, Paul

> The definition quoted above is wrong I think. Rusty can you
> perhaps fix it up?
>
> > I think my focusing on the hardware details (of which I am woefully
> > ignorant) was a mistake: the hardware can really do whatever it wants so
> > long as it implements the expected abstract machine model, and it is
> > ordering that matters in that model. So it makes sense that ordering, not
> > time, is what the definitions of the memory barriers focus on.
>
> Exactly.
>
> > > Does this mb() garantees that the new value of ->cur will be visible
> > > on other cpus immediately after smp_sees 1 or 2 is undefined. mb() (so that rcu_pending() will
> > > notice it) ?
> > is really just about the timeliness with which writes before a smp_mb()
> > become visible to other CPUs. Does Linux run on architectures in which
> > writes are not propagated in a timely manner (that is: other CPUs can read
> > stale values for a "considerable" time)? I rather doubt it.
> >
> > But with a proviso to my answer: the compiler will happily hoist reads out
> > of loops. So writes won't necessarily get read in a timely manner.
>
> Or just consider interrupts. Any CPU can stop for an very long time
> at any time as seen by the other CPUs. Or it might take an machine check
> and come back. You can never make any assumptions about when another
> CPU sees a write unless you add explicit synchronization like a spinlock.
>
> -Andi
>

2005-12-13 22:28:07

by Keith Owens

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Tue, 13 Dec 2005 08:20:27 -0800,
"Paul E. McKenney" <[email protected]> wrote:
>If the variable p references MMIO rather than normal memory, then
>wmb() and rmb() are needed instead of smp_wmb() and smp_rmb().

mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O
space compared to normal memory. MIPS also has a non-empty form of
mmiowb().

>This is because the I/O device needs to see the accesses ordered
>even in a UP system.

Why does mmiowb() map to empty on most systems, including Alpha?
Should it not map to wmb() for everything except IA64 and MIPS?

2005-12-13 22:50:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote:
> On Tue, 13 Dec 2005 08:20:27 -0800,
> "Paul E. McKenney" <[email protected]> wrote:
> >If the variable p references MMIO rather than normal memory, then
> >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb().
>
> mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O
> space compared to normal memory. MIPS also has a non-empty form of
> mmiowb().

New one on me!

> >This is because the I/O device needs to see the accesses ordered
> >even in a UP system.
>
> Why does mmiowb() map to empty on most systems, including Alpha?
> Should it not map to wmb() for everything except IA64 and MIPS?

Sounds like I am not the only one that it is new to...

There are over four hundred instances of wmb() still in the drivers tree
in 2.6.14. I suspect that most of them are for MMIO fencing -- the ones
I looked at quickly certainly seem to be.

But, given mmiowb(), what is the point of having wmb()? Why are
write memory barriers needed in UP kernels if not for MMIO and other
hardware-specific accesses? Is your thought that use of wmb() should
be phased out in favor of mmiowb()? (Might be a good idea, but doesn't
look like we are there yet.)

Thanx, Paul

2005-12-14 01:13:01

by Andi Kleen

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Tue, Dec 13, 2005 at 02:50:59PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote:
> > On Tue, 13 Dec 2005 08:20:27 -0800,
> > "Paul E. McKenney" <[email protected]> wrote:
> > >If the variable p references MMIO rather than normal memory, then
> > >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb().
> >
> > mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O
> > space compared to normal memory. MIPS also has a non-empty form of
> > mmiowb().
>
> New one on me!

Didn't it make only a difference on the Altix or something like that?
I suppose they added it only on the drivers for devices supported by SGI.

-Andi

2005-12-14 01:46:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Wed, Dec 14, 2005 at 02:12:53AM +0100, Andi Kleen wrote:
> On Tue, Dec 13, 2005 at 02:50:59PM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote:
> > > On Tue, 13 Dec 2005 08:20:27 -0800,
> > > "Paul E. McKenney" <[email protected]> wrote:
> > > >If the variable p references MMIO rather than normal memory, then
> > > >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb().
> > >
> > > mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O
> > > space compared to normal memory. MIPS also has a non-empty form of
> > > mmiowb().
> >
> > New one on me!
>
> Didn't it make only a difference on the Altix or something like that?
> I suppose they added it only on the drivers for devices supported by SGI.

It could potentially help on a few other CPUs, but quite a few driver
changes would be needed to really bring out the full benefits. I am
concerned that the current state leaves a number of CPU families broken --
the empty definitions cannot be good for other weakly ordered CPUs!

Thanx, Paul

2005-12-15 21:16:35

by Roland Dreier

[permalink] [raw]
Subject: Re: Semantics of smp_mb()

Keith> Why does mmiowb() map to empty on most systems, including
Keith> Alpha? Should it not map to wmb() for everything except
Keith> IA64 and MIPS?

I think the intention (as spelled out in Documentation/DocBook/deviceiobook.tmpl)
is that mmiowb() must be used in conjunction with spinlocks or some
other SMP synchronization mechanism. The locks themselves are
sufficient ordering on the archs where mmiowb() is a NOP.

One way of thinking about this is that the usually barrier operations
like wmb() affect the order of a single CPU's operations -- that is,
wmb() is saying that all writes from the current thread of execution
before the wmb() become visible before any of the writes from the
current after the wmb(). But wmb() doesn't say anything about how one
CPU's writes are ordered against anything a different CPU does.

mmiowb() is something else -- it controls the visibility of writes
from different CPUs. It says that all writes before the mmiowb()
become visible before any writes from any CPU after the mmiowb().
However, this isn't sensible unless we can order the writes between
CPUs, and that only makes sense if there's a lock that lets us say
that one CPU is executing something after the mmiowb().

- R.

2005-12-16 07:46:58

by Jeremy Higdon

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

Roland Dreier got this right. The purpose of the mmiowb is
to ensure that writes to I/O devices while holding a spinlock
are ordered with respect to writes issued after the original
processor releases and a second processor acquires said
spinlock.

A MMIO read would be sufficient, but is much heavier weight.

On the SGI MIPS-based systems, the "sync" instruction was used.
On the Altix systems, a register on the hub chip is read.

>From comments by jejb, we're looking at modifying the mmiowb
API by adding an argument which would be a register to read
from if the architecture in question needs ordering in this
way but does not have a lighter weight mechanism like the Altix
mmiowb. Since there will now need to be a width indication,
mmiowb will be replaced with mmiowb[bwlq].

jeremy

2006-03-13 18:39:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote:
> Roland Dreier got this right. The purpose of the mmiowb is
> to ensure that writes to I/O devices while holding a spinlock
> are ordered with respect to writes issued after the original
> processor releases and a second processor acquires said
> spinlock.
>
> A MMIO read would be sufficient, but is much heavier weight.
>
> On the SGI MIPS-based systems, the "sync" instruction was used.
> On the Altix systems, a register on the hub chip is read.
>
> >From comments by jejb, we're looking at modifying the mmiowb
> API by adding an argument which would be a register to read
> from if the architecture in question needs ordering in this
> way but does not have a lighter weight mechanism like the Altix
> mmiowb. Since there will now need to be a width indication,
> mmiowb will be replaced with mmiowb[bwlq].

Any progress on this front? I figured that I would wait to update
the ordering document until after this change happened, but if it
is going to be awhile, I should proceed with the current API.

Thoughts?

Thanx, Paul

2006-03-31 04:57:37

by Jeremy Higdon

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Mon, Mar 13, 2006 at 10:39:32AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote:
> > Roland Dreier got this right. The purpose of the mmiowb is
> > to ensure that writes to I/O devices while holding a spinlock
> > are ordered with respect to writes issued after the original
> > processor releases and a second processor acquires said
> > spinlock.
> >
> > A MMIO read would be sufficient, but is much heavier weight.
> >
> > On the SGI MIPS-based systems, the "sync" instruction was used.
> > On the Altix systems, a register on the hub chip is read.
> >
> > >From comments by jejb, we're looking at modifying the mmiowb
> > API by adding an argument which would be a register to read
> > from if the architecture in question needs ordering in this
> > way but does not have a lighter weight mechanism like the Altix
> > mmiowb. Since there will now need to be a width indication,
> > mmiowb will be replaced with mmiowb[bwlq].
>
> Any progress on this front? I figured that I would wait to update
> the ordering document until after this change happened, but if it
> is going to be awhile, I should proceed with the current API.
>
> Thoughts?
>
> Thanx, Paul

Brent Casavant was going to be working on this. I'll CC him so that
he can indicate the status.

jeremy

2006-03-31 06:18:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

On Thu, Mar 30, 2006 at 08:56:27PM -0800, Jeremy Higdon wrote:
> On Mon, Mar 13, 2006 at 10:39:32AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote:
> > > Roland Dreier got this right. The purpose of the mmiowb is
> > > to ensure that writes to I/O devices while holding a spinlock
> > > are ordered with respect to writes issued after the original
> > > processor releases and a second processor acquires said
> > > spinlock.
> > >
> > > A MMIO read would be sufficient, but is much heavier weight.
> > >
> > > On the SGI MIPS-based systems, the "sync" instruction was used.
> > > On the Altix systems, a register on the hub chip is read.
> > >
> > > >From comments by jejb, we're looking at modifying the mmiowb
> > > API by adding an argument which would be a register to read
> > > from if the architecture in question needs ordering in this
> > > way but does not have a lighter weight mechanism like the Altix
> > > mmiowb. Since there will now need to be a width indication,
> > > mmiowb will be replaced with mmiowb[bwlq].
> >
> > Any progress on this front? I figured that I would wait to update
> > the ordering document until after this change happened, but if it
> > is going to be awhile, I should proceed with the current API.
> >
> > Thoughts?
>
> Brent Casavant was going to be working on this. I'll CC him so that
> he can indicate the status.

David Howells is documenting memory barriers in the Documentation
directory as well.

Thanx, Paul

2006-03-31 23:38:46

by Jesse Barnes

[permalink] [raw]
Subject: Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ]

[Unknown author]
> > > From comments by jejb, we're looking at modifying the mmiowb
> > > API by adding an argument which would be a register to read
> > > from if the architecture in question needs ordering in this
> > > way but does not have a lighter weight mechanism like the Altix
> > > mmiowb. Since there will now need to be a width indication,
> > > mmiowb will be replaced with mmiowb[bwlq].
> >
> > Any progress on this front? I figured that I would wait to update
> > the ordering document until after this change happened, but if it
> > is going to be awhile, I should proceed with the current API.

I avoided doing this initially on the premise that I shouldn't do things
'just because we might need it in the future' since that way seems to
lead to madness. Is there actual hardware that needs an argument to
mmiowb() (i.e. that can't just do a read from the system's single bridge
or something)?

Jesse