LinuxLists.cc - PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex

2006-11-05 19:35:20

Subject: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

When task->array != NULL, try_to_wake_up() just goes to "out_running" and sets
task->state = TASK_RUNNING.

In that case hrtimer_wakeup() does:

timeout->task = NULL; <----- [1]

spin_lock(runqueues->lock);

task->state = TASK_RUNNING; <----- [2]

from Documentation/memory-barriers.txt

Memory operations that occur before a LOCK operation may appear to
happen after it completes.

This means that [2] may be completed before [1], and

CPU_0 CPU_1
rt_mutex_slowlock:

for (;;) {
...
if (timeout && !timeout->task)
return -ETIMEDOUT;
...

schedule();
hrtimer_wakeup() sets
... task->state = TASK_RUNNING,
but "timeout->task = NULL"
is not completed
set_current_state(TASK_INTERRUPTIBLE);
}

we can miss a timeout.

Of course, this all is scholasticism, this can't happen in practice, but
may be this patch makes sense as a documentation update.

Signed-off-by: Oleg Nesterov <[email protected]>

--- STATS/kernel/hrtimer.c~1_hrtw 2006-10-22 18:24:03.000000000 +0400
+++ STATS/kernel/hrtimer.c 2006-11-05 22:32:36.000000000 +0300
@@ -662,9 +662,12 @@ static int hrtimer_wakeup(struct hrtimer
container_of(timer, struct hrtimer_sleeper, timer);
struct task_struct *task = t->task;

- t->task = NULL;
- if (task)
+ if (task) {
+ t->task = NULL;
+ /* must be visible before task->state = TASK_RUNNING */
+ smp_wmb();
wake_up_process(task);
+ }

return HRTIMER_NORESTART;
}

2006-11-05 22:29:54

by Linus Torvalds

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Sun, 5 Nov 2006, Oleg Nesterov wrote:
>
> When task->array != NULL, try_to_wake_up() just goes to "out_running" and sets
> task->state = TASK_RUNNING.
>
> In that case hrtimer_wakeup() does:
>
> timeout->task = NULL; <----- [1]
>
> spin_lock(runqueues->lock);
>
> task->state = TASK_RUNNING; <----- [2]
>
> from Documentation/memory-barriers.txt
>
> Memory operations that occur before a LOCK operation may appear to
> happen after it completes.
>
> This means that [2] may be completed before [1], and

Yes. On x86 (and x86-64) you'll never see this, because writes are always
seen in order regardless, and in addition, the spin_lock is actually
totally serializing anyway. On most other architectures, the spin_lock
will serialize all the writes too, but it's not guaranteed, so in theory
you're right. I suspect no actual architecture will do this, but hey,
when talking memory ordering, safe is a lot better than sorry.

That said, since "task->state" in only tested _inside_ the runqueue lock,
there is no race that I can see. Since we've gotten the runqueue lock in
order to even check task-state, the processor that _sets_ task state must
not only have done the "spin_lock()", it must also have done the
"spin_unlock()", and _that_ will not allow either the timeout or the task
state to haev leaked out from under it (because that would imply that the
critical region leaked out too).

So I don't think the race exists anyway - the schedule() will return
immediately (because it will see TASK_RUNNING), and we'll just retry.

Linus

2006-11-05 22:53:53

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Sun, 5 Nov 2006, Linus Torvalds wrote:

>
> That said, since "task->state" in only tested _inside_ the runqueue lock,
> there is no race that I can see. Since we've gotten the runqueue lock in
> order to even check task-state, the processor that _sets_ task state must
> not only have done the "spin_lock()", it must also have done the
> "spin_unlock()", and _that_ will not allow either the timeout or the task
> state to haev leaked out from under it (because that would imply that the
> critical region leaked out too).
>
> So I don't think the race exists anyway - the schedule() will return
> immediately (because it will see TASK_RUNNING), and we'll just retry.
>

This whole situation is very theoretical, but I think this actually can
happen *theoretically*.

OK, the spin_lock doesn't do any serialization, but the unlock does. But
the problem can happen before the unlock. Because of the loop.

CPU 1 CPU 2

task_rq_lock()

p->state = TASK_RUNNING;

(from bottom of for loop)
set_current_state(TASK_INTERRUPTIBLE);

for (;;) { (looping)

if (timeout && !timeout->task)

(now CPU implements)
t->task = NULL

task_rq_unlock();

schedule() (with state == TASK_INTERRUPTIBLE)

Again, this is very theoretical, and I don't even think that this can
happen if you tried to make it. But I guess if hardware were to change in
the future with the same rules that we have today with barriers, that this
can be a race.

-- Steve

2006-11-05 22:54:20

by Oleg Nesterov

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On 11/05, Linus Torvalds wrote:
>
>
> On Sun, 5 Nov 2006, Oleg Nesterov wrote:
> >
> > When task->array != NULL, try_to_wake_up() just goes to "out_running" and sets
> > task->state = TASK_RUNNING.
> >
> > In that case hrtimer_wakeup() does:
> >
> > timeout->task = NULL; <----- [1]
> >
> > spin_lock(runqueues->lock);
> >
> > task->state = TASK_RUNNING; <----- [2]
> >
> > from Documentation/memory-barriers.txt
> >
> > Memory operations that occur before a LOCK operation may appear to
> > happen after it completes.
> >
> > This means that [2] may be completed before [1], and
>
> Yes. On x86 (and x86-64) you'll never see this, because writes are always
> seen in order regardless, and in addition, the spin_lock is actually
> totally serializing anyway. On most other architectures, the spin_lock
> will serialize all the writes too, but it's not guaranteed, so in theory
> you're right. I suspect no actual architecture will do this, but hey,
> when talking memory ordering, safe is a lot better than sorry.
>
> That said, since "task->state" in only tested _inside_ the runqueue lock,
> there is no race that I can see. Since we've gotten the runqueue lock in
> order to even check task-state, the processor that _sets_ task state must
> not only have done the "spin_lock()", it must also have done the
> "spin_unlock()", and _that_ will not allow either the timeout or the task
> state to haev leaked out from under it (because that would imply that the
> critical region leaked out too).
>
> So I don't think the race exists anyway - the schedule() will return
> immediately (because it will see TASK_RUNNING), and we'll just retry.

schedule() will see TASK_INTERRUPTIBLE. hrtimer_wakeup() sets TASK_RUNNING,
rt_mutex_slowlock() does set_current_state(TASK_INTERRUPTIBLE) after that.
schedule() takes runqueue lock, yes, but we are testing timeout->task before.

Oleg.

2006-11-05 23:08:47

by Oleg Nesterov

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On 11/05, Steven Rostedt wrote:
>
> On Sun, 5 Nov 2006, Linus Torvalds wrote:
>
> >
> > That said, since "task->state" in only tested _inside_ the runqueue lock,
> > there is no race that I can see. Since we've gotten the runqueue lock in
> > order to even check task-state, the processor that _sets_ task state must
> > not only have done the "spin_lock()", it must also have done the
> > "spin_unlock()", and _that_ will not allow either the timeout or the task
> > state to haev leaked out from under it (because that would imply that the
> > critical region leaked out too).
> >
> > So I don't think the race exists anyway - the schedule() will return
> > immediately (because it will see TASK_RUNNING), and we'll just retry.
> >
>
> This whole situation is very theoretical, but I think this actually can
> happen *theoretically*.
>
>
> OK, the spin_lock doesn't do any serialization, but the unlock does. But
> the problem can happen before the unlock. Because of the loop.

Yes, yes, thanks.

( Actually, this was more "is my understanding correct?" than a patch )

Thanks!

> CPU 1 CPU 2
>
> task_rq_lock()
>
> p->state = TASK_RUNNING;
>
>
> (from bottom of for loop)
> set_current_state(TASK_INTERRUPTIBLE);
>
> for (;;) { (looping)
>
> if (timeout && !timeout->task)
>
>
> (now CPU implements)
> t->task = NULL
>
> task_rq_unlock();
>
> schedule() (with state == TASK_INTERRUPTIBLE)
>
>
> Again, this is very theoretical, and I don't even think that this can
> happen if you tried to make it. But I guess if hardware were to change in
> the future with the same rules that we have today with barriers, that this
> can be a race.
>
> -- Steve
>
>

2006-11-06 03:08:54

by Linus Torvalds

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Sun, 5 Nov 2006, Steven Rostedt wrote:
>
> This whole situation is very theoretical, but I think this actually can
> happen *theoretically*.
>
> OK, the spin_lock doesn't do any serialization, but the unlock does. But
> the problem can happen before the unlock. Because of the loop.
>
> CPU 1 CPU 2
>
> task_rq_lock()
>
> p->state = TASK_RUNNING;
>
>
> (from bottom of for loop)
> set_current_state(TASK_INTERRUPTIBLE);
>
> for (;;) { (looping)
>
> if (timeout && !timeout->task)
>
>
> (now CPU implements)
> t->task = NULL
>
> task_rq_unlock();
>
> schedule() (with state == TASK_INTERRUPTIBLE)

Yeah, that seems a real bug. You _always_ need to actually do the thing
that you wait for _before_ you want it up. That's what all the scheduling
primitives depend on - you can't wake people up first, and then set the
condition variable.

So if a rt_mutex depeds on something that is set inside the rq-lock, it
needs to get the task rw-lock in order to check it.

Linus

2006-11-06 08:58:19

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

> Yes. On x86 (and x86-64) you'll never see this, because writes are always
> seen in order regardless, and in addition, the spin_lock is actually
> totally serializing anyway. On most other architectures, the spin_lock
> will serialize all the writes too, but it's not guaranteed, so in theory
> you're right. I suspect no actual architecture will do this, but hey,
> when talking memory ordering, safe is a lot better than sorry.

PowerPC doesn't serialize the writes on spin_lock, only on spin_unlock.

(That is, previous writes can "leak" into the lock, but writes done
before the unlock can't leak out of the spinlock).

Now, I've just glanced at the thread, so I don't know if that's relevant
to the problems you guys are talking about :-)

Ben.

2006-11-06 11:07:43

by Oleg Nesterov

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On 11/05, Linus Torvalds wrote:
>
> On Sun, 5 Nov 2006, Steven Rostedt wrote:
> >
> > This whole situation is very theoretical, but I think this actually can
> > happen *theoretically*.
> >
> > OK, the spin_lock doesn't do any serialization, but the unlock does. But
> > the problem can happen before the unlock. Because of the loop.
> >
> > CPU 1 CPU 2
> >
> > task_rq_lock()
> >
> > p->state = TASK_RUNNING;
> >
> >
> > (from bottom of for loop)
> > set_current_state(TASK_INTERRUPTIBLE);
> >
> > for (;;) { (looping)
> >
> > if (timeout && !timeout->task)
> >
> >
> > (now CPU implements)
> > t->task = NULL
> >
> > task_rq_unlock();
> >
> > schedule() (with state == TASK_INTERRUPTIBLE)
>
> Yeah, that seems a real bug. You _always_ need to actually do the thing
> that you wait for _before_ you want it up. That's what all the scheduling
> primitives depend on - you can't wake people up first, and then set the
> condition variable.
>
> So if a rt_mutex depeds on something that is set inside the rq-lock, it
> needs to get the task rw-lock in order to check it.

No, rt_mutex is fine (I think).

My changelog was very unclean and confusing, I'll try again. What we are
doing is:

rt_mutex_slowlock:

task->state = TASK_INTERRUPTIBLE;

mb();

if (CONDITION)
return -ETIMEDOUT;

schedule();

This is common and correct.

hrtimer_wakeup:

CONDITION = 1; // [1]

spin_lock(rq->lock);

task->state = TASK_RUNNING; // [2]

This needs 'wmb()' between [1] and [2] unless spin_lock() garantees memory
ordering. Of course, rt_mutex can take rq->lock to solve this, but I don't
think it would be right.

Oleg.

2006-11-06 12:27:18

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 6 Nov 2006, Oleg Nesterov wrote:

> On 11/05, Linus Torvalds wrote:
> >
> > Yeah, that seems a real bug. You _always_ need to actually do the thing
> > that you wait for _before_ you want it up. That's what all the scheduling
> > primitives depend on - you can't wake people up first, and then set the
> > condition variable.
> >
> > So if a rt_mutex depeds on something that is set inside the rq-lock, it
> > needs to get the task rw-lock in order to check it.
>
> No, rt_mutex is fine (I think).

I agree. The problem isn't with rt_mutex. It rt_mutex doesn't depend on
that condition to break out of the loop. That condition is an extra
condition that's there to stop in the case we timed out before the owner
released the lock. The real condition to break out of the loop is the
owner releasing the lock, and there's no known races there.

So the time out is what needs to make sure that rt_mutex sees the event.

>
> My changelog was very unclean and confusing, I'll try again. What we are
> doing is:
>
> rt_mutex_slowlock:
>
> task->state = TASK_INTERRUPTIBLE;
>
> mb();
>
> if (CONDITION)
> return -ETIMEDOUT;
>
> schedule();
>
> This is common and correct.
>
> hrtimer_wakeup:
>
> CONDITION = 1; // [1]

Right! The changing of the condition isn't even in any spin lock. But that
condition needs be be completed before the changes done within the spin
lock, but unfortunately, the spin lock itself doesn't guarantee anything.

>
> spin_lock(rq->lock);
>
> task->state = TASK_RUNNING; // [2]
>
> This needs 'wmb()' between [1] and [2] unless spin_lock() garantees memory
> ordering. Of course, rt_mutex can take rq->lock to solve this, but I don't
> think it would be right.

Correct, the solution should NOT change rt_mutex. Your original patch is
the correct approach.

-- Steve

>
> Oleg.
>
>

2006-11-06 12:32:21

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Sun, 5 Nov 2006, Oleg Nesterov wrote:

> When task->array != NULL, try_to_wake_up() just goes to "out_running" and sets
> task->state = TASK_RUNNING.
>
> In that case hrtimer_wakeup() does:
>
> timeout->task = NULL; <----- [1]
>
> spin_lock(runqueues->lock);
>
> task->state = TASK_RUNNING; <----- [2]
>
> from Documentation/memory-barriers.txt
>
> Memory operations that occur before a LOCK operation may appear to
> happen after it completes.
>
> This means that [2] may be completed before [1], and
>
> CPU_0 CPU_1
> rt_mutex_slowlock:
>
> for (;;) {
> ...
> if (timeout && !timeout->task)
> return -ETIMEDOUT;
> ...
>
> schedule();
> hrtimer_wakeup() sets
> ... task->state = TASK_RUNNING,
> but "timeout->task = NULL"
> is not completed
> set_current_state(TASK_INTERRUPTIBLE);
> }
>
> we can miss a timeout.
>
> Of course, this all is scholasticism, this can't happen in practice, but
> may be this patch makes sense as a documentation update.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- STATS/kernel/hrtimer.c~1_hrtw 2006-10-22 18:24:03.000000000 +0400
> +++ STATS/kernel/hrtimer.c 2006-11-05 22:32:36.000000000 +0300
> @@ -662,9 +662,12 @@ static int hrtimer_wakeup(struct hrtimer
> container_of(timer, struct hrtimer_sleeper, timer);
> struct task_struct *task = t->task;
>
> - t->task = NULL;
> - if (task)
> + if (task) {
> + t->task = NULL;
> + /* must be visible before task->state = TASK_RUNNING */
> + smp_wmb();
> wake_up_process(task);
> + }
>
> return HRTIMER_NORESTART;
> }
>
>

Ingo, Thomas, could you ack this too, or Nack it with an explaination.

Ingo, you might want to read the whole thread here.

Acked-by: Steven Rostedt <[email protected]>

-- Steve

2006-11-06 12:35:28

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 6 Nov 2006, Benjamin Herrenschmidt wrote:

>
> > Yes. On x86 (and x86-64) you'll never see this, because writes are always
> > seen in order regardless, and in addition, the spin_lock is actually
> > totally serializing anyway. On most other architectures, the spin_lock
> > will serialize all the writes too, but it's not guaranteed, so in theory
> > you're right. I suspect no actual architecture will do this, but hey,
> > when talking memory ordering, safe is a lot better than sorry.
>
> PowerPC doesn't serialize the writes on spin_lock, only on spin_unlock.
>
> (That is, previous writes can "leak" into the lock, but writes done
> before the unlock can't leak out of the spinlock).
>
> Now, I've just glanced at the thread, so I don't know if that's relevant
> to the problems you guys are talking about :-)
>

It is relevant. In powerpc, can one write happen before another write?

x = 1;
barrier(); (only compiler barrier)
b = 2;

And have CPU 2 see b=2 before seeing x=1?

If so, then I guess this is indeed a bug on powerpc.

-- Steve

2006-11-06 13:12:54

by Oleg Nesterov

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On 11/06, Steven Rostedt wrote:
>
> Acked-by: Steven Rostedt <[email protected]>

Thanks.

But on the other hand we probably have a similar code (set condition +
wake_up_process()) in other places too, and __wake_up(wait_queue_head_t)
has (in theory) the same problem. Probably we can add something like

smp_wmb_unless_spin_lock_implies_memory_barrier_on_this_arch()

somewhere in try_to_wake_up(). I dunno.

Oleg.

2006-11-06 14:12:48

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 6 Nov 2006, Oleg Nesterov wrote:

> On 11/06, Steven Rostedt wrote:
> >
> > Acked-by: Steven Rostedt <[email protected]>
>
> Thanks.
>
> But on the other hand we probably have a similar code (set condition +
> wake_up_process()) in other places too, and __wake_up(wait_queue_head_t)
> has (in theory) the same problem. Probably we can add something like
>
> smp_wmb_unless_spin_lock_implies_memory_barrier_on_this_arch()

Probably smp_wmb_before_spin_lock would be a good name.

>
> somewhere in try_to_wake_up(). I dunno.
>

But I don't think it belongs in try_to_wake_up. That's just assuming too
much in that function. The event to wake up could simply be an interrupt
took place, where the smb_wmb would not be needed.

As Linus has stated. Most cases where a process is going to sleep on an
event, it had better check to make sure that the event didn't happen
between the time it set itself to something other than TASK_RUNNING and
when it calls schedule.

I guess what this points out is that in such a case, the event maker must
make sure that the event is visible before it wakes up the process that's
waiting on the event. But putting that logic into try_to_wake_up, IMHO is
an overkill.

But...

Doing a quick search on wake_up, I came across loop.c (also my first hit).

Here we have: (some funcs pulled in).

loop_make_request:
lo->lo_bio = lo->lo_biotail = bio;
wake_up(&lo->lo_event);

And with wait_event_interruptible(lo->lo_event, lo->lo_bio)
we have:

prepare_to_wait(..., TASK_INTERRUPTIBLE);
if (lo->bio)
break;
schedule_timeout();

So this probably also has the same bug!

There's nothing to prevent the CPU from showing lo->bio has changed
*after* setting wake_up to TASK_RUNNING.

So maybe this *is* a race condition that is all over the kernel for other
architectures!

-- Steve

2006-11-06 15:06:09

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 6 Nov 2006, Steven Rostedt wrote:

> But...
>
> Doing a quick search on wake_up, I came across loop.c (also my first hit).
>
> Here we have: (some funcs pulled in).
>
> loop_make_request:
> lo->lo_bio = lo->lo_biotail = bio;
> wake_up(&lo->lo_event);
>
> And with wait_event_interruptible(lo->lo_event, lo->lo_bio)
> we have:
>
> prepare_to_wait(..., TASK_INTERRUPTIBLE);
> if (lo->bio)
> break;
> schedule_timeout();
>
>
> So this probably also has the same bug!
>
> There's nothing to prevent the CPU from showing lo->bio has changed
> *after* setting wake_up to TASK_RUNNING.
>
> So maybe this *is* a race condition that is all over the kernel for other
> architectures!

Talking this over with Stephen Tweedie, he pointed out to me that the
setting of the task state is done within the wait queue locks, and thus,
the above is not a problem.

But the old paradigm of:

add_wait_queue();
for (;;) {
task->state = TASK_INTERRUPTIBLE;
if (condition)
break;
schedule();
}

Can still be a bug, since the other CPU might hold off the condition
to after it set the state to TASK_RUNNING.

-- Steve

2006-11-06 20:59:15

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 2006-11-06 at 07:35 -0500, Steven Rostedt wrote:

> It is relevant. In powerpc, can one write happen before another write?
>
>
> x = 1;
> barrier(); (only compiler barrier)
> b = 2;
>
>
> And have CPU 2 see b=2 before seeing x=1?

Yes. Definitely.

> If so, then I guess this is indeed a bug on powerpc.

Ben.

2006-11-06 21:18:10

by Steven Rostedt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Tue, 7 Nov 2006, Benjamin Herrenschmidt wrote:

> On Mon, 2006-11-06 at 07:35 -0500, Steven Rostedt wrote:
>
> > It is relevant. In powerpc, can one write happen before another write?
> >
> >
> > x = 1;
> > barrier(); (only compiler barrier)
> > b = 2;
> >
> >
> > And have CPU 2 see b=2 before seeing x=1?
>
> Yes. Definitely.

OK, I see in powerpc, that spin lock calls isync. This just clears the
pipeline. It doesn't touch the loads and stores, right?

So basically saying this:

x=1;
asm ("isync");
b=2;

Would that have the same problem too? Where another CPU can see x=1
before seeing b=2?

Thanks!

-- Steve

2006-11-06 21:41:35

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

On Mon, 2006-11-06 at 16:17 -0500, Steven Rostedt wrote:
> On Tue, 7 Nov 2006, Benjamin Herrenschmidt wrote:
>
> > On Mon, 2006-11-06 at 07:35 -0500, Steven Rostedt wrote:
> >
> > > It is relevant. In powerpc, can one write happen before another write?
> > >
> > >
> > > x = 1;
> > > barrier(); (only compiler barrier)
> > > b = 2;
> > >
> > >
> > > And have CPU 2 see b=2 before seeing x=1?
> >
> > Yes. Definitely.
>
> OK, I see in powerpc, that spin lock calls isync. This just clears the
> pipeline. It doesn't touch the loads and stores, right?

Yes. That isync is to prevent loads to be speculated accross spin_lock,
thus leaking out of the lock by the top. In fact, it doesn't act on the
load per-se but it prevent speculative execution accross the conditional
branch in the spin_lock.

> So basically saying this:
>
> x=1;
> asm ("isync");
> b=2;
>
> Would that have the same problem too? Where another CPU can see x=1
> before seeing b=2?

Yes.

What isync provides is

a = *foo
spin_lock_loop_with_conditional_branch
isync
b = *bar

It prevents the read of b from being speculated by the CPU ahead of a

Ben.