Hi,
Waiman Long noticed that the memory barriers in sem_lock() are not
really documented, and while adding documentation, I ended up with one
case where I'm not certain about the wake_q code:
Questions:
- Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
ordering guarantee?
- Is it ok that wake_up_q just writes wake_q->next, shouldn't
smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
provides any ordering.
Example:
- CPU2 never touches lock a. It is just an unrelated wake_q user that also
wants to wake up task 1234.
- I've noticed already that smp_store_acquire() doesn't exist.
So smp_store_mb() is required. But from semantical point of view, we
would
need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
- May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
or should the function be safe by itself?
CPU1: /current=1234, inside do_semtimedop()/
g_wakee = current;
current->state = TASK_INTERRUPTIBLE;
spin_unlock(a);
CPU2: / arbitrary kernel thread that uses wake_q /
wake_q_add(&unrelated_q, 1234);
wake_up_q(&unrelated_q);
<...ongoing>
CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
spin_lock(a);
wake_q_add(,g_wakee);
< within wake_q_add() >:
smp_mb__before_atomic();
if (unlikely(cmpxchg_relaxed(&node->next,
NULL, WAKE_Q_TAIL)))
return false; /* -> this happens */
CPU2:
<within wake_up_q>
1234->wake_q.next = NULL; <<<<<<<<< Ok? Is
store_acquire() missing? >>>>>>>>>>>>
wake_up_process(1234);
< within wake_up_process/try_to_wake_up():
raw_spin_lock_irqsave()
smp_mb__after_spinlock()
if(1234->state = TASK_RUNNING) return;
>
rewritten:
start condition: A = 1; B = 0;
CPU1:
B = 1;
RELEASE, unlock LockX;
CPU2:
lock LockX, ACQUIRE
if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
CPU2:
A = 0;
ACQUIRE, lock LockY
smp_mb__after_spinlock();
READ B
Question: is A = 1, B = 0 possible?
--
Manfred
On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> Hi,
>
> Waiman Long noticed that the memory barriers in sem_lock() are not really
> documented, and while adding documentation, I ended up with one case where
> I'm not certain about the wake_q code:
>
> Questions:
> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> ? ordering guarantee?
Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
prefix) or it doesn't (most RISC LL/SC), if it does,
smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
unconditional barriers.
IOW, the only way to get a cmpxchg without barriers on failure, is with
LL/SC, and in that case smp_mb__{before,after}_atomic() are
unconditional.
For instance, the way ARM64 does cmpxchg() is:
cmpxchg(p, o, n)
do {
v = LL(p);
if (v != o)
return v;
} while (!SC_RELEASE(p, n))
smp_mb();
return v;
And you'll note how on success the store-release constraints all prior
memory operations, and the smp_mb() constraints all later memops. But on
failure there's not a barrier to be found.
> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
> ? smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
> ? happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
> ? provides any ordering.
There is no such thing as store_acquire, it is either load_acquire or
store_release. But just like how we can write load-aquire like
load+smp_mb(), so too I suppose we could write store-acquire like
store+smp_mb(), and that is exactly what is there (through the implied
barrier of wake_up_process()).
(arguably it should've been WRITE_ONCE() I suppose)
>
> Example:
> - CPU2 never touches lock a. It is just an unrelated wake_q user that also
> ? wants to wake up task 1234.
> - I've noticed already that smp_store_acquire() doesn't exist.
> ? So smp_store_mb() is required. But from semantical point of view, we would
> ? need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
> - May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
> ? or should the function be safe by itself?
>
> CPU1: /current=1234, inside do_semtimedop()/
> ??????? g_wakee = current;
> ??????? current->state = TASK_INTERRUPTIBLE;
> ??????? spin_unlock(a);
>
> CPU2: / arbitrary kernel thread that uses wake_q /
> ??????????????? wake_q_add(&unrelated_q, 1234);
> ??????????????? wake_up_q(&unrelated_q);
> ??????????????? <...ongoing>
>
> CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
> ??????????????????????? spin_lock(a);
> ??????????????????????? wake_q_add(,g_wakee);
> ??????????????????????? < within wake_q_add() >:
> ????????????????????????? smp_mb__before_atomic();
> ????????????????????????? if (unlikely(cmpxchg_relaxed(&node->next, NULL,
> WAKE_Q_TAIL)))
> ????????????????????????????? return false; /* -> this happens */
>
> CPU2:
> ??????????????? <within wake_up_q>
> ??????????????? 1234->wake_q.next = NULL; <<<<<<<<< Ok? Is store_acquire()
> missing? >>>>>>>>>>>>
/* smp_mb(); implied by the following wake_up_process() */
> ??????????????? wake_up_process(1234);
> ??????????????? < within wake_up_process/try_to_wake_up():
> ??????????????????? raw_spin_lock_irqsave()
> ??????????????????? smp_mb__after_spinlock()
> ??????????????????? if(1234->state = TASK_RUNNING) return;
> ???????????????? >
>
>
> rewritten:
>
> start condition: A = 1; B = 0;
>
> CPU1:
> ??? B = 1;
> ??? RELEASE, unlock LockX;
>
> CPU2:
> ??? lock LockX, ACQUIRE
> ??? if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
>
> CPU2:
> ??? A = 0;
> ??? ACQUIRE, lock LockY
> ??? smp_mb__after_spinlock();
> ??? READ B
>
> Question: is A = 1, B = 0 possible?
Your example is incomplete (there is no A=1 assignment for example), but
I'm thinking I can guess where that should go given the earlier text.
I don't think this is broken.
Hi Peter,
On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
>> Hi,
>>
>> Waiman Long noticed that the memory barriers in sem_lock() are not really
>> documented, and while adding documentation, I ended up with one case where
>> I'm not certain about the wake_q code:
>>
>> Questions:
>> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>> ordering guarantee?
> Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> prefix) or it doesn't (most RISC LL/SC), if it does,
> smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> unconditional barriers.
And _relaxed() differs from "normal" cmpxchg only for LL/SC
architectures, correct?
Therefore smp_mb__{before,after}_atomic() may be combined with
cmpxchg_relaxed, to form a full memory barrier, on all archs.
[...]
>> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
>> smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
>> happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
>> provides any ordering.
> There is no such thing as store_acquire, it is either load_acquire or
> store_release. But just like how we can write load-aquire like
> load+smp_mb(), so too I suppose we could write store-acquire like
> store+smp_mb(), and that is exactly what is there (through the implied
> barrier of wake_up_process()).
Thanks for confirming my assumption:
The code is correct, due to the implied barrier inside wake_up_process().
[...]
>> rewritten:
>>
>> start condition: A = 1; B = 0;
>>
>> CPU1:
>> B = 1;
>> RELEASE, unlock LockX;
>>
>> CPU2:
>> lock LockX, ACQUIRE
>> if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
>>
>> CPU2:
>> A = 0;
>> ACQUIRE, lock LockY
>> smp_mb__after_spinlock();
>> READ B
>>
>> Question: is A = 1, B = 0 possible?
> Your example is incomplete (there is no A=1 assignment for example), but
> I'm thinking I can guess where that should go given the earlier text.
A=1 is listed as start condition. Way before, someone did wake_q_add().
> I don't think this is broken.
Thanks.
--
Manfred
On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
> Hi Peter,
>
> On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> > On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> > > Hi,
> > >
> > > Waiman Long noticed that the memory barriers in sem_lock() are not really
> > > documented, and while adding documentation, I ended up with one case where
> > > I'm not certain about the wake_q code:
> > >
> > > Questions:
> > > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> > > ? ordering guarantee?
> > Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> > prefix) or it doesn't (most RISC LL/SC), if it does,
> > smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> > unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> > unconditional barriers.
>
> And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
> correct?
Indeed.
> Therefore smp_mb__{before,after}_atomic() may be combined with
> cmpxchg_relaxed, to form a full memory barrier, on all archs.
Just so.
> > > - Is it ok that wake_up_q just writes wake_q->next, shouldn't
> > > ? smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
> > > ? happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
> > > ? provides any ordering.
> > There is no such thing as store_acquire, it is either load_acquire or
> > store_release. But just like how we can write load-aquire like
> > load+smp_mb(), so too I suppose we could write store-acquire like
> > store+smp_mb(), and that is exactly what is there (through the implied
> > barrier of wake_up_process()).
>
> Thanks for confirming my assumption:
> The code is correct, due to the implied barrier inside wake_up_process().
It has a comment there, trying to state this.
task->wake_q.next = NULL;
/*
* wake_up_process() executes a full barrier, which pairs with
* the queueing in wake_q_add() so as not to miss wakeups.
*/
wake_up_process(task);
On Thu, 10 Oct 2019, Peter Zijlstra wrote:
>On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
>> Hi Peter,
>>
>> On 10/10/19 1:42 PM, Peter Zijlstra wrote:
>> > On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
>> > > Hi,
>> > >
>> > > Waiman Long noticed that the memory barriers in sem_lock() are not really
>> > > documented, and while adding documentation, I ended up with one case where
>> > > I'm not certain about the wake_q code:
>> > >
>> > > Questions:
>> > > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>> > > ? ordering guarantee?
>> > Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
>> > prefix) or it doesn't (most RISC LL/SC), if it does,
>> > smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
>> > unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
>> > unconditional barriers.
>>
>> And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
>> correct?
>
>Indeed.
>
>> Therefore smp_mb__{before,after}_atomic() may be combined with
>> cmpxchg_relaxed, to form a full memory barrier, on all archs.
>
>Just so.
We might want something like this?
----8<---------------------------------------------------------
From: Davidlohr Bueso <[email protected]>
Subject: [PATCH] Documentation/memory-barriers.txt: Mention smp_mb__{before,after}_atomic() and CAS
Explicitly mention possible usages to guarantee serialization even upon
failed cmpxchg (or similar) calls along with smp_mb__{before,after}_atomic().
Signed-off-by: Davidlohr Bueso <[email protected]>
---
Documentation/memory-barriers.txt | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..5d2873d4b442 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
This makes sure that the death mark on the object is perceived to be set
*before* the reference counter is decremented.
+ Similarly, these barriers can be used to guarantee serialization for atomic
+ RMW calls on architectures which may not imply memory barriers upon failure.
+
+ obj->next = NULL;
+ smp_mb__before_atomic()
+ if (cmpxchg(&obj->ptr, NULL, val))
+ return;
+
+ This makes sure that the store to the next pointer always has smp_store_mb()
+ semantics. As such, smp_mb__{before,after}_atomic() calls allow optimizing
+ the barrier usage by finer grained serialization.
+
See Documentation/atomic_{t,bitops}.txt for more information.
--
2.16.4
Hi Davidlohr,
On 10/10/19 9:25 PM, Davidlohr Bueso wrote:
> On Thu, 10 Oct 2019, Peter Zijlstra wrote:
>
>> On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
>>
>>> Therefore smp_mb__{before,after}_atomic() may be combined with
>>> cmpxchg_relaxed, to form a full memory barrier, on all archs.
>>
>> Just so.
>
> We might want something like this?
>
> ----8<---------------------------------------------------------
>
> From: Davidlohr Bueso <[email protected]>
> Subject: [PATCH] Documentation/memory-barriers.txt: Mention
> smp_mb__{before,after}_atomic() and CAS
>
> Explicitly mention possible usages to guarantee serialization even upon
> failed cmpxchg (or similar) calls along with
> smp_mb__{before,after}_atomic().
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> Documentation/memory-barriers.txt | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/Documentation/memory-barriers.txt
> b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..5d2873d4b442 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
> This makes sure that the death mark on the object is perceived to
> be set
> *before* the reference counter is decremented.
>
> + Similarly, these barriers can be used to guarantee serialization
> for atomic
> + RMW calls on architectures which may not imply memory barriers
> upon failure.
> +
> + obj->next = NULL;
> + smp_mb__before_atomic()
> + if (cmpxchg(&obj->ptr, NULL, val))
> + return;
> +
> + This makes sure that the store to the next pointer always has
> smp_store_mb()
> + semantics. As such, smp_mb__{before,after}_atomic() calls allow
> optimizing
> + the barrier usage by finer grained serialization.
> +
> See Documentation/atomic_{t,bitops}.txt for more information.
>
>
I don't know. The new documentation would not have answered my question
(is it ok to combine smp_mb__before_atomic() with atomic_relaxed()?).
And it copies content already present in atomic_t.txt.
Thus: I would prefer if the first sentence of the paragraph is replaced:
The list of operations should end with "...", and it should match what
is in atomic_t.txt
Ok?
--
Manfred
On Fri, 11 Oct 2019, Manfred Spraul wrote:
>I don't know. The new documentation would not have answered my
>question (is it ok to combine smp_mb__before_atomic() with
>atomic_relaxed()?). And it copies content already present in
>atomic_t.txt.
Well, the _relaxed (and release/acquire) extentions refer to a
_successful_ operation (LL/SC), and whether it has barriers on
each of the sides before and after. I thought you were mainly
worried about the failed CAS scenario, not the relaxed itself.
I don't know how this copies content from atomic_t.txt, at no
point does it talk about failed CAS.
>
>Thus: I would prefer if the first sentence of the paragraph is
>replaced: The list of operations should end with "...", and it should
>match what is in atomic_t.txt
I'll see about combining some of your changes in patch 5/5 of
your new series, but have to say I still prefer my documentation
change.
Thanks,
Davidlohr