2019-10-12 05:51:25

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc

Hi,

Updated series, based on input from Davidlohr:

- Mixing WRITE_ONCE(), when not holding a lock, and "normal" writes,
when holding a lock, makes the code less readable.
Thus use _ONCE() everywhere, for both WRITE_ONCE() and READ_ONCE().

- According to my understanding, wake_q_add() does not contain a barrier
that protects the refount increase. Document that, and add the barrier
to the ipc code

- and, based on patch review: The V1 patch for ipc/sem.c is incorrect,
->state must be set to "-EINTR", not EINTR.

From V1:

The memory barriers in ipc are not properly documented, and at least
for some architectures insufficient:
Reading the xyz->status is only a control barrier, thus
smp_acquire__after_ctrl_dep() was missing in mqueue.c and msg.c
sem.c contained a full smp_mb(), which is not required.

Patches:
Patch 1: Document the barrier rules for wake_q_add().

Patch 2: remove code duplication
@Davidlohr: There is no "Signed-off-by" in your mail, otherwise I would
list you as author.

Patch 3-5: Update the ipc code, especially add missing
smp_mb__after_ctrl_dep().

Clarify that smp_mb__{before,after}_atomic() are compatible with all
RMW atomic operations, not just the operations that do not return a value.

Patch 6: Documentation for smp_mb__{before,after}_atomic().

Open issues:
- Is my analysis regarding the refcount correct?

- Review other users of wake_q_add().

- More testing. I did some tests, but doubt that the tests would be
sufficient to show issues with regards to incorrect memory barriers.

- Should I add a "Fixes:" or "Cc:stable"? The issues that I see are
the missing smp_mb__after_ctrl_dep(), and WRITE_ONCE() vs.
"ptr = NULL", and a risk regarding the refcount that I can't evaluate.


What do you think?

--
Manfred


2019-10-12 05:52:12

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

The documentation in memory-barriers.txt claims that
smp_mb__{before,after}_atomic() are for atomic ops that do not return a
value.

This is misleading and doesn't match the example in atomic_t.txt,
and e.g. smp_mb__before_atomic() may and is used together with
cmpxchg_relaxed() in the wake_q code.

The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
RMW atomic operation to a full memory barrier.
The return code of the atomic operation has no impact, so all of the
following examples are valid:

1)
smp_mb__before_atomic();
atomic_add();

2)
smp_mb__before_atomic();
atomic_xchg_relaxed();

3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();

Invalid would be:
smp_mb__before_atomic();
atomic_set();

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
Documentation/memory-barriers.txt | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..52076b057400 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();

- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
+ These are for use with atomic RMW functions (such as add, subtract,
+ increment, decrement, failed conditional operations, ...) that do
+ not imply memory barriers, but where the code needs a memory barrier,
+ for example when used for reference counting.

- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are also used for atomic RMW bitop functions that do imply a full
+ memory barrier (such as set_bit and clear_bit).

As an example, consider a piece of code that marks an object as being dead
and then decrements the object's reference count:
--
2.21.0

2019-10-12 05:54:55

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 5/6] ipc/sem.c: Document and update memory barriers

The patch documents and updates the memory barriers in ipc/sem.c:
- Add smp_store_release() to wake_up_sem_queue_prepare() and
document why it is needed.

- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
as the pair for the barrier inside wake_up_sem_queue_prepare().

- Add comments to all barriers, and mention the rules in the block
regarding locking.

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
ipc/sem.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index ec97a7072413..c6c5954a2030 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -205,7 +205,9 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
*
* Memory ordering:
* Most ordering is enforced by using spin_lock() and spin_unlock().
- * The special case is use_global_lock:
+ *
+ * Exceptions:
+ * 1) use_global_lock:
* Setting it from non-zero to 0 is a RELEASE, this is ensured by
* using smp_store_release().
* Testing if it is non-zero is an ACQUIRE, this is ensured by using
@@ -214,6 +216,24 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
* this smp_load_acquire(), this is guaranteed because the smp_load_acquire()
* is inside a spin_lock() and after a write from 0 to non-zero a
* spin_lock()+spin_unlock() is done.
+ *
+ * 2) queue.status:
+ * Initialization is done while holding sem_lock(), so no further barrier is
+ * required.
+ * Setting it to a result code is a RELEASE, this is ensured by both a
+ * smp_store_release() (for case a) and while holding sem_lock()
+ * (for case b).
+ * The AQUIRE when reading the result code without holding sem_lock() is
+ * achieved by using READ_ONCE() + smp_acquire__after_ctrl_dep().
+ * (case a above).
+ * Reading the result code while holding sem_lock() needs no further barriers,
+ * the locks inside sem_lock() enforce ordering (case b above)
+ *
+ * 3) current->state:
+ * current->state is set to TASK_INTERRUPTIBLE while holding sem_lock().
+ * The wakeup is handled using the wake_q infrastructure. wake_q wakeups may
+ * happen immediately after calling wake_q_add. As wake_q_add() is called
+ * when holding sem_lock(), no further barriers are required.
*/

#define sc_semmsl sem_ctls[0]
@@ -766,15 +786,24 @@ static int perform_atomic_semop(struct sem_array *sma, struct sem_queue *q)
static inline void wake_up_sem_queue_prepare(struct sem_queue *q, int error,
struct wake_q_head *wake_q)
{
+ /*
+ * When the wakeup is performed, q->sleeper->state is read and later
+ * set to TASK_RUNNING. This may happen at any time, even before
+ * wake_q_add() returns. Memory ordering for q->sleeper->state is
+ * enforced by sem_lock(): we own sem_lock now (that was the ACQUIRE),
+ * and q->sleeper wrote q->sleeper->state before calling sem_unlock()
+ * (->RELEASE).
+ */
wake_q_add(wake_q, q->sleeper);
/*
- * Rely on the above implicit barrier, such that we can
- * ensure that we hold reference to the task before setting
- * q->status. Otherwise we could race with do_exit if the
- * task is awoken by an external event before calling
- * wake_up_process().
+ * Here, we need a barrier to protect the refcount increase inside
+ * wake_q_add().
+ * case a: The barrier inside wake_q_add() pairs with
+ * READ_ONCE(q->status) + smp_acquire__after_ctrl_dep() in
+ * do_semtimedop().
+ * case b: nothing, ordering is enforced by the locks in sem_lock().
*/
- WRITE_ONCE(q->status, error);
+ smp_store_release(&q->status, error);
}

static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
@@ -2148,9 +2177,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
}

do {
+ /* memory ordering ensured by the lock in sem_lock() */
WRITE_ONCE(queue.status, -EINTR);
queue.sleeper = current;

+ /* memory ordering is ensured by the lock in sem_lock() */
__set_current_state(TASK_INTERRUPTIBLE);
sem_unlock(sma, locknum);
rcu_read_unlock();
@@ -2174,12 +2205,16 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
error = READ_ONCE(queue.status);
if (error != -EINTR) {
/*
- * User space could assume that semop() is a memory
- * barrier: Without the mb(), the cpu could
- * speculatively read in userspace stale data that was
- * overwritten by the previous owner of the semaphore.
+ * Memory barrier for queue.status, case a):
+ * The smp_acquire__after_ctrl_dep(), together with the
+ * READ_ONCE() above pairs with the barrier inside
+ * wake_up_sem_queue_prepare().
+ * The barrier protects user space, too: User space may
+ * assume that all data from the CPU that did the wakeup
+ * semop() is visible on the wakee CPU when the sleeping
+ * semop() returns.
*/
- smp_mb();
+ smp_acquire__after_ctrl_dep();
goto out_free;
}

@@ -2189,6 +2224,10 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (!ipc_valid_object(&sma->sem_perm))
goto out_unlock_free;

+ /*
+ * No necessity for any barrier:
+ * We are protect by sem_lock() (case b)
+ */
error = READ_ONCE(queue.status);

/*
--
2.21.0

2019-10-12 05:55:14

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 4/6] ipc/msg.c: Update and document memory barriers.

Transfer findings from ipc/mqueue.c:
- A control barrier was missing for the lockless receive case
So in theory, not yet initialized data may have been copied
to user space - obviously only for architectures where
control barriers are not NOP.

- use smp_store_release(). In theory, the refount
may have been decreased to 0 already when wake_q_add()
tries to get a reference.

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 44 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 8dec945fa030..e6b20a7e6341 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -184,6 +184,10 @@ static inline void ss_add(struct msg_queue *msq,
{
mss->tsk = current;
mss->msgsz = msgsz;
+ /*
+ * No memory barrier required: we did ipc_lock_object(),
+ * and the waker obtains that lock before calling wake_q_add().
+ */
__set_current_state(TASK_INTERRUPTIBLE);
list_add_tail(&mss->list, &msq->q_senders);
}
@@ -238,7 +242,14 @@ static void expunge_all(struct msg_queue *msq, int res,

list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(res));
+
+ /*
+ * The barrier is required to ensure that the refcount increase
+ * inside wake_q_add() is completed before the state is updated.
+ *
+ * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
+ */
+ smp_store_release(&msr->r_msg, ERR_PTR(res));
}
}

@@ -798,13 +809,17 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
list_del(&msr->r_list);
if (msr->r_maxsize < msg->m_ts) {
wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(-E2BIG));
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG));
} else {
ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk));
msq->q_rtime = ktime_get_real_seconds();

wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, msg);
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, msg);
return 1;
}
}
@@ -1154,7 +1169,11 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
msr_d.r_maxsize = INT_MAX;
else
msr_d.r_maxsize = bufsz;
- msr_d.r_msg = ERR_PTR(-EAGAIN);
+
+ /* memory barrier not require due to ipc_lock_object() */
+ WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN));
+
+ /* memory barrier not required, we own ipc_lock_object() */
__set_current_state(TASK_INTERRUPTIBLE);

ipc_unlock_object(&msq->q_perm);
@@ -1183,8 +1202,21 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
* signal) it will either see the message and continue ...
*/
msg = READ_ONCE(msr_d.r_msg);
- if (msg != ERR_PTR(-EAGAIN))
+ if (msg != ERR_PTR(-EAGAIN)) {
+ /*
+ * Memory barrier for msr_d.r_msg
+ * The smp_acquire__after_ctrl_dep(), together with the
+ * READ_ONCE() above pairs with the barrier inside
+ * wake_q_add().
+ * The barrier protects the accesses to the message in
+ * do_msg_fill(). In addition, the barrier protects user
+ * space, too: User space may assume that all data from
+ * the CPU that sent the message is visible.
+ */
+ smp_acquire__after_ctrl_dep();
+
goto out_unlock1;
+ }

/*
* ... or see -EAGAIN, acquire the lock to check the message
@@ -1192,7 +1224,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
*/
ipc_lock_object(&msq->q_perm);

- msg = msr_d.r_msg;
+ msg = READ_ONCE(msr_d.r_msg);
if (msg != ERR_PTR(-EAGAIN))
goto out_unlock0;

--
2.21.0

2019-10-12 05:55:55

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 2/6] ipc/mqueue.c: Remove duplicated code

Patch entirely from Davidlohr:
pipelined_send() and pipelined_receive() are identical, so merge them.

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
ipc/mqueue.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3d920ff15c80..be48c0ba92f7 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
* The same algorithm is used for senders.
*/

-/* pipelined_send() - send a message directly to the task waiting in
- * sys_mq_timedreceive() (without inserting message into a queue).
- */
-static inline void pipelined_send(struct wake_q_head *wake_q,
+static inline void __pipelined_op(struct wake_q_head *wake_q,
struct mqueue_inode_info *info,
- struct msg_msg *message,
- struct ext_wait_queue *receiver)
+ struct ext_wait_queue *this)
{
- receiver->msg = message;
- list_del(&receiver->list);
- wake_q_add(wake_q, receiver->task);
+ list_del(&this->list);
+ wake_q_add(wake_q, this->task);
/*
* Rely on the implicit cmpxchg barrier from wake_q_add such
* that we can ensure that updating receiver->state is the last
@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
* yet, at that point we can later have a use-after-free
* condition and bogus wakeup.
*/
- receiver->state = STATE_READY;
+ this->state = STATE_READY;
+}
+
+/* pipelined_send() - send a message directly to the task waiting in
+ * sys_mq_timedreceive() (without inserting message into a queue).
+ */
+static inline void pipelined_send(struct wake_q_head *wake_q,
+ struct mqueue_inode_info *info,
+ struct msg_msg *message,
+ struct ext_wait_queue *receiver)
+{
+ receiver->msg = message;
+ __pipelined_op(wake_q, info, receiver);
}

/* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
if (msg_insert(sender->msg, info))
return;

- list_del(&sender->list);
- wake_q_add(wake_q, sender->task);
- sender->state = STATE_READY;
+ __pipelined_op(wake_q, info, sender);
}

static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
--
2.21.0

2019-10-14 06:38:25

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/6] ipc/mqueue.c: Remove duplicated code

On Sat, 12 Oct 2019, Manfred Spraul wrote:

>Patch entirely from Davidlohr:
>pipelined_send() and pipelined_receive() are identical, so merge them.
>

Sorry, yeah feel free to add my:

Signed-off-by: Davidlohr Bueso <[email protected]>

>Signed-off-by: Manfred Spraul <[email protected]>
>Cc: Davidlohr Bueso <[email protected]>
>---
> ipc/mqueue.c | 31 ++++++++++++++++++-------------
> 1 file changed, 18 insertions(+), 13 deletions(-)
>
>diff --git a/ipc/mqueue.c b/ipc/mqueue.c
>index 3d920ff15c80..be48c0ba92f7 100644
>--- a/ipc/mqueue.c
>+++ b/ipc/mqueue.c
>@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> * The same algorithm is used for senders.
> */
>
>-/* pipelined_send() - send a message directly to the task waiting in
>- * sys_mq_timedreceive() (without inserting message into a queue).
>- */
>-static inline void pipelined_send(struct wake_q_head *wake_q,
>+static inline void __pipelined_op(struct wake_q_head *wake_q,
> struct mqueue_inode_info *info,
>- struct msg_msg *message,
>- struct ext_wait_queue *receiver)
>+ struct ext_wait_queue *this)
> {
>- receiver->msg = message;
>- list_del(&receiver->list);
>- wake_q_add(wake_q, receiver->task);
>+ list_del(&this->list);
>+ wake_q_add(wake_q, this->task);
> /*
> * Rely on the implicit cmpxchg barrier from wake_q_add such
> * that we can ensure that updating receiver->state is the last
>@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
> * yet, at that point we can later have a use-after-free
> * condition and bogus wakeup.
> */
>- receiver->state = STATE_READY;
>+ this->state = STATE_READY;
>+}
>+
>+/* pipelined_send() - send a message directly to the task waiting in
>+ * sys_mq_timedreceive() (without inserting message into a queue).
>+ */
>+static inline void pipelined_send(struct wake_q_head *wake_q,
>+ struct mqueue_inode_info *info,
>+ struct msg_msg *message,
>+ struct ext_wait_queue *receiver)
>+{
>+ receiver->msg = message;
>+ __pipelined_op(wake_q, info, receiver);
> }
>
> /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
>@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
> if (msg_insert(sender->msg, info))
> return;
>
>- list_del(&sender->list);
>- wake_q_add(wake_q, sender->task);
>- sender->state = STATE_READY;
>+ __pipelined_op(wake_q, info, sender);
> }
>
> static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
>--
>2.21.0
>

2019-10-14 13:04:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] ipc/mqueue.c: Remove duplicated code

On Sat, Oct 12, 2019 at 07:49:54AM +0200, Manfred Spraul wrote:
> +static inline void __pipelined_op(struct wake_q_head *wake_q,
> struct mqueue_inode_info *info,
> + struct ext_wait_queue *this)
> {
> + list_del(&this->list);
> + wake_q_add(wake_q, this->task);
> /*
> * Rely on the implicit cmpxchg barrier from wake_q_add such
> * that we can ensure that updating receiver->state is the last
> @@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
> * yet, at that point we can later have a use-after-free
> * condition and bogus wakeup.
> */
> + this->state = STATE_READY;

^^^^^^^^^^ whitespace damage

> +}

2019-10-14 13:08:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
> The documentation in memory-barriers.txt claims that
> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
> value.
>
> This is misleading and doesn't match the example in atomic_t.txt,
> and e.g. smp_mb__before_atomic() may and is used together with
> cmpxchg_relaxed() in the wake_q code.
>
> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
> RMW atomic operation to a full memory barrier.
> The return code of the atomic operation has no impact, so all of the
> following examples are valid:

The value return of atomic ops is relevant in so far that
(traditionally) all value returning atomic ops already implied full
barriers. That of course changed when we added
_release/_acquire/_relaxed variants.

>
> 1)
> smp_mb__before_atomic();
> atomic_add();
>
> 2)
> smp_mb__before_atomic();
> atomic_xchg_relaxed();
>
> 3)
> smp_mb__before_atomic();
> atomic_fetch_add_relaxed();
>
> Invalid would be:
> smp_mb__before_atomic();
> atomic_set();
>
> Signed-off-by: Manfred Spraul <[email protected]>
> Cc: Waiman Long <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> Documentation/memory-barriers.txt | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..52076b057400 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> + These are for use with atomic RMW functions (such as add, subtract,
> + increment, decrement, failed conditional operations, ...) that do
> + not imply memory barriers, but where the code needs a memory barrier,
> + for example when used for reference counting.
>
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> + These are also used for atomic RMW bitop functions that do imply a full

s/do/do not/ ?

> + memory barrier (such as set_bit and clear_bit).


2019-10-14 19:34:51

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

Hello Peter,

On 10/14/19 3:03 PM, Peter Zijlstra wrote:
> On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
>> The documentation in memory-barriers.txt claims that
>> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
>> value.
>>
>> This is misleading and doesn't match the example in atomic_t.txt,
>> and e.g. smp_mb__before_atomic() may and is used together with
>> cmpxchg_relaxed() in the wake_q code.
>>
>> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
>> RMW atomic operation to a full memory barrier.
>> The return code of the atomic operation has no impact, so all of the
>> following examples are valid:
> The value return of atomic ops is relevant in so far that
> (traditionally) all value returning atomic ops already implied full
> barriers. That of course changed when we added
> _release/_acquire/_relaxed variants.
I've updated the Change description accordingly
>> 1)
>> smp_mb__before_atomic();
>> atomic_add();
>>
>> 2)
>> smp_mb__before_atomic();
>> atomic_xchg_relaxed();
>>
>> 3)
>> smp_mb__before_atomic();
>> atomic_fetch_add_relaxed();
>>
>> Invalid would be:
>> smp_mb__before_atomic();
>> atomic_set();
>>
>> Signed-off-by: Manfred Spraul <[email protected]>
>> Cc: Waiman Long <[email protected]>
>> Cc: Davidlohr Bueso <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> ---
>> Documentation/memory-barriers.txt | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>> index 1adbb8a371c7..52076b057400 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
>> (*) smp_mb__before_atomic();
>> (*) smp_mb__after_atomic();
>>
>> - These are for use with atomic (such as add, subtract, increment and
>> - decrement) functions that don't return a value, especially when used for
>> - reference counting. These functions do not imply memory barriers.
>> + These are for use with atomic RMW functions (such as add, subtract,
>> + increment, decrement, failed conditional operations, ...) that do
>> + not imply memory barriers, but where the code needs a memory barrier,
>> + for example when used for reference counting.
>>
>> - These are also used for atomic bitop functions that do not return a
>> - value (such as set_bit and clear_bit).
>> + These are also used for atomic RMW bitop functions that do imply a full
> s/do/do not/ ?
Sorry, yes, of course
>> + memory barrier (such as set_bit and clear_bit).



Attachments:
0001-Update-Documentation-for-_-acquire-release-relaxed.patch (2.39 kB)

2019-10-14 23:47:40

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Mon, 14 Oct 2019, Manfred Spraul wrote:
>I've updated the Change description accordingly

I continue to think my memory-barriers.txt change regarding failed
Rmw is still good to have. Unless any strong objections, could you
also add the patch to the series?

Thanks,
Davidlohr

2019-10-15 05:37:02

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Sat, 12 Oct 2019, Manfred Spraul wrote:
>Invalid would be:
> smp_mb__before_atomic();
> atomic_set();

fyi I've caught a couple of naughty users:

drivers/crypto/cavium/nitrox/nitrox_main.c
drivers/gpu/drm/msm/adreno/a5xx_preempt.c

Thanks,
Davidlohr

2019-10-15 07:30:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Mon, Oct 14, 2019 at 06:26:04PM -0700, Davidlohr Bueso wrote:
> On Sat, 12 Oct 2019, Manfred Spraul wrote:
> > Invalid would be:
> > smp_mb__before_atomic();
> > atomic_set();
>
> fyi I've caught a couple of naughty users:
>
> drivers/crypto/cavium/nitrox/nitrox_main.c
> drivers/gpu/drm/msm/adreno/a5xx_preempt.c

Yes, there's still some of that. Andrea went and killed a buch a while
ago I think.

2019-10-15 07:46:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Mon, Oct 14, 2019 at 07:49:56PM +0200, Manfred Spraul wrote:

> From 61c85a56994e32ea393af9debef4cccd9cd24abd Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <[email protected]>
> Date: Fri, 11 Oct 2019 10:33:26 +0200
> Subject: [PATCH] Update Documentation for _{acquire|release|relaxed}()
>
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
>
> smp_mb__{before,after}_atomic() is now indended for all RMW operations

"intended"

Otherwise it looks fine, thanks!

2019-10-15 20:19:08

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On Tue, 15 Oct 2019, Peter Zijlstra wrote:

>On Mon, Oct 14, 2019 at 06:26:04PM -0700, Davidlohr Bueso wrote:
>> On Sat, 12 Oct 2019, Manfred Spraul wrote:
>> > Invalid would be:
>> > smp_mb__before_atomic();
>> > atomic_set();
>>
>> fyi I've caught a couple of naughty users:
>>
>> drivers/crypto/cavium/nitrox/nitrox_main.c
>> drivers/gpu/drm/msm/adreno/a5xx_preempt.c
>
>Yes, there's still some of that. Andrea went and killed a buch a while
>ago I think.

I sent these, which just does smp_mb():

https://lore.kernel.org/lkml/[email protected]
https://lore.kernel.org/lkml/20191015162144.fuyc25tdwvc6ddu3@linux-p48b

Similarly, I was doing some barrier auditing in btrfs code recently
(completely unrelated to these topics) and noted that there are some
cases where we can inverse this exercise. Iow, callers doing atomic
Rmw along with smp_mb(), which we can replace with these upgradable
calls and benefit, for example in x86.

Thanks,
Davidlohr

2019-10-16 03:45:22

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()

On 10/14/19 1:49 PM, Manfred Spraul wrote:
> Hello Peter,
>
> On 10/14/19 3:03 PM, Peter Zijlstra wrote:
>> On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
>>> The documentation in memory-barriers.txt claims that
>>> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
>>> value.
>>>
>>> This is misleading and doesn't match the example in atomic_t.txt,
>>> and e.g. smp_mb__before_atomic() may and is used together with
>>> cmpxchg_relaxed() in the wake_q code.
>>>
>>> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
>>> RMW atomic operation to a full memory barrier.
>>> The return code of the atomic operation has no impact, so all of the
>>> following examples are valid:
>> The value return of atomic ops is relevant in so far that
>> (traditionally) all value returning atomic ops already implied full
>> barriers. That of course changed when we added
>> _release/_acquire/_relaxed variants.
> I've updated the Change description accordingly
>>> 1)
>>>     smp_mb__before_atomic();
>>>     atomic_add();
>>>
>>> 2)
>>>     smp_mb__before_atomic();
>>>     atomic_xchg_relaxed();
>>>
>>> 3)
>>>     smp_mb__before_atomic();
>>>     atomic_fetch_add_relaxed();
>>>
>>> Invalid would be:
>>>     smp_mb__before_atomic();
>>>     atomic_set();
>>>
>>> Signed-off-by: Manfred Spraul <[email protected]>
>>> Cc: Waiman Long <[email protected]>
>>> Cc: Davidlohr Bueso <[email protected]>
>>> Cc: Peter Zijlstra <[email protected]>
>>> ---
>>>   Documentation/memory-barriers.txt | 11 ++++++-----
>>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/memory-barriers.txt
>>> b/Documentation/memory-barriers.txt
>>> index 1adbb8a371c7..52076b057400 100644
>>> --- a/Documentation/memory-barriers.txt
>>> +++ b/Documentation/memory-barriers.txt
>>> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
>>>    (*) smp_mb__before_atomic();
>>>    (*) smp_mb__after_atomic();
>>>   -     These are for use with atomic (such as add, subtract,
>>> increment and
>>> -     decrement) functions that don't return a value, especially
>>> when used for
>>> -     reference counting.  These functions do not imply memory
>>> barriers.
>>> +     These are for use with atomic RMW functions (such as add,
>>> subtract,
>>> +     increment, decrement, failed conditional operations, ...) that do
>>> +     not imply memory barriers, but where the code needs a memory
>>> barrier,
>>> +     for example when used for reference counting.
>>>   -     These are also used for atomic bitop functions that do not
>>> return a
>>> -     value (such as set_bit and clear_bit).
>>> +     These are also used for atomic RMW bitop functions that do
>>> imply a full
>> s/do/do not/ ?
> Sorry, yes, of course

I was wondering the same thing. With the revised patch,

Acked-by: Waiman Long <[email protected]>

>>> +     memory barrier (such as set_bit and clear_bit).
>
>