2019-10-20 12:33:49

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc

Hi,

Updated series, based on input from Davidlohr and Peter Zijlstra:

- I've dropped the documentation update for wake_q_add, as what it
states is normal: When you call a function and pass a parameter
to a structure, you as caller are responsible to ensure that the
parameter is valid, and remains valid for the duration of the
function call, including any tearing due to memory reordering.
In addition, I've switched ipc to wake_q_add_safe().

- The patch to Documentation/memory_barriers.txt now as first change.
@Davidlohr: You proposed to have 2 paragraphs: First, one for
add/subtract, then one for failed cmpxchg. I didn't like that:
We have one rule (can be combined with non-mb RMW ops), and then
examples what are non-mb RMW ops. Listing special cases just ask
for issues later.
What I don't know is if there should be examples at all in
Documentation/memory_barriers, or just
"See Documentation/atomic_t.txt for examples of RMW ops that
do not contain a memory barrier"

- For the memory barrier pairs in ipc/<whatever>, I have now added
/* See ABC_BARRIER for purpose/pairing */ as standard comment,
and then a block near the relevant structure where purpose, pairing
races, ... are explained. I think this makes it easier to read,
compared to adding it to both the _release and _acquire branches.

Description/purpose:

The memory barriers in ipc are not properly documented, and at least
for some architectures insufficient:
Reading the xyz->status is only a control barrier, thus
smp_acquire__after_ctrl_dep() was missing in mqueue.c and msg.c
sem.c contained a full smp_mb(), which is not required.

Patches:
Patch 1: Documentation for smp_mb__{before,after}_atomic().

Patch 2: Remove code duplication inside ipc/mqueue.c

Patch 3-5: Update the ipc code, especially add missing
smp_mb__after_ctrl_dep() and switch to wake_q_add_safe().

Clarify that smp_mb__{before,after}_atomic() are compatible with all
RMW atomic operations, not just the operations that do not return a value.

Open issues:
- More testing. I did some tests, but doubt that the tests would be
sufficient to show issues with regards to incorrect memory barriers.

What do you think?

--
Manfred


2019-10-20 12:33:52

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation

When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:

smp_mb__{before,after}_atomic() is now intended for all RMW operations
that do not imply a memory barrier.

1)
smp_mb__before_atomic();
atomic_add();

2)
smp_mb__before_atomic();
atomic_xchg_relaxed();

3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();

Invalid would be:
smp_mb__before_atomic();
atomic_set();

In addition, the patch splits the long sentence into multiple shorter
sentences.

Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")

Signed-off-by: Manfred Spraul <[email protected]>
Acked-by: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
---
Documentation/memory-barriers.txt | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..fe43f4b30907 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,16 @@ There are some more advanced barrier functions:
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();

- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
-
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are for use with atomic RMW functions that do not imply memory
+ barriers, but where the code needs a memory barrier. Examples for atomic
+ RMW functions that do not imply are memory barrier are e.g. add,
+ subtract, (failed) conditional operations, _relaxed functions,
+ but not atomic_read or atomic_set. A common example where a memory
+ barrier may be required is when atomic ops are used for reference
+ counting.
+
+ These are also used for atomic RMW bitop functions that do not imply a
+ memory barrier (such as set_bit and clear_bit).

As an example, consider a piece of code that marks an object as being dead
and then decrements the object's reference count:
--
2.21.0

2019-10-20 12:34:00

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 5/5] ipc/sem.c: Document and update memory barriers

The patch documents and updates the memory barriers in ipc/sem.c:
- Add smp_store_release() to wake_up_sem_queue_prepare() and
document why it is needed.

- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
as the pair for the barrier inside wake_up_sem_queue_prepare().

- Add comments to all barriers, and mention the rules in the block
regarding locking.

- Switch to using wake_q_add_safe().

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
ipc/sem.c | 66 ++++++++++++++++++++++++++++++++++---------------------
1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index ec97a7072413..c89734b200c6 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -205,15 +205,38 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
*
* Memory ordering:
* Most ordering is enforced by using spin_lock() and spin_unlock().
- * The special case is use_global_lock:
+ *
+ * Exceptions:
+ * 1) use_global_lock: (SEM_BARRIER_1)
* Setting it from non-zero to 0 is a RELEASE, this is ensured by
- * using smp_store_release().
+ * using smp_store_release(): Immediately after setting it to 0,
+ * a simple op can start.
* Testing if it is non-zero is an ACQUIRE, this is ensured by using
* smp_load_acquire().
* Setting it from 0 to non-zero must be ordered with regards to
* this smp_load_acquire(), this is guaranteed because the smp_load_acquire()
* is inside a spin_lock() and after a write from 0 to non-zero a
* spin_lock()+spin_unlock() is done.
+ *
+ * 2) queue.status: (SEM_BARRIER_2)
+ * Initialization is done while holding sem_lock(), so no further barrier is
+ * required.
+ * Setting it to a result code is a RELEASE, this is ensured by both a
+ * smp_store_release() (for case a) and while holding sem_lock()
+ * (for case b).
+ * The AQUIRE when reading the result code without holding sem_lock() is
+ * achieved by using READ_ONCE() + smp_acquire__after_ctrl_dep().
+ * (case a above).
+ * Reading the result code while holding sem_lock() needs no further barriers,
+ * the locks inside sem_lock() enforce ordering (case b above)
+ *
+ * 3) current->state:
+ * current->state is set to TASK_INTERRUPTIBLE while holding sem_lock().
+ * The wakeup is handled using the wake_q infrastructure. wake_q wakeups may
+ * happen immediately after calling wake_q_add. As wake_q_add_safe() is called
+ * when holding sem_lock(), no further barriers are required.
+ *
+ * See also ipc/mqueue.c for more details on the covered races.
*/

#define sc_semmsl sem_ctls[0]
@@ -344,12 +367,8 @@ static void complexmode_tryleave(struct sem_array *sma)
return;
}
if (sma->use_global_lock == 1) {
- /*
- * Immediately after setting use_global_lock to 0,
- * a simple op can start. Thus: all memory writes
- * performed by the current operation must be visible
- * before we set use_global_lock to 0.
- */
+
+ /* See SEM_BARRIER_1 for purpose/pairing */
smp_store_release(&sma->use_global_lock, 0);
} else {
sma->use_global_lock--;
@@ -400,7 +419,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
*/
spin_lock(&sem->lock);

- /* pairs with smp_store_release() */
+ /* see SEM_BARRIER_1 for purpose/pairing */
if (!smp_load_acquire(&sma->use_global_lock)) {
/* fast path successful! */
return sops->sem_num;
@@ -766,15 +785,12 @@ static int perform_atomic_semop(struct sem_array *sma, struct sem_queue *q)
static inline void wake_up_sem_queue_prepare(struct sem_queue *q, int error,
struct wake_q_head *wake_q)
{
- wake_q_add(wake_q, q->sleeper);
- /*
- * Rely on the above implicit barrier, such that we can
- * ensure that we hold reference to the task before setting
- * q->status. Otherwise we could race with do_exit if the
- * task is awoken by an external event before calling
- * wake_up_process().
- */
- WRITE_ONCE(q->status, error);
+ get_task_struct(q->sleeper);
+
+ /* see SEM_BARRIER_2 for purpuse/pairing */
+ smp_store_release(&q->status, error);
+
+ wake_q_add_safe(wake_q, q->sleeper);
}

static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
@@ -2148,9 +2164,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
}

do {
+ /* memory ordering ensured by the lock in sem_lock() */
WRITE_ONCE(queue.status, -EINTR);
queue.sleeper = current;

+ /* memory ordering is ensured by the lock in sem_lock() */
__set_current_state(TASK_INTERRUPTIBLE);
sem_unlock(sma, locknum);
rcu_read_unlock();
@@ -2173,13 +2191,8 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
*/
error = READ_ONCE(queue.status);
if (error != -EINTR) {
- /*
- * User space could assume that semop() is a memory
- * barrier: Without the mb(), the cpu could
- * speculatively read in userspace stale data that was
- * overwritten by the previous owner of the semaphore.
- */
- smp_mb();
+ /* see SEM_BARRIER_2 for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
goto out_free;
}

@@ -2189,6 +2202,9 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (!ipc_valid_object(&sma->sem_perm))
goto out_unlock_free;

+ /*
+ * No necessity for any barrier: We are protect by sem_lock()
+ */
error = READ_ONCE(queue.status);

/*
--
2.21.0

2019-10-20 12:34:01

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 4/5] ipc/msg.c: Update and document memory barriers.

Transfer findings from ipc/mqueue.c:
- A control barrier was missing for the lockless receive case
So in theory, not yet initialized data may have been copied
to user space - obviously only for architectures where
control barriers are not NOP.

- use smp_store_release(). In theory, the refount
may have been decreased to 0 already when wake_q_add()
tries to get a reference.

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 43 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 8dec945fa030..192a9291a8ab 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -61,6 +61,16 @@ struct msg_queue {
struct list_head q_senders;
} __randomize_layout;

+/*
+ * MSG_BARRIER Locking:
+ *
+ * Similar to the optimization used in ipc/mqueue.c, one syscall return path
+ * does not acquire any locks when it sees that a message exists in
+ * msg_receiver.r_msg. Therefore r_msg is set using smp_store_release()
+ * and accessed using READ_ONCE()+smp_acquire__after_ctrl_dep(). In addition,
+ * wake_q_add_safe() is used. See ipc/mqueue.c for more details
+ */
+
/* one msg_receiver structure for each sleeping receiver */
struct msg_receiver {
struct list_head r_list;
@@ -184,6 +194,10 @@ static inline void ss_add(struct msg_queue *msq,
{
mss->tsk = current;
mss->msgsz = msgsz;
+ /*
+ * No memory barrier required: we did ipc_lock_object(),
+ * and the waker obtains that lock before calling wake_q_add().
+ */
__set_current_state(TASK_INTERRUPTIBLE);
list_add_tail(&mss->list, &msq->q_senders);
}
@@ -237,8 +251,11 @@ static void expunge_all(struct msg_queue *msq, int res,
struct msg_receiver *msr, *t;

list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
- wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(res));
+ get_task_struct(msr->r_tsk);
+
+ /* see MSG_BARRIER for purpose/pairing */
+ smp_store_release(&msr->r_msg, ERR_PTR(res));
+ wake_q_add_safe(wake_q, msr->r_tsk);
}
}

@@ -798,13 +815,17 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
list_del(&msr->r_list);
if (msr->r_maxsize < msg->m_ts) {
wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(-E2BIG));
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG));
} else {
ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk));
msq->q_rtime = ktime_get_real_seconds();

wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, msg);
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, msg);
return 1;
}
}
@@ -1154,7 +1175,11 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
msr_d.r_maxsize = INT_MAX;
else
msr_d.r_maxsize = bufsz;
- msr_d.r_msg = ERR_PTR(-EAGAIN);
+
+ /* memory barrier not require due to ipc_lock_object() */
+ WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN));
+
+ /* memory barrier not required, we own ipc_lock_object() */
__set_current_state(TASK_INTERRUPTIBLE);

ipc_unlock_object(&msq->q_perm);
@@ -1183,8 +1208,12 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
* signal) it will either see the message and continue ...
*/
msg = READ_ONCE(msr_d.r_msg);
- if (msg != ERR_PTR(-EAGAIN))
+ if (msg != ERR_PTR(-EAGAIN)) {
+ /* see MSG_BARRIER for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
+
goto out_unlock1;
+ }

/*
* ... or see -EAGAIN, acquire the lock to check the message
@@ -1192,7 +1221,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
*/
ipc_lock_object(&msq->q_perm);

- msg = msr_d.r_msg;
+ msg = READ_ONCE(msr_d.r_msg);
if (msg != ERR_PTR(-EAGAIN))
goto out_unlock0;

--
2.21.0

2019-10-20 12:35:20

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 2/5] ipc/mqueue.c: Remove duplicated code

Patch from Davidlohr, I just added this change log.
pipelined_send() and pipelined_receive() are identical, so merge them.

Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/mqueue.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3d920ff15c80..270456530f6a 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
* The same algorithm is used for senders.
*/

-/* pipelined_send() - send a message directly to the task waiting in
- * sys_mq_timedreceive() (without inserting message into a queue).
- */
-static inline void pipelined_send(struct wake_q_head *wake_q,
+static inline void __pipelined_op(struct wake_q_head *wake_q,
struct mqueue_inode_info *info,
- struct msg_msg *message,
- struct ext_wait_queue *receiver)
+ struct ext_wait_queue *this)
{
- receiver->msg = message;
- list_del(&receiver->list);
- wake_q_add(wake_q, receiver->task);
+ list_del(&this->list);
+ wake_q_add(wake_q, this->task);
/*
* Rely on the implicit cmpxchg barrier from wake_q_add such
* that we can ensure that updating receiver->state is the last
@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
* yet, at that point we can later have a use-after-free
* condition and bogus wakeup.
*/
- receiver->state = STATE_READY;
+ this->state = STATE_READY;
+}
+
+/* pipelined_send() - send a message directly to the task waiting in
+ * sys_mq_timedreceive() (without inserting message into a queue).
+ */
+static inline void pipelined_send(struct wake_q_head *wake_q,
+ struct mqueue_inode_info *info,
+ struct msg_msg *message,
+ struct ext_wait_queue *receiver)
+{
+ receiver->msg = message;
+ __pipelined_op(wake_q, info, receiver);
}

/* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
if (msg_insert(sender->msg, info))
return;

- list_del(&sender->list);
- wake_q_add(wake_q, sender->task);
- sender->state = STATE_READY;
+ __pipelined_op(wake_q, info, sender);
}

static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
--
2.21.0

2019-10-20 12:35:54

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 3/5] ipc/mqueue.c: Update/document memory barriers

Update and document memory barriers for mqueue.c:
- ewp->state is read without any locks, thus READ_ONCE is required.

- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
acquire semantics if the value is STATE_READY.

- use wake_q_add_safe()

- document why __set_current_state() may be used:
Reading task->state cannot happen before the wake_q_add() call,
which happens while holding info->lock. Thus the spin_unlock()
is the RELEASE, and the spin_lock() is the ACQUIRE.

For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE

Signed-off-by: Manfred Spraul <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Cc: Waiman Long <[email protected]>
---
ipc/mqueue.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 78 insertions(+), 14 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 270456530f6a..49a05ba3000d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -63,6 +63,66 @@ struct posix_msg_tree_node {
int priority;
};

+/*
+ * Locking:
+ *
+ * Accesses to a message queue are synchronized by acquiring info->lock.
+ *
+ * There are two notable exceptions:
+ * - The actual wakeup of a sleeping task is performed using the wake_q
+ * framework. info->lock is already released when wake_up_q is called.
+ * - The exit codepaths after sleeping check ext_wait_queue->state without
+ * any locks. If it is STATE_READY, then the syscall is completed without
+ * acquiring info->lock.
+ *
+ * MQ_BARRIER:
+ * To achieve proper release/acquire memory barrier pairing, the state is set to
+ * STATE_READY with smp_store_release(), and it is read with READ_ONCE followed
+ * by smp_acquire__after_ctrl_dep(). In addition, wake_q_add_safe() is used.
+ *
+ * This prevents the following races:
+ *
+ * 1) With the simple wake_q_add(), the task could be gone already before
+ * the increase of the reference happens
+ * Thread A
+ * Thread B
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * wake_q_add(A)
+ * if (cmpxchg()) // success
+ * ->state = STATE_READY (reordered)
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * sysret to user space
+ * sys_exit()
+ * get_task_struct() // UaF
+ *
+ * Solution: Use wake_q_add_safe() and perform the get_task_struct() before
+ * the smp_store_release() that does ->state = STATE_READY.
+ *
+ * 2) Without proper _release/_acquire barriers, the woken up task
+ * could read stale data
+ *
+ * Thread A
+ * Thread B
+ * do_mq_timedreceive
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * state = STATE_READY;
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * msg_ptr = wait.msg; // Access to stale data!
+ * receiver->msg = message; (reordered)
+ *
+ * Solution: use _release and _acquire barriers.
+ *
+ * 3) There is intentionally no barrier when setting current->state
+ * to TASK_INTERRUPTIBLE: spin_unlock(&info->lock) provides the
+ * release memory barrier, and the wakeup is triggered when holding
+ * info->lock, i.e. spin_lock(&info->lock) provided a pairing
+ * acquire memory barrier.
+ */
+
struct ext_wait_queue { /* queue of sleeping tasks */
struct task_struct *task;
struct list_head list;
@@ -646,18 +706,23 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr,
wq_add(info, sr, ewp);

for (;;) {
+ /* memory barrier not required, we hold info->lock */
__set_current_state(TASK_INTERRUPTIBLE);

spin_unlock(&info->lock);
time = schedule_hrtimeout_range_clock(timeout, 0,
HRTIMER_MODE_ABS, CLOCK_REALTIME);

- if (ewp->state == STATE_READY) {
+ if (READ_ONCE(ewp->state) == STATE_READY) {
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
retval = 0;
goto out;
}
spin_lock(&info->lock);
- if (ewp->state == STATE_READY) {
+
+ /* we hold info->lock, so no memory barrier required */
+ if (READ_ONCE(ewp->state) == STATE_READY) {
retval = 0;
goto out_unlock;
}
@@ -923,16 +988,11 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
struct ext_wait_queue *this)
{
list_del(&this->list);
- wake_q_add(wake_q, this->task);
- /*
- * Rely on the implicit cmpxchg barrier from wake_q_add such
- * that we can ensure that updating receiver->state is the last
- * write operation: As once set, the receiver can continue,
- * and if we don't have the reference count from the wake_q,
- * yet, at that point we can later have a use-after-free
- * condition and bogus wakeup.
- */
- this->state = STATE_READY;
+ get_task_struct(this->task);
+
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_store_release(&this->state, STATE_READY);
+ wake_q_add_safe(wake_q, this->task);
}

/* pipelined_send() - send a message directly to the task waiting in
@@ -1049,7 +1109,9 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
} else {
wait.task = current;
wait.msg = (void *) msg_ptr;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, SEND, timeout, &wait);
/*
* wq_sleep must be called with info->lock held, and
@@ -1152,7 +1214,9 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
ret = -EAGAIN;
} else {
wait.task = current;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, RECV, timeout, &wait);
msg_ptr = wait.msg;
}
--
2.21.0

2019-10-23 00:15:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/5] ipc/mqueue.c: Remove duplicated code

On Sun, 20 Oct 2019 14:33:02 +0200 Manfred Spraul <[email protected]> wrote:

> Patch from Davidlohr, I just added this change log.
> pipelined_send() and pipelined_receive() are identical, so merge them.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> Signed-off-by: Manfred Spraul <[email protected]>

I redid the changelog as below:

From: Davidlohr Bueso <[email protected]>
Subject: ipc/mqueue.c: remove duplicated code

pipelined_send() and pipelined_receive() are identical, so merge them.

[[email protected]: add changelog]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Manfred Spraul <[email protected]>


2019-11-01 17:23:57

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation

Hi Manfred,

On Sun, Oct 20, 2019 at 02:33:01PM +0200, Manfred Spraul wrote:
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
>
> smp_mb__{before,after}_atomic() is now intended for all RMW operations
> that do not imply a memory barrier.

[...]

> Documentation/memory-barriers.txt | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..fe43f4b30907 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,12 +1873,16 @@ There are some more advanced barrier functions:
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> + These are for use with atomic RMW functions that do not imply memory
> + barriers, but where the code needs a memory barrier. Examples for atomic
> + RMW functions that do not imply are memory barrier are e.g. add,

typo: "are memory barrier"

> + subtract, (failed) conditional operations, _relaxed functions,
> + but not atomic_read or atomic_set. A common example where a memory
> + barrier may be required is when atomic ops are used for reference
> + counting.
> +
> + These are also used for atomic RMW bitop functions that do not imply a
> + memory barrier (such as set_bit and clear_bit).

Although I think this is correct, I really think we should instead refer to
Documentation/atomic_t.txt and get rid of this out of memory-barriers.txt
entirely. It's just duplication and is out of date.

Will

2019-11-06 19:24:42

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation

Hi Will,

On 11/1/19 5:49 PM, Will Deacon wrote:
> Hi Manfred,
>
> On Sun, Oct 20, 2019 at 02:33:01PM +0200, Manfred Spraul wrote:
>> When adding the _{acquire|release|relaxed}() variants of some atomic
>> operations, it was forgotten to update Documentation/memory_barrier.txt:
>>
>> smp_mb__{before,after}_atomic() is now intended for all RMW operations
>> that do not imply a memory barrier.
> [...]
>
> Although I think this is correct, I really think we should instead refer to
> Documentation/atomic_t.txt and get rid of this out of memory-barriers.txt
> entirely. It's just duplication and is out of date.

So you would prefer the attached patch?

For me this would be fine, too.


--

    Manfred


Attachments:
0001-smp_mb__-before-after-_atomic-Update-Documentation.patch (2.35 kB)

2019-11-07 11:24:22

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation

Hi Manfred,

On Wed, Nov 06, 2019 at 08:23:03PM +0100, Manfred Spraul wrote:
> From 8d2b219794221e3ef1a1ec90e0f4fe344af9a55d Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <[email protected]>
> Date: Fri, 11 Oct 2019 10:33:26 +0200
> Subject: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
>
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
>
> smp_mb__before_atomic and smp_mb__after_atomic can be combined with
> all RMW operations that do not imply memory barriers.
>
> In order to avoid that this happens again:
> Remove the paragraph from Documentation/memory_barrier.txt, the functions
> are sufficiently documented in Documentation/atomic_{t,bitops}.txt
>
> Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
>
> Signed-off-by: Manfred Spraul <[email protected]>
> Acked-by: Waiman Long <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> Documentation/memory-barriers.txt | 20 +-------------------
> 1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..16dfb4cde1e1 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,25 +1873,7 @@ There are some more advanced barrier functions:
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> -
> - As an example, consider a piece of code that marks an object as being dead
> - and then decrements the object's reference count:
> -
> - obj->dead = 1;
> - smp_mb__before_atomic();
> - atomic_dec(&obj->ref_count);
> -
> - This makes sure that the death mark on the object is perceived to be set
> - *before* the reference counter is decremented.
> -
> - See Documentation/atomic_{t,bitops}.txt for more information.
> -
> + See Documentation/atomic_{t,bitops}.txt for information.
>
> (*) dma_wmb();
> (*) dma_rmb();

Thanks, I much prefer this approach:

Acked-by: Will Deacon <[email protected]>

Will