2015-06-06 13:38:18

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -next 0/5] ipc: EIDRM/EINVAL returns & misc updates

Hello,

Patches 1,2: Are a resend, I've incorporated them to the set,
based on Manfred's comments.

Patch 3: is a trivial function rename.

Patches 4,5: are attempts to order how Linux ipc deals with EIDRM
and EINVAL return error codes. By looking at corresponding manpages
two possible inverted return codes are returned, these patches
make the manpages accurate now -- but I may have missed something,
and we are changing semantics. afaik EIDRM is specific to Linux
(other OSes only rely on EINVAL), which is already messy, so lets
try to make this consistent at least.

Passes all ipc related ltp tests.

Thanks!

Davidlohr Bueso (5):
ipc,shm: move BUG_ON check into shm_lock
ipc,msg: provide barrier pairings for lockless receive
ipc: rename ipc_obtain_object
ipc,sysv: make return -EIDRM when racing with RMID consistent
ipc,sysv: return -EINVAL upon incorrect id/seqnum

ipc/msg.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
ipc/sem.c | 4 ++--
ipc/shm.c | 13 ++++++++-----
ipc/util.c | 23 +++++++++++++----------
ipc/util.h | 2 +-
5 files changed, 63 insertions(+), 29 deletions(-)

--
2.1.4


2015-06-06 13:38:30

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/5] ipc,shm: move BUG_ON check into shm_lock

Upon every shm_lock call, we BUG_ON if an error was returned,
indicating racing either in idr or in shm_destroy. Move this logic
into the locking.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/shm.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 6d76707..6dbac3b 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -155,8 +155,14 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
{
struct kern_ipc_perm *ipcp = ipc_lock(&shm_ids(ns), id);

- if (IS_ERR(ipcp))
+ if (IS_ERR(ipcp)) {
+ /*
+ * We raced in the idr lookup or with shm_destroy(),
+ * either way, the ID is busted.
+ */
+ BUG();
return (struct shmid_kernel *)ipcp;
+ }

return container_of(ipcp, struct shmid_kernel, shm_perm);
}
@@ -191,7 +197,6 @@ static void shm_open(struct vm_area_struct *vma)
struct shmid_kernel *shp;

shp = shm_lock(sfd->ns, sfd->id);
- BUG_ON(IS_ERR(shp));
shp->shm_atim = get_seconds();
shp->shm_lprid = task_tgid_vnr(current);
shp->shm_nattch++;
@@ -258,7 +263,6 @@ static void shm_close(struct vm_area_struct *vma)
down_write(&shm_ids(ns).rwsem);
/* remove from the list of attaches of the shm segment */
shp = shm_lock(ns, sfd->id);
- BUG_ON(IS_ERR(shp));
shp->shm_lprid = task_tgid_vnr(current);
shp->shm_dtim = get_seconds();
shp->shm_nattch--;
@@ -1191,7 +1195,6 @@ out_fput:
out_nattch:
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
- BUG_ON(IS_ERR(shp));
shp->shm_nattch--;
if (shm_may_destroy(ns, shp))
shm_destroy(ns, shp);
--
2.1.4

2015-06-06 13:39:10

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/5] ipc,msg: provide barrier pairings for lockless receive

We currently use a full barrier on the sender side to
to avoid receiver tasks disappearing on us while still
performing on the sender side wakeup. We lack however,
the proper CPU-CPU interactions pairing on the receiver
side which busy-waits for the message. Similarly, we do
not need a full smp_mb, and can relax the semantics for
the writer and reader sides of the message. This is safe
as we are only ordering loads and stores to r_msg. And in
both smp_wmb and smp_rmb, there are no stores after the
calls _anyway_.

This obviously applies for pipelined_send and expunge_all,
for EIRDM when destroying a queue.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 48 ++++++++++++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 2b6fdbb..a9c3c51 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -196,7 +196,7 @@ static void expunge_all(struct msg_queue *msq, int res)
* or dealing with -EAGAIN cases. See lockless receive part 1
* and 2 in do_msgrcv().
*/
- smp_mb();
+ smp_wmb(); /* barrier (B) */
msr->r_msg = ERR_PTR(res);
}
}
@@ -580,7 +580,8 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
/* initialize pipelined send ordering */
msr->r_msg = NULL;
wake_up_process(msr->r_tsk);
- smp_mb(); /* see barrier comment below */
+ /* barrier (B) see barrier comment below */
+ smp_wmb();
msr->r_msg = ERR_PTR(-E2BIG);
} else {
msr->r_msg = NULL;
@@ -589,11 +590,12 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
wake_up_process(msr->r_tsk);
/*
* Ensure that the wakeup is visible before
- * setting r_msg, as the receiving end depends
- * on it. See lockless receive part 1 and 2 in
- * do_msgrcv().
+ * setting r_msg, as the receiving can otherwise
+ * exit - once r_msg is set, the receiver can
+ * continue. See lockless receive part 1 and 2
+ * in do_msgrcv(). Barrier (B).
*/
- smp_mb();
+ smp_wmb();
msr->r_msg = msg;

return 1;
@@ -932,12 +934,38 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
/* Lockless receive, part 2:
* Wait until pipelined_send or expunge_all are outside of
* wake_up_process(). There is a race with exit(), see
- * ipc/mqueue.c for the details.
+ * ipc/mqueue.c for the details. The correct serialization
+ * ensures that a receiver cannot continue without the wakeup
+ * being visibible _before_ setting r_msg:
+ *
+ * CPU 0 CPU 1
+ * <loop receiver>
+ * smp_rmb(); (A) <-- pair -. <waker thread>
+ * <load ->r_msg> | msr->r_msg = NULL;
+ * | wake_up_process();
+ * <continue> `------> smp_wmb(); (B)
+ * msr->r_msg = msg;
+ *
+ * Where (A) orders the message value read and where (B) orders
+ * the write to the r_msg -- done in both pipelined_send and
+ * expunge_all.
*/
- msg = (struct msg_msg *)msr_d.r_msg;
- while (msg == NULL) {
- cpu_relax();
+ for (;;) {
+ /*
+ * Pairs with writer barrier in pipelined_send
+ * or expunge_all.
+ */
+ smp_rmb(); /* barrier (A) */
msg = (struct msg_msg *)msr_d.r_msg;
+ if (msg)
+ break;
+
+ /*
+ * The cpu_relax() call is a compiler barrier
+ * which forces everything in this loop to be
+ * re-loaded.
+ */
+ cpu_relax();
}

/* Lockless receive, part 3:
--
2.1.4

2015-06-06 13:39:03

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 3/5] ipc: rename ipc_obtain_object

... to ipc_obtain_object_idr, which is more meaningful
and makes the code slightly easier to follow.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 2 +-
ipc/sem.c | 4 ++--
ipc/shm.c | 2 +-
ipc/util.c | 8 ++++----
ipc/util.h | 2 +-
5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index a9c3c51..66c4f56 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -76,7 +76,7 @@ struct msg_sender {

static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
{
- struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
+ struct kern_ipc_perm *ipcp = ipc_obtain_object_idr(&msg_ids(ns), id);

if (IS_ERR(ipcp))
return ERR_CAST(ipcp);
diff --git a/ipc/sem.c b/ipc/sem.c
index d1a6edd..bc3d530 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -391,7 +391,7 @@ static inline struct sem_array *sem_obtain_lock(struct ipc_namespace *ns,
struct kern_ipc_perm *ipcp;
struct sem_array *sma;

- ipcp = ipc_obtain_object(&sem_ids(ns), id);
+ ipcp = ipc_obtain_object_idr(&sem_ids(ns), id);
if (IS_ERR(ipcp))
return ERR_CAST(ipcp);

@@ -410,7 +410,7 @@ static inline struct sem_array *sem_obtain_lock(struct ipc_namespace *ns,

static inline struct sem_array *sem_obtain_object(struct ipc_namespace *ns, int id)
{
- struct kern_ipc_perm *ipcp = ipc_obtain_object(&sem_ids(ns), id);
+ struct kern_ipc_perm *ipcp = ipc_obtain_object_idr(&sem_ids(ns), id);

if (IS_ERR(ipcp))
return ERR_CAST(ipcp);
diff --git a/ipc/shm.c b/ipc/shm.c
index 6dbac3b..3323c49 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -129,7 +129,7 @@ void __init shm_init(void)

static inline struct shmid_kernel *shm_obtain_object(struct ipc_namespace *ns, int id)
{
- struct kern_ipc_perm *ipcp = ipc_obtain_object(&shm_ids(ns), id);
+ struct kern_ipc_perm *ipcp = ipc_obtain_object_idr(&shm_ids(ns), id);

if (IS_ERR(ipcp))
return ERR_CAST(ipcp);
diff --git a/ipc/util.c b/ipc/util.c
index ff3323e..adb8f89 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -558,7 +558,7 @@ void ipc64_perm_to_ipc_perm(struct ipc64_perm *in, struct ipc_perm *out)
* Call inside the RCU critical section.
* The ipc object is *not* locked on exit.
*/
-struct kern_ipc_perm *ipc_obtain_object(struct ipc_ids *ids, int id)
+struct kern_ipc_perm *ipc_obtain_object_idr(struct ipc_ids *ids, int id)
{
struct kern_ipc_perm *out;
int lid = ipcid_to_idx(id);
@@ -584,7 +584,7 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
struct kern_ipc_perm *out;

rcu_read_lock();
- out = ipc_obtain_object(ids, id);
+ out = ipc_obtain_object_idr(ids, id);
if (IS_ERR(out))
goto err1;

@@ -608,7 +608,7 @@ err1:
* @ids: ipc identifier set
* @id: ipc id to look for
*
- * Similar to ipc_obtain_object() but also checks
+ * Similar to ipc_obtain_object_idr() but also checks
* the ipc object reference counter.
*
* Call inside the RCU critical section.
@@ -616,7 +616,7 @@ err1:
*/
struct kern_ipc_perm *ipc_obtain_object_check(struct ipc_ids *ids, int id)
{
- struct kern_ipc_perm *out = ipc_obtain_object(ids, id);
+ struct kern_ipc_perm *out = ipc_obtain_object_idr(ids, id);

if (IS_ERR(out))
goto out;
diff --git a/ipc/util.h b/ipc/util.h
index 1a5a0fc..3a8a5a0 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -132,7 +132,7 @@ void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head));
void ipc_rcu_free(struct rcu_head *head);

struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
-struct kern_ipc_perm *ipc_obtain_object(struct ipc_ids *ids, int id);
+struct kern_ipc_perm *ipc_obtain_object_idr(struct ipc_ids *ids, int id);

void kernel_to_ipc64_perm(struct kern_ipc_perm *in, struct ipc64_perm *out);
void ipc64_perm_to_ipc_perm(struct ipc64_perm *in, struct ipc_perm *out);
--
2.1.4

2015-06-06 13:38:54

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 4/5] ipc,sysv: make return -EIDRM when racing with RMID consistent

The ipc_lock helper is used by all forms of sysv ipc to acquire
the ipc object's spinlock. Upon error (bogus identifier), we
always return -EINVAL, whether the problem be in the idr path or
because we raced with a task performing RMID. For the later,
however, all ipc related manpages, state the that for:

EIDRM <ID> points to a removed identifier.

And return:

EINVAL Invalid <ID> value, or unaligned, etc.

Which (EINVAL) should only return once the ipc resource is deleted.
For all types of ipc this is done immediately upon a RMID command.
However, shared memory behaves slightly different as it can merely
mark a segment for deletion, and delay the actual freeing until
there are no more active consumers. Per shmctl(IPC_RMID) manpage:

""
Mark the segment to be destroyed. The segment will only actually
be destroyed after the last process detaches it (i.e., when the
shm_nattch member of the associated structure shmid_ds is zero).
""

Unlike ipc_lock, paths that behave "correctly", at least per the
manpage, involve controlling the ipc resource via *ctl(), doing
the exact same validity check as ipc_lock after right acquiring
the spinlock:

if (!ipc_valid_object()) {
err = -EIDRM;
goto out_unlock;
}

Thus make ipc_lock consistent with the rest of ipc code and return
-EIDRM in ipc_lock when !ipc_valid_object().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/util.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index adb8f89..15e750d 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -586,19 +586,22 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
rcu_read_lock();
out = ipc_obtain_object_idr(ids, id);
if (IS_ERR(out))
- goto err1;
+ goto err;

spin_lock(&out->lock);

- /* ipc_rmid() may have already freed the ID while ipc_lock
- * was spinning: here verify that the structure is still valid
+ /*
+ * ipc_rmid() may have already freed the ID while ipc_lock()
+ * was spinning: here verify that the structure is still valid.
+ * Upon races with RMID, return -EIDRM, thus indicating that
+ * the ID points to a removed identifier.
*/
if (ipc_valid_object(out))
return out;

spin_unlock(&out->lock);
- out = ERR_PTR(-EINVAL);
-err1:
+ out = ERR_PTR(-EIDRM);
+err:
rcu_read_unlock();
return out;
}
--
2.1.4

2015-06-06 13:38:45

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 5/5] ipc,sysv: return -EINVAL upon incorrect id/seqnum

In ipc_obtain_object_check we return -EIDRM when a bogus
sequence number is detected via ipc_checkid, while the ipc
manpages state the following return codes for such errors:

EIDRM <ID> points to a removed identifier.
EINVAL Invalid <ID> value, or unaligned, etc.

EIDRM should only be returned upon a RMID call (->deleted
check), and thus return EINVAL for wrong seq. This difference
in semantics has also caused real bugs, ie:
https://bugzilla.redhat.com/show_bug.cgi?id=246509

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/util.c b/ipc/util.c
index 15e750d..468b225 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -625,7 +625,7 @@ struct kern_ipc_perm *ipc_obtain_object_check(struct ipc_ids *ids, int id)
goto out;

if (ipc_checkid(out, id))
- return ERR_PTR(-EIDRM);
+ return ERR_PTR(-EINVAL);
out:
return out;
}
--
2.1.4

2015-06-09 22:29:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] ipc,shm: move BUG_ON check into shm_lock

On Sat, 6 Jun 2015 06:37:56 -0700 Davidlohr Bueso <[email protected]> wrote:

> Upon every shm_lock call, we BUG_ON if an error was returned,
> indicating racing either in idr or in shm_destroy. Move this logic
> into the locking.
>
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -155,8 +155,14 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
> {
> struct kern_ipc_perm *ipcp = ipc_lock(&shm_ids(ns), id);
>
> - if (IS_ERR(ipcp))
> + if (IS_ERR(ipcp)) {
> + /*
> + * We raced in the idr lookup or with shm_destroy(),
> + * either way, the ID is busted.
> + */
> + BUG();
> return (struct shmid_kernel *)ipcp;
> + }

Was there any particular reason to still do it this way? It's a bit
klunky.

--- a/ipc/shm.c~ipcshm-move-bug_on-check-into-shm_lock-fix
+++ a/ipc/shm.c
@@ -155,14 +155,11 @@ static inline struct shmid_kernel *shm_l
{
struct kern_ipc_perm *ipcp = ipc_lock(&shm_ids(ns), id);

- if (IS_ERR(ipcp)) {
- /*
- * We raced in the idr lookup or with shm_destroy(),
- * either way, the ID is busted.
- */
- BUG();
- return (struct shmid_kernel *)ipcp;
- }
+ /*
+ * We raced in the idr lookup or with shm_destroy(). Either way, the
+ * ID is busted.
+ */
+ BUG_ON(IS_ERR(ipcp));

return container_of(ipcp, struct shmid_kernel, shm_perm);
}

One benefit of the code you sent is that the unreachable `return' will
prevent a compile warning when CONFIG_BUG=n, but CONFIG_BUG=n is silly
and I never worry about it.

2015-06-10 00:13:28

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 1/5] ipc,shm: move BUG_ON check into shm_lock

On Tue, 2015-06-09 at 15:28 -0700, Andrew Morton wrote:
> --- a/ipc/shm.c~ipcshm-move-bug_on-check-into-shm_lock-fix
> +++ a/ipc/shm.c
> @@ -155,14 +155,11 @@ static inline struct shmid_kernel *shm_l
> {
> struct kern_ipc_perm *ipcp = ipc_lock(&shm_ids(ns), id);
>
> - if (IS_ERR(ipcp)) {
> - /*
> - * We raced in the idr lookup or with shm_destroy(),
> - * either way, the ID is busted.
> - */
> - BUG();
> - return (struct shmid_kernel *)ipcp;
> - }
> + /*
> + * We raced in the idr lookup or with shm_destroy(). Either way, the
> + * ID is busted.
> + */
> + BUG_ON(IS_ERR(ipcp));
>
> return container_of(ipcp, struct shmid_kernel, shm_perm);
> }
>
> One benefit of the code you sent is that the unreachable `return' will
> prevent a compile warning when CONFIG_BUG=n, but CONFIG_BUG=n is silly
> and I never worry about it.

I took a closer look and the above looks Ok, and I don't care either way
about CONFIG_BUG=n.

Thanks,
Davidlohr