2013-05-16 01:08:15

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily

This patchset continues the work that began in the sysv ipc semaphore scaling
series: https://lkml.org/lkml/2013/3/20/546

Just like semaphores used to be, sysv shared memory and msg queues also abuse the ipc
lock, unnecessarily holding it for operations such as permission and security checks. This
patchset mostly deals with mqueues, and while shared mem can be done in a very similar way,
I want to get these patches out in the open first. It also does some pending cleanups,
mostly focused on the two level locking we have in ipc code, taking care of ipc_addid()
and ipcctl_pre_down_nolock() - yes there are still functions that need to be updated as well.

I have tried to split each patch to be as readable and specific as possible, specially when
shortening the critical regions, going one function at a time.

Patch 1 moves the locking to be explicitly done by the callers of ipc_addid.
It updates msg, sem and shm.

Patches 2-3: are just wrappers around the ipc lock, initially suggested by Linus.

Patch 4 is similar to patch 1, moving the rcu and rw_mutex locking out of
ipcctl_pre_down_nolock so that the callers explicitly deals with them. It updates msg, sem
and shm.

Patch 5 shortens the critical region in msgctl_down(), using the lockless
ipcctl_pre_down() function and only acquiring the ipc lock for RMID and SET commands.

Patch 6 simply moves the what-should-be lockless logic of *_INFO and *_STAT commands
out of msgctl() into a new function, msgctl_lockless().

Patch 7 introduces the necessary wrappers around ipc_obtain_object[_check]()
that will later enable us to separately lookup the ipc object without holding the lock.

Patch 8 updates the previously added msgctl_nolock() to actually be lockless, reducing
the critical region for the STAT commands.

Patch 9 redoes the locking for msgsend().

Patch 10 redoes the locking for msgrcv().

Patch 11 removes the now unused msg_lock and msg_lock_check functions, replaced by
a smarter combination of rcu, ipc_obtain_object and ipc_object_lock.

Davidlohr Bueso (11):
ipc: move rcu lock out of ipc_addid
ipc: introduce ipc object locking helpers
ipc: close open coded spin lock calls
ipc: move locking out of ipcctl_pre_down_nolock
ipc,msg: shorten critical region in semctl_down
ipc,msg: introduce msgctl_nolock
ipc,msg: introduce lockless functions to obtain the ipc object
ipc,msg: make msgctl_nolock lockless
ipc,msg: reduce critical region in msgsnd
ipc,msg: make shorten critical region in msgrcv
ipc: remove unused functions

ipc/msg.c | 227 ++++++++++++++++++++++++++++++++++++++-----------------------
ipc/sem.c | 42 +++++++-----
ipc/shm.c | 32 ++++++---
ipc/util.c | 25 ++-----
ipc/util.h | 22 ++++--
5 files changed, 211 insertions(+), 137 deletions(-)

Thanks,
Davidlohr

--
1.7.11.7


2013-05-16 01:08:27

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 11/11] ipc: remove unused functions

We can now drop the msg_lock and msg_lock_check functions along
with a bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 25 -------------------------
ipc/sem.c | 1 -
2 files changed, 26 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1654be4..3b7b4b5 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -141,31 +141,6 @@ void __init msg_init(void)
IPC_MSG_IDS, sysvipc_msg_proc_show);
}

-/*
- * msg_lock_(check_) routines are called in the paths where the rw_mutex
- * is not held.
- */
-static inline struct msg_queue *msg_lock(struct ipc_namespace *ns, int id)
-{
- struct kern_ipc_perm *ipcp = ipc_lock(&msg_ids(ns), id);
-
- if (IS_ERR(ipcp))
- return (struct msg_queue *)ipcp;
-
- return container_of(ipcp, struct msg_queue, q_perm);
-}
-
-static inline struct msg_queue *msg_lock_check(struct ipc_namespace *ns,
- int id)
-{
- struct kern_ipc_perm *ipcp = ipc_lock_check(&msg_ids(ns), id);
-
- if (IS_ERR(ipcp))
- return (struct msg_queue *)ipcp;
-
- return container_of(ipcp, struct msg_queue, q_perm);
-}
-
static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
{
struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
diff --git a/ipc/sem.c b/ipc/sem.c
index f7d99f9..8223f17 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1281,7 +1281,6 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
&semid64.sem_perm, 0);
if (IS_ERR(ipcp)) {
err = PTR_ERR(ipcp);
- /* the ipc lock is not held upon failure */
goto out_unlock1;
}

--
1.7.11.7

2013-05-16 01:08:30

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 10/11] ipc,msg: shorten critical region in msgrcv

do_msgrcv() is the last msg queue function that abuses the ipc lock
Take it only when needed when actually updating msq.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 57 ++++++++++++++++++++++++++++++++-------------------------
1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index e60f0e0..1654be4 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -887,21 +887,19 @@ static struct msg_msg *find_msg(struct msg_queue *msq, long *msgtyp, int mode)
return ERR_PTR(-EAGAIN);
}

-
-long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
- int msgflg,
+long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgflg,
long (*msg_handler)(void __user *, struct msg_msg *, size_t))
{
- struct msg_queue *msq;
- struct msg_msg *msg;
int mode;
+ struct msg_queue *msq;
struct ipc_namespace *ns;
- struct msg_msg *copy = NULL;
+ struct msg_msg *msg, *copy = NULL;

ns = current->nsproxy->ipc_ns;

if (msqid < 0 || (long) bufsz < 0)
return -EINVAL;
+
if (msgflg & MSG_COPY) {
copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax));
if (IS_ERR(copy))
@@ -909,8 +907,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
}
mode = convert_mode(&msgtyp, msgflg);

- msq = msg_lock_check(ns, msqid);
+ rcu_read_lock();
+ msq = msq_obtain_object_check(ns, msqid);
if (IS_ERR(msq)) {
+ rcu_read_unlock();
free_copy(copy);
return PTR_ERR(msq);
}
@@ -920,10 +920,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,

msg = ERR_PTR(-EACCES);
if (ipcperms(ns, &msq->q_perm, S_IRUGO))
- goto out_unlock;
+ goto out_unlock1;

msg = find_msg(msq, &msgtyp, mode);
-
if (!IS_ERR(msg)) {
/*
* Found a suitable message.
@@ -931,7 +930,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
*/
if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
msg = ERR_PTR(-E2BIG);
- goto out_unlock;
+ goto out_unlock1;
}
/*
* If we are copying, then do not unlink message and do
@@ -939,8 +938,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
*/
if (msgflg & MSG_COPY) {
msg = copy_msg(msg, copy);
- goto out_unlock;
+ goto out_unlock1;
}
+
+ ipc_lock_object(&msq->q_perm);
list_del(&msg->m_list);
msq->q_qnum--;
msq->q_rtime = get_seconds();
@@ -949,14 +950,17 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
atomic_sub(msg->m_ts, &ns->msg_bytes);
atomic_dec(&ns->msg_hdrs);
ss_wakeup(&msq->q_senders, 0);
- msg_unlock(msq);
- break;
+
+ goto out_unlock0;
}
+
/* No message waiting. Wait for a message */
if (msgflg & IPC_NOWAIT) {
msg = ERR_PTR(-ENOMSG);
- goto out_unlock;
+ goto out_unlock1;
}
+
+ ipc_lock_object(&msq->q_perm);
list_add_tail(&msr_d.r_list, &msq->q_receivers);
msr_d.r_tsk = current;
msr_d.r_msgtype = msgtyp;
@@ -967,8 +971,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
msr_d.r_maxsize = bufsz;
msr_d.r_msg = ERR_PTR(-EAGAIN);
current->state = TASK_INTERRUPTIBLE;
- msg_unlock(msq);

+ ipc_unlock_object(&msq->q_perm);
+ rcu_read_unlock();
schedule();

/* Lockless receive, part 1:
@@ -998,32 +1003,34 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
* If there is a message or an error then accept it without
* locking.
*/
- if (msg != ERR_PTR(-EAGAIN)) {
- rcu_read_unlock();
- break;
- }
+ if (msg != ERR_PTR(-EAGAIN))
+ goto out_unlock1;

/* Lockless receive, part 3:
* Acquire the queue spinlock.
*/
- ipc_lock_by_ptr(&msq->q_perm);
- rcu_read_unlock();
+ ipc_lock_object(&msq->q_perm);

/* Lockless receive, part 4:
* Repeat test after acquiring the spinlock.
*/
msg = (struct msg_msg*)msr_d.r_msg;
if (msg != ERR_PTR(-EAGAIN))
- goto out_unlock;
+ goto out_unlock0;

list_del(&msr_d.r_list);
if (signal_pending(current)) {
msg = ERR_PTR(-ERESTARTNOHAND);
-out_unlock:
- msg_unlock(msq);
- break;
+ goto out_unlock0;
}
+
+ ipc_unlock_object(&msq->q_perm);
}
+
+out_unlock0:
+ ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+ rcu_read_unlock();
if (IS_ERR(msg)) {
free_copy(copy);
return PTR_ERR(msg);
--
1.7.11.7

2013-05-16 01:08:32

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 09/11] ipc,msg: shorten critical region in msgsnd

do_msgsnd() is another function that does too many things
with the ipc object acquired. Take it only when needed when
actually updating msq.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 2d89b00..e60f0e0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -700,10 +700,11 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
msg->m_type = mtype;
msg->m_ts = msgsz;

- msq = msg_lock_check(ns, msqid);
+ rcu_read_lock();
+ msq = msq_obtain_object_check(ns, msqid);
if (IS_ERR(msq)) {
err = PTR_ERR(msq);
- goto out_free;
+ goto out_unlock1;
}

for (;;) {
@@ -711,11 +712,11 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,

err = -EACCES;
if (ipcperms(ns, &msq->q_perm, S_IWUGO))
- goto out_unlock_free;
+ goto out_unlock1;

err = security_msg_queue_msgsnd(msq, msg, msgflg);
if (err)
- goto out_unlock_free;
+ goto out_unlock1;

if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
1 + msq->q_qnum <= msq->q_qbytes) {
@@ -725,32 +726,41 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
/* queue full, wait: */
if (msgflg & IPC_NOWAIT) {
err = -EAGAIN;
- goto out_unlock_free;
+ goto out_unlock1;
}
+
+ ipc_lock_object(&msq->q_perm);
ss_add(msq, &s);

if (!ipc_rcu_getref(msq)) {
err = -EIDRM;
- goto out_unlock_free;
+ goto out_unlock0;
}

- msg_unlock(msq);
+ ipc_unlock_object(&msq->q_perm);
+ rcu_read_unlock();
schedule();

- ipc_lock_by_ptr(&msq->q_perm);
+ rcu_read_lock();
+ ipc_lock_object(&msq->q_perm);
+
ipc_rcu_putref(msq);
if (msq->q_perm.deleted) {
err = -EIDRM;
- goto out_unlock_free;
+ goto out_unlock0;
}
+
ss_del(&s);

if (signal_pending(current)) {
err = -ERESTARTNOHAND;
- goto out_unlock_free;
+ goto out_unlock0;
}
+
+ ipc_unlock_object(&msq->q_perm);
}

+ ipc_lock_object(&msq->q_perm);
msq->q_lspid = task_tgid_vnr(current);
msq->q_stime = get_seconds();

@@ -766,9 +776,10 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
err = 0;
msg = NULL;

-out_unlock_free:
- msg_unlock(msq);
-out_free:
+out_unlock0:
+ ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+ rcu_read_unlock();
if (msg != NULL)
free_msg(msg);
return err;
--
1.7.11.7

2013-05-16 01:08:25

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 05/11] ipc,msg: shorten critical region in msgctl_down

Instead of holding the ipc lock for the entire function, use
the ipcctl_pre_down_nolock and only acquire the lock for
specific commands: RMID and SET.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 4eaeb08..264ebe0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -412,11 +412,10 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
down_write(&msg_ids(ns).rw_mutex);
rcu_read_lock();

- ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
- &msqid64.msg_perm, msqid64.msg_qbytes);
+ ipcp = ipcctl_pre_down_nolock(ns, &msg_ids(ns), msqid, cmd,
+ &msqid64.msg_perm, msqid64.msg_qbytes);
if (IS_ERR(ipcp)) {
err = PTR_ERR(ipcp);
- /* the ipc lock is not held upon failure */
goto out_unlock1;
}

@@ -424,10 +423,11 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,

err = security_msg_queue_msgctl(msq, cmd);
if (err)
- goto out_unlock0;
+ goto out_unlock1;

switch (cmd) {
case IPC_RMID:
+ ipc_lock_object(&msq->q_perm);
/* freeque unlocks the ipc object and rcu */
freeque(ns, ipcp);
goto out_up;
@@ -435,9 +435,10 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
!capable(CAP_SYS_RESOURCE)) {
err = -EPERM;
- goto out_unlock0;
+ goto out_unlock1;
}

+ ipc_lock_object(&msq->q_perm);
err = ipc_update_perm(&msqid64.msg_perm, ipcp);
if (err)
goto out_unlock0;
@@ -456,6 +457,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
break;
default:
err = -EINVAL;
+ goto out_unlock1;
}

out_unlock0:
--
1.7.11.7

2013-05-16 01:08:23

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 03/11] ipc: close open coded spin lock calls

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 2 +-
ipc/sem.c | 14 +++++++-------
ipc/shm.c | 4 ++--
ipc/util.h | 4 ++--
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 8270511..7a6f344 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -218,7 +218,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
INIT_LIST_HEAD(&msq->q_receivers);
INIT_LIST_HEAD(&msq->q_senders);

- spin_unlock(&msq->q_perm.lock);
+ ipc_unlock_object(&msq->q_perm);
rcu_read_unlock();

return msq->q_perm.id;
diff --git a/ipc/sem.c b/ipc/sem.c
index cee96e6..7854fc8 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -246,7 +246,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
* their critical section while the array lock is held.
*/
lock_array:
- spin_lock(&sma->sem_perm.lock);
+ ipc_lock_object(&sma->sem_perm);
for (i = 0; i < sma->sem_nsems; i++) {
struct sem *sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
@@ -259,7 +259,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
- spin_unlock(&sma->sem_perm.lock);
+ ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
spin_unlock(&sem->lock);
@@ -857,7 +857,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
int i;

/* Free the existing undo structures for this semaphore set. */
- assert_spin_locked(&sma->sem_perm.lock);
+ ipc_assert_locked_object(&sma->sem_perm);
list_for_each_entry_safe(un, tu, &sma->list_id, list_id) {
list_del(&un->list_id);
spin_lock(&un->ulp->lock);
@@ -1055,7 +1055,7 @@ static int semctl_setval(struct ipc_namespace *ns, int semid, int semnum,

curr = &sma->sem_base[semnum];

- assert_spin_locked(&sma->sem_perm.lock);
+ ipc_assert_locked_object(&sma->sem_perm);
list_for_each_entry(un, &sma->list_id, list_id)
un->semadj[semnum] = 0;

@@ -1184,7 +1184,7 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
for (i = 0; i < nsems; i++)
sma->sem_base[i].semval = sem_io[i];

- assert_spin_locked(&sma->sem_perm.lock);
+ ipc_assert_locked_object(&sma->sem_perm);
list_for_each_entry(un, &sma->list_id, list_id) {
for (i = 0; i < nsems; i++)
un->semadj[i] = 0;
@@ -1481,7 +1481,7 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
new->semid = semid;
assert_spin_locked(&ulp->lock);
list_add_rcu(&new->list_proc, &ulp->list_proc);
- assert_spin_locked(&sma->sem_perm.lock);
+ ipc_assert_locked_object(&sma->sem_perm);
list_add(&new->list_id, &sma->list_id);
un = new;

@@ -1818,7 +1818,7 @@ void exit_sem(struct task_struct *tsk)
}

/* remove un from the linked lists */
- assert_spin_locked(&sma->sem_perm.lock);
+ ipc_assert_locked_object(&sma->sem_perm);
list_del(&un->list_id);

spin_lock(&ulp->lock);
diff --git a/ipc/shm.c b/ipc/shm.c
index 11dec70..1fd3d05f 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -141,7 +141,7 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
{
rcu_read_lock();
- spin_lock(&ipcp->shm_perm.lock);
+ ipc_lock_object(&ipcp->shm_perm);
}

static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
@@ -547,7 +547,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
ns->shm_tot += numpages;
error = shp->shm_perm.id;

- spin_unlock(&shp->shm_perm.lock);
+ ipc_unlock_object(&shp->shm_perm);
rcu_read_unlock();
return error;

diff --git a/ipc/util.h b/ipc/util.h
index da65e8a..b6a6a88 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -177,12 +177,12 @@ static inline void ipc_assert_locked_object(struct kern_ipc_perm *perm)
static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
{
rcu_read_lock();
- spin_lock(&perm->lock);
+ ipc_lock_object(perm);
}

static inline void ipc_unlock(struct kern_ipc_perm *perm)
{
- spin_unlock(&perm->lock);
+ ipc_unlock_object(perm);
rcu_read_unlock();
}

--
1.7.11.7

2013-05-16 01:08:21

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 02/11] ipc: introduce ipc object locking helpers

Simple helpers around the (kern_ipc_perm *)->lock spinlock.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/util.h | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/ipc/util.h b/ipc/util.h
index 2b0bdd5..da65e8a 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -159,23 +159,33 @@ static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int uid)
return uid / SEQ_MULTIPLIER != ipcp->seq;
}

-static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
+static inline void ipc_lock_object(struct kern_ipc_perm *perm)
{
- rcu_read_lock();
spin_lock(&perm->lock);
}

-static inline void ipc_unlock(struct kern_ipc_perm *perm)
+static inline void ipc_unlock_object(struct kern_ipc_perm *perm)
{
spin_unlock(&perm->lock);
- rcu_read_unlock();
}

-static inline void ipc_lock_object(struct kern_ipc_perm *perm)
+static inline void ipc_assert_locked_object(struct kern_ipc_perm *perm)
{
+ assert_spin_locked(&perm->lock);
+}
+
+static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
+{
+ rcu_read_lock();
spin_lock(&perm->lock);
}

+static inline void ipc_unlock(struct kern_ipc_perm *perm)
+{
+ spin_unlock(&perm->lock);
+ rcu_read_unlock();
+}
+
struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
struct kern_ipc_perm *ipc_obtain_object_check(struct ipc_ids *ids, int id);
int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
--
1.7.11.7

2013-05-16 01:08:16

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 01/11] ipc: move rcu lock out of ipc_addid

Make all callers explicitly take and release the RCU read lock.

This addresses the two level locking seen in newary(), newseg()
and newqueue(). For the last two, explicitly unlock the ipc object
and the rcu lock, instead of calling the custom shm_unlock and
msg_unlock functions. The next patch will deal with the open
coded locking for ->perm.lock

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 9 +++++----
ipc/sem.c | 2 ++
ipc/shm.c | 7 ++++++-
ipc/util.c | 4 +---
4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index d0c6d96..8270511 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -199,11 +199,11 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
return retval;
}

- /*
- * ipc_addid() locks msq
- */
+ /* ipc_addid() locks msq upon success. */
+ rcu_read_lock();
id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
if (id < 0) {
+ rcu_read_unlock();
security_msg_queue_free(msq);
ipc_rcu_putref(msq);
return id;
@@ -218,7 +218,8 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
INIT_LIST_HEAD(&msq->q_receivers);
INIT_LIST_HEAD(&msq->q_senders);

- msg_unlock(msq);
+ spin_unlock(&msq->q_perm.lock);
+ rcu_read_unlock();

return msq->q_perm.id;
}
diff --git a/ipc/sem.c b/ipc/sem.c
index a7e40ed..cee96e6 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -407,8 +407,10 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
return retval;
}

+ rcu_read_lock();
id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni);
if (id < 0) {
+ rcu_read_unlock();
security_sem_free(sma);
ipc_rcu_putref(sma);
return id;
diff --git a/ipc/shm.c b/ipc/shm.c
index 7e199fa..11dec70 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -521,9 +521,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (IS_ERR(file))
goto no_file;

+ rcu_read_lock();
id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
if (id < 0) {
error = id;
+ rcu_read_unlock();
goto no_id;
}

@@ -535,6 +537,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
shp->shm_nattch = 0;
shp->shm_file = file;
shp->shm_creator = current;
+
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
* proc-ps tools use this. Changing this will break them.
@@ -543,7 +546,9 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)

ns->shm_tot += numpages;
error = shp->shm_perm.id;
- shm_unlock(shp);
+
+ spin_unlock(&shp->shm_perm.lock);
+ rcu_read_unlock();
return error;

no_id:
diff --git a/ipc/util.c b/ipc/util.c
index 809ec5e..92f0242 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -246,9 +246,8 @@ int ipc_get_maxid(struct ipc_ids *ids)
* is returned. The 'new' entry is returned in a locked state on success.
* On failure the entry is not locked and a negative err-code is returned.
*
- * Called with ipc_ids.rw_mutex held as a writer.
+ * Called with RCU read lock and writer ipc_ids.rw_mutex held.
*/
-
int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
{
kuid_t euid;
@@ -266,7 +265,6 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)

spin_lock_init(&new->lock);
new->deleted = 0;
- rcu_read_lock();
spin_lock(&new->lock);

id = idr_alloc(&ids->ipcs_idr, new,
--
1.7.11.7

2013-05-16 01:09:39

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 08/11] ipc,msg: make msgctl_nolock lockless

While the INFO cmd doesn't take the ipc lock, the STAT
commands do acquire it unnecessarily. We can do the
permissions and security checks only holding the
rcu lock.

This function now mimics semctl_nolock().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 9ff295b..2d89b00 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -547,17 +547,25 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
if (!buf)
return -EFAULT;

+ memset(&tbuf, 0, sizeof(tbuf));
+
+ rcu_read_lock();
if (cmd == MSG_STAT) {
- msq = msg_lock(ns, msqid);
- if (IS_ERR(msq))
- return PTR_ERR(msq);
+ msq = msq_obtain_object(ns, msqid);
+ if (IS_ERR(msq)) {
+ err = PTR_ERR(msq);
+ goto out_unlock;
+ }
success_return = msq->q_perm.id;
} else {
- msq = msg_lock_check(ns, msqid);
- if (IS_ERR(msq))
- return PTR_ERR(msq);
+ msq = msq_obtain_object_check(ns, msqid);
+ if (IS_ERR(msq)) {
+ err = PTR_ERR(msq);
+ goto out_unlock;
+ }
success_return = 0;
}
+
err = -EACCES;
if (ipcperms(ns, &msq->q_perm, S_IRUGO))
goto out_unlock;
@@ -566,8 +574,6 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
if (err)
goto out_unlock;

- memset(&tbuf, 0, sizeof(tbuf));
-
kernel_to_ipc64_perm(&msq->q_perm, &tbuf.msg_perm);
tbuf.msg_stime = msq->q_stime;
tbuf.msg_rtime = msq->q_rtime;
@@ -577,7 +583,8 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
tbuf.msg_qbytes = msq->q_qbytes;
tbuf.msg_lspid = msq->q_lspid;
tbuf.msg_lrpid = msq->q_lrpid;
- msg_unlock(msq);
+ rcu_read_unlock();
+
if (copy_msqid_to_user(buf, &tbuf, version))
return -EFAULT;
return success_return;
@@ -589,7 +596,7 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,

return err;
out_unlock:
- msg_unlock(msq);
+ rcu_read_unlock();
return err;
}

--
1.7.11.7

2013-05-16 01:10:01

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 06/11] ipc,msg: introduce msgctl_nolock

Similar to semctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.

Add a msgctl_nolock() function and move the logic of *_INFO
and *_STAT out of msgctl(). This change still takes the lock
and it will be properly lockless in the next patch

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 49 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 264ebe0..20361b5 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -469,17 +469,11 @@ out_up:
return err;
}

-SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
+static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
+ int cmd, int version, void __user *buf)
{
+ int err;
struct msg_queue *msq;
- int err, version;
- struct ipc_namespace *ns;
-
- if (msqid < 0 || cmd < 0)
- return -EINVAL;
-
- version = ipc_parse_version(&cmd);
- ns = current->nsproxy->ipc_ns;

switch (cmd) {
case IPC_INFO:
@@ -490,6 +484,7 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)

if (!buf)
return -EFAULT;
+
/*
* We must not return kernel stack data.
* due to padding, it's not enough
@@ -521,7 +516,8 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
return -EFAULT;
return (max_id < 0) ? 0 : max_id;
}
- case MSG_STAT: /* msqid is an index rather than a msg queue id */
+
+ case MSG_STAT:
case IPC_STAT:
{
struct msqid64_ds tbuf;
@@ -565,19 +561,42 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
return -EFAULT;
return success_return;
}
- case IPC_SET:
- case IPC_RMID:
- err = msgctl_down(ns, msqid, cmd, buf, version);
- return err;
+
default:
- return -EINVAL;
+ return -EINVAL;
}

+ return err;
out_unlock:
msg_unlock(msq);
return err;
}

+SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
+{
+ int version;
+ struct ipc_namespace *ns;
+
+ if (msqid < 0 || cmd < 0)
+ return -EINVAL;
+
+ version = ipc_parse_version(&cmd);
+ ns = current->nsproxy->ipc_ns;
+
+ switch (cmd) {
+ case IPC_INFO:
+ case MSG_INFO:
+ case MSG_STAT: /* msqid is an index rather than a msg queue id */
+ case IPC_STAT:
+ return msgctl_nolock(ns, msqid, cmd, version, buf);
+ case IPC_SET:
+ case IPC_RMID:
+ return msgctl_down(ns, msqid, cmd, buf, version);
+ default:
+ return -EINVAL;
+ }
+}
+
static int testmsg(struct msg_msg *msg, long type, int mode)
{
switch(mode)
--
1.7.11.7

2013-05-16 01:09:59

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 07/11] ipc,msg: introduce lockless functions to obtain the ipc object

Add msq_obtain_object() and msq_obtain_object_check(), which will
allow us to get the ipc object without acquiring the lock. Just
as with semaphores, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/ipc/msg.c b/ipc/msg.c
index 20361b5..9ff295b 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -166,6 +166,27 @@ static inline struct msg_queue *msg_lock_check(struct ipc_namespace *ns,
return container_of(ipcp, struct msg_queue, q_perm);
}

+static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
+{
+ struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
+
+ if (IS_ERR(ipcp))
+ return ERR_CAST(ipcp);
+
+ return container_of(ipcp, struct msg_queue, q_perm);
+}
+
+static inline struct msg_queue *msq_obtain_object_check(struct ipc_namespace *ns,
+ int id)
+{
+ struct kern_ipc_perm *ipcp = ipc_obtain_object_check(&msg_ids(ns), id);
+
+ if (IS_ERR(ipcp))
+ return ERR_CAST(ipcp);
+
+ return container_of(ipcp, struct msg_queue, q_perm);
+}
+
static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s)
{
ipc_rmid(&msg_ids(ns), &s->q_perm);
--
1.7.11.7

2013-05-16 01:10:39

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them, creating
another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down()) explicitly
lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 24 +++++++++++++++++-------
ipc/sem.c | 27 ++++++++++++++++-----------
ipc/shm.c | 23 +++++++++++++++++------
ipc/util.c | 21 ++++++---------------
4 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 7a6f344..4eaeb08 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
return -EFAULT;
}

+ down_write(&msg_ids(ns).rw_mutex);
+ rcu_read_lock();
+
ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
&msqid64.msg_perm, msqid64.msg_qbytes);
- if (IS_ERR(ipcp))
- return PTR_ERR(ipcp);
+ if (IS_ERR(ipcp)) {
+ err = PTR_ERR(ipcp);
+ /* the ipc lock is not held upon failure */
+ goto out_unlock1;
+ }

msq = container_of(ipcp, struct msg_queue, q_perm);

err = security_msg_queue_msgctl(msq, cmd);
if (err)
- goto out_unlock;
+ goto out_unlock0;

switch (cmd) {
case IPC_RMID:
+ /* freeque unlocks the ipc object and rcu */
freeque(ns, ipcp);
goto out_up;
case IPC_SET:
if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
!capable(CAP_SYS_RESOURCE)) {
err = -EPERM;
- goto out_unlock;
+ goto out_unlock0;
}

err = ipc_update_perm(&msqid64.msg_perm, ipcp);
if (err)
- goto out_unlock;
+ goto out_unlock0;

msq->q_qbytes = msqid64.msg_qbytes;

@@ -450,8 +457,11 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
default:
err = -EINVAL;
}
-out_unlock:
- msg_unlock(msq);
+
+out_unlock0:
+ ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+ rcu_read_unlock();
out_up:
up_write(&msg_ids(ns).rw_mutex);
return err;
diff --git a/ipc/sem.c b/ipc/sem.c
index 7854fc8..f7d99f9 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1274,39 +1274,44 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
return -EFAULT;
}

+ down_write(&sem_ids(ns).rw_mutex);
+ rcu_read_lock();
+
ipcp = ipcctl_pre_down_nolock(ns, &sem_ids(ns), semid, cmd,
&semid64.sem_perm, 0);
- if (IS_ERR(ipcp))
- return PTR_ERR(ipcp);
+ if (IS_ERR(ipcp)) {
+ err = PTR_ERR(ipcp);
+ /* the ipc lock is not held upon failure */
+ goto out_unlock1;
+ }

sma = container_of(ipcp, struct sem_array, sem_perm);

err = security_sem_semctl(sma, cmd);
- if (err) {
- rcu_read_unlock();
- goto out_up;
- }
+ if (err)
+ goto out_unlock1;

- switch(cmd){
+ switch (cmd) {
case IPC_RMID:
sem_lock(sma, NULL, -1);
+ /* freeary unlocks the ipc object and rcu */
freeary(ns, ipcp);
goto out_up;
case IPC_SET:
sem_lock(sma, NULL, -1);
err = ipc_update_perm(&semid64.sem_perm, ipcp);
if (err)
- goto out_unlock;
+ goto out_unlock0;
sma->sem_ctime = get_seconds();
break;
default:
- rcu_read_unlock();
err = -EINVAL;
- goto out_up;
+ goto out_unlock1;
}

-out_unlock:
+out_unlock0:
sem_unlock(sma, -1);
+out_unlock1:
rcu_read_unlock();
out_up:
up_write(&sem_ids(ns).rw_mutex);
diff --git a/ipc/shm.c b/ipc/shm.c
index 1fd3d05f..078cc19 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -759,31 +759,42 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
return -EFAULT;
}

+ down_write(&shm_ids(ns).rw_mutex);
+ rcu_read_lock();
+
ipcp = ipcctl_pre_down(ns, &shm_ids(ns), shmid, cmd,
&shmid64.shm_perm, 0);
- if (IS_ERR(ipcp))
- return PTR_ERR(ipcp);
+ if (IS_ERR(ipcp)) {
+ err = PTR_ERR(ipcp);
+ /* the ipc lock is not held upon failure */
+ goto out_unlock1;
+ }

shp = container_of(ipcp, struct shmid_kernel, shm_perm);

err = security_shm_shmctl(shp, cmd);
if (err)
- goto out_unlock;
+ goto out_unlock0;
+
switch (cmd) {
case IPC_RMID:
+ /* do_shm_rmid unlocks the ipc object and rcu */
do_shm_rmid(ns, ipcp);
goto out_up;
case IPC_SET:
err = ipc_update_perm(&shmid64.shm_perm, ipcp);
if (err)
- goto out_unlock;
+ goto out_unlock0;
shp->shm_ctim = get_seconds();
break;
default:
err = -EINVAL;
}
-out_unlock:
- shm_unlock(shp);
+
+out_unlock0:
+ ipc_unlock_object(&shp->shm_perm);
+out_unlock1:
+ rcu_read_unlock();
out_up:
up_write(&shm_ids(ns).rw_mutex);
return err;
diff --git a/ipc/util.c b/ipc/util.c
index 92f0242..a746abb 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -745,8 +745,10 @@ int ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out)
* It must be called without any lock held and
* - retrieves the ipc with the given id in the given table.
* - performs some audit and permission check, depending on the given cmd
- * - returns the ipc with both ipc and rw_mutex locks held in case of success
+ * - returns the ipc with the ipc lock held in case of success
* or an err-code without any lock held otherwise.
+ *
+ * Call holding the both the rw_mutex and the rcu read lock.
*/
struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
struct ipc_ids *ids, int id, int cmd,
@@ -771,13 +773,10 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
int err = -EPERM;
struct kern_ipc_perm *ipcp;

- down_write(&ids->rw_mutex);
- rcu_read_lock();
-
ipcp = ipc_obtain_object_check(ids, id);
if (IS_ERR(ipcp)) {
err = PTR_ERR(ipcp);
- goto out_up;
+ goto err;
}

audit_ipc_obj(ipcp);
@@ -788,16 +787,8 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
euid = current_euid();
if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid) ||
ns_capable(ns->user_ns, CAP_SYS_ADMIN))
- return ipcp;
-
-out_up:
- /*
- * Unsuccessful lookup, unlock and return
- * the corresponding error.
- */
- rcu_read_unlock();
- up_write(&ids->rw_mutex);
-
+ return ipcp; /* successful lookup */
+err:
return ERR_PTR(err);
}

--
1.7.11.7

2013-05-16 14:12:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily

On Wed, May 15, 2013 at 6:07 PM, Davidlohr Bueso <[email protected]> wrote:
> This patchset continues the work that began in the sysv ipc semaphore scaling

Looks fine to me, but with the problems we had on the semaphore side,
and since we're out of the merge window, I'm not going to take it into
3.10 (I assume you weren't expecting that either). But hopefully it
can go into Andrew''s -mm tree and get some testing.

Linus

2013-05-16 16:41:23

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily

On Thu, 2013-05-16 at 07:12 -0700, Linus Torvalds wrote:
> On Wed, May 15, 2013 at 6:07 PM, Davidlohr Bueso <[email protected]> wrote:
> > This patchset continues the work that began in the sysv ipc semaphore scaling
>
> Looks fine to me, but with the problems we had on the semaphore side,
> and since we're out of the merge window, I'm not going to take it into
> 3.10 (I assume you weren't expecting that either). But hopefully it
> can go into Andrew''s -mm tree and get some testing.

Yes, it is intended for Andrew's tree.

2013-05-16 17:55:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock

Davidlohr Bueso <[email protected]> writes:

> This function currently acquires both the rw_mutex and the rcu lock on
> successful lookups, leaving the callers to explicitly unlock them, creating
> another two level locking situation.

Note that the rcu read lock is not a real lock, so it doesn't have the
usual ordering problems.

-Andi

--
[email protected] -- Speaking for myself only

2013-05-24 01:45:33

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily

ping, Andrew?

On Wed, 2013-05-15 at 18:07 -0700, Davidlohr Bueso wrote:
> This patchset continues the work that began in the sysv ipc semaphore scaling
> series: https://lkml.org/lkml/2013/3/20/546
>
> Just like semaphores used to be, sysv shared memory and msg queues also abuse the ipc
> lock, unnecessarily holding it for operations such as permission and security checks. This
> patchset mostly deals with mqueues, and while shared mem can be done in a very similar way,
> I want to get these patches out in the open first. It also does some pending cleanups,
> mostly focused on the two level locking we have in ipc code, taking care of ipc_addid()
> and ipcctl_pre_down_nolock() - yes there are still functions that need to be updated as well.
>
> I have tried to split each patch to be as readable and specific as possible, specially when
> shortening the critical regions, going one function at a time.
>
> Patch 1 moves the locking to be explicitly done by the callers of ipc_addid.
> It updates msg, sem and shm.
>
> Patches 2-3: are just wrappers around the ipc lock, initially suggested by Linus.
>
> Patch 4 is similar to patch 1, moving the rcu and rw_mutex locking out of
> ipcctl_pre_down_nolock so that the callers explicitly deals with them. It updates msg, sem
> and shm.
>
> Patch 5 shortens the critical region in msgctl_down(), using the lockless
> ipcctl_pre_down() function and only acquiring the ipc lock for RMID and SET commands.
>
> Patch 6 simply moves the what-should-be lockless logic of *_INFO and *_STAT commands
> out of msgctl() into a new function, msgctl_lockless().
>
> Patch 7 introduces the necessary wrappers around ipc_obtain_object[_check]()
> that will later enable us to separately lookup the ipc object without holding the lock.
>
> Patch 8 updates the previously added msgctl_nolock() to actually be lockless, reducing
> the critical region for the STAT commands.
>
> Patch 9 redoes the locking for msgsend().
>
> Patch 10 redoes the locking for msgrcv().
>
> Patch 11 removes the now unused msg_lock and msg_lock_check functions, replaced by
> a smarter combination of rcu, ipc_obtain_object and ipc_object_lock.
>
> Davidlohr Bueso (11):
> ipc: move rcu lock out of ipc_addid
> ipc: introduce ipc object locking helpers
> ipc: close open coded spin lock calls
> ipc: move locking out of ipcctl_pre_down_nolock
> ipc,msg: shorten critical region in semctl_down
> ipc,msg: introduce msgctl_nolock
> ipc,msg: introduce lockless functions to obtain the ipc object
> ipc,msg: make msgctl_nolock lockless
> ipc,msg: reduce critical region in msgsnd
> ipc,msg: make shorten critical region in msgrcv
> ipc: remove unused functions
>
> ipc/msg.c | 227 ++++++++++++++++++++++++++++++++++++++-----------------------
> ipc/sem.c | 42 +++++++-----
> ipc/shm.c | 32 ++++++---
> ipc/util.c | 25 ++-----
> ipc/util.h | 22 ++++--
> 5 files changed, 211 insertions(+), 137 deletions(-)
>
> Thanks,
> Davidlohr
>

2013-05-24 20:16:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock

On Wed, 15 May 2013 18:08:03 -0700 Davidlohr Bueso <[email protected]> wrote:

> This function currently acquires both the rw_mutex and the rcu lock on
> successful lookups, leaving the callers to explicitly unlock them, creating
> another two level locking situation.
>
> Make the callers (including those that still use ipcctl_pre_down()) explicitly
> lock and unlock the rwsem and rcu lock.
>
> ...
>
> @@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
> return -EFAULT;
> }
>
> + down_write(&msg_ids(ns).rw_mutex);
> + rcu_read_lock();
> +
> ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
> &msqid64.msg_perm, msqid64.msg_qbytes);
> - if (IS_ERR(ipcp))
> - return PTR_ERR(ipcp);
> + if (IS_ERR(ipcp)) {
> + err = PTR_ERR(ipcp);
> + /* the ipc lock is not held upon failure */

Terms like "the ipc lock" are unnecessarily vague. It's better to
identify the lock by name, eg msg_queue.q_perm.lock.

Where should readers go to understand the overall locking scheme? A
description of the overall object hierarchy and the role which the
various locks play?

Have you done any performance testing of this patchset? Just from
squinting at it, I'd expect the effects to be small...

2013-05-24 22:21:39

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock

On Fri, 2013-05-24 at 13:16 -0700, Andrew Morton wrote:
> On Wed, 15 May 2013 18:08:03 -0700 Davidlohr Bueso <[email protected]> wrote:
>
> > This function currently acquires both the rw_mutex and the rcu lock on
> > successful lookups, leaving the callers to explicitly unlock them, creating
> > another two level locking situation.
> >
> > Make the callers (including those that still use ipcctl_pre_down()) explicitly
> > lock and unlock the rwsem and rcu lock.
> >
> > ...
> >
> > @@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
> > return -EFAULT;
> > }
> >
> > + down_write(&msg_ids(ns).rw_mutex);
> > + rcu_read_lock();
> > +
> > ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
> > &msqid64.msg_perm, msqid64.msg_qbytes);
> > - if (IS_ERR(ipcp))
> > - return PTR_ERR(ipcp);
> > + if (IS_ERR(ipcp)) {
> > + err = PTR_ERR(ipcp);
> > + /* the ipc lock is not held upon failure */
>
> Terms like "the ipc lock" are unnecessarily vague. It's better to
> identify the lock by name, eg msg_queue.q_perm.lock.

Ok, I can send a patch to rephrase that to perm.lock when I send the shm
patchset (which will be very similar to this one).

>
> Where should readers go to understand the overall locking scheme? A
> description of the overall object hierarchy and the role which the
> various locks play?

That can be done, how about something like
Documentation/ipc-locking.txt?

>
> Have you done any performance testing of this patchset? Just from
> squinting at it, I'd expect the effects to be small...
>

Right, I don't expect much performance benefits. (a) unlike sems, I
haven't seen mqueues ever show up as any source of contention, and (b) I
think sysv mqueues have mostly been replaced by posix ones...

For testing, I did run these patches with ipccmd
(http://code.google.com/p/ipcmd/), pgbench, aim7 and Oracle on large
machines - no regressions but nothing new in terms of performance.

I suspect that shm could have a little more impact, but haven't looked
too much into it.

Thanks,
Davidlohr

2013-05-24 22:29:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock

On Fri, 24 May 2013 15:21:36 -0700 Davidlohr Bueso <[email protected]> wrote:

> >
> > Where should readers go to understand the overall locking scheme? A
> > description of the overall object hierarchy and the role which the
> > various locks play?
>
> That can be done, how about something like
> Documentation/ipc-locking.txt?

I find that code comments at definition sites work better, if possible.
Standalone documentation ends up being too verbose and never gets
updated or read...