Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751860AbbEAVwL (ORCPT ); Fri, 1 May 2015 17:52:11 -0400 Received: from ns.horizon.com ([71.41.210.147]:63747 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751040AbbEAVwI (ORCPT ); Fri, 1 May 2015 17:52:08 -0400 Date: 1 May 2015 17:52:07 -0400 Message-ID: <20150501215207.25731.qmail@ns.horizon.com> From: "George Spelvin" To: dave@stgolabs.net, mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de Subject: Re: [PATCH 3/3] ipc/mqueue: lockless pipelined wakeups Cc: bigeasy@linutronix.de, clm@fb.com, dbueso@suse.de, linux-kernel@vger.kernel.org, linux@horizon.com, manfred@colorfullife.com, rostedt@goodmis.org, torvalds@linux-foundation.org In-Reply-To: <1430494072-30283-4-git-send-email-dave@stgolabs.net> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2845 Lines: 76 In general, Acked-by, but you're making me fix all your comments. :-) This is a nice use of the wake queue, since the code was already handling the same problem in a similar way with STATE_PENDING. > * The receiver accepts the message and returns without grabbing the queue >+ * spinlock. The used algorithm is different from sysv semaphores (ipc/sem.c): Is that last sentence even wanted? >+ * >+ * - Set pointer to message. >+ * - Queue the receiver task's for later wakeup (without the info->lock). It's "task" singular, and the apostrophe would be wrong if it were plural. >+ * - Update its state to STATE_READY. Now the receiver can continue. >+ * - Wake up the process after the lock is dropped. Should the process wake up >+ * before this wakeup (due to a timeout or a signal) it will either see >+ * STATE_READY and continue or acquire the lock to check the sate again. "check the sTate again". >+ wake_q_add(wake_q, receiver->task); >+ /* >+ * Rely on the implicit cmpxchg barrier from wake_q_add such >+ * that we can ensure that updating receiver->state is the last >+ * write operation: As once set, the receiver can continue, >+ * and if we don't have the reference count from the wake_q, >+ * yet, at that point we can later have a use-after-free >+ * condition and bogus wakeup. >+ */ > receiver->state = STATE_READY; How about: /* * There must be a write barrier here; setting STATE_READY * lets the receiver proceed without further synchronization. * The cmpxchg inside wake_q_add serves as the barrier here. */ The need for a wake queue to take a reference to avoid use-after-free is generic to wake queues, and handled in generic code; I don't see why it needs a comment here. >@@ -1084,6 +1094,7 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr, > ktime_t expires, *timeout = NULL; > struct timespec ts; > struct posix_msg_tree_node *new_leaf = NULL; >+ WAKE_Q(wake_q); > > if (u_abs_timeout) { > int res = prepare_timeout(u_abs_timeout, &expires, &ts); >@@ -1155,8 +1166,9 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr, > CURRENT_TIME; > > /* There is now free space in queue. */ >- pipelined_receive(info); >+ pipelined_receive(&wake_q, info); > spin_unlock(&info->lock); >+ wake_up_q(&wake_q); > ret = 0; > } > if (ret == 0) { Since WAKE_Q actually involves some initialization, would it make sense to move its declaration to inside the condition that needs it? (I'm also a fan of declaring variables in the smallest scope possible, just on general principles.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/