Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932933AbXBYGXr (ORCPT ); Sun, 25 Feb 2007 01:23:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932960AbXBYGXr (ORCPT ); Sun, 25 Feb 2007 01:23:47 -0500 Received: from pool-71-111-73-6.ptldor.dsl-w.verizon.net ([71.111.73.6]:21659 "EHLO IBM-8EC8B5596CA.beaverton.ibm.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932933AbXBYGXq (ORCPT ); Sun, 25 Feb 2007 01:23:46 -0500 Date: Sat, 24 Feb 2007 22:23:49 -0800 From: "Paul E. McKenney" To: jens.axboe@oracle.com Cc: linux-kernel@vger.kernel.org, oleg@tv-sign.ru, a.p.zijlstra@chello.nl, mingo@elte.hu, hch@infradead.org, akpm@osdl.org, stern@rowland.harvard.edu Subject: [PATCH] QRCU with lockless fastpath Message-ID: <20070225062349.GA17468@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6791 Lines: 193 Hello! This is an updated version of Oleg Nesterov's QRCU that avoids the earlier lock acquisition on the synchronize_qrcu() fastpath. This passes rcutorture on x86 and the weakly ordered POWER. A promela model of the code passes as noted before for 2 readers and 3 updaters and for 3 readers and 2 updaters. 3 readers and 3 updaters runs every machine that I have access to out of memory -- nothing like a little combinatorial explosion! However, after some thought, the proof ended up being simple enough: 1. If synchronize_qrcu() exits too soon, then by definition there has been a reader present during synchronize_srcu()'s full execution. 2. The counter corresponding to this reader will be at least 1 at all times. 3. The synchronize_qrcu() code forces at least one of the counters to be at least one at all times -- if there is a reader, the sum will be at least two. (Unfortunately, we cannot fetch the pair of counters atomically.) 4. Therefore, the only way that synchronize_qrcu()s fastpath can see a sum of 1 is if it races with another synchronize_qrcu() -- the first synchronize_qrcu() must read one of the counters before the second synchronize_qrcu() increments it, and must read the other counter after the second synchronize_qrcu() decrements it. There can be at most one reader present through this entire operation -- otherwise, the first synchronize_qrcu() will see a sum of 2 or greater. 5. But the second synchronize_qrcu() will not release the mutex until after the reader is done. During this time, the first synchronize_qrcu() will always see a sum of at least 2, and therefore cannot take the remainder of the fastpath until the reader is done. 6. Because the second synchronize_qrcu() holds the mutex, no other synchronize_qrcu() can manipulate the counters until the reader is done. A repeat of the race called out in #4 above therefore cannot happen until after the reader is done, in which case it is safe for the first synchronize_qrcu() to proceed. Therefore, two summations of the counter separated by a memory barrier suffices and the implementation shown below also suffices. (And, yes, the fastpath -could- check for a sum of zero and exit immediately, but this would help only in case of a three-way race between two synchronize_qrcu()s and a qrcu_read_unlock(), would add another compare, so is not worth it.) Signed-off-by: Paul E. McKenney --- include/linux/srcu.h | 22 +++++++++++++ kernel/srcu.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+) diff -urpNa -X dontdiff linux-2.6.19/include/linux/srcu.h linux-2.6.19-qrcu-fp/include/linux/srcu.h --- linux-2.6.19/include/linux/srcu.h 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19-qrcu-fp/include/linux/srcu.h 2007-02-05 16:00:18.000000000 -0800 @@ -27,6 +27,8 @@ #ifndef _LINUX_SRCU_H #define _LINUX_SRCU_H +#include + struct srcu_struct_array { int c[2]; }; @@ -50,4 +52,24 @@ void srcu_read_unlock(struct srcu_struct void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); +/* + * fully compatible with srcu, but optimized for writers. + */ + +struct qrcu_struct { + int completed; + atomic_t ctr[2]; + wait_queue_head_t wq; + struct mutex mutex; +}; + +int init_qrcu_struct(struct qrcu_struct *qp); +int qrcu_read_lock(struct qrcu_struct *qp); +void qrcu_read_unlock(struct qrcu_struct *qp, int idx); +void synchronize_qrcu(struct qrcu_struct *qp); + +static inline void cleanup_qrcu_struct(struct qrcu_struct *qp) +{ +} + #endif diff -urpNa -X dontdiff linux-2.6.19/kernel/srcu.c linux-2.6.19-qrcu-fp/kernel/srcu.c --- linux-2.6.19/kernel/srcu.c 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19-qrcu-fp/kernel/srcu.c 2007-02-11 18:10:35.000000000 -0800 @@ -256,3 +256,89 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock); EXPORT_SYMBOL_GPL(synchronize_srcu); EXPORT_SYMBOL_GPL(srcu_batches_completed); EXPORT_SYMBOL_GPL(srcu_readers_active); + +int init_qrcu_struct(struct qrcu_struct *qp) +{ + qp->completed = 0; + atomic_set(qp->ctr + 0, 1); + atomic_set(qp->ctr + 1, 0); + init_waitqueue_head(&qp->wq); + mutex_init(&qp->mutex); + + return 0; +} + +int qrcu_read_lock(struct qrcu_struct *qp) +{ + for (;;) { + int idx = qp->completed & 0x1; + if (likely(atomic_inc_not_zero(qp->ctr + idx))) + return idx; + } +} + +void qrcu_read_unlock(struct qrcu_struct *qp, int idx) +{ + if (atomic_dec_and_test(qp->ctr + idx)) + wake_up(&qp->wq); +} + +void synchronize_qrcu(struct qrcu_struct *qp) +{ + int idx; + + smp_mb(); /* Force preceding change to happen before fastpath check. */ + + /* + * Fastpath: If the two counters sum to "1" at a given point in + * time, there are no readers. However, it takes two separate + * loads to sample both counters, which won't occur simultaneously. + * So we might race with a counter switch, so that we might see + * ctr[0]==0, then the counter might switch, then we might see + * ctr[1]==1 (unbeknownst to us because there is a reader still + * there). So we do a read memory barrier and recheck. If the + * same race happens again, there must have been a second counter + * switch. This second counter switch could not have happened + * until all preceding readers finished, so if the condition + * is true both times, we may safely proceed. + * + * This relies critically on the atomic increment and atomic + * decrement being seen as executing in order. + */ + + if (atomic_read(&qp->ctr[0]) + atomic_read(&qp->ctr[1]) <= 1) { + smp_rmb(); /* Keep two checks independent. */ + if (atomic_read(&qp->ctr[0]) + atomic_read(&qp->ctr[1]) <= 1) + goto out; + } + + mutex_lock(&qp->mutex); + + idx = qp->completed & 0x1; + if (atomic_read(qp->ctr + idx) == 1) + goto out_unlock; + + atomic_inc(qp->ctr + (idx ^ 0x1)); + + /* + * Prevent subsequent decrement from being seen before previous + * increment -- such an inversion could cause the fastpath + * above to falsely conclude that there were no readers. Also, + * reduce the likelihood that qrcu_read_lock() will loop. + */ + + smp_mb__after_atomic_inc(); + qp->completed++; + + atomic_dec(qp->ctr + idx); + __wait_event(qp->wq, !atomic_read(qp->ctr + idx)); +out_unlock: + mutex_unlock(&qp->mutex); +out: + smp_mb(); /* force subsequent free after qrcu_read_unlock(). */ +} + +EXPORT_SYMBOL_GPL(init_qrcu_struct); +EXPORT_SYMBOL_GPL(qrcu_read_lock); +EXPORT_SYMBOL_GPL(qrcu_read_unlock); +EXPORT_SYMBOL_GPL(synchronize_qrcu); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/