Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933394AbWKSVqu (ORCPT ); Sun, 19 Nov 2006 16:46:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933395AbWKSVqu (ORCPT ); Sun, 19 Nov 2006 16:46:50 -0500 Received: from pool-71-111-72-250.ptldor.dsl-w.verizon.net ([71.111.72.250]:41578 "EHLO IBM-8EC8B5596CA.beaverton.ibm.com") by vger.kernel.org with ESMTP id S933394AbWKSVqt (ORCPT ); Sun, 19 Nov 2006 16:46:49 -0500 Date: Sun, 19 Nov 2006 13:43:15 -0800 From: "Paul E. McKenney" To: Alan Stern Cc: "Paul E. McKenney" , Oleg Nesterov , Kernel development list Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync Message-ID: <20061119214315.GI4427@us.ibm.com> Reply-To: paulmck@us.ibm.com References: <20061118171410.GB4427@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13185 Lines: 363 On Sat, Nov 18, 2006 at 04:00:35PM -0500, Alan Stern wrote: > On Sat, 18 Nov 2006, Paul E. McKenney wrote: > > > > > @@ -94,7 +112,8 @@ void cleanup_srcu_struct(struct srcu_str > > > > WARN_ON(sum); /* Leakage unless caller handles error. */ > > > > if (sum != 0) > > > > return; > > > > - free_percpu(sp->per_cpu_ref); > > > > + if (sp->per_cpu_ref != NULL) > > > > + free_percpu(sp->per_cpu_ref); > > > > > > Now that Andrew has accepted the "allow free_percpu(NULL)" change, you can > > > remove the test here. > > > > OK. I thought that there was some sort of error-checking involved, > > but if not, will fix. > > Just make sure that _you_ have the free_percpu(NULL) patch installed on > your machine before testing this -- otherwise you'll get a nice hard > crash! 'Nuff said -- will leave this fixup till later. ;-) > > > > preempt_disable(); > > > > idx = sp->completed & 0x1; > > > > - barrier(); /* ensure compiler looks -once- at sp->completed. */ > > > > - per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; > > > > - srcu_barrier(); /* ensure compiler won't misorder critical section. */ > > > > + sap = rcu_dereference(sp->per_cpu_ref); > > > > + if (likely(sap != NULL)) { > > > > + barrier(); /* ensure compiler looks -once- at sp->completed. */ > > > > > > Put this barrier() back where the old one was (outside the "if"). > > > > Why? Outside this "if", I don't use "sap". > > Because it looks funny to see the comment here talking about sp->completed > when sp->completed hasn't been used for several lines. (Maybe it looks > less funny in the patched source than in the patch itself.) The best > place to prevent extra accesses of sp->completed is immediately after > the required access. Good point -- and with the addition of the second element of hardluckref, it has to be hoisted out of the "if" in any case. > > > > + smp_processor_id())->c[idx]++; > > > > + smp_mb(); > > > > + preempt_enable(); > > > > + return idx; > > > > + } > > > > + if (mutex_trylock(&sp->mutex)) { > > > > + preempt_enable(); > > > > > > Move the preempt_enable() before the "if", then get rid of the > > > preempt_enable() after the "if" block. > > > > No can do. The preempt_enable() must follow the increment and > > the memory barrier, otherwise the synchronize_sched() inside > > synchronize_srcu() can't do its job. > > You misunderstood -- I was talking about the preempt_enable() in the last > line quoted above (not the one in the third line) and the "if > (mutex_trylock" (not the earlier "if (likely"). OK, I see your point -- but this has changed thoroughly with the addition of the second element of hardluckref. > > > > + if (sp->per_cpu_ref == NULL) > > > > + sp->per_cpu_ref = alloc_srcu_struct_percpu(); > > > > > > It would be cleaner to put the mutex_unlock() and closing '}' right here. > > > > I can move the mutex_unlock() to this point, but I cannot otherwise > > merge the two following pieces of code -- at least not without doing > > an otherwise-gratuitous preempt_disable(). Which I suppose I could > > do, but seems like it would be more confusing than would the > > separate code. I will play with this a bit and see if I can eliminate > > the duplication. > > If you follow the advice above then you won't need to add a gratuitous > preempt_disable(). Try it and see how it comes out; the idea is that > you can use the same code for testing sp->per_cpu_ref regardless of > whether the mutex_trylock() or the call to alloc_srcu_struct_percpu() > succeeded. Understood, finally -- but the two-element hardluckref now requires greater preempt_disable() coverage. > > > What happens if a prior reader failed to allocate the memory but this call > > > succeeds? You need to check hardluckref before doing this. The same is > > > true in srcu_read_lock(). > > > > All accounted for by the fact that hardluckref is unconditionally > > added in by srcu_readers_active(). Right? > > Yes, you're right. > > > Will spin a new patch... > > Good -- it's getting pretty messy to look at this one! > > By the way, I think the fastpath for synchronize_srcu() should be safe, > now that you have added the memory barriers into srcu_read_lock() and > srcu_read_unlock(). You might as well try putting it in. > > Although now that I look at it again, you have forgotten to put smp_mb() > after the atomic_inc() call and before the atomic_dec(). In > srcu_read_unlock() you could just move the existing smp_mb() back before > the test of idx. Good catch again -- added smp_mb__before_atomic_dec() and smp_mb__after_atomic_inc(). The reason for avoiding moving the smp_mb() is that atomic_dec() implies a memory barrier on some architectures, such as x86. In these cases, smp_mb__before_atomic_dec() is a no-op. Signed-off-by: Paul E. McKenney (was @us.ibm.com) --- include/linux/srcu.h | 8 --- kernel/srcu.c | 126 +++++++++++++++++++++++++++------------------------ 2 files changed, 71 insertions(+), 63 deletions(-) diff -urpNa -X dontdiff linux-2.6.19-rc5/include/linux/srcu.h linux-2.6.19-rc5-dsrcu/include/linux/srcu.h --- linux-2.6.19-rc5/include/linux/srcu.h 2006-11-17 15:44:40.000000000 -0800 +++ linux-2.6.19-rc5-dsrcu/include/linux/srcu.h 2006-11-19 13:33:35.000000000 -0800 @@ -35,19 +35,15 @@ struct srcu_struct { int completed; struct srcu_struct_array *per_cpu_ref; struct mutex mutex; + atomic_t hardluckref[2]; }; -#ifndef CONFIG_PREEMPT -#define srcu_barrier() barrier() -#else /* #ifndef CONFIG_PREEMPT */ -#define srcu_barrier() -#endif /* #else #ifndef CONFIG_PREEMPT */ - int init_srcu_struct(struct srcu_struct *sp); void cleanup_srcu_struct(struct srcu_struct *sp); int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp); void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); +int srcu_readers_active(struct srcu_struct *sp); #endif diff -urpNa -X dontdiff linux-2.6.19-rc5/kernel/srcu.c linux-2.6.19-rc5-dsrcu/kernel/srcu.c --- linux-2.6.19-rc5/kernel/srcu.c 2006-11-17 15:44:40.000000000 -0800 +++ linux-2.6.19-rc5-dsrcu/kernel/srcu.c 2006-11-19 13:40:33.000000000 -0800 @@ -34,6 +34,18 @@ #include #include +/* + * Initialize the per-CPU array, returning the pointer. + */ +static inline struct srcu_struct_array *alloc_srcu_struct_percpu(void) +{ + struct srcu_struct_array *sap; + + sap = alloc_percpu(struct srcu_struct_array); + smp_wmb(); + return (sap); +} + /** * init_srcu_struct - initialize a sleep-RCU structure * @sp: structure to initialize. @@ -46,7 +58,9 @@ int init_srcu_struct(struct srcu_struct { sp->completed = 0; mutex_init(&sp->mutex); - sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array); + sp->per_cpu_ref = alloc_srcu_struct_percpu(); + atomic_set(&sp->hardluckref[0], 0); + atomic_set(&sp->hardluckref[1], 0); return (sp->per_cpu_ref ? 0 : -ENOMEM); } @@ -58,12 +72,15 @@ int init_srcu_struct(struct srcu_struct static int srcu_readers_active_idx(struct srcu_struct *sp, int idx) { int cpu; + struct srcu_struct_array *sap; int sum; sum = 0; - for_each_possible_cpu(cpu) - sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx]; - return sum; + sap = rcu_dereference(sp->per_cpu_ref); + if (likely(sap != NULL)) + for_each_possible_cpu(cpu) + sum += per_cpu_ptr(sap, cpu)->c[idx]; + return sum + atomic_read(&sp->hardluckref[idx]); } /** @@ -94,7 +111,8 @@ void cleanup_srcu_struct(struct srcu_str WARN_ON(sum); /* Leakage unless caller handles error. */ if (sum != 0) return; - free_percpu(sp->per_cpu_ref); + if (sp->per_cpu_ref != NULL) + free_percpu(sp->per_cpu_ref); sp->per_cpu_ref = NULL; } @@ -105,18 +123,41 @@ void cleanup_srcu_struct(struct srcu_str * Counts the new reader in the appropriate per-CPU element of the * srcu_struct. Must be called from process context. * Returns an index that must be passed to the matching srcu_read_unlock(). + * The index is mapped to negative numbers if the srcu_struct is not and + * cannot be initialized. */ int srcu_read_lock(struct srcu_struct *sp) { int idx; + struct srcu_struct_array *sap; preempt_disable(); idx = sp->completed & 0x1; barrier(); /* ensure compiler looks -once- at sp->completed. */ - per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; - srcu_barrier(); /* ensure compiler won't misorder critical section. */ + sap = rcu_dereference(sp->per_cpu_ref); + if (likely(sap != NULL)) { + per_cpu_ptr(sap, smp_processor_id())->c[idx]++; + smp_mb(); + preempt_enable(); + return idx; + } + if (mutex_trylock(&sp->mutex)) { + preempt_enable(); + if (sp->per_cpu_ref == NULL) + sp->per_cpu_ref = alloc_srcu_struct_percpu(); + if (sp->per_cpu_ref == NULL) { + mutex_unlock(&sp->mutex); + preempt_disable(); + idx = sp->completed & 0x1; + } else { + mutex_unlock(&sp->mutex); + return srcu_read_lock(sp); + } + } + atomic_inc(&sp->hardluckref[idx]); + smp_mb__after_atomic_inc(); preempt_enable(); - return idx; + return -1 - idx; } /** @@ -131,10 +172,16 @@ int srcu_read_lock(struct srcu_struct *s */ void srcu_read_unlock(struct srcu_struct *sp, int idx) { - preempt_disable(); - srcu_barrier(); /* ensure compiler won't misorder critical section. */ - per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--; - preempt_enable(); + if (likely(idx <= 0)) { + preempt_disable(); + smp_mb(); + per_cpu_ptr(rcu_dereference(sp->per_cpu_ref), + smp_processor_id())->c[idx]--; + preempt_enable(); + return; + } + smp_mb__before_atomic_dec(); + atomic_dec(&sp->hardluckref[-1 - idx]); } /** @@ -158,6 +205,11 @@ void synchronize_srcu(struct srcu_struct idx = sp->completed; mutex_lock(&sp->mutex); + /* Initialize if not already initialized. */ + + if (sp->per_cpu_ref == NULL) + sp->per_cpu_ref = alloc_srcu_struct_percpu(); + /* * Check to see if someone else did the work for us while we were * waiting to acquire the lock. We need -two- advances of @@ -173,65 +225,25 @@ void synchronize_srcu(struct srcu_struct return; } - synchronize_sched(); /* Force memory barrier on all CPUs. */ - - /* - * The preceding synchronize_sched() ensures that any CPU that - * sees the new value of sp->completed will also see any preceding - * changes to data structures made by this CPU. This prevents - * some other CPU from reordering the accesses in its SRCU - * read-side critical section to precede the corresponding - * srcu_read_lock() -- ensuring that such references will in - * fact be protected. - * - * So it is now safe to do the flip. - */ - + smp_mb(); /* ensure srcu_read_lock() sees prior change first! */ idx = sp->completed & 0x1; sp->completed++; - synchronize_sched(); /* Force memory barrier on all CPUs. */ + synchronize_sched(); /* * At this point, because of the preceding synchronize_sched(), * all srcu_read_lock() calls using the old counters have completed. * Their corresponding critical sections might well be still * executing, but the srcu_read_lock() primitives themselves - * will have finished executing. + * will have finished executing. The "old" rank of counters + * can therefore only decrease, never increase in value. */ while (srcu_readers_active_idx(sp, idx)) schedule_timeout_interruptible(1); - synchronize_sched(); /* Force memory barrier on all CPUs. */ - - /* - * The preceding synchronize_sched() forces all srcu_read_unlock() - * primitives that were executing concurrently with the preceding - * for_each_possible_cpu() loop to have completed by this point. - * More importantly, it also forces the corresponding SRCU read-side - * critical sections to have also completed, and the corresponding - * references to SRCU-protected data items to be dropped. - * - * Note: - * - * Despite what you might think at first glance, the - * preceding synchronize_sched() -must- be within the - * critical section ended by the following mutex_unlock(). - * Otherwise, a task taking the early exit can race - * with a srcu_read_unlock(), which might have executed - * just before the preceding srcu_readers_active() check, - * and whose CPU might have reordered the srcu_read_unlock() - * with the preceding critical section. In this case, there - * is nothing preventing the synchronize_sched() task that is - * taking the early exit from freeing a data structure that - * is still being referenced (out of order) by the task - * doing the srcu_read_unlock(). - * - * Alternatively, the comparison with "2" on the early exit - * could be changed to "3", but this increases synchronize_srcu() - * latency for bulk loads. So the current code is preferred. - */ + smp_mb(); /* must see critical section prior to srcu_read_unlock() */ mutex_unlock(&sp->mutex); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/