Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934213AbcKQR6P (ORCPT ); Thu, 17 Nov 2016 12:58:15 -0500 Received: from mail.efficios.com ([167.114.142.141]:54297 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752275AbcKQR6M (ORCPT ); Thu, 17 Nov 2016 12:58:12 -0500 Date: Thu, 17 Nov 2016 15:31:32 +0000 (UTC) From: Mathieu Desnoyers To: Lai Jiangshan Cc: Boqun Feng , "Paul E. McKenney" , linux-kernel , Ingo Molnar , dipankar , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , dvhart , fweisbec , Oleg Nesterov , bobby prani , ldr709 Message-ID: <1189271890.5446.1479396692585.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20161114183636.GA28589@linux.vnet.ibm.com> <20161115014445.GC12110@tardis.cn.ibm.com> <20161115143700.GZ4127@linux.vnet.ibm.com> <20161117143012.GB5227@tardis.cn.ibm.com> Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.1_GA_1670 (ZimbraWebClient - FF45 (Linux)/8.7.1_GA_1670) Thread-Topic: SRCU rewrite Thread-Index: BF4VZ+iOuWw9IatsZvzJ3gaea+drVA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5079 Lines: 141 ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@gmail.com wrote: > On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng wrote: >> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote: >>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney >>> wrote: >>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote: >>> >>> >> >>> >> __srcu_read_lock() used to be called with preemption disabled. I guess >>> >> the reason was because we have two percpu variables to increase. So with >>> >> only one percpu right, could we remove the preempt_{dis,en}able() in >>> >> srcu_read_lock() and use this_cpu_inc() here? >>> > >>> > Quite possibly... >>> > >>> >> >> Hello, Lai ;-) >> >>> it will be nicer if it is removed. >>> >>> The reason for the preemption-disabled was also because we >>> have to disallow any preemption between the fetching of the idx >>> and the increasement. so that we have at most NR_CPUS worth >>> of readers using the old index that haven't incremented the counters. >>> >> >> After reading the comment for a while, I actually got a question, maybe >> I miss something ;-) >> >> Why "at most NR_CPUS worth of readers using the old index haven't >> incremented the counters" could save us from overflow the counter? >> >> Please consider the following case in current implementation: >> >> >> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...) >> >> one thread A is currently in __srcu_read_lock() and using idx = 1 and >> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s >> have been called and returned with idx = 1, please note I think this is >> possible because I assume we may have some code like this: >> >> unsigned long i = 0; >> for (; i < ULONG_MAX; i++) >> srcu_read_lock(); // return the same idx 1; > > this is the wrong usage of the api. > > > you might rewrite it as: > > unsigned long index[2] = {0, 0}; > unsigned long i = 0; > for (; index[1] < ULONG_MAX; i++) > index[srcu_read_lock()]++; > > > I think we should add document to disallow this kind of usage. > a reader should eat 4bytes on the memory at least. > (the analysis below refers to the rewritten SRCU scheme) Let's presume we use the API correctly, as you describe (saving the returned index of srcu_read_lock() somewhere). So for the sake of argument, we can either call srcu_read_lock in a loop (during which we can be migrated), or call it concurrently from various threads. The effect in terms of overflow is pretty much the same. What is done here is incrementing per-cpu split-counters. In the worse-case scenario, let's assume we're incrementing those counters for a single index (0 or 1). If we think about this system-wide, we don't really care about the overflow of a single CPU counter, because what matters is the difference between the overall nr_lock - nr_unlock counts for a given index, once summed up by synchronize_srcu(). So the only situation that could lead to an overflow that matters is if synchronize_srcu() see ULONG_MAX more increments of nr_lock than the observed number of nr_unlock increments. So the bound is not only about the number of concurrent SRCU readers, but also about the number of SRCU readers that may appear between the moment synchronize_srcu() reads the nr_unlock per-cpu counters and the moment it reads the nr_lock counters. This maximum bound of ULONG_MAX - 1 therefore applies to the sum of: - numner of concurrent SRCU read-side critical sections active at the same time, - number of SRCU read-side critical sections beginning after synchronize_srcu() has read the nr_unlock counters, before it reads the nr_lock counters. You guys seem to see cases that would require a lower max nr reader bound, but I'm afraid I don't quite understand them. Thanks, Mathieu >> >> And none of the corresponding srcu_read_unlock() has been called; >> >> In this case, at the time thread A increases the percpu c[idx], that >> will result in an overflow, right? So even one reader using old idx will >> result in overflow. >> >> >> I think we won't be hit by overflow is not because we have few readers >> using old idx, it's because there are unlikely ULONG_MAX + 1 >> __srcu_read_lock() called for the same idx, right? And the reason of >> this is much complex: because we won't have a fair mount of threads in >> the system, because no thread will nest srcu many levels, because there >> won't be a lot readers using old idx. >> >> And this will still be true if we use new mechanism and shrink the >> preemption disabled section, right? >> >> Regards, >> Boqun >> >>> if we remove the preempt_{dis,en}able(). we must change the >>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume >>> one on-going reader needs at least need 4bytes at the stack). it is still safe. >>> >>> but we still need to think more if we want to remove the preempt_{dis,en}able(). >>> >>> Thanks > >> Lai -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com