Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934853AbcKQRIk (ORCPT ); Thu, 17 Nov 2016 12:08:40 -0500 Received: from mail.efficios.com ([167.114.142.141]:53492 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754320AbcKQRIN (ORCPT ); Thu, 17 Nov 2016 12:08:13 -0500 Date: Thu, 17 Nov 2016 16:33:14 +0000 (UTC) From: Mathieu Desnoyers To: "Paul E. McKenney" Cc: Lai Jiangshan , Boqun Feng , linux-kernel , Ingo Molnar , dipankar , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , dvhart , fweisbec , Oleg Nesterov , bobby prani , ldr709 Message-ID: <1091181941.5761.1479400394025.JavaMail.zimbra@efficios.com> In-Reply-To: <20161117155326.GZ3612@linux.vnet.ibm.com> References: <20161114183636.GA28589@linux.vnet.ibm.com> <20161115143700.GZ4127@linux.vnet.ibm.com> <20161117143012.GB5227@tardis.cn.ibm.com> <1189271890.5446.1479396692585.JavaMail.zimbra@efficios.com> <1452112.5481.1479397088449.JavaMail.zimbra@efficios.com> <20161117155326.GZ3612@linux.vnet.ibm.com> Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.1_GA_1670 (ZimbraWebClient - FF45 (Linux)/8.7.1_GA_1670) Thread-Topic: SRCU rewrite Thread-Index: 2uZMx+jF183xvWgzE7BhdUGjUy2R2A== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6948 Lines: 188 ----- On Nov 17, 2016, at 10:53 AM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Thu, Nov 17, 2016 at 03:38:08PM +0000, Mathieu Desnoyers wrote: >> ----- On Nov 17, 2016, at 10:31 AM, Mathieu Desnoyers >> mathieu.desnoyers@efficios.com wrote: >> >> > ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@gmail.com wrote: >> > >> >> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng wrote: >> >>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote: >> >>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney >> >>>> wrote: >> >>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote: >> >>>> >> >>>> >> >> >>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess >> >>>> >> the reason was because we have two percpu variables to increase. So with >> >>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in >> >>>> >> srcu_read_lock() and use this_cpu_inc() here? >> >>>> > >> >>>> > Quite possibly... >> >>>> > >> >>>> >> >>> >> >>> Hello, Lai ;-) >> >>> >> >>>> it will be nicer if it is removed. >> >>>> >> >>>> The reason for the preemption-disabled was also because we >> >>>> have to disallow any preemption between the fetching of the idx >> >>>> and the increasement. so that we have at most NR_CPUS worth >> >>>> of readers using the old index that haven't incremented the counters. >> >>>> >> >>> >> >>> After reading the comment for a while, I actually got a question, maybe >> >>> I miss something ;-) >> >>> >> >>> Why "at most NR_CPUS worth of readers using the old index haven't >> >>> incremented the counters" could save us from overflow the counter? >> >>> >> >>> Please consider the following case in current implementation: >> >>> >> >>> >> >>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...) >> >>> >> >>> one thread A is currently in __srcu_read_lock() and using idx = 1 and >> >>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s >> >>> have been called and returned with idx = 1, please note I think this is >> >>> possible because I assume we may have some code like this: >> >>> >> >>> unsigned long i = 0; >> >>> for (; i < ULONG_MAX; i++) >> >>> srcu_read_lock(); // return the same idx 1; >> >> >> >> this is the wrong usage of the api. >> >> >> >> >> >> you might rewrite it as: >> >> >> >> unsigned long index[2] = {0, 0}; >> >> unsigned long i = 0; >> >> for (; index[1] < ULONG_MAX; i++) >> >> index[srcu_read_lock()]++; >> >> >> >> >> >> I think we should add document to disallow this kind of usage. >> >> a reader should eat 4bytes on the memory at least. >> >> >> > >> > (the analysis below refers to the rewritten SRCU scheme) >> > >> > Let's presume we use the API correctly, as you describe (saving >> > the returned index of srcu_read_lock() somewhere). >> > >> > So for the sake of argument, we can either call srcu_read_lock >> > in a loop (during which we can be migrated), or call it >> > concurrently from various threads. The effect in terms of >> > overflow is pretty much the same. >> > >> > What is done here is incrementing per-cpu split-counters. In >> > the worse-case scenario, let's assume we're incrementing those >> > counters for a single index (0 or 1). >> > >> > If we think about this system-wide, we don't really care about >> > the overflow of a single CPU counter, because what matters is >> > the difference between the overall nr_lock - nr_unlock counts >> > for a given index, once summed up by synchronize_srcu(). >> > >> > So the only situation that could lead to an overflow that matters >> > is if synchronize_srcu() see ULONG_MAX more increments of nr_lock >> > than the observed number of nr_unlock increments. >> > >> > So the bound is not only about the number of concurrent SRCU >> > readers, but also about the number of SRCU readers that may >> > appear between the moment synchronize_srcu() reads the nr_unlock >> > per-cpu counters and the moment it reads the nr_lock counters. >> > >> > This maximum bound of ULONG_MAX - 1 therefore applies to the >> > sum of: >> > - numner of concurrent SRCU read-side critical sections active >> > at the same time, >> > - number of SRCU read-side critical sections beginning after >> > synchronize_srcu() has read the nr_unlock counters, before >> > it reads the nr_lock counters. >> >> Now that I think of it, since we flip the period before summing >> the nr_unlock counter, we cannot have any newcoming readers appearing >> within the target period while we execute synchronize_srcu(). >> So it ends up being a limit on the number of concurrent SRCU >> read-side c.s. active at the same time. (you can scratch the >> second bullet above). > > We can have NR_CPUS worth of them -- those that have fetched the > index, but not yet incremented their counter. Ah, yes, due to preemption being disabled across those operations. More precisely, it would be NR_CPUS - 1, because synchronize_srcu() needs to run somewhere ;-) > > But if the updater fails to see their counter increment, then > their next srcu_read_lock() is guaranteed to see the new index. Got it, thanks! Mathieu > > Thanx, Paul > >> Thanks, >> >> Mathieu >> >> >> >> > You guys seem to see cases that would require a lower max nr >> > reader bound, but I'm afraid I don't quite understand them. >> > >> > Thanks, >> > >> > Mathieu >> > >> > >> >>> >> >>> And none of the corresponding srcu_read_unlock() has been called; >> >>> >> >>> In this case, at the time thread A increases the percpu c[idx], that >> >>> will result in an overflow, right? So even one reader using old idx will >> >>> result in overflow. >> >>> >> >>> >> >>> I think we won't be hit by overflow is not because we have few readers >> >>> using old idx, it's because there are unlikely ULONG_MAX + 1 >> >>> __srcu_read_lock() called for the same idx, right? And the reason of >> >>> this is much complex: because we won't have a fair mount of threads in >> >>> the system, because no thread will nest srcu many levels, because there >> >>> won't be a lot readers using old idx. >> >>> >> >>> And this will still be true if we use new mechanism and shrink the >> >>> preemption disabled section, right? >> >>> >> >>> Regards, >> >>> Boqun >> >>> >> >>>> if we remove the preempt_{dis,en}able(). we must change the >> >>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume >> >>>> one on-going reader needs at least need 4bytes at the stack). it is still safe. >> >>>> >> >>>> but we still need to think more if we want to remove the preempt_{dis,en}able(). >> >>>> >> >>>> Thanks >> >> >> Lai >> > >> > -- >> > Mathieu Desnoyers >> > EfficiOS Inc. >> > http://www.efficios.com >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com