Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752336AbcKQVW7 (ORCPT ); Thu, 17 Nov 2016 16:22:59 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:46525 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750714AbcKQVW5 (ORCPT ); Thu, 17 Nov 2016 16:22:57 -0500 Date: Thu, 17 Nov 2016 07:53:26 -0800 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: Lai Jiangshan , Boqun Feng , linux-kernel , Ingo Molnar , dipankar , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , dvhart , fweisbec , Oleg Nesterov , bobby prani , ldr709 Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite Reply-To: paulmck@linux.vnet.ibm.com References: <20161114183636.GA28589@linux.vnet.ibm.com> <20161115014445.GC12110@tardis.cn.ibm.com> <20161115143700.GZ4127@linux.vnet.ibm.com> <20161117143012.GB5227@tardis.cn.ibm.com> <1189271890.5446.1479396692585.JavaMail.zimbra@efficios.com> <1452112.5481.1479397088449.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1452112.5481.1479397088449.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111715-0024-0000-0000-0000150B4C70 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006094; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00781986; UDB=6.00377287; IPR=6.00559491; BA=6.00004889; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013361; XFM=3.00000011; UTC=2016-11-17 15:53:30 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111715-0025-0000-0000-000046370BA8 Message-Id: <20161117155326.GZ3612@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-17_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611170280 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6430 Lines: 171 On Thu, Nov 17, 2016 at 03:38:08PM +0000, Mathieu Desnoyers wrote: > ----- On Nov 17, 2016, at 10:31 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@gmail.com wrote: > > > >> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng wrote: > >>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote: > >>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney > >>>> wrote: > >>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote: > >>>> > >>>> >> > >>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess > >>>> >> the reason was because we have two percpu variables to increase. So with > >>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in > >>>> >> srcu_read_lock() and use this_cpu_inc() here? > >>>> > > >>>> > Quite possibly... > >>>> > > >>>> > >>> > >>> Hello, Lai ;-) > >>> > >>>> it will be nicer if it is removed. > >>>> > >>>> The reason for the preemption-disabled was also because we > >>>> have to disallow any preemption between the fetching of the idx > >>>> and the increasement. so that we have at most NR_CPUS worth > >>>> of readers using the old index that haven't incremented the counters. > >>>> > >>> > >>> After reading the comment for a while, I actually got a question, maybe > >>> I miss something ;-) > >>> > >>> Why "at most NR_CPUS worth of readers using the old index haven't > >>> incremented the counters" could save us from overflow the counter? > >>> > >>> Please consider the following case in current implementation: > >>> > >>> > >>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...) > >>> > >>> one thread A is currently in __srcu_read_lock() and using idx = 1 and > >>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s > >>> have been called and returned with idx = 1, please note I think this is > >>> possible because I assume we may have some code like this: > >>> > >>> unsigned long i = 0; > >>> for (; i < ULONG_MAX; i++) > >>> srcu_read_lock(); // return the same idx 1; > >> > >> this is the wrong usage of the api. > >> > >> > >> you might rewrite it as: > >> > >> unsigned long index[2] = {0, 0}; > >> unsigned long i = 0; > >> for (; index[1] < ULONG_MAX; i++) > >> index[srcu_read_lock()]++; > >> > >> > >> I think we should add document to disallow this kind of usage. > >> a reader should eat 4bytes on the memory at least. > >> > > > > (the analysis below refers to the rewritten SRCU scheme) > > > > Let's presume we use the API correctly, as you describe (saving > > the returned index of srcu_read_lock() somewhere). > > > > So for the sake of argument, we can either call srcu_read_lock > > in a loop (during which we can be migrated), or call it > > concurrently from various threads. The effect in terms of > > overflow is pretty much the same. > > > > What is done here is incrementing per-cpu split-counters. In > > the worse-case scenario, let's assume we're incrementing those > > counters for a single index (0 or 1). > > > > If we think about this system-wide, we don't really care about > > the overflow of a single CPU counter, because what matters is > > the difference between the overall nr_lock - nr_unlock counts > > for a given index, once summed up by synchronize_srcu(). > > > > So the only situation that could lead to an overflow that matters > > is if synchronize_srcu() see ULONG_MAX more increments of nr_lock > > than the observed number of nr_unlock increments. > > > > So the bound is not only about the number of concurrent SRCU > > readers, but also about the number of SRCU readers that may > > appear between the moment synchronize_srcu() reads the nr_unlock > > per-cpu counters and the moment it reads the nr_lock counters. > > > > This maximum bound of ULONG_MAX - 1 therefore applies to the > > sum of: > > - numner of concurrent SRCU read-side critical sections active > > at the same time, > > - number of SRCU read-side critical sections beginning after > > synchronize_srcu() has read the nr_unlock counters, before > > it reads the nr_lock counters. > > Now that I think of it, since we flip the period before summing > the nr_unlock counter, we cannot have any newcoming readers appearing > within the target period while we execute synchronize_srcu(). > So it ends up being a limit on the number of concurrent SRCU > read-side c.s. active at the same time. (you can scratch the > second bullet above). We can have NR_CPUS worth of them -- those that have fetched the index, but not yet incremented their counter. But if the updater fails to see their counter increment, then their next srcu_read_lock() is guaranteed to see the new index. Thanx, Paul > Thanks, > > Mathieu > > > > > You guys seem to see cases that would require a lower max nr > > reader bound, but I'm afraid I don't quite understand them. > > > > Thanks, > > > > Mathieu > > > > > >>> > >>> And none of the corresponding srcu_read_unlock() has been called; > >>> > >>> In this case, at the time thread A increases the percpu c[idx], that > >>> will result in an overflow, right? So even one reader using old idx will > >>> result in overflow. > >>> > >>> > >>> I think we won't be hit by overflow is not because we have few readers > >>> using old idx, it's because there are unlikely ULONG_MAX + 1 > >>> __srcu_read_lock() called for the same idx, right? And the reason of > >>> this is much complex: because we won't have a fair mount of threads in > >>> the system, because no thread will nest srcu many levels, because there > >>> won't be a lot readers using old idx. > >>> > >>> And this will still be true if we use new mechanism and shrink the > >>> preemption disabled section, right? > >>> > >>> Regards, > >>> Boqun > >>> > >>>> if we remove the preempt_{dis,en}able(). we must change the > >>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume > >>>> one on-going reader needs at least need 4bytes at the stack). it is still safe. > >>>> > >>>> but we still need to think more if we want to remove the preempt_{dis,en}able(). > >>>> > >>>> Thanks > >> >> Lai > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >