Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752699AbcKRN2c (ORCPT ); Fri, 18 Nov 2016 08:28:32 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48612 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752336AbcKRN2D (ORCPT ); Fri, 18 Nov 2016 08:28:03 -0500 Date: Fri, 18 Nov 2016 05:27:54 -0800 From: "Paul E. McKenney" To: Lance Roy Cc: Lai Jiangshan , LKML , Ingo Molnar , dipankar@in.ibm.com, akpm@linux-foundation.org, Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , David Howells , Eric Dumazet , dvhart@linux.intel.com, =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , oleg@redhat.com, pranith kumar Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite Reply-To: paulmck@linux.vnet.ibm.com References: <20161114183636.GA28589@linux.vnet.ibm.com> <20161117115304.0ff3f84e@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161117115304.0ff3f84e@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111813-0004-0000-0000-000010E21E0C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006099; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00782402; UDB=6.00377542; IPR=6.00559871; BA=6.00004892; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013368; XFM=3.00000011; UTC=2016-11-18 13:27:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111813-0005-0000-0000-00007AB14969 Message-Id: <20161118132754.GP3612@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-18_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611180236 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3157 Lines: 59 On Thu, Nov 17, 2016 at 11:53:04AM -0800, Lance Roy wrote: > On Thu, 17 Nov 2016 21:58:34 +0800 > Lai Jiangshan wrote: > > from the changelog, it sounds like that "ULONG_MAX - NR_CPUS" is the limit > > of the implements(old or this one). but actually the real max number of > > active readers is much smaller, I think ULONG_MAX/4 can be used here instead > > and that part of the changelog can be removed. > In the old version, there are two separate limits. There first is that there > are no more than ULONG_MAX nested or parallel readers, as otherwise ->c[] would > overflow. > > The other limit is to prevent ->seq[] from overflowing during > srcu_readers_active_idx_check(). For this to happen, there must be ULONG_MAX+1 > readers that loaded ->completed before srcu_flip() was run which then increment > ->seq[]. The ->seq[] array is supposed to prevent > srcu_readers_active_idx_check() from completing successfully if any such > readers increment ->seq[], because otherwise they could decrement ->c[] while > it is being read, which could cause it to incorrectly report that there are no > active readers. If ->seq[] overflows then there is nothing (except how > improbable it is) to prevent this from happening. > > I used to think (because of the previous comment) that there could be at most > one such increment of ->seq[] per CPU, as they would have to be using to old > value of ->completed and preemption would be disabled. This is not the case > because there are no barriers around srcu_flip(), so the processor is not > required to increment ->completed before reading ->seq[] the first time, nor is > it required to wait until it is done reading ->seq[] the second time before > incrementing. This means that the following code could cause ->seq[] to > increment an arbitrarily large number of times between the two ->seq[] loads in > srcu_readers_active_idx_check(). > while (true) { > int idx = srcu_read_lock(sp); > srcu_read_unlock(sp, idx); > } I also initially thought that there would need to be a memory barrier immediately after srcu_flip(). But after further thought, I don't believe that this is the case. The key point is that updaters do the flip, sum the unlock counters, do a full memory barrier, then sum the lock counters. We therefore know that if an updater sees an unlock, it is guaranteed to see the corresponding lock. Which prevents negative sums. However, it is true that the flip and the unlock reads can be interchanged. This can result in failing to see a count of zero, but it cannot result in spuriously seeing a count of zero. More to this point, if an updater fails to see a lock, the next time that CPU/task does an srcu_read_lock(), that CPU/task is guaranteed to see the new value of the index. This limits the number of CPUs/tasks that can be using the old value of the index. Given that preemption is disabled across the fetch of the index and the increment of the lock count, that number is NR_CPUS-1, given that the updater has to be running on one of the CPUs (as Mathieu pointed out earlier in this thread). Or am I missing something? Thanx, Paul