Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp3164882rwb; Fri, 20 Jan 2023 12:07:17 -0800 (PST) X-Google-Smtp-Source: AMrXdXvDq40D2ctLX94Lu+21odlLWzmAD+Nrqp9/fIDNTkA/yEXK5SrvWJYtWv98fWK8U8pyHi84 X-Received: by 2002:a05:6a20:7f90:b0:b9:5fda:cd71 with SMTP id d16-20020a056a207f9000b000b95fdacd71mr6496945pzj.6.1674245237352; Fri, 20 Jan 2023 12:07:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674245237; cv=none; d=google.com; s=arc-20160816; b=wuXlj9U5YSswvbMQAblf4ApfYpxFqGvEuvc2AOosP41hmsU3302q7ZzX3dejZLsTaU 6sTqzGPuUeZdPQuIxTb6yT0By0k8WrovBS7ExHX1+1BMxFxtuRlPk1awAc4d0c9LLiQx j5A1xEkstkPAQAg/XQGnDqqUSkdxsO8LR5XzY+Os31jStSmjkWijvi4jK5Q82v/5BDdl MkMOUkR2iWNVPFno78EV20kGmZhDNYn0rFHvUyIk0VO01EClhPxPxVUzpUn7rIRPlwTQ mUojFJ5y1XgYXBsT0Vz1xhr54WwaIjIaocWC7ZyhjIs8OKtp9ocZhu4WAazc8A8dVbqR iyMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=fvYnxDbVh3blm2584V+xY4dve5gu04D8Z6+jMkNBYes=; b=YImE1j46yPTokO1aVfPYSnP85mU5+FSoifaY2uGBo9W68oAxILAsaL6WqyCPfIt4cD YdaEJw/nuRB/VNa8+INNm7US2uzcmJt0i1X4gLMmSwNRTxgR7MAyMWCqYHvF6SNLxx0C IMnItYg+sYIfLRrSFYKqWgElL9IIaH3oiaTlgvFe23FwtvLTyfXyv3S4DOKCybHU8pfM vqgFd3dpxtFwW0Jzv1Zf2XVtvmnWThh4eBSJSz1DqiE4CgJNsAgpSbHcoeXYgL900SJW alQHOHPFExrd/WM60VH3RlWTEHPUxwxTRv3Uyliu6uoUiS3t+RZtV+9VwmLVt65Ob0A7 ALng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h63-20020a638342000000b004cc2dc3fb01si9703699pge.48.2023.01.20.12.07.10; Fri, 20 Jan 2023 12:07:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229901AbjATShy (ORCPT + 50 others); Fri, 20 Jan 2023 13:37:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbjATShx (ORCPT ); Fri, 20 Jan 2023 13:37:53 -0500 Received: from netrider.rowland.org (netrider.rowland.org [192.131.102.5]) by lindbergh.monkeyblade.net (Postfix) with SMTP id F19A2618BC for ; Fri, 20 Jan 2023 10:37:51 -0800 (PST) Received: (qmail 43159 invoked by uid 1000); 20 Jan 2023 13:37:51 -0500 Date: Fri, 20 Jan 2023 13:37:51 -0500 From: Alan Stern To: "Paul E. McKenney" Cc: Andrea Parri , Jonas Oberhauser , Peter Zijlstra , will , "boqun.feng" , npiggin , dhowells , "j.alglave" , "luc.maranget" , akiyks , dlustig , joel , urezki , quic_neeraju , frederic , Kernel development list Subject: Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus test) Message-ID: References: <20230117174308.GK2948950@paulmck-ThinkPad-P17-Gen-1> <20230118035041.GQ2948950@paulmck-ThinkPad-P17-Gen-1> <20230118200601.GH2948950@paulmck-ThinkPad-P17-Gen-1> <20230119000214.GM2948950@paulmck-ThinkPad-P17-Gen-1> <20230120175804.GN2948950@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230120175804.GN2948950@paulmck-ThinkPad-P17-Gen-1> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 20, 2023 at 09:58:04AM -0800, Paul E. McKenney wrote: > On Fri, Jan 20, 2023 at 11:01:03AM -0500, Alan Stern wrote: > > On Wed, Jan 18, 2023 at 04:02:14PM -0800, Paul E. McKenney wrote: > > > There are pairs of per-CPU counters. One pair (->srcu_lock_count[]) > > > counts the number of srcu_down_read() operations that took place on > > > that CPU and another pair (->srcu_unlock_count[]) counts the number > > > of srcu_down_read() operations that took place on that CPU. There is > > > an ->srcu_idx that selects which of the ->srcu_lock_count[] elements > > > should be incremented by srcu_down_read(). Of course, srcu_down_read() > > > returns the value of ->srcu_idx that it used so that the matching > > > srcu_up_read() will use that same index when incrementing its CPU's > > > ->srcu_unlock_count[]. > > > > > > Grace periods go something like this: > > > > > > 1. Sum up the ->srcu_unlock_count[!ssp->srcu_idx] counters. > > > > > > 2. smp_mb(). > > > > > > 3. Sum up the ->srcu_unlock_count[!ssp->srcu_idx] counters. > > > > Presumably you meant to write "lock" here, not "unlock". > > You are quite right, and apologies for my confusion. > > > > 4. If the sums are not equal, retry from #1. > > > > > > 5. smp_mb(). > > > > > > 6. WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx); > > > > > > 7. smp_mb(). > > > > > > 8. Same loop as #1-4. > > > > > > So similar to r/w semaphores, but with two separate distributed counts. > > > This means that the number of readers need not go to zero at any given > > > point in time, consistent with the need to wait only on old readers. > > > > Reasoning from first principles, I deduce the following: > > > > You didn't describe exactly how srcu_down_read() and srcu_up_read() > > work. Evidently the unlock increment in srcu_up_read() should have > > release semantics, to prevent accesses from escaping out the end of the > > critical section. But the lock increment in srcu_down_read() has to be > > stronger than an acquire; to prevent accesses in the critical section > > escaping out the start, the increment has to be followed by smp_mb(). > > You got it! There is some work going on to see if srcu_read_lock()'s > smp_mb() can be weakened to pure release, but we will see. That doesn't make sense. Release ordering in srcu_read_lock() would only prevent accesses from leaking _in_ to the critical section. It would do nothing to prevent accesses from leaking _out_. > > The smp_mb() fences in steps 5 and 7 appear to be completely > > unnecessary. > > For correctness, agreed. Their purpose is instead forward progress. It's hard to say whether they would be effective at that. smp_mb() forces the processor to wait until some time when previous writes have become visible to all CPUs. But if you don't wait for that potentially excessively long delay, you may be able to continue and be lucky enough to find that all previous writes have already become visible to all the CPUs that matter. As far as I know, smp_mb() doesn't expedite the process of making previous writes visible. However, I am very far from being an expert on system architecture design. > One can argue that step 5 is redundant due to control dependency, but > control dependencies are fragile, and as you say below, this code is > nowhere near a fastpath. Also, control dependencies do not contribute to forward progress. > > Provided an smp_mb() is added at the very start and end of the grace > > period, the memory barrier in step 2 and its copy in step 8 can be > > demoted to smp_rmb(). > > This might need to be smp_mb() to allow srcu_read_unlock() to be > demoted to release ordering. Work in progress. I thought srcu_read_unlock() already _is_ a release operation. The smp_mb() fence mentioned earlier needs to be in srcu_read_lock(), not unlock(). And there's no way that one can be demoted. srcu_read_unlock() does not need a full smp_mb(). > > These changes would be small optimizations at best, and you may consider > > them unimportant in view of the fact that grace periods often last quite > > a long time. > > Agreed, keeping it simple and obvious is important on this code, which > is nowhere near a fastpath. The case of srcu_read_unlock() is another > thing altogether. Unfortunately, the full fence in srcu_read_lock() is unavoidable without very major changes to the algorithm -- probably a complete redesign. Without it, a read inside the critical section could be executed before the store part of the increment, which could lead synchronize_srcu() to believe that the critical section had not yet started when in fact it had. Alan