Received: by 2002:a05:6358:f14:b0:e5:3b68:ec04 with SMTP id b20csp5165899rwj; Tue, 20 Dec 2022 21:55:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXv0dykQpBOK8SXoyeta9ctP1W+P06W3s2jGozX5kivBsKJVviqCAoDlWhyeI7FTyZBhWUed X-Received: by 2002:a62:18c9:0:b0:580:17d9:5b25 with SMTP id 192-20020a6218c9000000b0058017d95b25mr1122275pfy.24.1671602149007; Tue, 20 Dec 2022 21:55:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671602148; cv=none; d=google.com; s=arc-20160816; b=mpAsG3g3V/tG6azuXgamgifIUXXWe/nH0HE/f+sSO2EcGDVZvL2isFdbThV2z2qIyz 7TQ/WHMF61pKPMiKynPVf+wBQEsiwRmtVEEWMhIwIWwqnpCwLc6NMTWcxMPFndLc9EKB 2oWXPLIbeR6u/WBDoEvPEDQAe7vsbNu4I4e63+t6Dn+OWH+s6jPaxZdA/lCnL8S3flvN ZCxGGSppev6KTf3DpmYhYaQr20Xjr41H4nVkIpBTIIYYW5Y3K1FpPEAgfWtFWQADfeJA x3eloX4zQaVGKxKvIAx3ieaK1K/gvrJnYWw21CKaAiNLeycJTBoaD6Q9/E1SGeL/qBh/ BSmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=VwujNE44iVgoBOHodnRgRRanSX+Q9n1m4X8xH8g+7sM=; b=sR5PJdIbgf02Duv9nmtA3JqX0WLnEKK+f706Fx96d9JX6DqSI8MREcTKC9ueCW9yCk bvrTvh51vRgV/o4FJZfPbHE6ZTdbP/z+2vok6fVFt4MlodHKv7M66ITDMNxHym2k6K7w e2u4M9S6BF+ed0CXH/VP03E8W5jjj1b+E/3ac5uPcrbCYlVF8+Vg76ESTwIlzC/SgcsP O0cQsnbSGTR+2wSlODYghSchzz8yBNH5uTIix5blf/orZxQc1IiJl0g5V/boS9Wg22fe WQ96q8Yl+t1uo1LKFaIeRWsVXCdREdnQyJZ9RaN4fEeSNi4icw8bCQNombIFNfXZirpk YNvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=ap85Sybm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o190-20020a62cdc7000000b005766cd2128dsi14437100pfg.269.2022.12.20.21.55.39; Tue, 20 Dec 2022 21:55:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=ap85Sybm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234005AbiLUFDI (ORCPT + 69 others); Wed, 21 Dec 2022 00:03:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbiLUFDD (ORCPT ); Wed, 21 Dec 2022 00:03:03 -0500 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 936021CB1A for ; Tue, 20 Dec 2022 21:03:00 -0800 (PST) Received: by mail-qv1-xf30.google.com with SMTP id mn15so9624534qvb.13 for ; Tue, 20 Dec 2022 21:03:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=VwujNE44iVgoBOHodnRgRRanSX+Q9n1m4X8xH8g+7sM=; b=ap85Sybm93zsxVaMi8bpUlC8wEntx/SEma58OrHgtIrcDSqOXU4eEzxbaKx5ctrgjR trJxr7ikyZe7ZeHZAp2KJkE5kjJH5edzeDLHiQAWqxqBhH/DXwzrwZtkjCsKScIwXBXh B+jK5Ixm5ZucyHAS2BvtVpA0XyNLCZJAVzAW8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VwujNE44iVgoBOHodnRgRRanSX+Q9n1m4X8xH8g+7sM=; b=HGcFaMkjlVEV4qVC0iDiM0Coc0Pqk/el80bE9l+0FSNeWgald1IFCytkgyYCi4wfn8 cjZgv+8D2joQoqP58r3d5mAAIPSs4LBRh+1dJ4dyYrvsbkoohskSRgDluoH79AeCp1A3 z8YUkaX4I5msy+v/qt9rl7HTxRtHNnKyWawBNlgKyf8MqJPcvIvniuybx8t2cMA8JHsw X7+i5QaD1t6RMrUmVIfv6zsSOeiZBT80QFJVHM8IDVan1P5KGneeHZq/723t7QfduAMi nZ7ZFBrEoIO0rKrvtMxfoeiV+4XH9dS7A0e3TrkrDcHY7yizdOQSwbvvrlnLNHR5wxfF kzlA== X-Gm-Message-State: AFqh2koEzDB0nlg4oyr7QGQk+MOOABdpuVWdb04nA1bXnUUfrqijnfN7 jjf6oUqKO/P0A4WMGvT6Q4spIg== X-Received: by 2002:a0c:ea51:0:b0:51a:a201:5619 with SMTP id u17-20020a0cea51000000b0051aa2015619mr747057qvp.13.1671598979604; Tue, 20 Dec 2022 21:02:59 -0800 (PST) Received: from smtpclient.apple (c-98-249-43-138.hsd1.va.comcast.net. [98.249.43.138]) by smtp.gmail.com with ESMTPSA id bm32-20020a05620a19a000b0070209239b87sm10119612qkb.41.2022.12.20.21.02.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 20 Dec 2022 21:02:58 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Joel Fernandes Mime-Version: 1.0 (1.0) Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier Date: Wed, 21 Dec 2022 00:02:48 -0500 Message-Id: <969CAAB7-5CBE-45F4-AE12-93E51D13F146@joelfernandes.org> References: Cc: Neeraj Upadhyay , linux-kernel@vger.kernel.org, Josh Triplett , Lai Jiangshan , "Paul E. McKenney" , rcu@vger.kernel.org, Steven Rostedt In-Reply-To: To: Mathieu Desnoyers X-Mailer: iPhone Mail (20B101) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Dec 20, 2022, at 10:51 PM, Mathieu Desnoyers wrote: >=20 > =EF=BB=BFOn 2022-12-20 15:55, Joel Fernandes wrote: >>>> On Dec 20, 2022, at 1:29 PM, Joel Fernandes wr= ote: >>>=20 >>> =EF=BB=BF >>>=20 >>>>> On Dec 20, 2022, at 1:13 PM, Mathieu Desnoyers wrote: >>>>>=20 >>>>> =EF=BB=BFOn 2022-12-20 13:05, Joel Fernandes wrote: >>>>> Hi Mathieu, >>>>>> On Tue, Dec 20, 2022 at 5:00 PM Mathieu Desnoyers >>>>>> wrote: >>>>>>=20 >>>>>> On 2022-12-19 20:04, Joel Fernandes wrote: >>>>>>>> On Mon, Dec 19, 2022 at 7:55 PM Joel Fernandes wrote: >>>>> [...] >>>>>>>>> On a 64-bit system, where 64-bit counters are used, AFAIU this nee= d to >>>>>>>>> be exactly 2^64 read-side critical sections. >>>>>>>>=20 >>>>>>>> Yes, but what about 32-bit systems? >>>>>>=20 >>>>>> The overflow indeed happens after 2^32 increments, just like seqlock.= >>>>>> The question we need to ask is therefore: if 2^32 is good enough for >>>>>> seqlock, why isn't it good enough for SRCU ? >>>>> I think Paul said wrap around does happen with SRCU on 32-bit but I'll= >>>>> let him talk more about it. If 32-bit is good enough, let us also drop= >>>>> the size of the counters for 64-bit then? >>>>>>>>> There are other synchronization algorithms such as seqlocks which a= re >>>>>>>>> quite happy with much less protection against overflow (using a 32= -bit >>>>>>>>> counter even on 64-bit architectures). >>>>>>>>=20 >>>>>>>> The seqlock is an interesting point. >>>>>>>>=20 >>>>>>>>> For practical purposes, I suspect this issue is really just theore= tical. >>>>>>>>=20 >>>>>>>> I have to ask, what is the benefit of avoiding a flip and scanning >>>>>>>> active readers? Is the issue about grace period delay or performanc= e? >>>>>>>> If so, it might be worth prototyping that approach and measuring us= ing >>>>>>>> rcutorture/rcuscale. If there is significant benefit to current >>>>>>>> approach, then IMO it is worth exploring. >>>>>>=20 >>>>>> The main benefit I expect is improved performance of the grace period= >>>>>> implementation in common cases where there are few or no readers >>>>>> present, especially on machines with many cpus. >>>>>>=20 >>>>>> It allows scanning both periods (0/1) for each cpu within the same pa= ss, >>>>>> therefore loading both period's unlock counters sitting in the same >>>>>> cache line at once (improved locality), and then loading both period'= s >>>>>> lock counters, also sitting in the same cache line. >>>>>>=20 >>>>>> It also allows skipping the period flip entirely if there are no read= ers >>>>>> present, which is an -arguably- tiny performance improvement as well.= >>>>> The issue of counter wrap aside, what if a new reader always shows up >>>>> in the active index being scanned, then can you not delay the GP >>>>> indefinitely? It seems like writer-starvation is possible then (sure >>>>> it is possible also with preemption after reader-index-sampling, but >>>>> scanning active index deliberately will make that worse). Seqlock does= >>>>> not have such writer starvation just because the writer does not care >>>>> about what the readers are doing. >>>>=20 >>>> No, it's not possible for "current index" readers to starve the g.p. wi= th the side-rcu scheme, because the initial pass (sampling both periods) onl= y opportunistically skips flipping the period if there happens to be no read= ers in both periods. >>>>=20 >>>> If there are readers in the "non-current" period, the grace period wait= s for them. >>>>=20 >>>> If there are readers in the "current" period, it flips the period and t= hen waits for them. >>>=20 >>> Ok glad you already do that, this is what I was sort of leaning at in my= previous email as well, that is doing a hybrid approach. Sorry I did not kn= ow the details of your side-RCU to know you were already doing something lik= e that. >>>=20 >>>>=20 >>>>> That said, the approach of scanning both counters does seem attractive= >>>>> for when there are no readers, for the reasons you mentioned. Maybe a >>>>> heuristic to count the number of readers might help? If we are not >>>>> reader-heavy, then scan both. Otherwise, just scan the inactive ones, >>>>> and also couple that heuristic with the number of CPUs. I am >>>>> interested in working on such a design with you! Let us do it and >>>>> prototype/measure. ;-) >>>>=20 >>>> Considering that it would add extra complexity, I'm unsure what that ex= tra heuristic would improve over just scanning both periods in the first pas= s. >>>=20 >>> Makes sense, I think you indirectly implement a form of heuristic alread= y by flipping in case scanning both was not fruitful. >>>=20 >>>> I'll be happy to work with you on such a design :) I think we can borro= w quite a few concepts from side-rcu for this. Please be aware that my time i= s limited though, as I'm currently supposed to be on vacation. :) >>>=20 >>> Oh, I was more referring to after the holidays. I am also starting vacat= ion soon and limited In cycles ;-). It is probably better to enjoy the holid= ays and come back to this after. >>>=20 >>> I do want to finish my memory barrier studies of SRCU over the holidays s= ince I have been deep in the hole with that already. Back to the post flip m= emory barrier here since I think now even that might not be needed=E2=80=A6 >> In my view, the mb between the totaling of unlocks and totaling of locks= serves as the mb that is required to enforce the GP guarantee, which I thin= k is what Mathieu is referring to. >=20 > No, AFAIU you also need barriers at the beginning and end of synchronize_s= rcu to provide those guarantees: My bad, I got too hung up on the scan code. Indeed we need additional orderi= ng on synchronize side. Anyway, the full memory barriers are already implemented in the synchronize c= ode AFAICS (beginning and end). At least one of them full memory barriers di= rectly appears at the end of __synchronize_srcu(). But I dont want to say so= mething stupid in the middle of the night, so I will take my time to get bac= k on that. Thanks, Joel >=20 > * There are memory-ordering constraints implied by synchronize_srcu(). >=20 > Need for a barrier at the end of synchronize_srcu(): >=20 > * On systems with more than one CPU, when synchronize_srcu() returns, > * each CPU is guaranteed to have executed a full memory barrier since > * the end of its last corresponding SRCU read-side critical section > * whose beginning preceded the call to synchronize_srcu(). >=20 > Need for a barrier at the beginning of synchronize_srcu(): >=20 > * In addition, > * each CPU having an SRCU read-side critical section that extends beyond > * the return from synchronize_srcu() is guaranteed to have executed a > * full memory barrier after the beginning of synchronize_srcu() and before= > * the beginning of that SRCU read-side critical section. Note that these > * guarantees include CPUs that are offline, idle, or executing in user mod= e, > * as well as CPUs that are executing in the kernel. >=20 > Thanks, >=20 > Mathieu >=20 >> Neeraj, do you agree? >> Thanks. >>>=20 >>> Cheers, >>>=20 >>> - Joel >>>=20 >>>=20 >>>>=20 >>>> Thanks, >>>>=20 >>>> Mathieu >>>>=20 >>>> --=20 >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> https://www.efficios.com >>>>=20 >=20 > --=20 > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >=20