Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4091866yba; Tue, 9 Apr 2019 10:58:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqzMnmto4kGbcZNkzfaJUZw7TxauPjfMMCKVLB+dypbxNNkeWgOrSywVRY+D8zJ35nZ65UuW X-Received: by 2002:a62:1d94:: with SMTP id d142mr37195624pfd.83.1554832686278; Tue, 09 Apr 2019 10:58:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554832686; cv=none; d=google.com; s=arc-20160816; b=SgjCnPTk5PMXsrebY6nnqQ+yuOZfBb/MX7AiEzYq4NfX+RZCClonuUPiB0f58FsOZ2 2vM4t+biTtiAvvUT64e4Gk86H7SkZVuGUCl9Sta/XhnUvvuu5wl6fknCHS9etLm3kIgL x5s/L/HPaNr5bhjj1c82TIpg0udQXxYB3iJliyUW7BVrfTkq6AaGqkiXr28fRVq9SJh0 ghjpGb759yW71oHQR4yf54jJ8mKLOwnCp40zK1q+34pL3Vc9eq2FC24mdssoEqA4K6Zw pxTBeqmkAn2aFHbwk/YQsh2dM1Rp2/qF5LNdv3D0QNfKQpzw6KuBWDoynpt/XXBRXf7r 1ZLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=I+0kAxBtB+3VydPPfzytbdGvz/xwZ/ieEtJ4rhOMqfo=; b=rJyp9iGw9vq88dRaedrQgFv8YoPCoJQxXoMj/S1RlbqGTwbgCPRZe71mJyRciY4I5T AZV5xwE+qp90VTb757UsxsRS88mUchdyXUHZ7PiqGFgF+taS34DgUviAQRdjvr2NKvj+ DKtSmxfFINYGMQiA38QOzvuhsWIPm99zY65updCWsgz5ut/wnNNqMKTvstOFuaPKNn+6 QwtVFNIQpn6DOeY2qt+AU7ahNSqSVBPny1Iiuk3OAUZo8WKu1/N6L7NWxm/d+OWIH2FN JhD+/YVB/V2z8GwHKPJOFrQc9UthbW+sQ+8ZrynW6f60DJhlohUAo3LR0ZroG/gggyNS jnAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h40si7634879plb.261.2019.04.09.10.57.50; Tue, 09 Apr 2019 10:58:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726573AbfDIRz7 (ORCPT + 99 others); Tue, 9 Apr 2019 13:55:59 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41440 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726383AbfDIRz6 (ORCPT ); Tue, 9 Apr 2019 13:55:58 -0400 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x39HsqFH071163 for ; Tue, 9 Apr 2019 13:55:56 -0400 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0b-001b2d01.pphosted.com with ESMTP id 2rs05cr3vq-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Apr 2019 13:55:55 -0400 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 9 Apr 2019 18:55:55 +0100 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 9 Apr 2019 18:55:49 +0100 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x39HtmK729950188 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Apr 2019 17:55:48 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 282C1B206C; Tue, 9 Apr 2019 17:55:48 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA5CAB2067; Tue, 9 Apr 2019 17:55:47 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.188]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Tue, 9 Apr 2019 17:55:47 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 25D6116C3D07; Tue, 9 Apr 2019 10:55:49 -0700 (PDT) Date: Tue, 9 Apr 2019 10:55:49 -0700 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: "Joel Fernandes, Google" , rcu , linux-kernel , Ingo Molnar , Lai Jiangshan , dipankar , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , fweisbec , Oleg Nesterov , linux-nvdimm , dri-devel , amd-gfx Subject: Re: [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules Reply-To: paulmck@linux.ibm.com References: <20190402142816.GA13084@linux.ibm.com> <20190408142230.GJ14111@linux.ibm.com> <1447252022.1166.1554734972823.JavaMail.zimbra@efficios.com> <20190408154616.GO14111@linux.ibm.com> <1489474416.1465.1554744287985.JavaMail.zimbra@efficios.com> <20190409154012.GC248418@google.com> <534133139.2374.1554825363211.JavaMail.zimbra@efficios.com> <20190409164031.GE14111@linux.ibm.com> <1958511501.2412.1554828325809.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1958511501.2412.1554828325809.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 19040917-0060-0000-0000-0000032A94AE X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010897; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000284; SDB=6.01186685; UDB=6.00621533; IPR=6.00967445; MB=3.00026365; MTD=3.00000008; XFM=3.00000015; UTC=2019-04-09 17:55:54 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19040917-0061-0000-0000-000048E3F594 Message-Id: <20190409175549.GG14111@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-09_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090113 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 09, 2019 at 12:45:25PM -0400, Mathieu Desnoyers wrote: > ----- On Apr 9, 2019, at 12:40 PM, paulmck paulmck@linux.ibm.com wrote: > > > On Tue, Apr 09, 2019 at 11:56:03AM -0400, Mathieu Desnoyers wrote: > >> ----- On Apr 9, 2019, at 11:40 AM, Joel Fernandes, Google joel@joelfernandes.org > >> wrote: > >> > >> > On Mon, Apr 08, 2019 at 01:24:47PM -0400, Mathieu Desnoyers wrote: > >> >> ----- On Apr 8, 2019, at 11:46 AM, paulmck paulmck@linux.ibm.com wrote: > >> >> > >> >> > On Mon, Apr 08, 2019 at 10:49:32AM -0400, Mathieu Desnoyers wrote: > >> >> >> ----- On Apr 8, 2019, at 10:22 AM, paulmck paulmck@linux.ibm.com wrote: > >> >> >> > >> >> >> > On Mon, Apr 08, 2019 at 09:05:34AM -0400, Mathieu Desnoyers wrote: > >> >> >> >> ----- On Apr 7, 2019, at 10:27 PM, paulmck paulmck@linux.ibm.com wrote: > >> >> >> >> > >> >> >> >> > On Sun, Apr 07, 2019 at 09:07:18PM +0000, Joel Fernandes wrote: > >> >> >> >> >> On Sun, Apr 07, 2019 at 04:41:36PM -0400, Mathieu Desnoyers wrote: > >> >> >> >> >> > > >> >> >> >> >> > ----- On Apr 7, 2019, at 3:32 PM, Joel Fernandes, Google joel@joelfernandes.org > >> >> >> >> >> > wrote: > >> >> >> >> >> > > >> >> >> >> >> > > On Sun, Apr 07, 2019 at 03:26:16PM -0400, Mathieu Desnoyers wrote: > >> >> >> >> >> > >> ----- On Apr 7, 2019, at 9:59 AM, paulmck paulmck@linux.ibm.com wrote: > >> >> >> >> >> > >> > >> >> >> >> >> > >> > On Sun, Apr 07, 2019 at 06:39:41AM -0700, Paul E. McKenney wrote: > >> >> >> >> >> > >> >> On Sat, Apr 06, 2019 at 07:06:13PM -0400, Joel Fernandes wrote: > >> >> >> >> >> > >> > > >> >> >> >> >> > >> > [ . . . ] > >> >> >> >> >> > >> > > >> >> >> >> >> > >> >> > > diff --git a/include/asm-generic/vmlinux.lds.h > >> >> >> >> >> > >> >> > > b/include/asm-generic/vmlinux.lds.h > >> >> >> >> >> > >> >> > > index f8f6f04c4453..c2d919a1566e 100644 > >> >> >> >> >> > >> >> > > --- a/include/asm-generic/vmlinux.lds.h > >> >> >> >> >> > >> >> > > +++ b/include/asm-generic/vmlinux.lds.h > >> >> >> >> >> > >> >> > > @@ -338,6 +338,10 @@ > >> >> >> >> >> > >> >> > > KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ > >> >> >> >> >> > >> >> > > __stop___tracepoints_ptrs = .; \ > >> >> >> >> >> > >> >> > > *(__tracepoints_strings)/* Tracepoints: strings */ \ > >> >> >> >> >> > >> >> > > + . = ALIGN(8); \ > >> >> >> >> >> > >> >> > > + __start___srcu_struct = .; \ > >> >> >> >> >> > >> >> > > + *(___srcu_struct_ptrs) \ > >> >> >> >> >> > >> >> > > + __end___srcu_struct = .; \ > >> >> >> >> >> > >> >> > > } \ > >> >> >> >> >> > >> >> > > >> >> >> >> >> > >> >> > This vmlinux linker modification is not needed. I tested without it and srcu > >> >> >> >> >> > >> >> > torture works fine with rcutorture built as a module. Putting further prints > >> >> >> >> >> > >> >> > in kernel/module.c verified that the kernel is able to find the srcu structs > >> >> >> >> >> > >> >> > just fine. You could squash the below patch into this one or apply it on top > >> >> >> >> >> > >> >> > of the dev branch. > >> >> >> >> >> > >> >> > >> >> >> >> >> > >> >> Good point, given that otherwise FORTRAN named common blocks would not > >> >> >> >> >> > >> >> work. > >> >> >> >> >> > >> >> > >> >> >> >> >> > >> >> But isn't one advantage of leaving that stuff in the RO_DATA_SECTION() > >> >> >> >> >> > >> >> macro that it can be mapped read-only? Or am I suffering from excessive > >> >> >> >> >> > >> >> optimism? > >> >> >> >> >> > >> > > >> >> >> >> >> > >> > And to answer the other question, in the case where I am suffering from > >> >> >> >> >> > >> > excessive optimism, it should be a separate commit. Please see below > >> >> >> >> >> > >> > for the updated original commit thus far. > >> >> >> >> >> > >> > > >> >> >> >> >> > >> > And may I have your Tested-by? > >> >> >> >> >> > >> > >> >> >> >> >> > >> Just to confirm: does the cleanup performed in the modules going > >> >> >> >> >> > >> notifier end up acting as a barrier first before freeing the memory ? > >> >> >> >> >> > >> If not, is it explicitly stated that a barrier must be issued before > >> >> >> >> >> > >> module unload ? > >> >> >> >> >> > >> > >> >> >> >> >> > > > >> >> >> >> >> > > You mean rcu_barrier? It is mentioned in the documentation that this is the > >> >> >> >> >> > > responsibility of the module writer to prevent delays for all modules. > >> >> >> >> >> > > >> >> >> >> >> > It's a srcu barrier yes. Considering it would be a barrier specific to the > >> >> >> >> >> > srcu domain within that module, I don't see how it would cause delays for > >> >> >> >> >> > "all" modules if we implicitly issue the barrier on module unload. What > >> >> >> >> >> > am I missing ? > >> >> >> >> >> > >> >> >> >> >> Yes you are right. I thought of this after I just sent my email. I think it > >> >> >> >> >> makes sense for srcu case to do and could avoid a class of bugs. > >> >> >> >> > > >> >> >> >> > If there are call_srcu() callbacks outstanding, the module writer still > >> >> >> >> > needs the srcu_barrier() because otherwise callbacks arrive after > >> >> >> >> > the module text has gone, which will be disappoint the CPU when it > >> >> >> >> > tries fetching instructions that are no longer mapped. If there are > >> >> >> >> > no call_srcu() callbacks from that module, then there is no need for > >> >> >> >> > srcu_barrier() either way. > >> >> >> >> > > >> >> >> >> > So if an srcu_barrier() is needed, the module developer needs to > >> >> >> >> > supply it. > >> >> >> >> > >> >> >> >> When you say "callbacks arrive after the module text has gone", > >> >> >> >> I think you assume that free_module() is invoked before the > >> >> >> >> MODULE_STATE_GOING notifiers are called. But it's done in the > >> >> >> >> opposite order: going notifiers are called first, and then > >> >> >> >> free_module() is invoked. > >> >> >> >> > >> >> >> >> So AFAIU it would be safe to issue the srcu_barrier() from the module > >> >> >> >> going notifier. > >> >> >> >> > >> >> >> >> Or am I missing something ? > >> >> >> > > >> >> >> > We do seem to be talking past each other. ;-) > >> >> >> > > >> >> >> > This has nothing to do with the order of events at module-unload time. > >> >> >> > > >> >> >> > So please let me try again. > >> >> >> > > >> >> >> > If a given srcu_struct in a module never has call_srcu() invoked, there > >> >> >> > is no need to invoke rcu_barrier() at any time, whether at module-unload > >> >> >> > time or not. Adding rcu_barrier() in this case adds overhead and latency > >> >> >> > for no good reason. > >> >> >> > >> >> >> Not if we invoke srcu_barrier() for that specific domain. If > >> >> >> call_srcu was never invoked for a srcu domain, I don't see why > >> >> >> srcu_barrier() should be more expensive than a simple check that > >> >> >> the domain does not have any srcu work queued. > >> >> > > >> >> > But that simple check does involve a cache miss for each possible CPU (not > >> >> > just each online CPU), so it is non-trivial, especially on large systems. > >> >> > > >> >> >> > If a given srcu_struct in a module does have at least one call_srcu() > >> >> >> > invoked, it is already that module's responsibility to make sure that > >> >> >> > the code sticks around long enough for the callback to be invoked. > >> >> >> > >> >> >> I understand that when users do explicit dynamic allocation/cleanup of > >> >> >> srcu domains, they indeed need to take care of doing explicit srcu_barrier(). > >> >> >> However, if they do static definition of srcu domains, it would be nice > >> >> >> if we can handle the barriers under the hood. > >> >> > > >> >> > All else being equal, of course. But... > >> >> > > >> >> >> > This means that correct SRCU users that invoke call_srcu() already > >> >> >> > have srcu_barrier() at module-unload time. Incorrect SRCU users, with > >> >> >> > reasonable probability, now get a WARN_ON() at module-unload time, with > >> >> >> > the per-CPU state getting leaked. Before this change, they would (also > >> >> >> > with reasonable probability) instead get an instruction-fetch fault when > >> >> >> > the SRCU callback was invoked after the completion of the module unload. > >> >> >> > Furthermore, in all cases where they would previously have gotten the > >> >> >> > instruction-fetch fault, they now get the WARN_ON(), like this: > >> >> >> > > >> >> >> > if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist))) > >> >> >> > return; /* Forgot srcu_barrier(), so just leak it! */ > >> >> >> > > >> >> >> > So this change already represents an improvement in usability. > >> >> >> > >> >> >> Considering that we can do a srcu_barrier() for the specific domain, > >> >> >> and that it should add no noticeable overhead if there is no queued > >> >> >> callbacks, I don't see a good reason for leaving the srcu_barrier > >> >> >> invocation to the user rather than implicitly doing it from the > >> >> >> module going notifier. > >> >> > > >> >> > Now, I could automatically add an indicator of whether or not a > >> >> > call_srcu() had happened, but then again, that would either add a > >> >> > call_srcu() scalability bottleneck or again require a scan of all possible > >> >> > CPUs... to figure out if it was necessary to scan all possible CPUs. > >> >> > > >> >> > Or is scanning all possible CPUs down in the noise in this case? Or > >> >> > am I missing a trick that would reduce the overhead? > >> >> > >> >> Module unloading implicitly does a synchronize_rcu (for RCU-sched), and > >> >> a stop_machine. So I would be tempted to say that overhead of iteration > >> >> over all CPUs might not matter that much considering the rest. > >> >> > >> >> About notifying that a call_srcu has happened for the srcu domain in a > >> >> scalable fashion, let's see... We could have a flag "call_srcu_used" > >> >> for each call_srcu domain. Whenever call_srcu is invoked, it would > >> >> load that flag. It sets it on first use. > >> >> > >> >> The idea here is to only use that flag when srcu_barrier is performed > >> >> right before the srcu domain cleanup (it could become part of that > >> >> cleanup). Else, using it in all srcu_barrier() might be tricky, because > >> >> we may then need to add memory barriers or locking to the call_srcu > >> >> fast-path, which is an overhead we try to avoid. > >> >> > >> >> However, if we only use that flag as part of the srcu domain cleanup, > >> >> it's already prohibited to invoke call_srcu concurrently with the > >> >> cleanup of the same domain, so I don't think we would need any > >> >> memory barriers in call_srcu. > >> > > >> > About the last part of your email, it seems to that if after call_srcu has > >> > returned, if the module could be unloaded on some other CPU - then it would > >> > need to see the flag stored by the preceding call_srcu, so I believe there > >> > would be a memory barrier between the two opreations (call_srcu and module > >> > unload). > >> > >> In order for the module unload not to race against module execution, it needs > >> to happen after the call_srcu in a way that is already ordered by other means, > >> else module unload races against the module code. > >> > >> > > >> > Also about doing the unconditional srcu_barrier, since a module could be > >> > unloaded at any time - don't all SRCU using modules need to invoke > >> > srcu_barrier() during their clean up anyway so we are incurring the barrier > >> > overhead anyway? Or, am I missing a design pattern here? It seems to me > >> > rcutorture module definitely calls srcu_barrier() before it is unloaded. > >> > >> I think a valid approach which is even simpler might be: if a module statically > >> defines a SRCU domain, it should be expected to use it. So adding a > >> srcu_barrier() > >> to its module going notifier should not hurt. The rare case where a module > >> defines > >> a static SRCU domain *and* does not actually use it with call_srcu() does not > >> seem that usual, and not worth optimizing for. > >> > >> Thoughts ? > > > > Most SRCU users use only synchronize_srcu(), and don't ever use > > call_srcu(). Which is not too surprising given that call_srcu() showed > > up late in the game. > > > > But something still bothers me about this, and I am not yet sure > > what. One thing that seems to reduce anxiety somewhat is doing the > > srcu_barrier() on all calls to cleanup_srcu_struct() rather than just > > those invoked from the modules infrastructure, but I don't see why at > > the moment. > > Indeed, providing similar guarantees for the dynamic allocation case > would be nice. > > The one thing that is making me anxious here is use-cases where > users would decide to chain their call_srcu(). Then they would > need as many srcu_barrier() as chain hops. This would be a valid > reason for leaving invocation of srcu_barrier() to the user and > not hide it under the hood. > > Thoughts ? The current state is not horrible, so my thought would be to give it some time to see if better thoughts arise. Either way, cleanup_srcu_struct() keeps its current checks for callbacks still being in flight, which is why I believe that the current state is not horrible. ;-) Thanx, Paul