Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3638251ybi; Sun, 2 Jun 2019 19:49:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqxX6psJmGFoy0ciazn9s4CZCIZkcjuRLWGHwK+OMxrDKLUHMhtRex8FcbKs8izCmNFRa56j X-Received: by 2002:a17:902:112c:: with SMTP id d41mr26536681pla.33.1559530167423; Sun, 02 Jun 2019 19:49:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559530167; cv=none; d=google.com; s=arc-20160816; b=pL2rrreO5bMZ01bCd2mQ/5OrR6JdtazUtIrRg9eOyyMc56mPckZ2/IKEsqg5tb//w4 V+AGs4oFBQRjPTZQp/pvA08mNN1APoXe3hjOieBsnL9sFX1gvWekX7pVWfcKKh7ClSUn 7nuPJ8sx4dhloN3AWSVV9PjIKtpIFDo7SkvNjZsmqV29/AxprXPVx9SWnjjQhMcP6tz3 yWqx4F+v0j49sYWqtYTxNt1cmRg5l8GrkcG+1OJ/2bFPPcxuy3qS3kdAD+Om8Q0L87c5 Uzs8Q6au7V6hbueK4vsrYD2WA5GNgCtlayM85tBJswE8wYvaOyUYqZLWicLGecAwKBG+ dUOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=o8u3nvQ0Y5uZb5zih8sq9U+UFy5YeXAP0bqfnSWaKVE=; b=TYtdPRVq0Dj5H9erdsXh/JkV5gYa8kkZsNflXxyjKYjzGkgQWZupsdzzHkO5fuOIFS xtxjTBMj/VWEEav39X8/hBg6Hmv+4bAzGhVQ59zjIfy8t/SBHsTzS1o3NtsQXH/X1Eay VTcCIZLOlfKo38BTDN/xhZva9OFz0drUEn7kHAeeCsZrz/pIF9EPBNL9o8xsFilP/4+t 8+7MB/ADtgEFFVxqbLCpIzIeQjC/Q1zpNVPgpopsuQpgb6f79luh8Qv/LggRmwNfQ+xy bCBIrrbBGkXoo4TPHD4zRd0J9kiB3NOXJFUCqY4Jq7Yr4NIyEiLQMqYnvObz9u0EuENU TIQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x10si3830644pfm.71.2019.06.02.19.49.11; Sun, 02 Jun 2019 19:49:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726899AbfFCCqy (ORCPT + 99 others); Sun, 2 Jun 2019 22:46:54 -0400 Received: from helcar.hmeau.com ([216.24.177.18]:44096 "EHLO deadmen.hmeau.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726270AbfFCCqx (ORCPT ); Sun, 2 Jun 2019 22:46:53 -0400 Received: from gondobar.mordor.me.apana.org.au ([192.168.128.4] helo=gondobar) by deadmen.hmeau.com with esmtps (Exim 4.89 #2 (Debian)) id 1hXczn-0007pq-Vp; Mon, 03 Jun 2019 10:46:48 +0800 Received: from herbert by gondobar with local (Exim 4.89) (envelope-from ) id 1hXczg-0001y1-8M; Mon, 03 Jun 2019 10:46:40 +0800 Date: Mon, 3 Jun 2019 10:46:40 +0800 From: Herbert Xu To: Linus Torvalds Cc: "Paul E. McKenney" , Frederic Weisbecker , Boqun Feng , Fengguang Wu , LKP , LKML , Netdev , "David S. Miller" Subject: Re: rcu_read_lock lost its compiler barrier Message-ID: <20190603024640.2soysu4rpkwjuash@gondor.apana.org.au> References: <20150910005708.GA23369@wfg-t540p.sh.intel.com> <20150910102513.GA1677@fixme-laptop.cn.ibm.com> <20150910171649.GE4029@linux.vnet.ibm.com> <20150911021933.GA1521@fixme-laptop.cn.ibm.com> <20150921193045.GA13674@lerouge> <20150921204327.GH4029@linux.vnet.ibm.com> <20190602055607.bk5vgmwjvvt4wejd@gondor.apana.org.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 02, 2019 at 01:54:12PM -0700, Linus Torvalds wrote: > On Sat, Jun 1, 2019 at 10:56 PM Herbert Xu wrote: > > > > You can't then go and decide to remove the compiler barrier! To do > > that you'd need to audit every single use of rcu_read_lock in the > > kernel to ensure that they're not depending on the compiler barrier. > > What's the possible case where it would matter when there is no preemption? The case we were discussing is from net/ipv4/inet_fragment.c from the net-next tree: void fqdir_exit(struct fqdir *fqdir) { ... fqdir->dead = true; /* call_rcu is supposed to provide memory barrier semantics, * separating the setting of fqdir->dead with the destruction * work. This implicit barrier is paired with inet_frag_kill(). */ INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn); queue_rcu_work(system_wq, &fqdir->destroy_rwork); } and void inet_frag_kill(struct inet_frag_queue *fq) { ... rcu_read_lock(); /* The RCU read lock provides a memory barrier * guaranteeing that if fqdir->dead is false then * the hash table destruction will not start until * after we unlock. Paired with inet_frags_exit_net(). */ if (!fqdir->dead) { rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params); ... } ... rcu_read_unlock(); ... } I simplified this to Initial values: a = 0 b = 0 CPU1 CPU2 ---- ---- a = 1 rcu_read_lock synchronize_rcu if (a == 0) b = 2 b = 1 rcu_read_unlock On exit we want this to be true: b == 2 Now what Paul was telling me is that unless every memory operation is done with READ_ONCE/WRITE_ONCE then his memory model shows that the exit constraint won't hold. IOW, we need CPU1 CPU2 ---- ---- WRITE_ONCE(a, 1) rcu_read_lock synchronize_rcu if (READ_ONCE(a) == 0) WRITE_ONCE(b, 2) WRITE_ONCE(b, 1) rcu_read_unlock Now I think this bullshit because if we really needed these compiler barriers then we surely would need real memory barriers to go with them. In fact, the sole purpose of the RCU mechanism is to provide those memory barriers. Quoting from Documentation/RCU/Design/Requirements/Requirements.html:
  • Each CPU that has an RCU read-side critical section that begins before synchronize_rcu() starts is guaranteed to execute a full memory barrier between the time that the RCU read-side critical section ends and the time that synchronize_rcu() returns. Without this guarantee, a pre-existing RCU read-side critical section might hold a reference to the newly removed struct foo after the kfree() on line 14 of remove_gp_synchronous().
  • Each CPU that has an RCU read-side critical section that ends after synchronize_rcu() returns is guaranteed to execute a full memory barrier between the time that synchronize_rcu() begins and the time that the RCU read-side critical section begins. Without this guarantee, a later RCU read-side critical section running after the kfree() on line 14 of remove_gp_synchronous() might later run do_something_gp() and find the newly deleted struct foo. My review of the RCU code shows that these memory barriers are indeed present (at least when we're not in tiny mode where all this discussion would be moot anyway). For example, in call_rcu we eventually get down to rcu_segcblist_enqueue which has an smp_mb. On the reader side (correct me if I'm wrong Paul) the memory barrier is implicitly coming from the scheduler. My point is that within our kernel whenever we have a CPU memory barrier we always have a compiler barrier too. Therefore my code example above does not need any extra compiler barriers such as the ones provided by READ_ONCE/WRITE_ONCE. I think perhaps Paul was perhaps thinking that I'm expecting rcu_read_lock/rcu_read_unlock themselves to provide the memory or compiler barriers. That would indeed be wrong but this is not what I need. All I need is the RCU semantics as documented for there to be memory and compiler barriers around the whole grace period. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt