Received: by 10.223.176.46 with SMTP id f43csp1776859wra; Wed, 24 Jan 2018 23:28:30 -0800 (PST) X-Google-Smtp-Source: AH8x227pBMANOj1yNwgDMNkezNSnf7J5dFd0Wy6942eXRRCdDd4LaTI5rhDUL5ltVGdHCiF4Cf2a X-Received: by 10.98.135.76 with SMTP id i73mr15016076pfe.183.1516865310473; Wed, 24 Jan 2018 23:28:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516865310; cv=none; d=google.com; s=arc-20160816; b=N5SHsPj3EhlSOV8PKLE4eTd4z3bDRUm7/oL7FOaOeeJpksqESzBPlNMkCbNrStlbOC TVT3Nu6YIawbORTYcEl8TgicLVBIyAAU3xEeZKZSq5N/QBxU/4w6qvrsugkiNXBNRjSJ BAtkGAyGEJtdISCFEFUPPTNNFZYPnH2QTOyQ82TZd4Oz8BhZyLq/P3PnPTq38J5MBzLk 9zQfbvKPgaVvOW53U6PuSUEYQYr/2+w1BqycMIx6T9VIvgdSWYeJpI0DjWLoE03MXl26 2OJJc9IaUCtd+CWDptXwm2sndwG/ONwzJHI/uRKWuIVBBRHs3EcfXc0hD5ZHpJvn+jYn aPCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=JQx0oUllRFd5hwQgfBYMtNTI6ZyndqW6zxeKPY8O2G0=; b=IZw33KgZEr4udEcYjk1Kr3AQsbcjqy20mMNA7eXUDA0RzCfvXPea9qiCEn8leBzJ18 WinvI3gAneAhRWd0xazXJnYcYtn307IdUYAQJSyU1PBXWgOQPeZRDGULVbnDyV4hxwZF SsYoFJzYwtnarp08uh9NunwARFaKw4evkKLA5DGk9ectBDTkStlXSnOgw51uI40azV3Q DlIMCFhBVVxpKu/NQoVOEKGnG5XxXx8AImDYaOcw8SQ2Uu6fKUe52arNFx1uLeWOGcFT jhMENf/DKJ+7x2og50ly1ulT0aHutH8wHvpQELZZnLxzM3xnrYuWjdGpuAjCKxmfJl/4 Z4OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oxpYxBYY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8si1214191pgp.160.2018.01.24.23.28.15; Wed, 24 Jan 2018 23:28:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oxpYxBYY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751227AbeAYH1u (ORCPT + 99 others); Thu, 25 Jan 2018 02:27:50 -0500 Received: from mail-ot0-f196.google.com ([74.125.82.196]:47082 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750989AbeAYH1t (ORCPT ); Thu, 25 Jan 2018 02:27:49 -0500 Received: by mail-ot0-f196.google.com with SMTP id t35so5903366otd.13 for ; Wed, 24 Jan 2018 23:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=JQx0oUllRFd5hwQgfBYMtNTI6ZyndqW6zxeKPY8O2G0=; b=oxpYxBYY80fPv+rrxxXWdJ9rY6YewPZPKwrKH7bUNhKCS3AO3H/Z12oqrnnReQUODq lt+pHHGX+0hJCz0YL2N1iUfLkqCnS/KSXJYGz6qBPORikVoBOSYA/ELZJs47pE3Gnl3R G2zSni0OTffTpvXzUXhi5LdXLvDQzmfjBs4RCTjpZLpw6YN8y6MuUAMSWUp7HrXVqG/9 1+rJ/ngPhYSigHTOqVSECw6pPZ6CsulG5q4tR2jJppK3GegaMx2uJzjx3jkwZd3AO3Ay 0xZzsujLF48jsYl1ThfNEKdef+KuWBKz5pdypJyMpLLENtZm/FRoRAsC0H0gbDOCcBVN 5BuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=JQx0oUllRFd5hwQgfBYMtNTI6ZyndqW6zxeKPY8O2G0=; b=grX4K7bT8iEes/vom1M9kMIIy4sjLpOXwg/qbauy13GNaHBExsCfo0wmDJpsL/KjEP 2jqw6DV0XaznZLjuanKhKQzwyZ2SDz4gClm48M5ioftjaI7K8zuYgln0M8vXVv8L7m0e p1lx5ojjw/igFnDyPOjtX2e3ewZTq6RvTfoFyQZlHtwOtTwnq8huMfbMM44BdshRs7I4 FzPnW6m8Ufhaai333YzalRfluY0xyqZP4UxCeEPTCpQwyOV756CQLU2wvcV0ZLTMXDbJ wuq/r9S9kgsDRjZU4e2tnhydGBCnZIjSAK4Kiz4bgKYDaEs0310hfAAkn3tG/XMM+1cF a1iQ== X-Gm-Message-State: AKwxytdaoIay6vMoQrA4gZd7ppZ+VQwq28MUtEYRrmw3DH+Ah6eVFXS+ lu64wbQdmhS3y76UqDgf1gk= X-Received: by 10.157.17.151 with SMTP id v23mr11727406otf.330.1516865268525; Wed, 24 Jan 2018 23:27:48 -0800 (PST) Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com. [66.111.4.228]) by smtp.gmail.com with ESMTPSA id m23sm1300485otf.18.2018.01.24.23.27.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jan 2018 23:27:47 -0800 (PST) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 7A65E20CF9; Thu, 25 Jan 2018 02:27:46 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute6.internal (MEProxy); Thu, 25 Jan 2018 02:27:46 -0500 X-ME-Sender: Received: from localhost (unknown [45.32.128.109]) by mail.messagingengine.com (Postfix) with ESMTPA id 8D85E7E13A; Thu, 25 Jan 2018 02:27:45 -0500 (EST) Date: Thu, 25 Jan 2018 15:30:33 +0800 From: Boqun Feng To: "Paul E. McKenney" Cc: lianglihao@huawei.com, guohanjun@huawei.com, heng.z@huawei.com, hb.chen@huawei.com, lihao.liang@gmail.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation Message-ID: <20180125073033.4rl7bun62newplb3@tardis> References: <1516694381-20333-1-git-send-email-lianglihao@huawei.com> <1516694381-20333-2-git-send-email-lianglihao@huawei.com> <20180125061618.GU3741@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vaizlpdknh5hghcj" Content-Disposition: inline In-Reply-To: <20180125061618.GU3741@linux.vnet.ibm.com> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --vaizlpdknh5hghcj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 24, 2018 at 10:16:18PM -0800, Paul E. McKenney wrote: > On Tue, Jan 23, 2018 at 03:59:26PM +0800, lianglihao@huawei.com wrote: > > From: Heng Zhang > >=20 > > This RCU implementation (PRCU) is based on a fast consensus protocol > > published in the following paper: > >=20 > > Fast Consensus Using Bounded Staleness for Scalable Read-mostly Synchro= nization. > > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. > > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016. > > https://dl.acm.org/citation.cfm?id=3D3024114.3024143 > >=20 > > Signed-off-by: Heng Zhang > > Signed-off-by: Lihao Liang >=20 > A few comments and questions interspersed. >=20 > Thanx, Paul >=20 > > --- > > include/linux/prcu.h | 37 +++++++++++++++ > > kernel/rcu/Makefile | 2 +- > > kernel/rcu/prcu.c | 125 +++++++++++++++++++++++++++++++++++++++++++= ++++++++ > > kernel/sched/core.c | 2 + > > 4 files changed, 165 insertions(+), 1 deletion(-) > > create mode 100644 include/linux/prcu.h > > create mode 100644 kernel/rcu/prcu.c > >=20 > > diff --git a/include/linux/prcu.h b/include/linux/prcu.h > > new file mode 100644 > > index 00000000..653b4633 > > --- /dev/null > > +++ b/include/linux/prcu.h > > @@ -0,0 +1,37 @@ > > +#ifndef __LINUX_PRCU_H > > +#define __LINUX_PRCU_H > > + > > +#include > > +#include > > +#include > > + > > +#define CONFIG_PRCU > > + > > +struct prcu_local_struct { > > + unsigned int locked; > > + unsigned int online; > > + unsigned long long version; > > +}; > > + > > +struct prcu_struct { > > + atomic64_t global_version; > > + atomic_t active_ctr; > > + struct mutex mtx; > > + wait_queue_head_t wait_q; > > +}; > > + > > +#ifdef CONFIG_PRCU > > +void prcu_read_lock(void); > > +void prcu_read_unlock(void); > > +void synchronize_prcu(void); > > +void prcu_note_context_switch(void); > > + > > +#else /* #ifdef CONFIG_PRCU */ > > + > > +#define prcu_read_lock() do {} while (0) > > +#define prcu_read_unlock() do {} while (0) > > +#define synchronize_prcu() do {} while (0) > > +#define prcu_note_context_switch() do {} while (0) >=20 > If CONFIG_PRCU=3Dn and some code is built that uses PRCU, shouldn't you > get a build error rather than an error-free but inoperative PRCU? >=20 > Of course, Peter's question about purpose of the patch set applies > here as well. >=20 > > + > > +#endif /* #ifdef CONFIG_PRCU */ > > +#endif /* __LINUX_PRCU_H */ > > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile > > index 23803c7d..8791419c 100644 > > --- a/kernel/rcu/Makefile > > +++ b/kernel/rcu/Makefile > > @@ -2,7 +2,7 @@ > > # and is generally not a function of system call inputs. > > KCOV_INSTRUMENT :=3D n > >=20 > > -obj-y +=3D update.o sync.o > > +obj-y +=3D update.o sync.o prcu.o > > obj-$(CONFIG_CLASSIC_SRCU) +=3D srcu.o > > obj-$(CONFIG_TREE_SRCU) +=3D srcutree.o > > obj-$(CONFIG_TINY_SRCU) +=3D srcutiny.o > > diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c > > new file mode 100644 > > index 00000000..a00b9420 > > --- /dev/null > > +++ b/kernel/rcu/prcu.c > > @@ -0,0 +1,125 @@ > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local); > > + > > +struct prcu_struct global_prcu =3D { > > + .global_version =3D ATOMIC64_INIT(0), > > + .active_ctr =3D ATOMIC_INIT(0), > > + .mtx =3D __MUTEX_INITIALIZER(global_prcu.mtx), > > + .wait_q =3D __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q) > > +}; > > +struct prcu_struct *prcu =3D &global_prcu; > > + > > +static inline void prcu_report(struct prcu_local_struct *local) > > +{ > > + unsigned long long global_version; > > + unsigned long long local_version; > > + > > + global_version =3D atomic64_read(&prcu->global_version); > > + local_version =3D local->version; > > + if (global_version > local_version) > > + cmpxchg(&local->version, local_version, global_version); > > +} > > + > > +void prcu_read_lock(void) > > +{ > > + struct prcu_local_struct *local; > > + > > + local =3D get_cpu_ptr(&prcu_local); > > + if (!local->online) { > > + WRITE_ONCE(local->online, 1); > > + smp_mb(); > > + } > > + > > + local->locked++; > > + put_cpu_ptr(&prcu_local); > > +} > > +EXPORT_SYMBOL(prcu_read_lock); > > + > > +void prcu_read_unlock(void) > > +{ > > + int locked; > > + struct prcu_local_struct *local; > > + > > + barrier(); > > + local =3D get_cpu_ptr(&prcu_local); > > + locked =3D local->locked; > > + if (locked) { > > + local->locked--; > > + if (locked =3D=3D 1) > > + prcu_report(local); >=20 > Is ordering important here? It looks to me that the compiler could > rearrange some of the accesses within prcu_report() with the local->locked > decrement. There appears to be some potential for load and store tearing, > though perhaps you have verified that your compiler avoids this on > the architecture that you are using. >=20 > > + put_cpu_ptr(&prcu_local); > > + } else { >=20 > Hmmm... We get here if the RCU read-side critical section was preempted. > If none of them are preempted, ->active_ctr remains zero. >=20 > > + put_cpu_ptr(&prcu_local); > > + if (!atomic_dec_return(&prcu->active_ctr)) > > + wake_up(&prcu->wait_q); > > + } > > +} > > +EXPORT_SYMBOL(prcu_read_unlock); > > + > > +static void prcu_handler(void *info) > > +{ > > + struct prcu_local_struct *local; > > + > > + local =3D this_cpu_ptr(&prcu_local); > > + if (!local->locked) And I think a smp_mb() is needed here, because in the following case: CPU 0 CPU 1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D {X is initially 0} WRITE_ONCE(X, 1); prcu_read_unlock(void): if (locked) { synchronize_prcu(void): ... local->locked--; # switch to IPI WRITE_ONCE(local->version,....) r1 =3D READ_ONCE(X); r1 could be 0, which breaks RCU guarantees. > > + WRITE_ONCE(local->version, atomic64_read(&prcu->global_version)); > > +} > > + > > +void synchronize_prcu(void) > > +{ > > + int cpu; > > + cpumask_t cpus; > > + unsigned long long version; > > + struct prcu_local_struct *local; > > + > > + version =3D atomic64_add_return(1, &prcu->global_version); > > + mutex_lock(&prcu->mtx); > > + > > + local =3D get_cpu_ptr(&prcu_local); > > + local->version =3D version; > > + put_cpu_ptr(&prcu_local); > > + > > + cpumask_clear(&cpus); > > + for_each_possible_cpu(cpu) { > > + local =3D per_cpu_ptr(&prcu_local, cpu); > > + if (!READ_ONCE(local->online)) > > + continue; > > + if (READ_ONCE(local->version) < version) { >=20 > On 32-bit systems, given that ->version is long long, you might see > load tearing. And on some 32-bit systems, the cmpxchg() in prcu_hander() > might not build. >=20 /me curious about why an atomic64_t is used here for global version. I think maybe 32bit global version still suffices. Regards, Boqun > Or is the idea that only prcu_handler() updates ->version? But in that > case, you wouldn't need the READ_ONCE() above. What am I missing here? >=20 > > + smp_call_function_single(cpu, prcu_handler, NULL, 0); > > + cpumask_set_cpu(cpu, &cpus); > > + } > > + } > > + > > + for_each_cpu(cpu, &cpus) { > > + local =3D per_cpu_ptr(&prcu_local, cpu); > > + while (READ_ONCE(local->version) < version) >=20 > This ->version read can also tear on some 32-bit systems, and this > one most definitely can race with the prcu_handler() above. Does the > algorithm operate correctly in that case? (It doesn't look that way > to me, but I might be missing something.) Or are 32-bit systems excluded? >=20 > > + cpu_relax(); > > + } >=20 > I might be missing something, but I believe we need a memory barrier > here on non-TSO systems. Without that, couldn't we miss a preemption? >=20 > > + > > + if (atomic_read(&prcu->active_ctr)) > > + wait_event(prcu->wait_q, !atomic_read(&prcu->active_ctr)); > > + > > + mutex_unlock(&prcu->mtx); > > +} > > +EXPORT_SYMBOL(synchronize_prcu); > > + > > +void prcu_note_context_switch(void) > > +{ > > + struct prcu_local_struct *local; > > + > > + local =3D get_cpu_ptr(&prcu_local); > > + if (local->locked) { > > + atomic_add(local->locked, &prcu->active_ctr); > > + local->locked =3D 0; > > + } > > + local->online =3D 0; > > + prcu_report(local); > > + put_cpu_ptr(&prcu_local); > > +} > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 326d4f88..a308581b 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -15,6 +15,7 @@ > > #include > > #include > > #include > > +#include > >=20 > > #include > > #include > > @@ -3383,6 +3384,7 @@ static void __sched notrace __schedule(bool preem= pt) > >=20 > > local_irq_disable(); > > rcu_note_context_switch(preempt); > > + prcu_note_context_switch(); > >=20 > > /* > > * Make sure that signal_pending_state()->signal_pending() below > > --=20 > > 2.14.1.729.g59c0ea183 > >=20 >=20 --vaizlpdknh5hghcj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAlpph4YACgkQSXnow7UH +rgLNggApGRdEu7tYB1wsnbaCWba77pJfR+iW6g2j1W1bKvW/PGJ/2r5R67c9zTG fWmYOr5wLEFc2rQODWaANkyN/NVuXrZ5JwEzh28unx8Rpko3p4bp2KJ7J7Kmfe/K +RmxW8CvdaxhuQKI+JhWV8u9rc0bjJz5CIASEWQwNphd/jP0cEAQsfy8boEgIlcC xg5VGbPla06E79TexncB0fmqIYJJtTZQpZ5Gd5vj/ZruzJH4kIcwC/mWxf1TtdgB oDuoSA9veOSDDKD390GzMJXbsF8M+8aZODkNolImjwCU1rro7mN6ZCBlyYsFYaU4 5Gm/imgWtTpViUf03E0e//f6Yxb9dQ== =DrHH -----END PGP SIGNATURE----- --vaizlpdknh5hghcj--