Received: by 10.223.176.46 with SMTP id f43csp1728881wra; Wed, 24 Jan 2018 22:31:44 -0800 (PST) X-Google-Smtp-Source: AH8x224FcRfHTY0pUJw2GwQ3eHcs1YXVvfQdtdiwmJxn/sEmkWdWO88yUVZAnScyfInWitcK8350 X-Received: by 10.98.63.214 with SMTP id z83mr14911928pfj.95.1516861903885; Wed, 24 Jan 2018 22:31:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516861903; cv=none; d=google.com; s=arc-20160816; b=KLY+m3G5ww1iAaaMLcrrEvg8pfBwZqdcRhOJ4vCsJDOY3YXvn+Rn5EysULf+csNVBm ssAeckYlJ+6D9wFgFnxbgurwjXuA1YeHheTtr2qes8dXmqTKNhL399eKLD4z6vVQtDi1 CIsP5/UCAKeGG41r9v5Wo9oraY0GMEX72pZxNrCAeKkLAtFAT/GVfAOen067yh58caYy FgkjvbrvlnXRti/CWwn9mHy5A9qWveyXYDRzy/964L+taNQMY5AH0ZySCaUYCTvzf1wu C9mgs3I8NbBnV1PHh8KGcR+bCoHlAFTDWh4ebM8H+JGXk/Gt1f2sY36gxLq1ir4gCts6 Urmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date:arc-authentication-results; bh=PYfHbStwLfbD90boQDo24yS5tEzOY/0+7a992fQlW8o=; b=OUbdhu8QTQyxjqbEGEbGy9BWkQ5eEReF3O0Jsm/3iWJwtfxF3ewnJbGEKnPnbj/0c3 nRR9TtX3xxeTRQxNpIsd475EjbA55Godm8bMZSf6MaWKlIA2sOgJg3fR/vHsYos+YtFp OwLvtquosSs55zGvjDroCuXH/jr1qj+ZorWQ28QItLqizIYfy2CGYaoTsXME9x7qO7fr zE26pFw5VUhRVJPPSPzJlgYEqCAfTG5qyPji0XBNJnmEAd6P7dnyYtpNTulzA6M+JWbr bjPGmeGPE8lJKhrzZLW0nO7kDDQoZRVYeLNvZgUjcimyb2lrt+q1XpWy++A8ShcXkZLk n0Vw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d186si169238pgc.583.2018.01.24.22.31.29; Wed, 24 Jan 2018 22:31:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751365AbeAYGam (ORCPT + 99 others); Thu, 25 Jan 2018 01:30:42 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35802 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751191AbeAYGaC (ORCPT ); Thu, 25 Jan 2018 01:30:02 -0500 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0P6TA0Z002078 for ; Thu, 25 Jan 2018 01:30:00 -0500 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fq4qr1und-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 25 Jan 2018 01:29:59 -0500 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Jan 2018 01:29:58 -0500 Received: from b01cxnp22034.gho.pok.ibm.com (9.57.198.24) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 25 Jan 2018 01:29:55 -0500 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0P6TskW52691006; Thu, 25 Jan 2018 06:29:54 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF508B2046; Thu, 25 Jan 2018 01:26:52 -0500 (EST) Received: from paulmck-ThinkPad-W541 (unknown [9.85.153.144]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP id 42640B204D; Thu, 25 Jan 2018 01:26:51 -0500 (EST) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id B671B16C3BC3; Wed, 24 Jan 2018 22:16:18 -0800 (PST) Date: Wed, 24 Jan 2018 22:16:18 -0800 From: "Paul E. McKenney" To: lianglihao@huawei.com Cc: guohanjun@huawei.com, heng.z@huawei.com, hb.chen@huawei.com, lihao.liang@gmail.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation Reply-To: paulmck@linux.vnet.ibm.com References: <1516694381-20333-1-git-send-email-lianglihao@huawei.com> <1516694381-20333-2-git-send-email-lianglihao@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516694381-20333-2-git-send-email-lianglihao@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18012506-0048-0000-0000-0000022C0239 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008423; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000247; SDB=6.00978112; UDB=6.00495877; IPR=6.00757813; BA=6.00005786; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019197; XFM=3.00000015; UTC=2018-01-25 06:29:57 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18012506-0049-0000-0000-000043E71054 Message-Id: <20180125061618.GU3741@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-25_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801250090 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 23, 2018 at 03:59:26PM +0800, lianglihao@huawei.com wrote: > From: Heng Zhang > > This RCU implementation (PRCU) is based on a fast consensus protocol > published in the following paper: > > Fast Consensus Using Bounded Staleness for Scalable Read-mostly Synchronization. > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016. > https://dl.acm.org/citation.cfm?id=3024114.3024143 > > Signed-off-by: Heng Zhang > Signed-off-by: Lihao Liang A few comments and questions interspersed. Thanx, Paul > --- > include/linux/prcu.h | 37 +++++++++++++++ > kernel/rcu/Makefile | 2 +- > kernel/rcu/prcu.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/core.c | 2 + > 4 files changed, 165 insertions(+), 1 deletion(-) > create mode 100644 include/linux/prcu.h > create mode 100644 kernel/rcu/prcu.c > > diff --git a/include/linux/prcu.h b/include/linux/prcu.h > new file mode 100644 > index 00000000..653b4633 > --- /dev/null > +++ b/include/linux/prcu.h > @@ -0,0 +1,37 @@ > +#ifndef __LINUX_PRCU_H > +#define __LINUX_PRCU_H > + > +#include > +#include > +#include > + > +#define CONFIG_PRCU > + > +struct prcu_local_struct { > + unsigned int locked; > + unsigned int online; > + unsigned long long version; > +}; > + > +struct prcu_struct { > + atomic64_t global_version; > + atomic_t active_ctr; > + struct mutex mtx; > + wait_queue_head_t wait_q; > +}; > + > +#ifdef CONFIG_PRCU > +void prcu_read_lock(void); > +void prcu_read_unlock(void); > +void synchronize_prcu(void); > +void prcu_note_context_switch(void); > + > +#else /* #ifdef CONFIG_PRCU */ > + > +#define prcu_read_lock() do {} while (0) > +#define prcu_read_unlock() do {} while (0) > +#define synchronize_prcu() do {} while (0) > +#define prcu_note_context_switch() do {} while (0) If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you get a build error rather than an error-free but inoperative PRCU? Of course, Peter's question about purpose of the patch set applies here as well. > + > +#endif /* #ifdef CONFIG_PRCU */ > +#endif /* __LINUX_PRCU_H */ > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile > index 23803c7d..8791419c 100644 > --- a/kernel/rcu/Makefile > +++ b/kernel/rcu/Makefile > @@ -2,7 +2,7 @@ > # and is generally not a function of system call inputs. > KCOV_INSTRUMENT := n > > -obj-y += update.o sync.o > +obj-y += update.o sync.o prcu.o > obj-$(CONFIG_CLASSIC_SRCU) += srcu.o > obj-$(CONFIG_TREE_SRCU) += srcutree.o > obj-$(CONFIG_TINY_SRCU) += srcutiny.o > diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c > new file mode 100644 > index 00000000..a00b9420 > --- /dev/null > +++ b/kernel/rcu/prcu.c > @@ -0,0 +1,125 @@ > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local); > + > +struct prcu_struct global_prcu = { > + .global_version = ATOMIC64_INIT(0), > + .active_ctr = ATOMIC_INIT(0), > + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx), > + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q) > +}; > +struct prcu_struct *prcu = &global_prcu; > + > +static inline void prcu_report(struct prcu_local_struct *local) > +{ > + unsigned long long global_version; > + unsigned long long local_version; > + > + global_version = atomic64_read(&prcu->global_version); > + local_version = local->version; > + if (global_version > local_version) > + cmpxchg(&local->version, local_version, global_version); > +} > + > +void prcu_read_lock(void) > +{ > + struct prcu_local_struct *local; > + > + local = get_cpu_ptr(&prcu_local); > + if (!local->online) { > + WRITE_ONCE(local->online, 1); > + smp_mb(); > + } > + > + local->locked++; > + put_cpu_ptr(&prcu_local); > +} > +EXPORT_SYMBOL(prcu_read_lock); > + > +void prcu_read_unlock(void) > +{ > + int locked; > + struct prcu_local_struct *local; > + > + barrier(); > + local = get_cpu_ptr(&prcu_local); > + locked = local->locked; > + if (locked) { > + local->locked--; > + if (locked == 1) > + prcu_report(local); Is ordering important here? It looks to me that the compiler could rearrange some of the accesses within prcu_report() with the local->locked decrement. There appears to be some potential for load and store tearing, though perhaps you have verified that your compiler avoids this on the architecture that you are using. > + put_cpu_ptr(&prcu_local); > + } else { Hmmm... We get here if the RCU read-side critical section was preempted. If none of them are preempted, ->active_ctr remains zero. > + put_cpu_ptr(&prcu_local); > + if (!atomic_dec_return(&prcu->active_ctr)) > + wake_up(&prcu->wait_q); > + } > +} > +EXPORT_SYMBOL(prcu_read_unlock); > + > +static void prcu_handler(void *info) > +{ > + struct prcu_local_struct *local; > + > + local = this_cpu_ptr(&prcu_local); > + if (!local->locked) > + WRITE_ONCE(local->version, atomic64_read(&prcu->global_version)); > +} > + > +void synchronize_prcu(void) > +{ > + int cpu; > + cpumask_t cpus; > + unsigned long long version; > + struct prcu_local_struct *local; > + > + version = atomic64_add_return(1, &prcu->global_version); > + mutex_lock(&prcu->mtx); > + > + local = get_cpu_ptr(&prcu_local); > + local->version = version; > + put_cpu_ptr(&prcu_local); > + > + cpumask_clear(&cpus); > + for_each_possible_cpu(cpu) { > + local = per_cpu_ptr(&prcu_local, cpu); > + if (!READ_ONCE(local->online)) > + continue; > + if (READ_ONCE(local->version) < version) { On 32-bit systems, given that ->version is long long, you might see load tearing. And on some 32-bit systems, the cmpxchg() in prcu_hander() might not build. Or is the idea that only prcu_handler() updates ->version? But in that case, you wouldn't need the READ_ONCE() above. What am I missing here? > + smp_call_function_single(cpu, prcu_handler, NULL, 0); > + cpumask_set_cpu(cpu, &cpus); > + } > + } > + > + for_each_cpu(cpu, &cpus) { > + local = per_cpu_ptr(&prcu_local, cpu); > + while (READ_ONCE(local->version) < version) This ->version read can also tear on some 32-bit systems, and this one most definitely can race with the prcu_handler() above. Does the algorithm operate correctly in that case? (It doesn't look that way to me, but I might be missing something.) Or are 32-bit systems excluded? > + cpu_relax(); > + } I might be missing something, but I believe we need a memory barrier here on non-TSO systems. Without that, couldn't we miss a preemption? > + > + if (atomic_read(&prcu->active_ctr)) > + wait_event(prcu->wait_q, !atomic_read(&prcu->active_ctr)); > + > + mutex_unlock(&prcu->mtx); > +} > +EXPORT_SYMBOL(synchronize_prcu); > + > +void prcu_note_context_switch(void) > +{ > + struct prcu_local_struct *local; > + > + local = get_cpu_ptr(&prcu_local); > + if (local->locked) { > + atomic_add(local->locked, &prcu->active_ctr); > + local->locked = 0; > + } > + local->online = 0; > + prcu_report(local); > + put_cpu_ptr(&prcu_local); > +} > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 326d4f88..a308581b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -3383,6 +3384,7 @@ static void __sched notrace __schedule(bool preempt) > > local_irq_disable(); > rcu_note_context_switch(preempt); > + prcu_note_context_switch(); > > /* > * Make sure that signal_pending_state()->signal_pending() below > -- > 2.14.1.729.g59c0ea183 >