Received: by 10.223.176.5 with SMTP id f5csp292085wra; Fri, 26 Jan 2018 23:36:26 -0800 (PST) X-Google-Smtp-Source: AH8x227qWKYgyk/qO70K1InDFe5AG9/bBk5o5DD7oN+wwQxDXVyo0e2lrGCS2Ku86l0zInR6rWxE X-Received: by 2002:a17:902:a711:: with SMTP id w17-v6mr3267656plq.299.1517038586218; Fri, 26 Jan 2018 23:36:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517038586; cv=none; d=google.com; s=arc-20160816; b=utBrfan+412rjlKai7EbkH9pSFSIpSlV4VHpOyBVZJF+tBVCkngk6yZdW0C1FAjqFr jFdOD/uX9xpeAgyE38wi3HaySmdhPwOjtSYUBJVux0t5CT05WL+0ArWvwxqFtf3BBI3C D3XDSyJ4qjK+ulhlWyMY/PmMdVHcPRIpB1XaLqGbK4KDw1Fc0g7EE+nv68gL1G/KrfAH B8wabJtrDNJ/rxDPsi0pn5gJ/BxdutpTs7GeB23wMCo092doSJCjLcjVr0BCNgMsupMU jU+OwJkp/ltAC1xKpj6+wou8Y7r/DU7BbcmfkoiB80ICIUR/IKvFHLEi5g7GlKtUHgjg +L6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=ZhEQBFcW0q++5N+yuEYyGNK5S4Rj2UDJg0tgACo73R8=; b=qIlhS6jAvva4BUSeKCUQ1LKxXOmdfo6VOpZa7sQiaHYYIyfuAfefM2RhsFn96pNpI2 BN7zdUpmqmyIz/wU7CQ6vzuAp0GDQY1QtRQNskDPCXRjA1m7GftOHcf+azqVnmfop04b YyT7kwipBwDdlSh9jzy3wLvl3vyRw+Gn5yQuUtX/QBB8ighmvqCENgJRWgZxjg9VA9jB BkDLRhk0s5oniVFJbb2P0QIJdXGXAMYv/1G49xpM1zfUCO3Wt9hfPGWqfbUhM+CFs8JU 6WDvogaYAygrMuEXd23DYYgwSih8g+lvtZP3d0oy9mbCYPe+NlmM/UOt3Ui3J74DRokF 5I4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kZQjU2rA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a10-v6si3878785plm.109.2018.01.26.23.35.58; Fri, 26 Jan 2018 23:36:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kZQjU2rA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751931AbeA0HfX (ORCPT + 99 others); Sat, 27 Jan 2018 02:35:23 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:35563 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751446AbeA0HfV (ORCPT ); Sat, 27 Jan 2018 02:35:21 -0500 Received: by mail-wr0-f195.google.com with SMTP id w50so2341878wrc.2 for ; Fri, 26 Jan 2018 23:35:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ZhEQBFcW0q++5N+yuEYyGNK5S4Rj2UDJg0tgACo73R8=; b=kZQjU2rAQlMAzotgqvJ12qDt9aVjJTFKK95LLzK9/YnoZh6bY3gDA7Q6/S+YdSaHds eTGJ/t9H8Gpm/iwIQwgEWaFu9VlRTPCll5/8g6YfhIaTN73YCRC3/kgznpO63YQUXtHz bgxeKYX0zZpcX80azVOuQKllJqtmTZoEM71LlgNTwCa+VuccdrP/g7yIYQw+5FNuxyVk 7AXQYLya1tpa1P6Tu0RCfjhZmDcih7LklHV7D7v26nB/IqM03C+qQ1zKFLtcYOD/RfTR fdcWYGwIy1ZsjvYQRmw1HP55HtEGamZ9IpH6wodLObtwBvH9lvY+2WzUFH+0zuafdQ7k di/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ZhEQBFcW0q++5N+yuEYyGNK5S4Rj2UDJg0tgACo73R8=; b=VM2r0Sq3zziUPTAafSErJBUyxv8ncDXB6VgcIiLcYt23whwRZz8/QX0NPTRl/LeHbl /BLndWSScl5vFr8yRFCb7yZO6ynPa213rDIRBSlO8Sdr7TyiVbQcP4JpQ1RfAzEkI5E8 RImNyVG6a1jDVUitcMSKyQ3N1tvYbdM//o93SzZsOWV8VtlJCMkxYETKodFjiCZTuor2 NWkLuKB6gX1YKIjTgljYEHpBrwSUx6glZS+eiMC1hKiH5i49NRyEiX5X9wcWdUivcQF5 WLQg8/AYoutM4g9ZIZxb/C4O/5/CpQU8mABUZmnIR/sgRCoBkMzG/QJjgBF2yzSiaDnK xESA== X-Gm-Message-State: AKwxytfo99BhxxPY2M0ctZ5IFkNvSRI2nNT4606VSYdw2V621Bz6aOsv FmEiR8zDafNpeI7lL+fXBRG9BDgxvE45UeBX1/A= X-Received: by 10.223.188.6 with SMTP id s6mr13272558wrg.14.1517038520576; Fri, 26 Jan 2018 23:35:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.175.53 with HTTP; Fri, 26 Jan 2018 23:35:20 -0800 (PST) In-Reply-To: <20180125061618.GU3741@linux.vnet.ibm.com> References: <1516694381-20333-1-git-send-email-lianglihao@huawei.com> <1516694381-20333-2-git-send-email-lianglihao@huawei.com> <20180125061618.GU3741@linux.vnet.ibm.com> From: Lihao Liang Date: Sat, 27 Jan 2018 07:35:20 +0000 Message-ID: Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation To: Paul McKenney , Peter Zijlstra Cc: "Guohanjun (Hanjun Guo)" , heng.z@huawei.com, hb.chen@huawei.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 25, 2018 at 6:16 AM, Paul E. McKenney wrote: > On Tue, Jan 23, 2018 at 03:59:26PM +0800, lianglihao@huawei.com wrote: >> From: Heng Zhang >> >> This RCU implementation (PRCU) is based on a fast consensus protocol >> published in the following paper: >> >> Fast Consensus Using Bounded Staleness for Scalable Read-mostly Synchronization. >> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. >> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016. >> https://dl.acm.org/citation.cfm?id=3024114.3024143 >> >> Signed-off-by: Heng Zhang >> Signed-off-by: Lihao Liang > > A few comments and questions interspersed. > > Thanx, Paul > >> --- >> include/linux/prcu.h | 37 +++++++++++++++ >> kernel/rcu/Makefile | 2 +- >> kernel/rcu/prcu.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> kernel/sched/core.c | 2 + >> 4 files changed, 165 insertions(+), 1 deletion(-) >> create mode 100644 include/linux/prcu.h >> create mode 100644 kernel/rcu/prcu.c >> >> diff --git a/include/linux/prcu.h b/include/linux/prcu.h >> new file mode 100644 >> index 00000000..653b4633 >> --- /dev/null >> +++ b/include/linux/prcu.h >> @@ -0,0 +1,37 @@ >> +#ifndef __LINUX_PRCU_H >> +#define __LINUX_PRCU_H >> + >> +#include >> +#include >> +#include >> + >> +#define CONFIG_PRCU >> + >> +struct prcu_local_struct { >> + unsigned int locked; >> + unsigned int online; >> + unsigned long long version; >> +}; >> + >> +struct prcu_struct { >> + atomic64_t global_version; >> + atomic_t active_ctr; >> + struct mutex mtx; >> + wait_queue_head_t wait_q; >> +}; >> + >> +#ifdef CONFIG_PRCU >> +void prcu_read_lock(void); >> +void prcu_read_unlock(void); >> +void synchronize_prcu(void); >> +void prcu_note_context_switch(void); >> + >> +#else /* #ifdef CONFIG_PRCU */ >> + >> +#define prcu_read_lock() do {} while (0) >> +#define prcu_read_unlock() do {} while (0) >> +#define synchronize_prcu() do {} while (0) >> +#define prcu_note_context_switch() do {} while (0) > > If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you > get a build error rather than an error-free but inoperative PRCU? > Very good point, thank you! > Of course, Peter's question about purpose of the patch set applies > here as well. > The main motivation of this patch set is the comparison results of rcuperf between PRCU and Tree RCU in which PRCU outperformed Tree RCU by a large margin. As indicated in your reply of the email in this patch series [PATCH RFC 00/16] A new RCU implementation based on a fast consensus protocol this may be a bug on either expedited RCU grace-period latency or on rcuperf's measurements. Many thanks, Lihao. >> + >> +#endif /* #ifdef CONFIG_PRCU */ >> +#endif /* __LINUX_PRCU_H */ >> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile >> index 23803c7d..8791419c 100644 >> --- a/kernel/rcu/Makefile >> +++ b/kernel/rcu/Makefile >> @@ -2,7 +2,7 @@ >> # and is generally not a function of system call inputs. >> KCOV_INSTRUMENT := n >> >> -obj-y += update.o sync.o >> +obj-y += update.o sync.o prcu.o >> obj-$(CONFIG_CLASSIC_SRCU) += srcu.o >> obj-$(CONFIG_TREE_SRCU) += srcutree.o >> obj-$(CONFIG_TINY_SRCU) += srcutiny.o >> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c >> new file mode 100644 >> index 00000000..a00b9420 >> --- /dev/null >> +++ b/kernel/rcu/prcu.c >> @@ -0,0 +1,125 @@ >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include >> + >> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local); >> + >> +struct prcu_struct global_prcu = { >> + .global_version = ATOMIC64_INIT(0), >> + .active_ctr = ATOMIC_INIT(0), >> + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx), >> + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q) >> +}; >> +struct prcu_struct *prcu = &global_prcu; >> + >> +static inline void prcu_report(struct prcu_local_struct *local) >> +{ >> + unsigned long long global_version; >> + unsigned long long local_version; >> + >> + global_version = atomic64_read(&prcu->global_version); >> + local_version = local->version; >> + if (global_version > local_version) >> + cmpxchg(&local->version, local_version, global_version); >> +} >> + >> +void prcu_read_lock(void) >> +{ >> + struct prcu_local_struct *local; >> + >> + local = get_cpu_ptr(&prcu_local); >> + if (!local->online) { >> + WRITE_ONCE(local->online, 1); >> + smp_mb(); >> + } >> + >> + local->locked++; >> + put_cpu_ptr(&prcu_local); >> +} >> +EXPORT_SYMBOL(prcu_read_lock); >> + >> +void prcu_read_unlock(void) >> +{ >> + int locked; >> + struct prcu_local_struct *local; >> + >> + barrier(); >> + local = get_cpu_ptr(&prcu_local); >> + locked = local->locked; >> + if (locked) { >> + local->locked--; >> + if (locked == 1) >> + prcu_report(local); > > Is ordering important here? It looks to me that the compiler could > rearrange some of the accesses within prcu_report() with the local->locked > decrement. There appears to be some potential for load and store tearing, > though perhaps you have verified that your compiler avoids this on > the architecture that you are using. > >> + put_cpu_ptr(&prcu_local); >> + } else { > > Hmmm... We get here if the RCU read-side critical section was preempted. > If none of them are preempted, ->active_ctr remains zero. > >> + put_cpu_ptr(&prcu_local); >> + if (!atomic_dec_return(&prcu->active_ctr)) >> + wake_up(&prcu->wait_q); >> + } >> +} >> +EXPORT_SYMBOL(prcu_read_unlock); >> + >> +static void prcu_handler(void *info) >> +{ >> + struct prcu_local_struct *local; >> + >> + local = this_cpu_ptr(&prcu_local); >> + if (!local->locked) >> + WRITE_ONCE(local->version, atomic64_read(&prcu->global_version)); >> +} >> + >> +void synchronize_prcu(void) >> +{ >> + int cpu; >> + cpumask_t cpus; >> + unsigned long long version; >> + struct prcu_local_struct *local; >> + >> + version = atomic64_add_return(1, &prcu->global_version); >> + mutex_lock(&prcu->mtx); >> + >> + local = get_cpu_ptr(&prcu_local); >> + local->version = version; >> + put_cpu_ptr(&prcu_local); >> + >> + cpumask_clear(&cpus); >> + for_each_possible_cpu(cpu) { >> + local = per_cpu_ptr(&prcu_local, cpu); >> + if (!READ_ONCE(local->online)) >> + continue; >> + if (READ_ONCE(local->version) < version) { > > On 32-bit systems, given that ->version is long long, you might see > load tearing. And on some 32-bit systems, the cmpxchg() in prcu_hander() > might not build. > > Or is the idea that only prcu_handler() updates ->version? But in that > case, you wouldn't need the READ_ONCE() above. What am I missing here? > >> + smp_call_function_single(cpu, prcu_handler, NULL, 0); >> + cpumask_set_cpu(cpu, &cpus); >> + } >> + } >> + >> + for_each_cpu(cpu, &cpus) { >> + local = per_cpu_ptr(&prcu_local, cpu); >> + while (READ_ONCE(local->version) < version) > > This ->version read can also tear on some 32-bit systems, and this > one most definitely can race with the prcu_handler() above. Does the > algorithm operate correctly in that case? (It doesn't look that way > to me, but I might be missing something.) Or are 32-bit systems excluded? > >> + cpu_relax(); >> + } > > I might be missing something, but I believe we need a memory barrier > here on non-TSO systems. Without that, couldn't we miss a preemption? > >> + >> + if (atomic_read(&prcu->active_ctr)) >> + wait_event(prcu->wait_q, !atomic_read(&prcu->active_ctr)); >> + >> + mutex_unlock(&prcu->mtx); >> +} >> +EXPORT_SYMBOL(synchronize_prcu); >> + >> +void prcu_note_context_switch(void) >> +{ >> + struct prcu_local_struct *local; >> + >> + local = get_cpu_ptr(&prcu_local); >> + if (local->locked) { >> + atomic_add(local->locked, &prcu->active_ctr); >> + local->locked = 0; >> + } >> + local->online = 0; >> + prcu_report(local); >> + put_cpu_ptr(&prcu_local); >> +} >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 326d4f88..a308581b 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -15,6 +15,7 @@ >> #include >> #include >> #include >> +#include >> >> #include >> #include >> @@ -3383,6 +3384,7 @@ static void __sched notrace __schedule(bool preempt) >> >> local_irq_disable(); >> rcu_note_context_switch(preempt); >> + prcu_note_context_switch(); >> >> /* >> * Make sure that signal_pending_state()->signal_pending() below >> -- >> 2.14.1.729.g59c0ea183 >> >