Received: by 10.223.176.5 with SMTP id f5csp2631780wra; Mon, 29 Jan 2018 01:11:34 -0800 (PST) X-Google-Smtp-Source: AH8x227jOs4Vk6ySzkAP9WVBkvw+kPiXQDTGr+vLSd2xaU8h9EddgOrTU16qHa8eWG5xAWM/+iq1 X-Received: by 10.99.116.22 with SMTP id p22mr12845998pgc.4.1517217094590; Mon, 29 Jan 2018 01:11:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517217094; cv=none; d=google.com; s=arc-20160816; b=j2ER61qauTpUjYTgVhic/tw9Otg2cjRn4QfH77PMcc+X8GWpNnC+zOyqVSKo/ShJ/p 3spwiwrdCGvD3/bdYyub9vIED5IHcyGPQxLOC9PO2Iaw1mYwQTeyD5yCpq1K6/4nPucU t4sYxqzD3ijkuAEUPstNgv+JsNZ8Zg/77Qrn4p9lI2fwHJUnCAJVxTAvkjudwJ3rFSTi qLL//yKbpANoi0eJ9MAbrLGRJyql+1HO11UNVuiEgkYaq/i0iAmW43V5U+U+WeNT6CBw vh0FxEqO1RzIPrFrjgq63ff6szx6bUuNUYpUEwZ/xVeS9qmJhtgjwxvFF7CJAJraMbCI M1iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=qElNzACIdQETSbxAHbomcHPvtmewXVRO5HR410MC7tA=; b=PoZLl19nWJ3Rfv5B4JOC8MRtwiwIMz58PLqOEUiZES/Pe7WW8KSHEUqclE0JkVtPWV Gu1tM5IIbJOS2YxV4uXoel6nndvozW5ry+R9VGlTJ9IsjpPNPa8PvwnPloN92bJEk96Q QR8/E1FgofMjDmmCrOc73fybiBx2q8ujVd0rm82kbpaPPwwvKzoVBOukTWL8mo3iNgN8 GuAow1FyyY+5rHOk2N1Zx7EWkkktw/eAMnLXLzAiNCAgehiSrifnOyNlI2dzV7s4utHP d5oDBdYmreruW4lx/x1yoVn2FK4kZddpX9iz5ZpVhwFxT1O+WrIaw+0Oc3UOhEXkKqUG Nm5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=TdwpDYdZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z21-v6si3051712plo.453.2018.01.29.01.11.19; Mon, 29 Jan 2018 01:11:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=TdwpDYdZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751670AbeA2JKx (ORCPT + 99 others); Mon, 29 Jan 2018 04:10:53 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:53151 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751127AbeA2JKu (ORCPT ); Mon, 29 Jan 2018 04:10:50 -0500 Received: by mail-it0-f65.google.com with SMTP id u62so7461698ita.2 for ; Mon, 29 Jan 2018 01:10:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=qElNzACIdQETSbxAHbomcHPvtmewXVRO5HR410MC7tA=; b=TdwpDYdZrKuJ0sMu1KnbphlifL2u0EASFNSZT39eLSu4SscXQq0cq5KuC4KWELT2VF tkpgB0PlptE+FL4ggq0d9mH+fD6M4updQmbN62+ROjGcuX/k9zY1HFGDKpNZE/g3orpH k4oPuI3g2aF3gWs1+gdeY7vAl0D3R5naSX12fgi6BUXVEs1ziHW9JnMvSFJbd3zxqDft jBiRsyCrQFqwCMyoKRkvG3Ltl/zeTumVMh+7ameDM4a/YHyB3LKWVrrRmNhtH40JI9xG YsBu/bl1mMC0N/OnU2ACJyuj2Y0yFPqU3QAoRz+Gzpms+v/DlYKdP1notRzOLNIr8v23 +7FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=qElNzACIdQETSbxAHbomcHPvtmewXVRO5HR410MC7tA=; b=pyBDN94aDW1rF6+fOj5EUZ55imEjOPktZOq3NbjEGxHcKZdT7nHH61eG+8sC+prih0 Rc6vsRFVR6oAMF7foQBoLKkxwl5an0M6Gsrmjorpt2KDBg6l3NGpnoz6ntAijlEhRl46 Y8o10hk9YLKWyQg5kXg8tFr9ioMJ0+F0aXC09jeespa0MqPn1/U5eo9DKJ4I8NDsyuwv aNUP5mYJk6WCHRHhelZuGRiE4fES2yoUcJUHoT9K8AgQKesF6u5VO8zQ/9pPX4Khqtg3 kRJHvIEEkE1dkiAD/tRDm1puk1paGqs9XyhS3RnXyuqBZjRuoMVt5rs54MVTmhbr99vL OzBw== X-Gm-Message-State: AKwxytfqjqu//DJBJsmTfmNochR2GhixZYRDLhLf5IdqxfFzI60XM2FE qJMWKPD8xvPAP2xXR/4FftcKtxoNzk8NOXGs6vk= X-Received: by 10.36.184.3 with SMTP id m3mr25586814ite.65.1517217049807; Mon, 29 Jan 2018 01:10:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.214.23 with HTTP; Mon, 29 Jan 2018 01:10:49 -0800 (PST) In-Reply-To: <1516694381-20333-2-git-send-email-lianglihao@huawei.com> References: <1516694381-20333-1-git-send-email-lianglihao@huawei.com> <1516694381-20333-2-git-send-email-lianglihao@huawei.com> From: Lai Jiangshan Date: Mon, 29 Jan 2018 17:10:49 +0800 X-Google-Sender-Auth: ijrpayP_6Qz6RvlTUlHk-j_VERs Message-ID: Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation To: lianglihao@huawei.com Cc: "Paul E. McKenney" , guohanjun@huawei.com, heng.z@huawei.com, hb.chen@huawei.com, lihao.liang@gmail.com, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 23, 2018 at 3:59 PM, wrote: > From: Heng Zhang > > This RCU implementation (PRCU) is based on a fast consensus protocol > published in the following paper: > > Fast Consensus Using Bounded Staleness for Scalable Read-mostly Synchronization. > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016. > https://dl.acm.org/citation.cfm?id=3024114.3024143 > > Signed-off-by: Heng Zhang > Signed-off-by: Lihao Liang > --- > include/linux/prcu.h | 37 +++++++++++++++ > kernel/rcu/Makefile | 2 +- > kernel/rcu/prcu.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/core.c | 2 + > 4 files changed, 165 insertions(+), 1 deletion(-) > create mode 100644 include/linux/prcu.h > create mode 100644 kernel/rcu/prcu.c > > diff --git a/include/linux/prcu.h b/include/linux/prcu.h > new file mode 100644 > index 00000000..653b4633 > --- /dev/null > +++ b/include/linux/prcu.h > @@ -0,0 +1,37 @@ > +#ifndef __LINUX_PRCU_H > +#define __LINUX_PRCU_H > + > +#include > +#include > +#include > + > +#define CONFIG_PRCU > + > +struct prcu_local_struct { > + unsigned int locked; > + unsigned int online; > + unsigned long long version; > +}; > + > +struct prcu_struct { > + atomic64_t global_version; > + atomic_t active_ctr; > + struct mutex mtx; > + wait_queue_head_t wait_q; > +}; > + > +#ifdef CONFIG_PRCU > +void prcu_read_lock(void); > +void prcu_read_unlock(void); > +void synchronize_prcu(void); > +void prcu_note_context_switch(void); > + > +#else /* #ifdef CONFIG_PRCU */ > + > +#define prcu_read_lock() do {} while (0) > +#define prcu_read_unlock() do {} while (0) > +#define synchronize_prcu() do {} while (0) > +#define prcu_note_context_switch() do {} while (0) > + > +#endif /* #ifdef CONFIG_PRCU */ > +#endif /* __LINUX_PRCU_H */ > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile > index 23803c7d..8791419c 100644 > --- a/kernel/rcu/Makefile > +++ b/kernel/rcu/Makefile > @@ -2,7 +2,7 @@ > # and is generally not a function of system call inputs. > KCOV_INSTRUMENT := n > > -obj-y += update.o sync.o > +obj-y += update.o sync.o prcu.o > obj-$(CONFIG_CLASSIC_SRCU) += srcu.o > obj-$(CONFIG_TREE_SRCU) += srcutree.o > obj-$(CONFIG_TINY_SRCU) += srcutiny.o > diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c > new file mode 100644 > index 00000000..a00b9420 > --- /dev/null > +++ b/kernel/rcu/prcu.c > @@ -0,0 +1,125 @@ > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local); > + > +struct prcu_struct global_prcu = { > + .global_version = ATOMIC64_INIT(0), > + .active_ctr = ATOMIC_INIT(0), > + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx), > + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q) > +}; > +struct prcu_struct *prcu = &global_prcu; > + > +static inline void prcu_report(struct prcu_local_struct *local) > +{ > + unsigned long long global_version; > + unsigned long long local_version; > + > + global_version = atomic64_read(&prcu->global_version); > + local_version = local->version; > + if (global_version > local_version) > + cmpxchg(&local->version, local_version, global_version); It is called with irq-disabled, and local->version can't be modified on other cpu. why cmpxchg is needed? > +} > + > +void prcu_read_lock(void) > +{ > + struct prcu_local_struct *local; > + > + local = get_cpu_ptr(&prcu_local); > + if (!local->online) { > + WRITE_ONCE(local->online, 1); > + smp_mb(); What's is the paired code? > + } > + > + local->locked++; > + put_cpu_ptr(&prcu_local); > +} > +EXPORT_SYMBOL(prcu_read_lock); > + > +void prcu_read_unlock(void) > +{ > + int locked; > + struct prcu_local_struct *local; > + > + barrier(); > + local = get_cpu_ptr(&prcu_local); > + locked = local->locked; > + if (locked) { > + local->locked--; > + if (locked == 1) > + prcu_report(local); > + put_cpu_ptr(&prcu_local); > + } else { > + put_cpu_ptr(&prcu_local); > + if (!atomic_dec_return(&prcu->active_ctr)) > + wake_up(&prcu->wait_q); > + } > +} > +EXPORT_SYMBOL(prcu_read_unlock); > + > +static void prcu_handler(void *info) > +{ > + struct prcu_local_struct *local; > + > + local = this_cpu_ptr(&prcu_local); > + if (!local->locked) > + WRITE_ONCE(local->version, atomic64_read(&prcu->global_version)); > +} > + > +void synchronize_prcu(void) > +{ > + int cpu; > + cpumask_t cpus; It might overflow the stack if the cpumask is large, please move it to struct prcu. > + unsigned long long version; > + struct prcu_local_struct *local; > + > + version = atomic64_add_return(1, &prcu->global_version); I think this line of code at least causes the following problem. > + mutex_lock(&prcu->mtx); > + > + local = get_cpu_ptr(&prcu_local); > + local->version = version; The successful orders of mutex_lock() might not be the same the orders of atomic64_add_return(). In this case, local->version will be decreased. prcu_report() can also happen here now. It is unsure who will change successfully the local->version. > + put_cpu_ptr(&prcu_local); > + > + cpumask_clear(&cpus); > + for_each_possible_cpu(cpu) { > + local = per_cpu_ptr(&prcu_local, cpu); > + if (!READ_ONCE(local->online)) > + continue; It seems like reading on local->online is unreliable. > + if (READ_ONCE(local->version) < version) { please handle the cases when version wraps around the maximum. > + smp_call_function_single(cpu, prcu_handler, NULL, 0); it smells bad when it is in for_each_possible_cpu() loop. > + cpumask_set_cpu(cpu, &cpus); > + } > + } > + > + for_each_cpu(cpu, &cpus) { > + local = per_cpu_ptr(&prcu_local, cpu); > + while (READ_ONCE(local->version) < version) > + cpu_relax(); > + } > Ouch, the cpu_relax() loop would take a long time. Since it will wait until all the relevant cpus scheduled. relevant cpus: prcu reader active cpus. So this block of code equals to synchronze_sched() in many cases when prcu is massively used. isn't it? smp_mb() /* A paired with B */ > + if (atomic_read(&prcu->active_ctr)) > + wait_event(prcu->wait_q, !atomic_read(&prcu->active_ctr)); > + > + mutex_unlock(&prcu->mtx); > +} > +EXPORT_SYMBOL(synchronize_prcu); > + > +void prcu_note_context_switch(void) > +{ > + struct prcu_local_struct *local; > + > + local = get_cpu_ptr(&prcu_local); > + if (local->locked) { > + atomic_add(local->locked, &prcu->active_ctr); smp_mb() /* B paired with A */ > + local->locked = 0; > + } > + local->online = 0; > + prcu_report(local); > + put_cpu_ptr(&prcu_local); > +} > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 326d4f88..a308581b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -3383,6 +3384,7 @@ static void __sched notrace __schedule(bool preempt) > > local_irq_disable(); > rcu_note_context_switch(preempt); > + prcu_note_context_switch(); > > /* > * Make sure that signal_pending_state()->signal_pending() below > -- > 2.14.1.729.g59c0ea183 >