Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756166AbdDMJyd (ORCPT ); Thu, 13 Apr 2017 05:54:33 -0400 Received: from merlin.infradead.org ([205.233.59.134]:49502 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750811AbdDMJyb (ORCPT ); Thu, 13 Apr 2017 05:54:31 -0400 Date: Thu, 13 Apr 2017 11:54:20 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com Subject: Re: [PATCH tip/core/rcu 40/40] srcu: Parallelize callback handling Message-ID: <20170413095420.a2p2ygddz26gaugw@hirez.programming.kicks-ass.net> References: <20170412174003.GA23207@linux.vnet.ibm.com> <1492018825-25634-40-git-send-email-paulmck@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1492018825-25634-40-git-send-email-paulmck@linux.vnet.ibm.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3074 Lines: 50 On Wed, Apr 12, 2017 at 10:40:25AM -0700, Paul E. McKenney wrote: > Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1], > however, there are workloads that could result in a high volume of > concurrent invocations of call_srcu(), which with current SRCU would > result in excessive lock contention on the srcu_struct structure's > ->queue_lock, which protects SRCU's callback lists. This commit therefore > moves SRCU to per-CPU callback lists, thus greatly reducing contention. > > Because a given SRCU instance no longer has a single centralized callback > list, starting grace periods and invoking callbacks each require a bit > more work. These are handled using an srcu_node tree that is in some ways > similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched > (for example, the srcu_node tree shape is controlled by exactly the > same Kconfig options and boot parameters that control the shape of the > rcu_node tree). > > In addition, the old per-CPU srcu_array structure is now named srcu_data > and contains an rcu_segcblist structure named ->srcu_cblist for its > callbacks (and a spinlock to protect this). The srcu_struct gets > an srcu_gp_seq that is used to associate callback segments with the > corresponding completion-time grace-period number. These completion-time > grace-period numbers are propagated up the srcu_node tree so that the > grace-period workqueue handler can determine whether additional grace > periods are needed on the one hand and where to look for callbacks that > are ready to be invoked. > > The srcu_barrier() function must now wait on all instances of the > per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected > by ->lock, srcu_barrier() can remotely add the needed callbacks. > In theory, it could also remotely start grace periods, but this gets > complex and racy. And interestingly enough, it is never necessary to > start a grace period in this case because srcu_barrier() only enqueues > a callback when a callback is already present. And a grace period has > to have already been started for this pre-existing callback. And it is > only the callback that srcu_barrier() needs to wait on, not any particular > grace period. Therefore, a new rcu_segcblist_entrain() function enqueues > the srcu_barrier() function's callback into the same segment occupied by > the pre-existing callback. The special case where all the pre-existing > callbacks are on a different list being invoked is handled by enqueuing > srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on > the done-callbacks check that takes place after all callbacks are inovked. > > Note that the readers use the same algorithm as before. Note that there > is a separate srcu_idx that tells the readers what counter to increment. > This unfortunately cannot be combined with srcu_gp_seq because they > need to be incremented at different times. So one thing I've asked before I think, would it not be possible to abstract PREEMPT_RCU and use the exact same code for PREEMPT_RCU and SRCU ?