Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966988AbdDSQ6R (ORCPT ); Wed, 19 Apr 2017 12:58:17 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41777 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S966634AbdDSQ6M (ORCPT ); Wed, 19 Apr 2017 12:58:12 -0400 Date: Wed, 19 Apr 2017 09:58:05 -0700 From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com Subject: [PATCH v3 tip/core/rcu 0/40] SRCU callback parallelization for 4.12 Reply-To: paulmck@linux.vnet.ibm.com References: <20170412174003.GA23207@linux.vnet.ibm.com> <20170417234452.GB19013@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170417234452.GB19013@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17041916-0056-0000-0000-00000343ABF1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006939; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00849682; UDB=6.00419583; IPR=6.00628316; BA=6.00005304; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015095; XFM=3.00000013; UTC=2017-04-19 16:58:09 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041916-0057-0000-0000-00000779BDE6 Message-Id: <20170419165805.GB10874@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-19_15:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704190140 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6313 Lines: 159 Hello! This v3 series moves SRCU from its traditional single per-srcu_struct callback queue to per-srcu_struct/per-CPU callback queues. This involves abstracting functionality from Tree RCU, which results in a large conflict footprint, which in turn results in some otherwise unrelated patches coming along for the ride. 1. Maintain special bits at bottom of ->dynticks counter. This is for some upcoming MM work. My intent was to hold it until that work was ready, but merge conflicts dictated otherwise. If the MM work does not appear soonish, I will manually revert this patch. 2. Make arch select smp_mb__after_unlock_lock() strength, which gets rid of an arch-specific #ifdef. 3. Consolidate SRCU batch checking into rcu_all_batches_empty(). 4. Check for tardy grace-period activity in cleanup_srcu_struct(). 5-7. Semicolon inside RCU_TRACE() for various parts of RCU. 8. Pull rcu_sched_qs_mask into rcu_dynticks structure in order to eliminate an isolated per-CPU variable. 9. Pull rcu_qs_ctr into rcu_dynticks structure. 10. Eliminate flavor scan in rcu_momentary_dyntick_idle() to reduce semi-common-case context-switch overhead. 11. Place guard on rcu_all_qs() and rcu_note_context_switch() actions to reduce common-case scheduler-fastpath overhead. 12. Default RCU_FANOUT_LEAF to 16 unless explicitly changed. 13. Abstract multi-tail callback list handling for SRCU. 14. Allow SRCU to access rcu_scheduler_active. 15. Allow early boot use of synchronize_srcu(), though not yet mid-boot use. 16. Add single-element dequeue functions to rcu_segcblist for debug use. 17. Move rcu_seq_start() and friends to rcu.h for SRCU's benefit. 18. Expedited wakeups need to be fully ordered. 19. Fix warning in rcu_seq_end(). 20. Push srcu_advance_batches() fastpath into common case as a step towards callback parallelization. 21. Move to state-based grace-period sequencing, also as a step towards callback parallelization. 22. Add grace-period sequence numbers to SRCU. 23. Use rcu_segcblist to track SRCU callbacks. 24. Move combining-tree definitions for SRCU's benefit. 25. Move rcu_init_levelspread() to rcu_tree_node.h for SRCU's benefit. 26. Remove redundant levelcnt[] array from rcu_init_one(). 27. Move rcu_node traversal macros to rcu.h for SRCU's benefit. 28. Make num_rcu_lvl[] array be external for SRCU's benefit. 29. Fix bogus try_check_zero() comment. 30. Improve rcu_seq grace-period-counter abstraction for SRCU's benefit. 31. Allow a second bit in rcu_seq for SRCU state. 32. Merge ->srcu_state into ->srcu_gp_seq to allow atomic updates. 33. Provide crude control of expedited SRCU grace periods. 34. Use static initialization for "srcu" in mm/mmu_notifier.c. 35. Create a tiny SRCU for bloatwatch/tinification. 36. Print Tiny SRCU reader statistics in rcutorture. 37. Introduce CLASSIC_SRCU Kconfig option for those who do not wish to help debug Tree SRCU. 38. Parallelize SRCU callback handling. 39. Make expedited parallel SRCU callback handling really be fully expedited. 40. Make non-preemptive schedule be Tasks RCU quiescent state. Updates since v2: o Apply Josh Triplett feedback. o Fix a performance regression found by Marc Zyngier. Updates since v1: o Incorporate feedback from Peter Zijlstra. o Dropped v1 patches 8-10 ("Make various parts of RCU do deferred NOCB wakeups in order to prevent callback blockages, and thus hangs"). These patches turned out to be papering over a no-CBs CPU design flaw. There will be patches in v4.13 to fix the design flaw directly. o Added v2 patch #34 ("Use static initialization for "srcu" in mm/mmu_notifier.c"), moving it from its v1 location in the fixes series. o Added v2 patch #39 ("Make non-preemptive schedule be Tasks RCU quiescent state") for the benefit of upcoming ftrace work at Steve Rostedt's request. Thanx, Paul ------------------------------------------------------------------------ /kernel/rcu/rcu_segcblist.h | 670 ----- b/Documentation/RCU/Design/Data-Structures/Data-Structures.html | 36 b/arch/Kconfig | 3 b/arch/powerpc/Kconfig | 1 b/include/linux/rcu_node_tree.h | 105 b/include/linux/rcu_segcblist.h | 720 +++++ b/include/linux/rcupdate.h | 17 b/include/linux/rcutiny.h | 24 b/include/linux/rcutree.h | 5 b/include/linux/srcu.h | 112 b/include/linux/srcuclassic.h | 101 b/include/linux/srcutiny.h | 81 b/include/linux/srcutree.h | 171 + b/init/Kconfig | 33 b/kernel/rcu/Makefile | 6 b/kernel/rcu/rcu.h | 165 + b/kernel/rcu/rcu_segcblist.h | 670 +++++ b/kernel/rcu/rcutorture.c | 39 b/kernel/rcu/srcu.c | 856 +++--- b/kernel/rcu/srcutiny.c | 215 + b/kernel/rcu/srcutree.c | 1255 ++++++++-- b/kernel/rcu/tiny.c | 20 b/kernel/rcu/tiny_plugin.h | 13 b/kernel/rcu/tree.c | 661 ++--- b/kernel/rcu/tree.h | 174 - b/kernel/rcu/tree_exp.h | 25 b/kernel/rcu/tree_plugin.h | 62 b/kernel/rcu/tree_trace.c | 26 b/kernel/rcu/update.c | 53 b/kernel/sched/core.c | 2 b/mm/mmu_notifier.c | 14 31 files changed, 4299 insertions(+), 2036 deletions(-)