Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755448AbdGSQyd (ORCPT ); Wed, 19 Jul 2017 12:54:33 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:42338 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753094AbdGSQyb (ORCPT ); Wed, 19 Jul 2017 12:54:31 -0400 Date: Wed, 19 Jul 2017 09:54:25 -0700 From: "Paul E. McKenney" To: Christopher Lameter Cc: "Li, Aubrey" , Thomas Gleixner , Andi Kleen , Peter Zijlstra , Frederic Weisbecker , Aubrey Li , len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org, daniel.lezcano@linaro.org Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods Reply-To: paulmck@linux.vnet.ibm.com References: <20170714162619.GJ3441@tassilo.jf.intel.com> <20170717192309.ubn5muvc3u7htuaw@hirez.programming.kicks-ass.net> <34371ef8-b8bc-d2bf-93de-3fccd6beb032@linux.intel.com> <20170718044521.GO3441@tassilo.jf.intel.com> <20170718152014.GB3981@linux.vnet.ibm.com> <20170719144827.GB3730@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17071916-0040-0000-0000-00000382AB85 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007388; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00889885; UDB=6.00444544; IPR=6.00670057; BA=6.00005479; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016277; XFM=3.00000015; UTC=2017-07-19 16:54:29 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17071916-0041-0000-0000-00000776C0DF Message-Id: <20170719165425.GE3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-19_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707190274 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9678 Lines: 215 On Wed, Jul 19, 2017 at 10:03:22AM -0500, Christopher Lameter wrote: > On Wed, 19 Jul 2017, Paul E. McKenney wrote: > > > > Do we have any problem if we skip RCU idle enter/exit under a fast idle scenario? > > > My understanding is, if tick is not stopped, then we don't need inform RCU in > > > idle path, it can be informed in irq exit. > > > > Indeed, the problem arises when the tick is stopped. > > Well is there a boundary when you would want the notification calls? I > would think that even an idle period of a couple of seconds does not > necessarily require a callback to rcu. Had some brokenness here where RCU > calls did not occur for hours or so. At some point the system ran out of > memory but thats far off. Yeah, I should spell this out more completely, shouldn't I? And get it into the documentation if it isn't already there... Where it is currently at best hinted at. :-/ 1. If a CPU is either idle or executing in usermode, and RCU believes it is non-idle, the scheduling-clock tick had better be running. Otherwise, you will get RCU CPU stall warnings. Or at best, very long (11-second) grace periods, with a pointless IPI waking the CPU each time. 2. If a CPU is in a portion of the kernel that executes RCU read-side critical sections, and RCU believes this CPU to be idle, you can get random memory corruption. DON'T DO THIS!!! This is one reason to test with lockdep, which will complain about this sort of thing. 3. If a CPU is in a portion of the kernel that is absolutely positively no-joking guaranteed to never execute any RCU read-side critical sections, and RCU believes this CPU to to be idle, no problem. This sort of thing is used by some architectures for light-weight exception handlers, which can then avoid the overhead of rcu_irq_enter() and rcu_irq_exit(). Some go further and avoid the entireties of irq_enter() and irq_exit(). Just make very sure you are running some of your tests with CONFIG_PROVE_RCU=y. 4. If a CPU is executing in the kernel with the scheduling-clock interrupt disabled and RCU believes this CPU to be non-idle, and if the CPU goes idle (from an RCU perspective) every few jiffies, no problem. It is usually OK for there to be the occasional gap between idle periods of up to a second or so. If the gap grows too long, you get RCU CPU stall warnings. 5. If a CPU is either idle or executing in usermode, and RCU believes it to be idle, of course no problem. 6. If a CPU is executing in the kernel, the kernel code path is passing through quiescent states at a reasonable frequency (preferably about once per few jiffies, but the occasional excursion to a second or so is usually OK) and the scheduling-clock interrupt is enabled, of course no problem. If the gap between a successive pair of quiescent states grows too long, you get RCU CPU stall warnings. For purposes of comparison, the default time until RCU CPU stall warning in mainline is 21 seconds. A number of distros set this to 60 seconds. Back in the 90s, the analogous timeout for DYNIX/ptx was 1.5 seconds. :-/ Hence "a second or so" instead of your "a couple of seconds". ;-) Please see below for the corresponding patch to RCU's requirements. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit 693c9bfa43f92570dd362d8834440b418bbb994a Author: Paul E. McKenney Date: Wed Jul 19 09:52:58 2017 -0700 doc: Set down RCU's scheduling-clock-interrupt needs This commit documents the situations in which RCU needs the scheduling-clock interrupt to be enabled, along with the consequences of failing to meet RCU's needs in this area. Signed-off-by: Paul E. McKenney diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 95b30fa25d56..7980bee5607f 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2080,6 +2080,8 @@ Some of the relevant points of interest are as follows:
  • Scheduler and RCU.
  • Tracing and RCU.
  • Energy Efficiency. +
  • + Scheduling-Clock Interrupts and RCU.
  • Memory Efficiency.
  • Performance, Scalability, Response Time, and Reliability. @@ -2532,6 +2534,113 @@ I learned of many of these requirements via angry phone calls: Flaming me on the Linux-kernel mailing list was apparently not sufficient to fully vent their ire at RCU's energy-efficiency bugs! +

    +Scheduling-Clock Interrupts and RCU

    + +

    +The kernel transitions between in-kernel non-idle execution, userspace +execution, and the idle loop. +Depending on kernel configuration, RCU handles these states differently: + + + + + + + + + + + + + + + + + + +
    HZ KconfigIn-KernelUsermodeIdle
    HZ_PERIODICCan rely on scheduling-clock interrupt.Can rely on scheduling-clock interrupt and its + detection of interrupt from usermode.Can rely on RCU's dyntick-idle detection.
    NO_HZ_IDLECan rely on scheduling-clock interrupt.Can rely on scheduling-clock interrupt and its + detection of interrupt from usermode.Can rely on RCU's dyntick-idle detection.
    NO_HZ_FULLCan only sometimes rely on scheduling-clock interrupt. + In other cases, it is necessary to bound kernel execution + times and/or use IPIs.Can rely on RCU's dyntick-idle detection.Can rely on RCU's dyntick-idle detection.
    + + + + + + + + +
     
    Quick Quiz:
    + Why can't NO_HZ_FULL in-kernel execution rely on the + scheduling-clock interrupt, just like HZ_PERIODIC + and NO_HZ_IDLE do? +
    Answer:
    + Because, as a performance optimization, NO_HZ_FULL + does not necessarily re-enable the scheduling-clock interrupt + on entry to each and every system call. +
     
    + +

    +However, RCU must be reliably informed as to whether any given +CPU is currently in the idle loop, and, for NO_HZ_FULL, +also whether that CPU is executing in usermode, as discussed +earlier. +It also requires that the scheduling-clock interrupt be enabled when +RCU needs it to be: + +

      +
    1. If a CPU is either idle or executing in usermode, and RCU believes + it is non-idle, the scheduling-clock tick had better be running. + Otherwise, you will get RCU CPU stall warnings. Or at best, + very long (11-second) grace periods, with a pointless IPI waking + the CPU from time to time. +
    2. If a CPU is in a portion of the kernel that executes RCU read-side + critical sections, and RCU believes this CPU to be idle, you will get + random memory corruption. DON'T DO THIS!!! + +
      This is one reason to test with lockdep, which will complain + about this sort of thing. +
    3. If a CPU is in a portion of the kernel that is absolutely + positively no-joking guaranteed to never execute any RCU read-side + critical sections, and RCU believes this CPU to to be idle, + no problem. This sort of thing is used by some architectures + for light-weight exception handlers, which can then avoid the + overhead of rcu_irq_enter() and rcu_irq_exit() + at exception entry and exit, respectively. + Some go further and avoid the entireties of irq_enter() + and irq_exit(). + +
      Just make very sure you are running some of your tests with + CONFIG_PROVE_RCU=y, just in case one of your code paths + was in fact joking about not doing RCU read-side critical sections. +
    4. If a CPU is executing in the kernel with the scheduling-clock + interrupt disabled and RCU believes this CPU to be non-idle, + and if the CPU goes idle (from an RCU perspective) every few + jiffies, no problem. It is usually OK for there to be the + occasional gap between idle periods of up to a second or so. + +
      If the gap grows too long, you get RCU CPU stall warnings. +
    5. If a CPU is either idle or executing in usermode, and RCU believes + it to be idle, of course no problem. +
    6. If a CPU is executing in the kernel, the kernel code + path is passing through quiescent states at a reasonable + frequency (preferably about once per few jiffies, but the + occasional excursion to a second or so is usually OK) and the + scheduling-clock interrupt is enabled, of course no problem. + +
      If the gap between a successive pair of quiescent states grows + too long, you get RCU CPU stall warnings. +
    + +

    +But as long as RCU is properly informed of kernel state transitions between +in-kernel execution, usermode execution, and idle, and as long as the +scheduling-clock interrupt is enabled when RCU needs it to be, you +can rest assured that the bugs you encounter will be in some other +part of RCU or some other part of the kernel! +

    Memory Efficiency