Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp489929ybg; Fri, 12 Jun 2020 06:58:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWjgGzvDoQB5QBVQcWuatbpJRlytvIEGdlOrcTpgGo9wyfV2qSE4eajkaf9X2NthMsORE5 X-Received: by 2002:a17:906:c10f:: with SMTP id do15mr13987630ejc.249.1591970287905; Fri, 12 Jun 2020 06:58:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591970287; cv=none; d=google.com; s=arc-20160816; b=PdjpMComefs45sJ0gbKQ8ZC82qUd00MT+GT7DsctflV0lLQlkDxdRtJY9Zr42EmlZh cShgcWg7H9b7IlkKUYP8ihFH/zRclzCDm+RMag+d1vqstZ2qZFXh3/+1APglqp9Fisnf zi1cAeqqadvIoHiKq1Qni0/KLBCKfkh8p3JWiKluzGmHc7t65TqBNvVPn/dXl9j46AUK P8WCUE2qa58IWzLMjfoFO0YeH83qyGkuWSLqxfRzM1GThb/Cm1O83/PgZz62YWCy5b4K 4KgGVZXb6gh8BSiMX6MQu7YrZimmgysJUBtIiRE47ezEyDN3dGEFbmrcEoplbsj19A/3 GUKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=wdkQZlF000ADtjIDdP16SW9nWh0T3bolhCJHaZFxn1Q=; b=RCTtCtzEPnF5zy70VMrLMDmDvRbrafQXPJBBBoTghOoLBUJBZcSRCUoA9oe3E0jNZc uhIha3TDueuMIhW69XLmxr2x2B/uukuKjF8zhssnvJqftriClf3dBgK1dL+3HGZK+i/w URDYE6qH+dcx3ycEW90Wui8UdHf1yoNuNHw/w4cKOZ5cc9H0uvkgyxDrabChOyq7C6DN JkSlzADbNAAoKJdcCwwlup8M4CEN5zr/q5ODyR7K6SfP/U+Z2kZWKA+AXb8tV0QqTIFA a78siJAIryfKdaemZaE4f6Y0w533E6O3OFb2DjcAhs2iqtRQSazagJhaaqQbdFO4fIQO xVWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m23si3408482eds.530.2020.06.12.06.57.45; Fri, 12 Jun 2020 06:58:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726559AbgFLNza (ORCPT + 99 others); Fri, 12 Jun 2020 09:55:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726323AbgFLNzM (ORCPT ); Fri, 12 Jun 2020 09:55:12 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BF27C03E96F; Fri, 12 Jun 2020 06:55:12 -0700 (PDT) Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jjk97-0004ZH-Er; Fri, 12 Jun 2020 15:55:01 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id B42B9100F5A; Fri, 12 Jun 2020 15:55:00 +0200 (CEST) From: Thomas Gleixner To: "Paul E. McKenney" Cc: LKML , rcu@vger.kernel.org, Andrew Lutomirski , X86 ML , Frederic Weisbecker , Steven Rostedt , Joel Fernandes , Mathieu Desnoyers , Will Deacon , Peter Zijlstra Subject: [PATCH x86/entry: Force rcu_irq_enter() when in idle task In-Reply-To: <871rmkzcc8.fsf@nanos.tec.linutronix.de> References: <20200611235305.GA32342@paulmck-ThinkPad-P72> <871rmkzcc8.fsf@nanos.tec.linutronix.de> Date: Fri, 12 Jun 2020 15:55:00 +0200 Message-ID: <87wo4cxubv.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The idea of conditionally calling into rcu_irq_enter() only when RCU is not watching turned out to be not completely thought through. Paul noticed occasional premature end of grace periods in RCU torture testing. Bisection led to the commit which made the invocation of rcu_irq_enter() conditional on !rcu_is_watching(). It turned out that this conditional breaks RCU assumptions about the idle task when the scheduler tick happens to be a nested interrupt. Nested interrupts can happen when the first interrupt invokes softirq processing on return which enables interrupts. If that nested tick interrupt does not invoke rcu_irq_enter() then the nest accounting in RCU claims that this is the first interrupt which might mark a quiescient state and end grace periods prematurely. Change the condition from !rcu_is_watching() to is_idle_task(current) which enforces that interrupts in the idle task unconditionally invoke rcu_irq_enter() independent of the RCU state. This is also correct vs. user mode entries in NOHZ full scenarios because user mode entries bring RCU out of EQS and force the RCU irq nesting state accounting to nested. As only the first interrupt can enter from user mode a nested tick interrupt will enter from kernel mode and as the nesting state accounting is forced to nesting it will not do anything stupid even if rcu_irq_enter() has not been invoked. Fixes: 3eeec3858488 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") Reported-by: "Paul E. McKenney" Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -557,14 +557,34 @@ bool noinstr idtentry_enter_cond_rcu(str return false; } - if (!__rcu_is_watching()) { + /* + * If this entry hit the idle task invoke rcu_irq_enter() whether + * RCU is watching or not. + * + * Interupts can nest when the first interrupt invokes softirq + * processing on return which enables interrupts. + * + * Scheduler ticks in the idle task can mark quiescent state and + * terminate a grace period, if and only if the timer interrupt is + * not nested into another interrupt. + * + * Checking for __rcu_is_watching() here would prevent the nesting + * interrupt to invoke rcu_irq_enter(). If that nested interrupt is + * the tick then rcu_flavor_sched_clock_irq() would wrongfully + * assume that it is the first interupt and eventually claim + * quiescient state and end grace periods prematurely. + * + * Unconditionally invoke rcu_irq_enter() so RCU state stays + * consistent. + * + * TINY_RCU does not support EQS, so let the compiler eliminate + * this part when enabled. + */ + if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { /* * If RCU is not watching then the same careful * sequence vs. lockdep and tracing is required * as in enter_from_user_mode(). - * - * This only happens for IRQs that hit the idle - * loop, i.e. if idle is not using MWAIT. */ lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); @@ -576,9 +596,10 @@ bool noinstr idtentry_enter_cond_rcu(str } /* - * If RCU is watching then RCU only wants to check - * whether it needs to restart the tick in NOHZ - * mode. + * If RCU is watching then RCU only wants to check whether it needs + * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() + * already contains a warning when RCU is not watching, so no point + * in having another one here. */ instrumentation_begin(); rcu_irq_enter_check_tick();