Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2514478imm; Mon, 24 Sep 2018 05:45:37 -0700 (PDT) X-Google-Smtp-Source: ACcGV630XOUjvT5Bhio1U1CqRs8tkjm8CPUjKZ/T5yarV0TuVu9NyeCvOinWhU5FTUmQwIwMfmOd X-Received: by 2002:a63:4281:: with SMTP id p123-v6mr9313074pga.91.1537793137615; Mon, 24 Sep 2018 05:45:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537793137; cv=none; d=google.com; s=arc-20160816; b=pBEW2s8omtK4EQeCR1JUOU+9tDo05tJf0bnJ3AFebcCRy5qmWZgKUnTM/DYtkaLAy0 r0Ao9oLQIE5nabUS7mNVnU7V3iofkAkqs/48jlZGqEqQhw85FgH2lLh4I+VOzK4PzB84 fUhXzKLikDYIF+EF4/KP+JNCStunnbxUy2EcHJR1jg9gXSmMEGht3G6sgJVqBPdoYypN N2xTkhjQ5lqFvAMS5gf984UalxgGH3K0lceDbOpnKvnO6mZ743EWyh0TBOVmkRg8O62j Cq2laCllSLteBZNtmnb8w1/PAsJTnH2u7ltDwuRgVEbO6tCZYukxXYzNU/ZNE130Rpkx aioA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from; bh=mtbexEdlOq4NV3p+L3rfCn4kW3GeAzDuP0J99q3b3/0=; b=RcHGc5sNIgSJxRlWKBW5GnYwLgoKaXTHgYEWIN2jtqCXtwetCDGPwJnsPfGFhh9e/4 O6/pkKvxy5+YnDjb1J3BgXo2Wt00cBvWwnzw/KILZJnb3qYzmvhBpSAig4v4MeFoG9o7 w3glfQ9ecfhHmFagLMfYdSCjMSCubIMsnzYPdWWJvdT6H2qcUKg+MkCwvdvGYBXChu9P cGvRNvyFja7w3cffp1JXpYi09b/WvMLrOo0+AprPomiv66yGHdf50e3YL0GpU1mAzLvN zydfVa8nTK9bAGF4e56AL072SgsH/dzpzgWRCuR/cyrxDFILIKgETI3EJtth1KQDFYZx 8q1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u18-v6si35899276plq.1.2018.09.24.05.45.22; Mon, 24 Sep 2018 05:45:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389132AbeIXSqS (ORCPT + 99 others); Mon, 24 Sep 2018 14:46:18 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:59590 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387512AbeIXSqR (ORCPT ); Mon, 24 Sep 2018 14:46:17 -0400 Received: from localhost (ip-213-127-77-73.ip.prioritytelecom.net [213.127.77.73]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8C06B1099; Mon, 24 Sep 2018 12:44:18 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Paul E. McKenney" , Sasha Levin Subject: [PATCH 4.18 207/235] rcu: Fix grace-period hangs due to race with CPU offline Date: Mon, 24 Sep 2018 13:53:13 +0200 Message-Id: <20180924113124.637508442@linuxfoundation.org> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180924113103.999624566@linuxfoundation.org> References: <20180924113103.999624566@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.18-stable review patch. If anyone has any objections, please let me know. ------------------ From: "Paul E. McKenney" [ Upstream commit 1e64b15a4b102e1cd059d4d798b7a78f93341333 ] Without special fail-safe quiescent-state-propagation checks, grace-period hangs can result from the following scenario: 1. CPU 1 goes offline. 2. Because CPU 1 is the only CPU in the system blocking the current grace period, the grace period ends as soon as rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp() returns. 3. At this point, the leaf rcu_node structure's ->lock is no longer held: rcu_report_qs_rnp() has released it, as it must in order to awaken the RCU grace-period kthread. 4. At this point, that same leaf rcu_node structure's ->qsmaskinitnext field still records CPU 1 as being online. This is absolutely necessary because the scheduler uses RCU (in this case on the wake-up path while awakening RCU's grace-period kthread), and ->qsmaskinitnext contains RCU's idea as to which CPUs are online. Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's bit from ->qsmaskinitnext would result in a lockdep-RCU splat due to RCU being used from an offline CPU. 5. RCU's grace-period kthread awakens, sees that the old grace period has completed and that a new one is needed. It therefore starts a new grace period, but because CPU 1's leaf rcu_node structure's ->qsmaskinitnext field still shows CPU 1 as being online, this new grace period is initialized to wait for a quiescent state from the now-offline CPU 1. 6. Without the fail-safe force-quiescent-state checks, there would be no quiescent state from the now-offline CPU 1, which would eventually result in RCU CPU stall warnings and memory exhaustion. It would be good to get rid of the special fail-safe quiescent-state propagation checks, and thus it would be good to fix things so that the above scenario cannot happen. This commit therefore adds a new ->ofl_lock to the rcu_state structure. This lock is held by rcu_gp_init() across the applying of buffered online and offline operations to the rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu() when buffering a new offline operation. This prevents rcu_gp_init() from acquiring the leaf rcu_node structure's lock during the interval between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(), which releases ->lock and the re-acquisition of that same lock. This in turn prevents the failure scenario outlined above, and will hopefully eventually allow removal of the offline-CPU checks from the force-quiescent-state code path. Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/rcu/tree.c | 6 ++++++ kernel/rcu/tree.h | 4 ++++ 2 files changed, 10 insertions(+) --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -102,6 +102,7 @@ struct rcu_state sname##_state = { \ .abbr = sabbr, \ .exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \ .exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \ + .ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \ } RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched); @@ -1925,11 +1926,13 @@ static bool rcu_gp_init(struct rcu_state */ rcu_for_each_leaf_node(rsp, rnp) { rcu_gp_slow(rsp, gp_preinit_delay); + spin_lock(&rsp->ofl_lock); raw_spin_lock_irq_rcu_node(rnp); if (rnp->qsmaskinit == rnp->qsmaskinitnext && !rnp->wait_blkd_tasks) { /* Nothing to do on this leaf rcu_node structure. */ raw_spin_unlock_irq_rcu_node(rnp); + spin_unlock(&rsp->ofl_lock); continue; } @@ -1964,6 +1967,7 @@ static bool rcu_gp_init(struct rcu_state } raw_spin_unlock_irq_rcu_node(rnp); + spin_unlock(&rsp->ofl_lock); } /* @@ -3725,9 +3729,11 @@ static void rcu_cleanup_dying_idle_cpu(i /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ mask = rdp->grpmask; + spin_lock(&rsp->ofl_lock); raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */ rnp->qsmaskinitnext &= ~mask; raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + spin_unlock(&rsp->ofl_lock); } /* --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -384,6 +384,10 @@ struct rcu_state { const char *name; /* Name of structure. */ char abbr; /* Abbreviated name. */ struct list_head flavors; /* List of RCU flavors. */ + + spinlock_t ofl_lock ____cacheline_internodealigned_in_smp; + /* Synchronize offline with */ + /* GP pre-initialization. */ }; /* Values for rcu_state structure's gp_flags field. */