Received: by 10.223.185.116 with SMTP id b49csp6408926wrg; Wed, 28 Feb 2018 08:57:11 -0800 (PST) X-Google-Smtp-Source: AG47ELvskNIgFEV8fwck5oMmt4vQ597EeunmyFlJNLJ6qoRW1NOQz/K9mVvTUCu16+HDTir9xcTO X-Received: by 2002:a17:902:7009:: with SMTP id y9-v6mr14739865plk.395.1519837031782; Wed, 28 Feb 2018 08:57:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519837031; cv=none; d=google.com; s=arc-20160816; b=pgXrz4MUFDS0u8ufzDsn1vUtph7VJAIV6C+wnziQyzJqGTjLQgoaa/EEZ+G/ICVHGI db1X66tXQaDCH2W3sjp+wtbjCUrQeFXKoMl+bJZKdDn/OPeLJw+NVN+up2ky1uzh0bG3 mXOoLLTfZtwTpwhIrBfC1M8x+ga/FinllAG7HTxi9tDmUicS+5Hc7/YncPR+spM7eJ1B ASe7WVjib/UHk1TlWAJtlGoXja4klup9+SB6yCq5y75o0SeQ8UVpWuv8oN9zXXbbjaAw roerpORbycb/Mc8rGl/Gbs31IDgOXGKtGdwX0C8g6WMN41qmVlBPhDYEnJgxWQQEUaKn 8z4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition :arc-authentication-results; bh=VaRdzESCjhN9HeEjEfebh86U1T/j2Jn3AhN7vgNSaxs=; b=saUCtR4qFlUdCJ0BJkGAjAU9YZVHvn+ApK8d7aL+ywhurNEBF8e2U68yTzFSrOIuJr cU/KBQzDzZqNE3D3gfUu2SlykfmguVAwj0f0FOx6UvSxTN34nX8jT1tZRp0wOMzvNooY 0UTEPPTyfIEHPtRUb1U8ceJT/A2IF9SeVR7XPYh4kmquN8k2PpMYSC602DTjWCjF6aVf xatemd8eMS3PxsVnZW+dlH+l7Si1oYaYFKAZiqhfS2BrRWWelKNMYJZbAt/+xdRY1+I2 UlEd4/pLO19xOoOg2q0itIFNzTLh3bHEUTCy5z9HXy9YMfHN9mK8QGXe1gweVwo0m8Oi 8gXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12-v6si1523601pln.400.2018.02.28.08.56.57; Wed, 28 Feb 2018 08:57:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933031AbeB1Q4G (ORCPT + 99 others); Wed, 28 Feb 2018 11:56:06 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:34512 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934011AbeB1P5Y (ORCPT ); Wed, 28 Feb 2018 10:57:24 -0500 Received: from [2a02:8011:400e:2:6f00:88c8:c921:d332] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1er3Yn-0006Xe-Si; Wed, 28 Feb 2018 15:22:26 +0000 Received: from ben by deadeye with local (Exim 4.90_1) (envelope-from ) id 1er3Yk-0000IR-JT; Wed, 28 Feb 2018 15:22:22 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Thomas Gleixner" , "Sebastian Sewior" , "Paul E. McKenney" , "Peter Zijlstra" , "Anna-Maria Gleixner" Date: Wed, 28 Feb 2018 15:20:18 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.16 230/254] hrtimer: Reset hrtimer cpu base proper on CPU hotplug In-Reply-To: X-SA-Exim-Connect-IP: 2a02:8011:400e:2:6f00:88c8:c921:d332 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.55-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Thomas Gleixner commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream. The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Sebastian Sewior Cc: Anna-Maria Gleixner Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos [bwh: Backported to 3.16: - There's no next_timer field to reset - Adjust filename, context] Signed-off-by: Ben Hutchings --- kernel/hrtimer.c | 3 +++ 1 file changed, 3 insertions(+) --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -659,6 +659,7 @@ static int hrtimer_reprogram(struct hrti static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) { base->expires_next.tv64 = KTIME_MAX; + base->hang_detected = 0; base->hres_active = 0; } @@ -1680,6 +1681,7 @@ static void init_hrtimers_cpu(int cpu) timerqueue_init_head(&cpu_base->clock_base[i].active); } + cpu_base->active_bases = 0; hrtimer_init_hres(cpu_base); }