Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp923pxv; Wed, 21 Jul 2021 13:52:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2Z5v5zi8JijCWjf/JoUBZ0A1Dzx/9wFCg1XV1rP25pwZjRJQvdMEXuDthYSRB7PE5yIsK X-Received: by 2002:a17:907:aa7:: with SMTP id bz7mr39350257ejc.396.1626900723997; Wed, 21 Jul 2021 13:52:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626900723; cv=none; d=google.com; s=arc-20160816; b=Iybj7t0VKTY2C5MFHeJ0Ctj3nQGWNuUyCrb61ap9Ku9Fa+Yeyx/FFHIzqPUgjw7Tww 63o6Dzjp38U1mhDNxSEJGtXq38g8k5lxldMEDCc5D+vclWFIbkGOW+INMgCEGFuhZEbh 1PxEoqftqfLcWnIoFwLjlUGliDdRgQ7/tw2nBw7xD+9rGlkW/tQ2Bg+ZMl7z0oCIpqpo ACIrEkCEbUtlU7XfM+rp98kDB+zNM8ECV1cGadUcZLL2op/MaIgFYyP97elqy/44V5LW Omw6SDGYL3rnduuNmK99xig0wRvb0pIwCouAZIznTAcabv0ew74WfsG4WHg8Rq1o9f/A BCmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JXQV4YQedcb/8pjZEFi5fMzc82oeT1dgjeGgkK/eVZk=; b=lrAG6uQc5tmXslHH60z3fHsGMWIr9tecnhaz/rpqzPE2LLT4wmK1cQDegoy9pClqJW zZr2icJlOwl+OHCiF3ZM8X41PXrZO3qyvnLgIq1K1CysgPijS4eB+QsC077tqQnIKAl4 C3AVOXjC6nw0lntmCYHXLQrZynLB6Uzf/Hd9PWxRicXQan7bqhVdVFHG+vpjfQdvXFQZ /DGfUGUs/kUNMDLerUklG7u0T3jeRZEzj5c03+FDKjeRJMSfzKPXZ0nBoU8uSyUV4SnJ S71NSBKG1m194gZcm3KYsLmYQR5m0vbCDdyzb9a1onawf5d8I83wnnfHqUg5hqRxziVY yAGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Nge/SQ/H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j2si27784466ejo.509.2021.07.21.13.51.40; Wed, 21 Jul 2021 13:52:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Nge/SQ/H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240552AbhGUTlg (ORCPT + 99 others); Wed, 21 Jul 2021 15:41:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:50100 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236971AbhGUTkx (ORCPT ); Wed, 21 Jul 2021 15:40:53 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9DED461264; Wed, 21 Jul 2021 20:21:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626898889; bh=Vh5Nv9OTWJlthT29pNs4NyRuESPztLsFKzhNfM5hp1s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Nge/SQ/HjEFLeVJLJ7sTAV/oTl4Fmhlkc1CzM5sIhTveEnNJPUVjFHV+zd9btCqCq mhvFQny9AACo+YIM2V3O/i+wI83DpgHJUQcoGzan1IrByIDvBT9R/ixXtWnGBdU6MQ veJ8+mdwJDowcoKdOIawQDa8kFqCpdea0LPGvUQELbpu5KNTZDT6rPC2tZnt5tgHOL NeiPEsPb9d6yFkxL6cyLv6mBag0isvxFfsP09TPyFXyuL1wj5YrYsx9PJpBdtL5Qv7 90nZchJ39zS6XFQAgCT2HzxHEOdqQvsu5oQKr589yJtG4f2GH96xZK+MwaOJZQLYzf UsLxIkWFEFGcQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 215005C0D47; Wed, 21 Jul 2021 13:21:29 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, mingo@kernel.org, jiangshanlai@gmail.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org, Sergey Senozhatsky , "Signed-off-by : Paul E . McKenney" Subject: [PATCH rcu 07/18] rcu/tree: Handle VM stoppage in stall detection Date: Wed, 21 Jul 2021 13:21:15 -0700 Message-Id: <20210721202127.2129660-7-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20210721202042.GA1472052@paulmck-ThinkPad-P17-Gen-1> References: <20210721202042.GA1472052@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sergey Senozhatsky The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky Signed-off-by: Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_stall.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index bd4de5bc5807e..0e7a60706d1c0 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -7,6 +7,8 @@ * Author: Paul E. McKenney */ +#include + ////////////////////////////////////////////////////////////////////////////// // // Controlling CPU stall warnings, including delay calculation. @@ -698,6 +700,14 @@ static void check_cpu_stall(struct rcu_data *rdp) (READ_ONCE(rnp->qsmask) & rdp->grpmask) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* We haven't checked in, so go dump stack. */ print_cpu_stall(gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) @@ -707,6 +717,14 @@ static void check_cpu_stall(struct rcu_data *rdp) ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* They had a few time units to dump stack, so complain. */ print_other_cpu_stall(gs2, gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) -- 2.31.1.189.g2e36527f23