Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp59911pxb; Mon, 13 Sep 2021 12:55:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxbBHkKtCm294eeiZbVdcDUoOKxHWs5qygDs77WwzcAeulSr1B6X0PYsM8l3fYu3Rp65XIk X-Received: by 2002:a05:6e02:2165:: with SMTP id s5mr9417383ilv.274.1631562947239; Mon, 13 Sep 2021 12:55:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631562947; cv=none; d=google.com; s=arc-20160816; b=MMN+kVOXt9bRLd4oWoFL0Qvha7ko8pCe5GO1PyuJJrhYPiyL11fd0JysCfoWFE8nYV 8z5vkijJosK7uopfLYCclGUfyyhSU0omDR+aGeK7ssEk4Vrjwy43eMRcAuMFPTuHDCxH 99rO6W/a0UdLpPOa/IqCmjua1ipAgzFBx0rqZ0zSXbI5REsZBPaHMPZ11JG/kz4KXGUb /HmAkxoybfG1R95B5+le7w8ukkYlp6xWv1nIu2rnragjPYSyXEuVdfkgLKsVrs+GfoRw iy1660HKuEe4D2YNbYhoL5ufq9qN1pt/VbhRdEb9WA2IM41WmYDr7bx924P/2+WbIMij SToA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=YQIFptRUQ164bDjlutGA8Or8rKnaQO4Hf4lyqDCzE6M=; b=z2qOlVZflzKDR5T2JoiY03g3hH2sXIqYD+q8Pn7Xx1xhB7vVgT/sRruWf3ShJzrog2 fqhgUGd754ivb23mqhAzYvO2sFSPj8lgALCZWCgbqGnqYeFs3Up3L6IZ+dXfQ4ZRjSUZ Nf05najxlyVX6xTqvxdnurB7uAHctx6T6ecHeZosVFiVaRKhXQrXUq/j/8aSJGdpzR8B iCogbPc/LeWlOgLld6aMQczwjTeDzOeG243zuGYsSuVSvcnJmYAfDfz/jVh5Jmg6BeWv vyB30wrpiUxRvDIcqZnmaZoA0Z4VByz34LDz6V5OtFRCfQuuwyRMAzBlKQxaB8rKL+Ik woVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=uMehIvXV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z17si7491374ils.47.2021.09.13.12.55.35; Mon, 13 Sep 2021 12:55:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=uMehIvXV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242883AbhIMNe2 (ORCPT + 99 others); Mon, 13 Sep 2021 09:34:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:34834 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240128AbhIMNT4 (ORCPT ); Mon, 13 Sep 2021 09:19:56 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 27AE2610F7; Mon, 13 Sep 2021 13:17:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1631539059; bh=BjhENQxqVTskBNpm5ddGaa50+zZRbEYuT0YBPNjWtJ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uMehIvXVBKkUwVPmPlvEEH0nowYv8FWHcj/fadr/t67uvawXIN2N0hT5KdOw65xSG 3DKj72swpwJAe4+aJBc/JeBV5PShDtx3vo7AGGR3Pl1uTGv/9UPjbp5LzBEbfrEZSD uIrpys7mL3RNfQFVWzRp/ZmBIr68vx9HOHN0jSp8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sergey Senozhatsky , "Signed-off-by: Paul E. McKenney" , Sasha Levin Subject: [PATCH 5.4 008/144] rcu/tree: Handle VM stoppage in stall detection Date: Mon, 13 Sep 2021 15:13:09 +0200 Message-Id: <20210913131048.247694521@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913131047.974309396@linuxfoundation.org> References: <20210913131047.974309396@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sergey Senozhatsky [ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ] The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky Signed-off-by: Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree_stall.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index c0b8c458d8a6..b8c9744ad595 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -7,6 +7,8 @@ * Author: Paul E. McKenney */ +#include + ////////////////////////////////////////////////////////////////////////////// // // Controlling CPU stall warnings, including delay calculation. @@ -525,6 +527,14 @@ static void check_cpu_stall(struct rcu_data *rdp) (READ_ONCE(rnp->qsmask) & rdp->grpmask) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* We haven't checked in, so go dump stack. */ print_cpu_stall(); if (rcu_cpu_stall_ftrace_dump) @@ -534,6 +544,14 @@ static void check_cpu_stall(struct rcu_data *rdp) ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* They had a few time units to dump stack, so complain. */ print_other_cpu_stall(gs2); if (rcu_cpu_stall_ftrace_dump) -- 2.30.2