Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2635263pxj; Mon, 14 Jun 2021 03:38:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzD9vGxHbzD2+IQ+fElUX5EfuI74IuWBoWZsDjUWf+p8XGKoqAyuhjMk39Vmax21Jci8sSi X-Received: by 2002:a17:906:5285:: with SMTP id c5mr14348792ejm.282.1623667110770; Mon, 14 Jun 2021 03:38:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623667110; cv=none; d=google.com; s=arc-20160816; b=FG0R8lukoP6w8Pe7ID35+6mRCjqOiSXr0yP9aIKPG0JhP4IKl59tG+2rBMWqQJ0hYr 2+pJ5J6XniOsozI/huMV7YtFMMt8RgtXvo240AdRR3Lri7uGPh3t90mH1XP9Bsk1D2hn uOcM+UAICHaxq0JX7DgnNr4KVf11nj7KAhg3hiAlyZ4VpBJek1YmfmlYNwCD7WCAUbSK +lDOt512DksWeWAEDnCUMB1oTFRI9VD65S+2dLQ6vzpzM+tHhNBLf80NVvzgffxmBEHh ACFv86zXZ7wHSoQ3DCPg7kfl2M/wuXpY4jpb4g3i5dGsF9njf+/e5VefOY6lIvGsQcKi vrfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=uPEAuxEOQizYDQkoK4kcfDgY1jDfTcb+jrLGQHmn//g=; b=SPLMDV32MnUNjB52N84CSYJMorsiuzCAl/ocmGOXbRPk9uOOs0vynJyYcXv8qUYzOg jEyc+tl6CEdV9RWgt5bRT5OFfjE3u46pI4k2NfAPqgTlw3TIBMEocW54t7JyDc65EUk1 RgHGpIg/bnJ1w/CJSVnaS/bF7cT3JuyQ4xmx51IhSDTwteBBwQFmKqBfImpUw+Ecndwx nX2rn0Y3Sgp0OKRSxLDea8MY80rvvrHqI/8VbjqQ0cFee1DtB6DY9PUMeXzq3cKmGGVu RXxs2Z/g0U1aSjFSzeEZ8yaM+UmlGahejC1gjbYhJ4rz9GDh5Kp+XDsMyYaz2gzbJXPv QktQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="0zqq+6/V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rv25si11274473ejb.507.2021.06.14.03.38.07; Mon, 14 Jun 2021 03:38:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="0zqq+6/V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233556AbhFNKf6 (ORCPT + 99 others); Mon, 14 Jun 2021 06:35:58 -0400 Received: from mail.kernel.org ([198.145.29.99]:40542 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233293AbhFNKde (ORCPT ); Mon, 14 Jun 2021 06:33:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 734586120E; Mon, 14 Jun 2021 10:31:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1623666691; bh=lyi+/9dYeef39pewHEjqb0lVxSDEswyJcOGb5KzVVOA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0zqq+6/VGSuU2Zinr8nyRh3Hdd7IL9oeH9SG52mgVwlYAkQ7dHgZtYaq5zo3+kwd8 0vmXuZJ9f3I/GyKnu+d/XHuCBXZH8E0XDMC9r4S17k2/rG6j81cR9Cac3D6UERFIkb TpZAE9aeyoH3/LQevIrQeX6pK0O/Gg6FTllg12NQ= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sergey Senozhatsky , Tejun Heo , Sasha Levin Subject: [PATCH 4.9 09/42] wq: handle VM suspension in stall detection Date: Mon, 14 Jun 2021 12:27:00 +0200 Message-Id: <20210614102643.002172619@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210614102642.700712386@linuxfoundation.org> References: <20210614102642.700712386@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sergey Senozhatsky [ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ] If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin --- kernel/workqueue.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 3231088afd73..a410d5827a73 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -5387,6 +5388,7 @@ static void wq_watchdog_timer_fn(unsigned long data) { unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ; bool lockup_detected = false; + unsigned long now = jiffies; struct worker_pool *pool; int pi; @@ -5401,6 +5403,12 @@ static void wq_watchdog_timer_fn(unsigned long data) if (list_empty(&pool->worklist)) continue; + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like a stall. + */ + kvm_check_and_clear_guest_paused(); + /* get the latest of pool and touched timestamps */ pool_ts = READ_ONCE(pool->watchdog_ts); touched = READ_ONCE(wq_watchdog_touched); @@ -5419,12 +5427,12 @@ static void wq_watchdog_timer_fn(unsigned long data) } /* did we stall? */ - if (time_after(jiffies, ts + thresh)) { + if (time_after(now, ts + thresh)) { lockup_detected = true; pr_emerg("BUG: workqueue lockup - pool"); pr_cont_pool_info(pool); pr_cont(" stuck for %us!\n", - jiffies_to_msecs(jiffies - pool_ts) / 1000); + jiffies_to_msecs(now - pool_ts) / 1000); } } -- 2.30.2