Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2637079pxj; Mon, 14 Jun 2021 03:41:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxU1SLJEvQ7qHEzIN9RGpOjJGJ1WF1dKgXK8NKDxgPVkZ/egHeF8Pg2DTnstHYInYdLRVM3 X-Received: by 2002:a17:906:7d8d:: with SMTP id v13mr14466235ejo.2.1623667278326; Mon, 14 Jun 2021 03:41:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623667278; cv=none; d=google.com; s=arc-20160816; b=yOBb2Wm1dmPXm5J3J3znUzxYwmNwHDmS52maj2fxD6AfqtNILr4cjU1yo7HwHT+jOb rOmJW9F4hVKS4cMt+hjEhqze5GooVYqa+P0jrospl7WvyCYuYS7GQ2Outfd0+UQiAnDZ hh6zMkk1PKYqxlFhreYOYOp9ZrFKJ7Opm7BDlDxvWUV0TpPjAlXqe9u2wPXq3PTHqRtp vDl+VIk6NlaOIbCig1uhJKvqaofCNKX86NSzU+O2fzKmtBRE3K+KbrjMQlzYPZjwMj+u rm7PIE0ueKz4iw1tnsdhz3CYNVuP/J+qpDI7w5LCNvqwxWNqNOtsTT0WXR3GuOZD7OLw Zy1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=WQxrIi7qXTElq6DMp/PxI4dI5oJcDevKPfrIzBdELJg=; b=dxu79XVb+JJ/iGQO7ap/NA+QlMTCS8h0Gqu1AV2kzQFoWIJqeACELxHqjr5SfMcwWy ThSg3i8yImwK8DHqTZRvS6W2Qti2PNT1yvrcr3bM1dHk6Md0hp2EyLKtSukU/Wl0g2bE tMF2Gz0Wgu6atEyT4XJ+5lyFzdxqtg1ok3BdgWEKMUUMk/yJSVOPeFTIjhoMsgkJH0XB vbfHjZJperez5e34k9WTVyUvWI9UyTuUPwoG6Ruv3qo8sfHzAZxuKFrWVVsWKUv4hZjT nGGHwef1vXyBZN3vN100uGz3Yucry2eq5HQg39VdApDUCGQAGYeT8Tm+2VB/17mmFQsj OPjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Q0ytpXxd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u8si11280187edv.293.2021.06.14.03.40.56; Mon, 14 Jun 2021 03:41:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Q0ytpXxd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233639AbhFNKim (ORCPT + 99 others); Mon, 14 Jun 2021 06:38:42 -0400 Received: from mail.kernel.org ([198.145.29.99]:40036 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233175AbhFNKf5 (ORCPT ); Mon, 14 Jun 2021 06:35:57 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C67FC61407; Mon, 14 Jun 2021 10:32:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1623666774; bh=yCIrM2YCEREfws3zQzUwYiMb4cPdjUcMGWaBVJu7oZU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q0ytpXxdpVTSij9Ot1XfJB011zjIac+kHja5zBceU3kZZImp9xx/yXfH8TnKG/eTv TtdB8BmBye30KEKagVIS9K1PHNZxArYN85jr3Wz7ZEHQd8+RCr07ajtCP8dyhDPNWU 1CHxp4nULTP8jz10EgGhpBKpbqLi5RMzTCL8SrkI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sergey Senozhatsky , Tejun Heo , Sasha Levin Subject: [PATCH 4.14 09/49] wq: handle VM suspension in stall detection Date: Mon, 14 Jun 2021 12:27:02 +0200 Message-Id: <20210614102642.177540897@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210614102641.857724541@linuxfoundation.org> References: <20210614102641.857724541@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sergey Senozhatsky [ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ] If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin --- kernel/workqueue.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bc32ed4a4cf3..58e7eefe4dbf 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -5465,6 +5466,7 @@ static void wq_watchdog_timer_fn(unsigned long data) { unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ; bool lockup_detected = false; + unsigned long now = jiffies; struct worker_pool *pool; int pi; @@ -5479,6 +5481,12 @@ static void wq_watchdog_timer_fn(unsigned long data) if (list_empty(&pool->worklist)) continue; + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like a stall. + */ + kvm_check_and_clear_guest_paused(); + /* get the latest of pool and touched timestamps */ pool_ts = READ_ONCE(pool->watchdog_ts); touched = READ_ONCE(wq_watchdog_touched); @@ -5497,12 +5505,12 @@ static void wq_watchdog_timer_fn(unsigned long data) } /* did we stall? */ - if (time_after(jiffies, ts + thresh)) { + if (time_after(now, ts + thresh)) { lockup_detected = true; pr_emerg("BUG: workqueue lockup - pool"); pr_cont_pool_info(pool); pr_cont(" stuck for %us!\n", - jiffies_to_msecs(jiffies - pool_ts) / 1000); + jiffies_to_msecs(now - pool_ts) / 1000); } } -- 2.30.2