Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp658652rdb; Sat, 6 Jan 2024 04:04:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IEKxaDAB4UBcUVEe3hu9VTU0ByOE882ZMlW60EGJ4EXOgObHB8FPVJ89PbZ2o5rIzkCpdny X-Received: by 2002:a05:620a:999:b0:783:9c0:2245 with SMTP id x25-20020a05620a099900b0078309c02245mr782125qkx.110.1704542664325; Sat, 06 Jan 2024 04:04:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704542664; cv=none; d=google.com; s=arc-20160816; b=P6e3jkNEVZpV7TnzqCHm9WYXyLxh1QAZPjj0s1+OJJG6iu5jno2xgjK9CMQeYe9ZOE VbfaBbzZi/YF4G6nf+uNiNhkXJJ7hHL4A38TZRyzkqvk9QaXCxKSMDy4uIKhH7+jT5Qu gXZ0VcAcgCowd8VboU9Hah6nJnSivxcMx3EcAm1sdzG2knqZ4BGs7rmdA73/+DNMRanY 5fzLDmQO+5i+TghN/Emt4lN5IS/wHqysGpxwKyUbxpynuS5lclpRL7AlHYrs2ioOaxZP TpjVeRGJ0TMZ0ZWX1mej33Fnm1Y0P5yxkRuR/dQEEwyRKZxxnIKN7R+Dwiu8Jzha8lRt CjeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=RzEPZw6rSzxzKZPYuTbV+ezDQvshGvvO6AjUOpHX+ow=; fh=Bb44Nrk1/8WSFM/od839jFyxhnip9baMUkGlSVLZ/7M=; b=xMDFBpL6OQGKKeCZMB7vcKirLAxNeDRdOHiANY06FNW8Abd01XWUhDNpT+C1DKka9C TY5FpquAGOZa81z/2cTcygUegIIgZxKQxcBrXw8mJ3RcYnL/I+MRKT7fnhgTuBwC/C/9 qEWHHBukjvRPbZVcVVCqHxtPrUcH+xYG6T0NNL1c37beuS+6igrk99RgyT+mFzjDTKph 9gEfmkP9uYPGsGsnT7L4oFw9u7MtSpRY6jfnS04dpGqL4do/zKf1gdu4g9Acr8EJ7Ors Cth37esaluWFrS8hcO9pa0vxIAarj5cj0f1FW5AO3XGDwb1zPYoH1IgLdm1UZ5vclaTp ChAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kiGhdiYG; spf=pass (google.com: domain of linux-kernel+bounces-18561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id u4-20020a05620a454400b0078308f654c3si3598740qkp.220.2024.01.06.04.04.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Jan 2024 04:04:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-18561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kiGhdiYG; spf=pass (google.com: domain of linux-kernel+bounces-18561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 011CF1C21188 for ; Sat, 6 Jan 2024 12:04:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2E27E6FCA; Sat, 6 Jan 2024 12:04:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kiGhdiYG" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 432C26FA1 for ; Sat, 6 Jan 2024 12:04:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A580EC433C8; Sat, 6 Jan 2024 12:04:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704542655; bh=PZMVTdiXRx8xTYwPS5q4JHkR6cGOewyIuP/6PhphI/M=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=kiGhdiYGImGuOSP86sFhnuYDqkFOpRUTP1xYzkuxSNYxdeAaqq4sJJYHPTTjFlN/L uMicvscCEDFaT5F+CKB/ScrZuK+/44saiqNYm4+nXXNMjVQuqGcka4uFhel8ejSlmT 8gjjJ/1rclhz1mMrjOeFld/7SpZeSjS0njPZRqKnQHHLyxiyvVlmbKYjivIgNECXE0 BUPiH2fMZ5ni/JFSOaimhhg75JUSUh6uB3smHGBrREy40dKQT6JxgrrdLtlMd82wuQ 7AGk8iEiYQRdPVYn14TiGDrnOj4dQAoAyJ+xOyoZqiZClVGpfiiPwuu5vrzfJ90Uhl mESNnH7RUUvAQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 42F04CE0EDC; Sat, 6 Jan 2024 04:04:15 -0800 (PST) Date: Sat, 6 Jan 2024 04:04:15 -0800 From: "Paul E. McKenney" To: Feng Tang Cc: Jiri Wiesner , linux-kernel@vger.kernel.org, John Stultz , Thomas Gleixner , Stephen Boyd , rui.zhang@intel.com Subject: Re: [PATCH] clocksource: Skip watchdog check for large watchdog intervals Message-ID: <14611f96-af33-456d-9a39-49970fd60ee8@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20240103112113.GA6108@incl> <5b8fd9ba-1622-4ec7-b3cc-2db3a78122f1@paulmck-laptop> <20240104163050.GC3303@incl> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sat, Jan 06, 2024 at 10:55:09AM +0800, Feng Tang wrote: > On Thu, Jan 04, 2024 at 11:19:56AM -0800, Paul E. McKenney wrote: > > On Thu, Jan 04, 2024 at 05:30:50PM +0100, Jiri Wiesner wrote: > > > On Wed, Jan 03, 2024 at 02:08:08PM -0800, Paul E. McKenney wrote: > > > > I believe that there were concerns about a similar approach in the case > > > > where the jiffies counter is the clocksource > > > > > > I ran a few simple tests on a 2 NUMA node Intel machine and found nothing > > > so far. I tried booting with clocksource=jiffies and I changed the > > > "nr_online_nodes <= 4" check in tsc_clocksource_as_watchdog() to enable > > > the watchdog on my machine. I have a debugging module that monitors > > > clocksource and watchdog reads in clocksource_watchdog() with kprobes. I > > > see the cs/wd reads executed roughly every 0.5 second, as expected. When > > > the machine is idle the average watchdog interval is 501.61 milliseconds > > > (+-15.57 ms, with a minimum of 477.07 ms and a maximum of 517.93 ms). The > > > result is similar when the CPUs of the machine are fully saturated with > > > netperf processes. I also tried booting with clocksource=jiffies and > > > tsc=watchdog. The watchdog interval was similar to the previous test. > > > > > > AFAIK, the jiffies clocksource does get checked by the watchdog itself. > > > And with that, I have run out of ideas. > > > > If I recall correctly (ha!), the concern was that with the jiffies as > > clocksource, we would be using jiffies (via timers) to check jiffies > > (the clocksource), and that this could cause issues if the jiffies got > > behind, then suddenly updated while the clocksource watchdog was running. > > Yes, we also met problem when 'jiffies' was used as clocksource/watchdog, > but don't know if it's the same problem you mentioned. Our problem > ('jiffies' as watchdog marks clocksource TSC as unstable) only happens > in early boot phase with serial earlyprintk enabled, that the updating > of 'jiffies' relies on HW timer's periodic interrupt, but early printk > will disable interrupt during printing and cause some timer interrupts > lost, and hence big lagging in 'jiffies'. Rui once proposed a patch to > prevent 'jiffies' from being a watchdog due to it unreliability [1]. > > And I think skipping the watchdog check one time when detecting some > abnormal condition won't hurt the overall check much. Works for me! Thanx, Paul > [1]. https://lore.kernel.org/lkml/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ > > Thanks, > Feng > > > Thoughts? > > > > Thanx, Paul