Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp186278rwl; Wed, 4 Jan 2023 17:29:28 -0800 (PST) X-Google-Smtp-Source: AMrXdXtsb1ryeuerbsfx/WBsYpIrSrm736sgMgByTjvdQ5g8pCXZI+ZKhDmjxnbiN4Xirtb7TIut X-Received: by 2002:a17:906:6d8:b0:844:79b1:ab36 with SMTP id v24-20020a17090606d800b0084479b1ab36mr43669813ejb.25.1672882168465; Wed, 04 Jan 2023 17:29:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672882168; cv=none; d=google.com; s=arc-20160816; b=SWlW1UyioeyMHmJPSsSMxzKVNwd9ch1loV12RaDDimav5jp8GXwFTj4Mg1Z9DSsJlk pCLFacjElKTWYSxuXCoBwhAWjib+NlPCcSwfK3rfnxplmPCKQ5mPhVjveeQKik521p9n Ol+xxwnXXQ0JAMdnH23AdyxSeNWyNNdkfw34jZqSA+IV3Pjryimy4WtkYLEJepVGMa30 scqlQye6H4YWRo5VhI/z3itg/o2XZxIrfV9zg9xVKj13Sp6CFcV7ZfUGqrntGuXnXywT gvRPhyHWKMHSxmetY9W02uZyJR6I24ej2gePxHWkO19Cj/6RkuvkgOPSd3S5f+zZ3yS/ 8S/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2AyDK4NDcvtUNTtF6RCfDl995m3SzxJWJC/92AqlVI4=; b=RtDjavfuB/kHlfOPvYIo3mh/d4qYEAKZ2d5pi7xpf1ufZ+rljwsszaeAIpYqnUgO53 KqdNz+huiwOU6dvPu2QiWyYu/Du+cpWR/E09tmcIM7Ks2KW7F5Spi5brkjQFYSupjAZr HQtiqGQuCe+quOa7jelW4tWxolBJCX+IpXBMjU4/xoJal+NDA2+TF85msSKAn2GTSd40 xpODw3+d41pkvrcuG7kAvPjn7o/Ao8PeZEFErezdDgqNRVnfSz2WYInn6ROIZ4uSEyMU UjTq45JltaEV7M/DcfuoDEs6vJ8CLMnGhnV36Oqui3ljltH+sYXqCYqUwDy3sgNW06Jy H6rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nKZYdpk4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb7-20020a170907160700b007c60482a110si34615257ejc.625.2023.01.04.17.29.15; Wed, 04 Jan 2023 17:29:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nKZYdpk4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbjAEBHU (ORCPT + 56 others); Wed, 4 Jan 2023 20:07:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229978AbjAEBHG (ORCPT ); Wed, 4 Jan 2023 20:07:06 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B63302F7B5 for ; Wed, 4 Jan 2023 17:07:05 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 68230B81986 for ; Thu, 5 Jan 2023 01:07:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A0E3C43396; Thu, 5 Jan 2023 01:07:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672880823; bh=hcTQTn+b1sZvh82B5YLHkY2GewljRr5BEWwiBULEmq8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nKZYdpk4iFosWUndbKPesO16saq43HPQBKWgqd0ZxFasfORyaxeWX8oHsN7AINfAz Q/NEgFjgVN0j3K5ZD9jORuWllNjETVsSmotvuwb44S1v3Et16P8p4ls0IBGnwfomYA hYCW+TdYcLb7xe7iws8bzMcgRkYpTeSqe/HicfJDoEA/GFMdByZV3kGsYZtjY+7L9Y JFH28SlVq0gfV26lCY6/GOSBMgf6S5BWlByAyHR5fYFKteNFlth1IyPCidCrx2bDc/ uWl61E2m+8X0K4gAnOpM4pjmvNm71Fc7SkhUgWVAoYttSzbk7RARmy7Qx9BIS70rf1 JbhlatiIXeeEQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id B0B315C149B; Wed, 4 Jan 2023 17:07:02 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, Waiman Long , John Stultz , "Paul E . McKenney" Subject: [PATCH clocksource 5/6] clocksource: Suspend the watchdog temporarily when high read latency detected Date: Wed, 4 Jan 2023 17:07:00 -0800 Message-Id: <20230105010701.1773895-5-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105010429.GA1773522@paulmck-ThinkPad-P17-Gen-1> References: <20230105010429.GA1773522@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Feng Tang Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang Acked-by: Waiman Long Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Feng Tang Signed-off-by: Paul E. McKenney --- kernel/time/clocksource.c | 45 ++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index fc486cd972635..91836b727cef5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -387,6 +387,15 @@ void clocksource_verify_percpu(struct clocksource *cs) } EXPORT_SYMBOL_GPL(clocksource_verify_percpu); +static inline void clocksource_reset_watchdog(void) +{ + struct clocksource *cs; + + list_for_each_entry(cs, &watchdog_list, wd_list) + cs->flags &= ~CLOCK_SOURCE_WATCHDOG; +} + + static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; @@ -394,6 +403,7 @@ static void clocksource_watchdog(struct timer_list *unused) int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; + unsigned long extra_wait = 0; u32 md; spin_lock(&watchdog_lock); @@ -413,13 +423,30 @@ static void clocksource_watchdog(struct timer_list *unused) read_ret = cs_watchdog_read(cs, &csnow, &wdnow); - if (read_ret != WD_READ_SUCCESS) { - if (read_ret == WD_READ_UNSTABLE) - /* Clock readout unreliable, so give it up. */ - __clocksource_unstable(cs); + if (read_ret == WD_READ_UNSTABLE) { + /* Clock readout unreliable, so give it up. */ + __clocksource_unstable(cs); continue; } + /* + * When WD_READ_SKIP is returned, it means the system is likely + * under very heavy load, where the latency of reading + * watchdog/clocksource is very big, and affect the accuracy of + * watchdog check. So give system some space and suspend the + * watchdog check for 5 minutes. + */ + if (read_ret == WD_READ_SKIP) { + /* + * As the watchdog timer will be suspended, and + * cs->last could keep unchanged for 5 minutes, reset + * the counters. + */ + clocksource_reset_watchdog(); + extra_wait = HZ * 300; + break; + } + /* Clocksource initialized ? */ if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || atomic_read(&watchdog_reset_pending)) { @@ -523,7 +550,7 @@ static void clocksource_watchdog(struct timer_list *unused) * pair clocksource_stop_watchdog() clocksource_start_watchdog(). */ if (!timer_pending(&watchdog_timer)) { - watchdog_timer.expires += WATCHDOG_INTERVAL; + watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait; add_timer_on(&watchdog_timer, next_cpu); } out: @@ -548,14 +575,6 @@ static inline void clocksource_stop_watchdog(void) watchdog_running = 0; } -static inline void clocksource_reset_watchdog(void) -{ - struct clocksource *cs; - - list_for_each_entry(cs, &watchdog_list, wd_list) - cs->flags &= ~CLOCK_SOURCE_WATCHDOG; -} - static void clocksource_resume_watchdog(void) { atomic_inc(&watchdog_reset_pending); -- 2.31.1.189.g2e36527f23