Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp2707813pxv; Sun, 11 Jul 2021 23:22:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRSnWIv5IV/mN//cfxW/byEGgB6iCINwxLbMHLJTFtYpj6SuPaEYR0qy5dOdSRrEEGWJoE X-Received: by 2002:a05:6638:1316:: with SMTP id r22mr43346941jad.89.1626070961775; Sun, 11 Jul 2021 23:22:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626070961; cv=none; d=google.com; s=arc-20160816; b=ms2fkqRaieYZqtTn3gZ1M9AeWrm77dk8wGUQZ8nL+qfpQrf6iJOL0/FhFRvLme0V1l qbGE8DNZlhDTPLJngFTgO+oyBMNr8/nka47J0DXRSQtmJkFewbgdDgez5LMQPeBoyQ+A C/UzJkiQjDGkph6VuBCjlSn+nuYEioWVqjCw5Uw3+6/1voubEShEk4nkxYkiHYZJTKdS dWKzEwsHwFyVYGD7e5tAGEedHwnKk/LLlvV+GBCxraqZiZsB8v2bHe7roYsK2CnJ7LiE e4L0iGv33bIiDvCIcRsqmn5rfukWhRmCI4wydtTVMciCwakdPpQLhUp9Wl1QJ56+SYcS qCcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=u6Cts1DFjcckBPvS4JJEIdoIOZA5o+5V4Z/X1AbfJ0s=; b=gd/tVwJyR25OWB+gOsJkp8lZ7pxNQMnW7Q7BmK3dyCSq997nphObK20Ti4Iy3/VRPm lsW3Mg7WToNKEDimbo/Hgp3xxFb2nfBycZf4FGcs7/NhcIg9dTXgNFY0idmKwGrH7nAz 8j+CPmbdgkWzapQTlfNmEZAyP/vH/w9Vm8hhtmm8kkV9sf3bNUa8tOTae2EfKtV0X+GK LmNop97HtO5rly57k4sUgLwfuFcl78/a4/XChUlq9LiE+8uIPbsBjsdmPTSk1c6tNmO+ +J/77SWp4LF0UXyF7iRMbr42/eLFaqRPQGzeiyoYlmlkvQKCNVNd4zgD5hNmLlXqDW5e yE3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=PkmIUI3i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u26si4392386jam.87.2021.07.11.23.22.30; Sun, 11 Jul 2021 23:22:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=PkmIUI3i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232717AbhGLGXi (ORCPT + 99 others); Mon, 12 Jul 2021 02:23:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:40098 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234398AbhGLGW3 (ORCPT ); Mon, 12 Jul 2021 02:22:29 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7D7D26113E; Mon, 12 Jul 2021 06:19:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1626070762; bh=yvNRTY8dx/rRiSFF4wbIfn7uiJmpmS+18keA0sMRdVs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PkmIUI3iQ0Xku/V4RcUdqyizmjHFobKtxi3HYhMavn7uPtvUmu7dd89qOxSjj9afs b3oiVQ4gsstYhh4ffMRnX/NqVawLqj9nd0g5sVKBySjMQIqVcFEfEdeCMvLUT273Cf oanTAedEFNgsoAeraeslZwKwUM/qaI5I3bXqPIGA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Chris Mason , "Paul E. McKenney" , Thomas Gleixner , Feng Tang , Sasha Levin Subject: [PATCH 5.4 130/348] clocksource: Retry clock read if long delays detected Date: Mon, 12 Jul 2021 08:08:34 +0200 Message-Id: <20210712060718.571489048@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210712060659.886176320@linuxfoundation.org> References: <20210712060659.886176320@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Paul E. McKenney [ Upstream commit db3a34e17433de2390eb80d436970edcebd0ca3e ] When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. Therefore, re-read the watchdog clock on either side of the read from the clock under test. If the watchdog clock shows an excessive time delta between its pair of reads, the reads are retried. The maximum number of retries is specified by a new kernel boot parameter clocksource.max_cswd_read_retries, which defaults to three, that is, up to four reads, one initial and up to three retries. If more than one retry was required, a message is printed on the console (the occasional single retry is expected behavior, especially in guest OSes). If the maximum number of retries is exceeded, the clock under test will be marked unstable. However, the probability of this happening due to various sorts of delays is quite small. In addition, the reason (clock-read delays) for the unstable marking will be apparent. Reported-by: Chris Mason Signed-off-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Acked-by: Feng Tang Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org Signed-off-by: Sasha Levin --- .../admin-guide/kernel-parameters.txt | 6 +++ kernel/time/clocksource.c | 53 ++++++++++++++++--- 2 files changed, 53 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a19ae163c058..dbb68067ba4e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -567,6 +567,12 @@ loops can be debugged more effectively on production systems. + clocksource.max_cswd_read_retries= [KNL] + Number of clocksource_watchdog() retries due to + external delays before the clock will be marked + unstable. Defaults to three retries, that is, + four attempts to read the clock under test. + clearcpuid=BITNUM[,BITNUM...] [X86] Disable CPUID feature X for the kernel. See arch/x86/include/asm/cpufeatures.h for the valid bit diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 428beb69426a..6863a054c970 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -124,6 +124,13 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating); #define WATCHDOG_INTERVAL (HZ >> 1) #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4) +/* + * Maximum permissible delay between two readouts of the watchdog + * clocksource surrounding a read of the clocksource being validated. + * This delay could be due to SMIs, NMIs, or to VCPU preemptions. + */ +#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC) + static void clocksource_watchdog_work(struct work_struct *work) { /* @@ -184,12 +191,45 @@ void clocksource_mark_unstable(struct clocksource *cs) spin_unlock_irqrestore(&watchdog_lock, flags); } +static ulong max_cswd_read_retries = 3; +module_param(max_cswd_read_retries, ulong, 0644); + +static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow) +{ + unsigned int nretries; + u64 wd_end, wd_delta; + int64_t wd_delay; + + for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) { + local_irq_disable(); + *wdnow = watchdog->read(watchdog); + *csnow = cs->read(cs); + wd_end = watchdog->read(watchdog); + local_irq_enable(); + + wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask); + wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, + watchdog->shift); + if (wd_delay <= WATCHDOG_MAX_SKEW) { + if (nretries > 1 || nretries >= max_cswd_read_retries) { + pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n", + smp_processor_id(), watchdog->name, nretries); + } + return true; + } + } + + pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", + smp_processor_id(), watchdog->name, wd_delay, nretries); + return false; +} + static void clocksource_watchdog(struct timer_list *unused) { - struct clocksource *cs; u64 csnow, wdnow, cslast, wdlast, delta; - int64_t wd_nsec, cs_nsec; int next_cpu, reset_pending; + int64_t wd_nsec, cs_nsec; + struct clocksource *cs; spin_lock(&watchdog_lock); if (!watchdog_running) @@ -206,10 +246,11 @@ static void clocksource_watchdog(struct timer_list *unused) continue; } - local_irq_disable(); - csnow = cs->read(cs); - wdnow = watchdog->read(watchdog); - local_irq_enable(); + if (!cs_watchdog_read(cs, &csnow, &wdnow)) { + /* Clock readout unreliable, so give it up. */ + __clocksource_unstable(cs); + continue; + } /* Clocksource initialized ? */ if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || -- 2.30.2