Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3098386yba; Sat, 18 May 2019 09:16:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwBdyzSnQGi5WscRQ4G7vKbDqPp89PoMAzQMZVzNDXux3VLgpMYktQVKC6x5ztDQvwI9WCy X-Received: by 2002:a65:4243:: with SMTP id d3mr53542486pgq.57.1558196161643; Sat, 18 May 2019 09:16:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558196161; cv=none; d=google.com; s=arc-20160816; b=VRuZGtfRreFhclisbIL2wpzyB9eIcWwmJ3068PrVynxc9tUwwL8COwb+AMOB7gT7O9 jylTQUZ1ps7WRQ7BCRxSPmxMkNVnsfkz72BvSh9I/kdXqjOWyaij8I8VJqYUtKKyL6Ck eVLyBTYZ/uly8McIbuoJl19pePzKHPEU8yW+3B8jBgbpPfCfAo+SKqQ3Sida2tA1byJq bNcVJEglVknRGp4QxedkEt1KJ7P5V6071UWTX1pQwE0aBu4mR4j/ZvW34ivxV2qv5yeb IuC78FGVtttAlIiGXZ3yiVnEA0fqQaeON9UQCvwU7RaXOHqikBMrqjGfGez/EBXlV8kb i44A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=wXHK/yFYTHMntztq5ciLlsveUacwnz4ToVDpBqNiP6g=; b=CqmFs7FtQUUq/bhF8C6a7tmMR17PyAJ0cFTLzOrVuYL55sONKJ+EQR6YScP3QDFIYt ghn7/lGYuNKQD83e+nLJwU3iVrEGzOOcc6zXEXLdijt+HWY+9vrW84JZSpLdy4r1SkaP L77iK+Gq7sjGbAgROaLOYMuFk7oC9LYdrovJsXhMEWdngROXBLFDEDqvgs8dNgJsFC2K ywgD9KIQuYSqt4FLJkzQuAcjFYMWglCRJ676KONfK6AiZndv6DQLmbya4PQ7XEaVKeDY SpIPY6Vgd8UaGBnAwUQtFqyM860WKxkOv6O0/ktVzhKGfYGYub8YpphGogENsO/UEO9B tIVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p7si11722401pgi.276.2019.05.18.09.15.20; Sat, 18 May 2019 09:16:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729465AbfEROPk (ORCPT + 99 others); Sat, 18 May 2019 10:15:40 -0400 Received: from [49.216.8.140] ([49.216.8.140]:55278 "EHLO E6440.gar.corp.intel.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729440AbfEROPk (ORCPT ); Sat, 18 May 2019 10:15:40 -0400 X-Greylist: delayed 331 seconds by postgrey-1.27 at vger.kernel.org; Sat, 18 May 2019 10:15:40 EDT Received: from E6440.gar.corp.intel.com (localhost [127.0.0.1]) by E6440.gar.corp.intel.com (Postfix) with ESMTP id 3858DC023A; Sat, 18 May 2019 22:10:08 +0800 (CST) From: Harry Pan To: LKML Cc: gs0622@gmail.com, Harry Pan , Stephen Boyd , Thomas Gleixner , John Stultz Subject: [PATCH v2] clocksource: Untrust the clocksource watchdog when its interval is too small Date: Sat, 18 May 2019 22:10:05 +0800 Message-Id: <20190518141005.1132-1-harry.pan@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190516090651.1396-1-harry.pan@intel.com> References: <20190516090651.1396-1-harry.pan@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch performs a sanity check on the deviation of the clocksource watchdog, target to reduce false alarm that incorrectly marks current clocksource unstable when there comes discrepancy. Say if there is a discrepancy between the current clocksource and watchdog, validate the watchdog deviation first, if its interval is too small against the expected timer interval, we shall trust the current clocksource. It is identified on some Coffee Lake platform w/ PC10 allowed, when the CPU entered and exited from PC10 (the residency counter is increased), the HPET generates timestamp delay, this causes discrepancy making kernel incorrectly untrust the current clocksource (TSC in this case) and re-select the next clocksource which is the problematic HPET, this eventually causes a user sensible wall clock delay. The HPET timestamp delay shall be tackled in firmware domain in order to properly handle the timer offload between XTAL and RTC when it enters PC10, while this patch is a mitigation to reduce the false alarm of clocksource unstable regardless what clocksources are paired. v2: fix resource leak: the locked watchdog_lock Link: https://bugzilla.kernel.org/show_bug.cgi?id=203183 Signed-off-by: Harry Pan --- kernel/time/clocksource.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3bcc19ceb073..090d937d5ec4 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -96,6 +96,7 @@ static u64 suspend_start; #ifdef CONFIG_CLOCKSOURCE_WATCHDOG static void clocksource_watchdog_work(struct work_struct *work); static void clocksource_select(void); +static void clocksource_dequeue_watchdog(struct clocksource *cs); static LIST_HEAD(watchdog_list); static struct clocksource *watchdog; @@ -236,6 +237,12 @@ static void clocksource_watchdog(struct timer_list *unused) /* Check the deviation from the watchdog clocksource. */ if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) { + if (wd_nsec < jiffies_to_nsecs(WATCHDOG_INTERVAL) - WATCHDOG_THRESHOLD) { + pr_err("Stop timekeeping watchdog '%s' because expected interval is too small in %lld ns only\n", + watchdog->name, wd_nsec); + clocksource_dequeue_watchdog(cs); + goto out; + } pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n", smp_processor_id(), cs->name); pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n", -- 2.20.1