Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp977367pxy; Thu, 22 Apr 2021 19:16:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwlaIzYpmav0qFzffhZK9xKIgolrzc4xTgu5NCBrRACce2yMFJeG1rVUJrMhqHwSRP26pTl X-Received: by 2002:a63:211c:: with SMTP id h28mr1559053pgh.278.1619144208318; Thu, 22 Apr 2021 19:16:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619144208; cv=none; d=google.com; s=arc-20160816; b=tTB8QpXSiNXaG7St9ikrQrbhLJbooD+S4QviREnNdpH4L+MrgIH4VINYUpDMnhFM3C R7YQPRLyGqapvdcmaqi9R6uox6/RskhqNPRQWFgGe8W4SEmZoXZyFZJQ2oKHHFmFJSQf MhgoAYlFFj8ANHGsDveBPNs1nrO0Jc8q0ZvA7Wnhv8D9BI3ytxKcfvOvXHhjnAf5BsEh VG6JVBtNJhEhfb/avL9YyvR2G0ozqqGX1sRwCX4PxRaeJM7ejpmbeWnpx6xS/zlonfM+ zJ89p6uY5Wvk7TeRZpN/fVcrppCgyOZauYlO5J/Sf+lplDCmKggzQpTOYB+jlntd75ZT DpMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=KMO8d5+rjOEbDCAlHxgLP4rsgn6jRnwTCxaZ42GrNQ0=; b=ZSvCbWKrBAH7AeYfqYOkckwGD445+qFP/MfW6ZdnF/KZfV0HIJLUtFT+5CIXwgk9Sy GJcPcOrLvLvsNhZD/5KrmBO3TGdCOBN94CNgEHvzQ3MYjsRtuwyrSTNl2+eJzRTZSDLE isVhi7Ee1y0etr6cmGsdHnnY4+Mwzi4vRc6he5AQlc2csJdJEisnKmkgk8Pw7YwSWiEO pQGU3wEzWQ2nP6L1elOA+ViVvC80oPK5xi8reDXtqBo0xHcqcvXDg0XWa3BACSXurqZt Lq6L2bf8i0SFCQ7f68sNl4fZ3dyIjcmeYg7O05QMVS70DreP8EavB65R+FOagf0vpiB4 jHXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m130si5159068pga.119.2021.04.22.19.16.35; Thu, 22 Apr 2021 19:16:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235901AbhDWCQi (ORCPT + 99 others); Thu, 22 Apr 2021 22:16:38 -0400 Received: from mga11.intel.com ([192.55.52.93]:57676 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230367AbhDWCQi (ORCPT ); Thu, 22 Apr 2021 22:16:38 -0400 IronPort-SDR: 9oFg7co7rUnv7fBoSuivU7pKNWEn2TaTKb3d914zp2oT/EP645GFnvX6ld07jISzOzZV4EyJuo pPjds5ki4KYQ== X-IronPort-AV: E=McAfee;i="6200,9189,9962"; a="192817646" X-IronPort-AV: E=Sophos;i="5.82,244,1613462400"; d="scan'208";a="192817646" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2021 19:16:02 -0700 IronPort-SDR: 3nIHzZZCasK7WPQ5/mr1vBaEn1/Pj/ss9BNYkuieE5VnGJP5QUZcS6P27zONsY9ISWfh3MAtbo JDaoBTkN3yPw== X-IronPort-AV: E=Sophos;i="5.82,244,1613462400"; d="scan'208";a="428205418" Received: from xingzhen-mobl.ccr.corp.intel.com (HELO [10.238.4.46]) ([10.238.4.46]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2021 19:15:58 -0700 Subject: Re: [LKP] Re: [clocksource] 6c52b5f3cf: stress-ng.opcode.ops_per_sec -14.4% regression To: paulmck@kernel.org, Feng Tang Cc: Thomas Gleixner , John Stultz , Stephen Boyd , Jonathan Corbet , Mark Rutland , Marc Zyngier , Andi Kleen , Chris Mason , LKML , lkp@lists.01.org, lkp@intel.com References: <20210420064934.GE31773@xsang-OptiPlex-9020> <20210420134331.GM975577@paulmck-ThinkPad-P17-Gen-1> <20210420140552.GA3158164@paulmck-ThinkPad-P17-Gen-1> <04f4752e-6c5a-8439-fe75-6363d212c7b2@intel.com> <20210421134224.GR975577@paulmck-ThinkPad-P17-Gen-1> <20210422074126.GA85095@shbuild999.sh.intel.com> <20210422142454.GD975577@paulmck-ThinkPad-P17-Gen-1> From: Xing Zhengjun Message-ID: Date: Fri, 23 Apr 2021 10:15:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <20210422142454.GD975577@paulmck-ThinkPad-P17-Gen-1> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/22/2021 10:24 PM, Paul E. McKenney wrote: > On Thu, Apr 22, 2021 at 03:41:26PM +0800, Feng Tang wrote: >> Hi Paul, >> >> On Thu, Apr 22, 2021 at 02:58:27PM +0800, Xing Zhengjun wrote: >>> >>> >>> On 4/21/2021 9:42 PM, Paul E. McKenney wrote: >>>> On Wed, Apr 21, 2021 at 02:07:19PM +0800, Xing, Zhengjun wrote: >>>>> >>>>> On 4/20/2021 10:05 PM, Paul E. McKenney wrote: >>>>>> On Tue, Apr 20, 2021 at 06:43:31AM -0700, Paul E. McKenney wrote: >>>>>>> On Tue, Apr 20, 2021 at 02:49:34PM +0800, kernel test robot wrote: >>>>>>>> Greeting, >>>>>>>> >>>>>>>> FYI, we noticed a -14.4% regression of stress-ng.opcode.ops_per_sec due to commit: >>>>>>>> >>>>>>>> >>>>>>>> commit: 6c52b5f3cfefd6e429efc4413fd25e3c394e959f ("clocksource: Reduce WATCHDOG_THRESHOLD") >>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.04.13a >>>>>>>> >>>>>>>> >>>>>>>> in testcase: stress-ng >>>>>>>> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory >>>>>>>> with following parameters: >>>>>>>> >>>>>>>> nr_threads: 10% >>>>>>>> disk: 1HDD >>>>>>>> testtime: 60s >>>>>>>> fs: ext4 >>>>>>>> class: os >>>>>>>> test: opcode >>>>>>>> cpufreq_governor: performance >>>>>>>> ucode: 0x5003006 >>>>>>> Hmmm... I will try a less-aggressive reduction. Thank you for testing! >>>>>> But wait... This code is only running twice per second. It is very >>>>>> hard to believe that a clock-read retry twice per second is worth 2% of >>>>>> performance, let alone 14.4%. >>>>>> >>>>>> Is something else perhaps going on here? >>>>>> >>>>>> For example, did this run enable any of the new diagnositic clocksource.* >>>>>> kernel parameters? >>>>>> >>>>>> Thanx, Paul >>>>> I attached the kernel log, the following logs are related with the >>>>> clocksource. >>>>> [    3.453206] clocksource: timekeeping watchdog on CPU1: Marking >>>>> clocksource 'tsc-early' as unstable because the skew is too large: >>>>> [    3.455197] clocksource:                       'hpet' wd_now: 288fcc0 >>>>> wd_last: 1a8b333 mask: ffffffff >>>>> [    3.455199] clocksource:                       'tsc-early' cs_now: >>>>> 1def309ebfdee cs_last: 1def2bd70d92c mask: ffffffffffffffff >>>>> [    3.455201] clocksource:                       No current clocksource. >>>>> [    3.457197] tsc: Marking TSC unstable due to clocksource watchdog >>>>> >>>>> 6c52b5f3cf reduced WATCHDOG_THRESHOLD, then in clocksource_watchdog, the >>>>> warning logs are print, the TSC is marked as unstable. >>>>> /* Check the deviation from the watchdog clocksource. */ >>>> Aha, so this system really does have an unstable TSC! Which means that >>>> the patch is operating as designed. >>>> >>>> Or are you saying that this is a false positive? >>>> >>>> Thanx, Paul >>> >>> It happened during boot and before TSC calibration >>> (tsc_refine_calibration_work()), so on some machines "abs(cs_nsec - wd_nsec) >>>> WATCHDOG_THRESHOLD", WATCHDOG_THRESHOLD is set too small at that time. >>> After TSC calibrated, abs(cs_nsec - wd_nsec) should be very small, >>> WATCHDOG_THRESHOLD for here is ok. So I suggest increasing the >>> WATCHDOG_THRESHOLD before TSC calibration, for example, the clocks be skewed >>> by more than 1% to be marked unstable. > > This is common code, so we do need an architecture-independent way to > handle this. > >> As Zhengjun measuered, this is a Cascade Lake platform, and it has 2 >> times calibration of tsc, the first one of early quick calibration gives >> 2100 MHz, while the later accurate calibration gives 2095 MHz, so there >> is about 2.5/1000 deviation for the first number, which just exceeds the >> 1/1000 threshold you set :) > > Even my 2/1000 initial try would have caused this, then. ;-) > > But even 1/1000 deviation would cause any number of applications some > severe heartburn, so I am not at all happy with the thought of globally > increasing to (say) 3/1000. > >> Following is the tsc freq info from kernel log >> >> [ 0.000000] DMI: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 >> [ 0.000000] tsc: Detected 2100.000 MHz processor >> ... >> [ 13.859982] tsc: Refined TSC clocksource calibration: 2095.077 MHz > > So what are our options? > > 1. Clear CLOCK_SOURCE_MUST_VERIFY from tsc-early. > > 2. #1, but add tsc-early into the watchdog list and set > CLOCK_SOURCE_MUST_VERIFY once it is better calibrated. > > 3. Add a field to struct clocksource that, if non-zero, gives > the maximum drift in nanoseconds per half second (AKA > WATCHDOG_INTERVAL). If zero, the WATCHDOG_MAX_SKEW value > is used. Set this to (say) 150,000ns for tsc-early. > > 4. As noted earlier, increase WATCHDOG_MAX_SKEW to 150 microseconds, > which again is not a good approach given the real-world needs > of real-world applications. > > 5. Your ideas here. How about set two watchdog thresholds, one for before calibration(1/100), the other for after calibration(1/1000)? For example, u64 watchdog_thresholds[2]. > > All in all, I am glad that I made the patch that decreases > WATCHDOG_MAX_SKEW be separate and at the end of the series. ;-) > > Thanx, Paul > -- Zhengjun Xing