Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1721739pxj; Wed, 19 May 2021 12:19:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwXBAxXlr/6wvYW4/xRR2oqKLxSuplkRVCwe8YSfVYYtJlKEqBa/ljm0ySSplIVJYZrhf3Y X-Received: by 2002:a50:f744:: with SMTP id j4mr605923edn.211.1621451948463; Wed, 19 May 2021 12:19:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621451948; cv=none; d=google.com; s=arc-20160816; b=DlLSB2GMmv8mNyP5eUU36YYIXDZ68c20SMAfybxw19ehRhyVhXUrJLpuqMryllRpkX tZQqN//2s0TH6Mgy4a93PmG7ffMZ+W58uDBuBeSdkKJzki8urEvdiXuDI+rxvCq3Vxmv oQ8VUk5Mk0TX9JeZNkZM6x66ABD1uJL2Ib45TUlJSysC3ToOBr1ZU/cE9rqYXQ5HlUVi S33qTdgWpEzwwpmQGt3rSW0og9v3qgAiWmRoCJI5GB7GndMCXZC0mjKF4nVOwq0xgTTq UTfMLRf4/ThI9IcS6F3JKOCXM6fUGCvGzTxFi+JaO5O+DUSVrXSP88fGHkeVkdYlztZJ 0tDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :ironport-sdr:ironport-sdr; bh=pifP9GiR0fxbT9US2+c+Yn5i0xR+bUR6ERsKpwda7hA=; b=S+WYAIKFMCCd3e0yMiNWxOifpoa8ZKVrZ/siY76AIHkimOaHaN7fN7XOu1wGsoHnwz D/8MXTmLe2b5hdOQbeCM/SUSSJY29HTDcEcwhJAhBiGPjJid6Wz2r6nOQhtYd73ZCggV 0tSFIR06nbWf+ljKhUMabgkLif2skd2NoVvq5BEAtGuKaCOOKwjcWEFTjWgY59NJQR0c Qn8yIisbH2i8v33kZeP9p1fXZuJOL02//HpPr99awyHb/MaAXTmdWWtXC+wtL7Bq7naS m1yKsxh+r6BIUDq3px5e/xPRPJEPm1vzdInglkiRJfYbSRqTC+ElPBB/WkdA8TTkBzG9 7v2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 17si108944edv.148.2021.05.19.12.18.43; Wed, 19 May 2021 12:19:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231217AbhESGKs (ORCPT + 99 others); Wed, 19 May 2021 02:10:48 -0400 Received: from mga14.intel.com ([192.55.52.115]:24016 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231339AbhESGKm (ORCPT ); Wed, 19 May 2021 02:10:42 -0400 IronPort-SDR: w/JbdbMbeWXR2PUvT2xGQESQC5728qE/M9ErizR2mSIEV3FDMFAfxWZGwHtjvMNIrBxYHIBN1W 5uaVRZCcdwZA== X-IronPort-AV: E=McAfee;i="6200,9189,9988"; a="200594362" X-IronPort-AV: E=Sophos;i="5.82,312,1613462400"; d="scan'208";a="200594362" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2021 23:09:07 -0700 IronPort-SDR: U9kdsIUWWPVeDaS4hXJ2El/Ogt0uWcsgMZQNaXEqcM0EDHRxuQcT0owPGuoPCuNPS9uKBF3MIG 02C4SvJO5NwA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,312,1613462400"; d="scan'208";a="473327115" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.147.94]) by orsmga001.jf.intel.com with ESMTP; 18 May 2021 23:09:02 -0700 Date: Wed, 19 May 2021 14:09:02 +0800 From: Feng Tang To: "Paul E. McKenney" Cc: kernel test robot , 0day robot , Thomas Gleixner , John Stultz , Stephen Boyd , Jonathan Corbet , Mark Rutland , Marc Zyngier , Andi Kleen , Xing Zhengjun , LKML , lkp@lists.01.org, ying.huang@intel.com, zhengjun.xing@intel.com, kernel-team@fb.com, neeraju@codeaurora.org, rui.zhang@intel.com Subject: Re: [clocksource] 388450c708: netperf.Throughput_tps -65.1% regression Message-ID: <20210519060902.GE78241@shbuild999.sh.intel.com> References: <20210501003247.2448287-4-paulmck@kernel.org> <20210513155515.GB23902@xsang-OptiPlex-9020> <20210513170707.GA975577@paulmck-ThinkPad-P17-Gen-1> <20210514074314.GB5384@shbuild999.sh.intel.com> <20210514174908.GI975577@paulmck-ThinkPad-P17-Gen-1> <20210516063419.GA22111@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210516063419.GA22111@shbuild999.sh.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 16, 2021 at 02:34:19PM +0800, Feng Tang wrote: > On Fri, May 14, 2021 at 10:49:08AM -0700, Paul E. McKenney wrote: > > On Fri, May 14, 2021 at 03:43:14PM +0800, Feng Tang wrote: > > > Hi Paul, > > > > > > On Thu, May 13, 2021 at 10:07:07AM -0700, Paul E. McKenney wrote: > > > > On Thu, May 13, 2021 at 11:55:15PM +0800, kernel test robot wrote: > > > > > > > > > > > > > > > Greeting, > > > > > > > > > > FYI, we noticed a -65.1% regression of netperf.Throughput_tps due to commit: > > > > > > > > > > > > > > > commit: 388450c7081ded73432e2b7148c1bb9a0b039963 ("[PATCH v12 clocksource 4/5] clocksource: Reduce clocksource-skew threshold for TSC") > > > > > url: https://github.com/0day-ci/linux/commits/Paul-E-McKenney/Do-not-mark-clocks-unstable-due-to-delays-for-v5-13/20210501-083404 > > > > > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2d036dfa5f10df9782f5278fc591d79d283c1fad > > > > > > > > > > in testcase: netperf > > > > > on test machine: 96 threads 2 sockets Ice Lake with 256G memory > > > > > with following parameters: > > > > > > > > > > ip: ipv4 > > > > > runtime: 300s > > > > > nr_threads: 25% > > > > > cluster: cs-localhost > > > > > test: UDP_RR > > > > > cpufreq_governor: performance > > > > > ucode: 0xb000280 > > > > > > > > > > test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance. > > > > > test-url: http://www.netperf.org/netperf/ > > > > > > > > > > > > > > > > > > > > If you fix the issue, kindly add following tag > > > > > Reported-by: kernel test robot > > > > > > > > > > > > > > > also as Feng Tang checked, this is a "unstable clocksource" case. > > > > > attached dmesg FYI. > > > > > > > > Agreed, given the clock-skew event and the resulting switch to HPET, > > > > performance regressions are expected behavior. > > > > > > > > That dmesg output does demonstrate the value of Feng Tang's patch! > > > > > > > > I don't see how to obtain the values of ->mult and ->shift that would > > > > allow me to compute the delta. So if you don't tell me otherwise, I > > > > will assume that the skew itself was expected on this hardware, perhaps > > > > somehow due to the tpm_tis_status warning immediately preceding the > > > > clock-skew event. If my assumption is incorrect, please let me know. > > > > > > I run the case with the debug patch applied, the info is: > > > > > > [ 13.796429] clocksource: timekeeping watchdog on CPU19: Marking clocksource 'tsc' as unstable because the skew is too large: > > > [ 13.797413] clocksource: 'hpet' wd_nesc: 505192062 wd_now: 10657158 wd_last: fac6f97 mask: ffffffff > > > [ 13.797413] clocksource: 'tsc' cs_nsec: 504008008 cs_now: 3445570292aa5 cs_last: 344551f0cad6f mask: ffffffffffffffff > > > [ 13.797413] clocksource: 'tsc' is current clocksource. > > > [ 13.797413] tsc: Marking TSC unstable due to clocksource watchdog > > > [ 13.844513] clocksource: Checking clocksource tsc synchronization from CPU 50 to CPUs 0-1,12,22,32-33,60,65. > > > [ 13.855080] clocksource: Switched to clocksource hpet > > > > > > So the delta is 1184 us (505192062 - 504008008), and I agree with > > > you that it should be related with the tpm_tis_status warning stuff. > > > > > > But this re-trigger my old concerns, that if the margins calculated > > > for tsc, hpet are too small? > > > > If the error really did disturb either tsc or hpet, then we really > > do not have a false positive, and nothing should change (aside from > > perhaps documenting that TPM issues can disturb the clocks, or better > > yet treating that perturbation as a separate bug that should be fixed). > > But if this is yet another way to get a confused measurement, then it > > would be better to work out a way to reject the confusion and keep the > > tighter margins. I cannot think right off of a way that this could > > cause measurement confusion, but you never know. > > I have no doubt in the correctness of the measuring method, but was > just afraid some platforms which use to 'just work' will be caught :) > > > So any thoughts on exactly how the tpm_tis_status warning might have > > resulted in the skew? > > The tpm error message has been reported before, and from google there > were some similar errors, we'll do some further check. Some update on this: further debug shows it is not related to TPM module, as the 'unstable' still happens even if we disable TPM module in kernel. We run this case on another test box of same type but with latest BIOS and microcode, the tsc freq is correctly calculated and the 'unstable' error can't be reproduced. And we will check how to upgrade the test box in 0day. Thanks, Feng