Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Date:   Sun, 2 Apr 2023 20:38:37 -0700
From:   "Paul E. McKenney" <paulmck@kernel.org>
To:     Waiman Long <longman@redhat.com>
Cc:     Feng Tang <feng.tang@intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        linux-kernel@vger.kernel.org
Subject: Re: A couple of TSC questions
Message-ID: <b2436d38-771b-499d-b6a6-2c42a98df659@paulmck-laptop>
Reply-To: paulmck@kernel.org
References: <b2f07f18-b60b-403b-90ff-937ea32963ee@paulmck-laptop>
 <ZBqOyLByOgvdAve3@feng-clx>
 <acda3fb5-373a-48b1-b78e-eea18da4ccb0@paulmck-laptop>
 <ZCD9SnZSUCkE9Ss+@feng-clx>
 <aaf391d9-98e6-47d8-b402-c90b4a81ae9c@paulmck-laptop>
 <3daa086c-b4a0-47a9-8bfc-aac4139013c4@paulmck-laptop>
 <293db107-a572-592f-cc27-e59ab81a4e60@redhat.com>
 <ef61060d-9f40-4b44-9a51-026041fe75e1@paulmck-laptop>
 <d0be6b59-11ee-f899-19e5-b84eba003ac8@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <d0be6b59-11ee-f899-19e5-b84eba003ac8@redhat.com>
Precedence: bulk

On Sun, Apr 02, 2023 at 10:05:51PM -0400, Waiman Long wrote:
> On 4/2/23 22:00, Paul E. McKenney wrote:
> > On Sun, Apr 02, 2023 at 09:04:04PM -0400, Waiman Long wrote:
> > > On 3/31/23 13:16, Paul E. McKenney wrote:
> > > > On Tue, Mar 28, 2023 at 02:58:54PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Mar 27, 2023 at 10:19:54AM +0800, Feng Tang wrote:
> > > > > > On Fri, Mar 24, 2023 at 05:47:33PM -0700, Paul E. McKenney wrote:
> > > > > > > On Wed, Mar 22, 2023 at 01:14:48PM +0800, Feng Tang wrote:
> > > > [ . . . ]
> > > > 
> > > > > > > > > Second, we are very occasionally running into console messages like this:
> > > > > > > > > 
> > > > > > > > > Measured 2 cycles TSC warp between CPUs, turning off TSC clock.
> > > > > > > > > 
> > > > > > > > > This comes from check_tsc_sync_source() and indicates that one CPU's
> > > > > > > > > TSC read produced a later time than a later read from some other CPU.
> > > > > > > > > I am beginning to suspect that these can be caused by unscheduled delays
> > > > > > > > > in the TSC synchronization code, but figured I should ask you if you have
> > > > > > > > > ever seen these.  And of course, if so, what the usual causes might be.
> > > > > > > > I haven't seen this error myself or got similar reports. Usually it
> > > > > > > > should be easy to detect once happened, as falling back to HPET
> > > > > > > > will trigger obvious performance degradation.
> > > > > > > And that is exactly what happened.  ;-)
> > > > > > > 
> > > > > > > > Could you give more detail about when and how it happens, and the
> > > > > > > > HW info like how many sockets the platform has.
> > > > > > > We are in early days, so I am checking for other experiences.
> > > > > > > 
> > > > > > > > CC Thomas, Waiman, as they discussed simliar case here:
> > > > > > > > https://lore.kernel.org/lkml/87h76ew3sb.ffs@tglx/T/#md4d0a88fb708391654e78312ffa75b481690699f
> > > > > > > Fun!  ;-)
> > > > > Waiman, do you recall what fraction of the benefit was provided by the
> > > > > first patch, that is, the one that grouped the sync_lock, last_tsc,
> > > > > max_warp, nr_warps, and random_warps global variables into a single
> > > > > struct?
> > > The purpose of the first patch is just to avoid false cacheline sharing
> > > between the watchdog cpu and another cpu that happens to access a nearby
> > > data in the same cacheline.
> > > 
> > > Now I realize that I should have followed up with this patch series. The
> > > problem reported in that patch series happen on one system only, I believe.
> > Thus far I am seeing eight systems, but out of a large number.  So this
> > is very much preliminary.
> > 
> > > > And what we are seeing is unlikely to be due to cache-latency-induced
> > > > delays.  We see a very precise warp, for example, one system always
> > > > has 182 cycles of TSC warp, another 273 cycles, and a third 469 cycles.
> > > > Another is at the insanely large value of about 2^64/10, and shows some
> > > > variation, but that variation is only about 0.1%.
> > > > 
> > > > But any given system only sees warp on about half of its reboots.
> > > > Perhaps due to the automation sometimes power cycling?
> > > > 
> > > > There are few enough affected systems that investigation will take
> > > > some time.
> > > Maybe the difference in wrap is due to NUMA distance of the running cpu from
> > > the node where the data reside. It will be interesting to see if my patch
> > > helps.
> > Almost all of them are single-socket systems.
> > 
> > If the problem sticks with a few systems, I should be able to test
> > patches no problem.  If it is randomly distributed across the fleet, a
> > bit more prework analysis will be called for.  But what is life without
> > a challenge?  ;-)
> 
> If it is happening on a single socket system, maybe it is caused by false
> cacheline sharing. It is hard to tell unless we find a way to reproduce it.

But multiple times on a given system with exactly the same number of
clock cycles of warp each time?  It should be entertaining tracking this
one down.  ;-)

I will take a few scans of the fleet over the coming week and see if
there is any consistency.  Here is hoping...

							Thanx, Paul