Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC
To: Thomas Gleixner <tglx@linutronix.de>
References: <1481179042-968-1-git-send-email-sroland@vmware.com>
 <alpine.DEB.2.20.1612080854420.3400@nanos>
 <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch>
 <alpine.DEB.2.20.1612090953570.3794@nanos>
 <38f3ca77-ce32-d180-106a-e7242c635394@hispeed.ch>
 <alpine.DEB.2.20.1612091806000.3423@nanos>
 <50876e71-ade8-3c3b-db8b-8f5ae136157e@hispeed.ch>
 <alpine.DEB.2.20.1612092203430.3612@nanos>
 <b6134d9e-276f-032c-dfef-9c3173632ea8@hispeed.ch>
Cc: Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
        x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
        Allen Hung <allen_hung@dell.com>
From: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
Message-ID: <a078901e-8abf-88ab-796e-27d8f60b29f4@hispeed.ch>
Date: Sat, 10 Dec 2016 06:20:54 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <b6134d9e-276f-032c-dfef-9c3173632ea8@hispeed.ch>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4851
Lines: 118

Am 10.12.2016 um 02:55 schrieb Roland Scheidegger:
> Am 09.12.2016 um 23:59 schrieb Thomas Gleixner:
>> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>>
>> Cc'ed someone from Dell. 
>>
>>> Am 09.12.2016 um 18:33 schrieb Thomas Gleixner:
>>>> Can you add the patch below to gather more information? There is a hunk in
>>>> there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
>>>> turn on as second step.
>>>
>>> Ok, here's the results:
>>> ...
>>> TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296
>>> TSC target sync skipped
>>> smpboot: Vector locked
>>> smpboot: Vector setup done
>>> smpboot: Clock setup
>>> TSC source sync skipped
>>> smpboot: Target CPU is online
>>
>> I did not expect that to happen. Now I'm puzzled and curious where the
>> machine gets lost after that. See below.
>>
>>> With the #if 0 block activated, it boots up fine, the output was:
>>
>> That does not make any sense at all, but yes, nothing in this context makes
>> sense.
>>
>>> [    1.038892] x86: Booting SMP configuration:
>>> [    1.038930] .... node  #0, CPUs:        #1
>>> [    0.171851] TSC ADJUST: CPU1: -2830353064 218577682002
>>> [    1.117495] TSC source sync 0 -> 1 runs 3
>>> [    0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1:
>>> -2830353064
>>> [    0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064
>>> [    1.117497] TSC target sync skip
>>
>>> (And fwiw with my quick hack the lockups disappear to when I change that
>>> back to blast a zero into TSC_ADJ for all cpus.)
>>
>> Right, That's what that hunk does as well.
>>
>> Now what's interesting is that the adjustement of CPU1 in the non write to
>> zero case results in the following:
>>
>> TSC ADJUST: CPU1: -2830353064 218577682002 <-- TSC value
>> TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: -2830353064
>>
>> We write CPU1 adjust register to -2828600940 which makes the TSC on CPU1
>> jump forwards by -2828600940 - -2830353064 = 1752124 cycles.
>>
>> In the write to zero case the jump is forward as well, but this time it's
>> huge, i.e. 2830353064 cycles.
>>
>> I tried to wreckage the TSC by writing similar values to the adjust MSR on
>> early boot, but independent of the values and independent of the write to
>> zero part the machine comes up happily all the time.
>>
>> The only difference is that my machine has a somewhat saner BIOS. So the
>> thing might just die in the value add SMM crap, but who knows.
>>
>> In the patch below is another bunch of debug prints which emit the state
>> information of CPU1 during bringup. Maybe that gives a hint where the
>> system gets stuck when you disable the 'write to zero' magic again.
>>
>> The NMI watchdog does not catch anything, right?
> Nope. (Though as mentioned earlier, with my hack when not writing zero
> it did - but the lockup there was later after all 16 cpus were online,
> and I only really tried that with the ubuntu 4.4 kernel. I never got to
> see the full output from that NMI though due to limited screen space, my
> attempts to try anything different than text mode were met with a blank
> screen, and from the parts I did see I didn't really see anything
> interesting albeit that's not saying much as I really have no idea about
> that code...)
> 
> With the new patch here's the output (albeit the typing gets a bit
> annoying...)
> ...
> Invoking state 32 CB replay_prepare_cpu+0x0/0xe0
> CB returned 0
> Invoking state 35 CB rcutree_prepare_cpu+0x0/0x50
> CB returned 0
> Invoking state 41 CB notify_prepare+0x0/0xa0
> CBreturned 0
> Invoking state 48 CB bringup_cpu+0x0/0x90
> x86: Booting SMP configuration:
> .... node  #0, CPUs:       #1
> Invoking state 51 CB sched_cpu_starting+0x0/0x60
> CB returned 0
> Invoking state 62 CB x86_pmu_starting_cpu+0x0/0x20
> CB returned 0
> TSC ADJUST: CPU1: -2846131604 175264328618
> TSC ADJUST differs: Reference CPU0: -2843967660 CPU1: -2846131604
> TSC ADJUST synchronize: Reference CPU0: -2843967660 CPU1: -2846131604
> TSC target sync skip
> smpboot: Vector locked
> TSC source sync 0 -> 1 runs 3
> smpboot: Vector setup done
> smpboot: Clock setup
> TSC source sync skipped
> smpboot: Target CPU is online


Ok I did some more digging. Since it appeared it never returned from
x86_cpuinit.setup_percpu_clockev() I followed that a bit more. This is
using the tsc deadline timer, ending up in clockevents_register_device()
finally. This executes all well except the raw_spin_unlock_irqrestore()
at the end which we never get past.

I disabled the tsc deadline timer (lapic=notscdeadline) and indeed, no
more lockups!
So could there be something be wrong with setting this up? Warping past
some event due to resynchronization or something?
Or hitting some bugs with TSC deadline interrupts?
Anyway, that's definitely out of my area of knowledge, hope it helps...

Roland