Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934512AbcLMQu0 (ORCPT ); Tue, 13 Dec 2016 11:50:26 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:42085 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932594AbcLMQtU (ORCPT ); Tue, 13 Dec 2016 11:49:20 -0500 Date: Tue, 13 Dec 2016 17:46:31 +0100 (CET) From: Thomas Gleixner To: Roland Scheidegger cc: LKML , x86@kernel.org, Peter Zijlstra , Borislav Petkov , Bruce Schlobohm , Kevin Stanton , Allen Hung Subject: Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm In-Reply-To: <33d4286c-3f77-1274-34b7-bc62d2c146a4@hispeed.ch> Message-ID: References: <20161213131115.764824574@linutronix.de> <33d4286c-3f77-1274-34b7-bc62d2c146a4@hispeed.ch> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1548 Lines: 35 On Tue, 13 Dec 2016, Roland Scheidegger wrote: > Am 13.12.2016 um 14:14 schrieb Thomas Gleixner: > > Roland reported interesting TSC ADJUST register wreckage on his DELL > > machine, which seems to populate that MSR with a random number generator. > > FWIW, I thought about the actual values some more and I don't actually > think they are all that random any more: the behavior is consistent with > the bios trying to zero the TSC of all cpus. If I understand this right, > writing a zero to TSC would cause somewhat small negative values in the > TSC_ADJ register at boot time, and larger negative values at suspend > time (at least if the TSC just stops when suspended and isn't reset) - > exactly what I'm seeing. > (And of course the different TSC_ADJ values would be because the bios is > writing TSC without any thoughts of synchronization, just one cpu after > another). Yeah, that might be. Still it looks like random nonsense and definitely the BIOS developers did not follow the secrit boot protocol. > > Deeper investagation into fixing this wreckage unearthed another special > > feature which is designed by Intel: Negative TSC adjuste values cause > > interrupt storms on the TSC deadline timer. Further details in patch 2/2 > > This actually looks like quite a serious hw bug to me, shouldn't there > be an errata for such a bug? > > And I still don't quite understand why the lockup doesn't happen after a > warm boot, there must be something different there... What are the adjust values after a warm boot? Thanks, tglx