Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934581AbcLMQe3 (ORCPT ); Tue, 13 Dec 2016 11:34:29 -0500 Received: from vie01a-dmta-ch01-3.mx.upcmail.net ([84.116.36.93]:23885 "EHLO vie01a-dmta-ch01-3.mx.upcmail.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934548AbcLMQeZ (ORCPT ); Tue, 13 Dec 2016 11:34:25 -0500 X-SourceIP: 77.56.147.151 X-Authenticated-Sender: rscheidegger_lists@hispeed.ch Subject: Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm To: Thomas Gleixner , LKML References: <20161213131115.764824574@linutronix.de> Cc: x86@kernel.org, Peter Zijlstra , Borislav Petkov , Bruce Schlobohm , Kevin Stanton , Allen Hung From: Roland Scheidegger Message-ID: <33d4286c-3f77-1274-34b7-bc62d2c146a4@hispeed.ch> Date: Tue, 13 Dec 2016 17:34:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161213131115.764824574@linutronix.de> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1320 Lines: 30 Am 13.12.2016 um 14:14 schrieb Thomas Gleixner: > Roland reported interesting TSC ADJUST register wreckage on his DELL > machine, which seems to populate that MSR with a random number generator. FWIW, I thought about the actual values some more and I don't actually think they are all that random any more: the behavior is consistent with the bios trying to zero the TSC of all cpus. If I understand this right, writing a zero to TSC would cause somewhat small negative values in the TSC_ADJ register at boot time, and larger negative values at suspend time (at least if the TSC just stops when suspended and isn't reset) - exactly what I'm seeing. (And of course the different TSC_ADJ values would be because the bios is writing TSC without any thoughts of synchronization, just one cpu after another). > > Deeper investagation into fixing this wreckage unearthed another special > feature which is designed by Intel: Negative TSC adjuste values cause > interrupt storms on the TSC deadline timer. Further details in patch 2/2 This actually looks like quite a serious hw bug to me, shouldn't there be an errata for such a bug? And I still don't quite understand why the lockup doesn't happen after a warm boot, there must be something different there... (I didn't have the chance to test the patch yet.) Roland