Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1168658AbcKAKgm (ORCPT ); Tue, 1 Nov 2016 06:36:42 -0400 Received: from aibo.runbox.com ([91.220.196.211]:53658 "EHLO aibo.runbox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1168612AbcKAKgl (ORCPT ); Tue, 1 Nov 2016 06:36:41 -0400 X-Greylist: delayed 1233 seconds by postgrey-1.27 at vger.kernel.org; Tue, 01 Nov 2016 06:36:40 EDT From: "M. Vefa Bicakci" Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare() To: Sebastian Andrzej Siewior References: <20161028080324.b6nnwaljmzxiyykx@linutronix.de> Cc: "linux-kernel@vger.kernel.org" , "Charles (Chas) Williams" Message-ID: <26425897-5229-d2c5-1e1b-a08442441f68@runbox.com> Date: Tue, 1 Nov 2016 13:15:53 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161028080324.b6nnwaljmzxiyykx@linutronix.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2287 Lines: 61 > On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote: >> >> [snip] >> >> But sometimes the topology info is correct and if I get lucky, the >> package id could be valid for all the CPU's. Given the behavior, >> I have seen so far it makes me thing the RAPL isn't being emulated. >> So even if I did boot onto a "valid" set of cores, would I always be >> certain that I will be on those cores? > > I don't what vmware does here. Nor do they ship source to check. So if > you have a big HW box with say two packages, it might make sense to give > this information to the guest _if_ the CPUs are pinned and the guest > never migrates. > >> Per your request in your next email: >> >> > One thing I forgot to ask: Could you please check if you get the same >> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug >> > rework). >> >> Our previous kernel was 4.4, and didn't use the logical package id: > > I see. > > Did the patch I sent fixed it for you and were you not able to test? Hello Sebastian, The patch fixes the kernel oops for me. I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding the allocation of package identifiers for the virtual CPUs. Prior to your patch, my Xen-based virtual machines would intermittently crash most of the time at boot-up with the backtrace reported by Charles. Due to this, I was under the impression that this is a subtle race condition. With your patch, the virtual machines boot-up successfully, all the time. Here are the relevant excerpts from dmesg: === 8< === [ 0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535 ... [ 2.213669] intel_rapl: Found RAPL domain package [ 2.213689] intel_rapl: Found RAPL domain core [ 2.216337] intel_rapl: Found RAPL domain uncore [ 2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS === >8 === Thank you, Vefa Please note: I am not subscribed to the Linux kernel mailing list, so I had to manually construct the headers of this reply with the proper In-Reply-To and References values (which were extracted from marc.info). As a result, this e-mail may not show up as a reply to your earlier conversation with Charles.