From: "M. Vefa Bicakci" <m.v.b@runbox.com>
Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
References: <20161028080324.b6nnwaljmzxiyykx@linutronix.de>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "Charles (Chas) Williams" <ciwillia@brocade.com>
Message-ID: <26425897-5229-d2c5-1e1b-a08442441f68@runbox.com>
Date: Tue, 1 Nov 2016 13:15:53 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20161028080324.b6nnwaljmzxiyykx@linutronix.de>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2287
Lines: 61

> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>
>> [snip]
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's.  Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
> 
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.
> 
>> Per your request in your next email:
>> 
>> > One thing I forgot to ask: Could you please check if you get the same
>> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>> > rework).
>> 
>> Our previous kernel was 4.4, and didn't use the logical package id:
>
> I see.
> 
> Did the patch I sent fixed it for you and were you not able to test?

Hello Sebastian,

The patch fixes the kernel oops for me.

I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
the allocation of package identifiers for the virtual CPUs.

Prior to your patch, my Xen-based virtual machines would intermittently
crash most of the time at boot-up with the backtrace reported by Charles.
Due to this, I was under the impression that this is a subtle race
condition.

With your patch, the virtual machines boot-up successfully, all the time.
Here are the relevant excerpts from dmesg:

=== 8< ===
[    0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535
...
[    2.213669] intel_rapl: Found RAPL domain package
[    2.213689] intel_rapl: Found RAPL domain core
[    2.216337] intel_rapl: Found RAPL domain uncore
[    2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS
=== >8 ===

Thank you,

Vefa

Please note: I am not subscribed to the Linux kernel mailing list, so
I had to manually construct the headers of this reply with the proper
In-Reply-To and References values (which were extracted from marc.info).
As a result, this e-mail may not show up as a reply to your earlier
conversation with Charles.