Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030339AbcJ0TAp (ORCPT ); Thu, 27 Oct 2016 15:00:45 -0400 Received: from mx0a-000f0801.pphosted.com ([67.231.144.122]:55868 "EHLO mx0a-000f0801.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933403AbcJ0TAo (ORCPT ); Thu, 27 Oct 2016 15:00:44 -0400 Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare() To: Sebastian Andrzej Siewior References: <20161021105630.y2iym7smtdpyo54z@linutronix.de> <4e56a576-1a9f-e195-5ed9-2bc7169c4d94@brocade.com> <20161025122205.cw5xyejcg7xegnmq@linutronix.de> CC: , From: "Charles (Chas) Williams" Message-ID: Date: Thu, 27 Oct 2016 15:00:32 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161025122205.cw5xyejcg7xegnmq@linutronix.de> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: hq1wp-excas14.corp.brocade.com (10.70.38.103) To BRMWP-EXMB12.corp.brocade.com (172.16.59.130) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-27_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610270301 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2101 Lines: 53 On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote: > On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote: >> [ 3.107126] init_rapl_pmus: maxpkg 4 > there! vmware bug. It probably worked by chance. Yes, the behavior is a bit random. > I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning > topology_max_packages(). So it says 4 but then returns 65535 for CPU 2 > and 3. That -1 comes probably from topology_update_package_map(). Could > you please send a complete boot log and try the following patch? This > one should fix your boot problem and disable RAPL if the info is > invalid. But sometimes the topology info is correct and if I get lucky, the package id could be valid for all the CPU's. Given the behavior, I have seen so far it makes me thing the RAPL isn't being emulated. So even if I did boot onto a "valid" set of cores, would I always be certain that I will be on those cores? > diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c > index 0a535cea8ff3..f5d85f2853d7 100644 > --- a/arch/x86/events/intel/rapl.c > +++ b/arch/x86/events/intel/rapl.c > @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void) > { > int maxpkg = topology_max_packages(); > size_t size; > + unsigned int cpu; > + > + for_each_possible_cpu(cpu) { > + if (topology_logical_package_id(cpu) >= maxpkg) { > + pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n", > + maxpkg, cpu, topology_logical_package_id(cpu)); > + return -EINVAL; > + } > + } > > size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *); > rapl_pmus = kzalloc(size, GFP_KERNEL); Per your request in your next email: >One thing I forgot to ask: Could you please check if you get the same >pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug >rework). Our previous kernel was 4.4, and didn't use the logical package id: /* check if phys_is is already covered */ for_each_cpu(i, &rapl_cpu_mask) { if (phys_id == topology_physical_package_id(i)) return;