Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932722AbbGJTAA (ORCPT ); Fri, 10 Jul 2015 15:00:00 -0400 Received: from mail-wg0-f54.google.com ([74.125.82.54]:35977 "EHLO mail-wg0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932290AbbGJS7w (ORCPT ); Fri, 10 Jul 2015 14:59:52 -0400 MIME-Version: 1.0 In-Reply-To: <20150710083523.GA11445@gmail.com> References: <1436428080-3098-1-git-send-email-adrian.hunter@intel.com> <20150709085022.GB2859@worktop.programming.kicks-ass.net> <20150709092656.GA13336@gmail.com> <20150709115948.GS19282@twins.programming.kicks-ass.net> <20150709123205.GA9496@gmail.com> <20150709124257.GU19282@twins.programming.kicks-ass.net> <20150710083523.GA11445@gmail.com> Date: Fri, 10 Jul 2015 11:59:51 -0700 Message-ID: Subject: Re: [RFC PATCH] perf: Provide status of known PMUs From: Stephane Eranian To: Ingo Molnar Cc: Peter Zijlstra , Adrian Hunter , Arnaldo Carvalho de Melo , Andy Lutomirski , Vince Weaver , Thomas Gleixner , "H. Peter Anvin" , LKML , Jiri Olsa , Borislav Petkov , Alexander Shishkin , Andi Kleen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3323 Lines: 75 Hi, On Fri, Jul 10, 2015 at 1:35 AM, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > >> On Thu, Jul 09, 2015 at 02:32:05PM +0200, Ingo Molnar wrote: >> > >> > perf record error: The 'bts' PMU is not available, because the CPU does not support it >> >> This one makes sense. >> >> > perf record error: The 'bts' PMU is not available, because this architecture does not support it >> > perf record error: The 'bts' PMU is not available, because its driver is not built into the kernel >> > >> > Because if it's the wrong architecture or CPU, I look for a box with the right >> > one, if it's simply the kernel not having the necessary PMU driver then I'll boot >> > a kernel with it enabled. >> >> These not so much; why won't a generic: "Unknown PMU, check arch/kernel" do? > > Yeah, I mean why not make the user's job harder if we can? We really don't want to > solve this problem technically and we _really_ want tooling to be fundamentally > unhelpful, right? ;-) > > I realize that the 'Error: there was a bug, aborting' style of sado-masochistic > error messages are the current Linux tooling status quo, which opaque error > feedback comes from an early technological mistake of Unix system calls screwing > up error handling, and I also see that after decades of abuse people are showing > signs of the Stockholm Syndrome related to this problem, but it _really_ does not > have to be so ... > > Whenever we can we should change such bad patterns. > >> The thing is, I hate that hard-coded list, its pain I don't need. > > Absolutely! I pointed this out during review as well. > > It does not impact the core concept though: we should have a single numeric error, > and free form error strings provided by the place that first triggers some > problem. That should be both programmatically easy to handle and maximally > informative to the users. > > At least half of a tool's usability comes not from how it behaves when it works, > but how it behaves when it does not. (SystemD, I'm looking at you.) > This patch looks useful but it does not address a related issue. Here you are reporting on the status of specific PMU support, i.e., PMU is not supported by hardware. But there is another problem which I ran into on ARM very often (like on Tegra) and it really annoys me. The PMU hardware is present, but the instance of the PMU on a CPU is not present, simply because the CPU is hotpluggable and its offline at the time the tool (perf) starts. I am not talking about explicit hotplugging by the user but instead be the kernel. Then during the run, the CPU is plugged back in by the kernel to handle the load. Perf misses monitoring that CPU completely, thus it does not measure what's going on in reality. I understand that reporting that a PMU instance is supported but offline does not solve the entire problem. There needs to be some other kernel support. But I think it would be good to have the tool at least issue a warning saying: "some CPUs are offline, not monitoring all CPUs, results may be partial". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/