Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752013AbaJSW6R (ORCPT ); Sun, 19 Oct 2014 18:58:17 -0400 Received: from mail-lb0-f173.google.com ([209.85.217.173]:49543 "EHLO mail-lb0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751572AbaJSW6Q (ORCPT ); Sun, 19 Oct 2014 18:58:16 -0400 MIME-Version: 1.0 In-Reply-To: <20141019222004.GI23531@worktop.programming.kicks-ass.net> References: <20141019213341.GF23531@worktop.programming.kicks-ass.net> <20141019222004.GI23531@worktop.programming.kicks-ass.net> From: Andy Lutomirski Date: Sun, 19 Oct 2014 15:57:54 -0700 Message-ID: Subject: Re: [RFC 5/5] x86,perf: Only allow rdpmc if a perf_event is mapped To: Peter Zijlstra Cc: Erik Bosman , Ingo Molnar , "linux-kernel@vger.kernel.org" , Paul Mackerras , Kees Cook , Arnaldo Carvalho de Melo , Andrea Arcangeli , Valdis Kletnieks Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 19, 2014 at 3:20 PM, Peter Zijlstra wrote: > On Sun, Oct 19, 2014 at 03:05:42PM -0700, Andy Lutomirski wrote: >> On Oct 19, 2014 2:33 PM, "Peter Zijlstra" wrote: > >> > > Also, I don't understand the purpose of cap_user_time. Wouldn't it be >> > > easier to just record the last CLOCK_MONOTONIC time and let the user >> > > call __vdso_clock_gettime if they need an updated time? >> > >> > Because perf doesn't use CLOCK_MONOTONIC. Due to performance >> > considerations we used the sched_clock stuff, which tries its best to >> > make the best of the TSC without reverting to HPET and the like. >> > >> > Not to mention that CLOCK_MONOTONIC was not available from NMI context >> > until very recently. >> >> I'm only talking about the userspace access to when an event was >> enabled and how long it's been running. I think that's what the >> cap_user_time stuff is for. I don't think those parameters are >> touched from NMI, right? >> >> Point taken about sched_clock, though. > > Well, mixing two time bases, one TSC based and one CLOCK_MONOTONIC is > just asking for trouble IMO ;-) Fair enough. > >> > Also, things like c73deb6aecda ("perf/x86: Add ability to calculate TSC >> > from perf sample timestamps") seem to suggest people actually use TSC >> > for things as well. >> > >> > Now we might change to using the new NMI safe CLOCK_MONOTONIC (with a >> > fallback to use the sched_clock stuff on time challenged hardware) in >> > order to ease the correlation between other trace thingies, but even >> > then it makes sense to have this, having it here and reading the TSC >> > within the seqcount loop ensures you've got consistent data and touch >> > less cachelines for reading. >> >> True. >> >> OTOH, people (i.e. I) have optimized the crap out of >> __vdso_clock_gettime, and __vdso_perf_event_whatever could be >> similarly optimized. > > Maybe, but at that point we commit to yet another ABI... I'd rather just > put a 'sane' implementation in a library or so. This cuts both ways, though. For vdso timekeeping, the underlying data structure has changed repeatedly, sometimes to add features, and sometimes for performance, and the vdso has done a good job insulating userspace from it. (In fact, until 3.16, even the same exact kernel version couldn't be relied on to have the same data structure with different configs, and even now, no one really wants to teach user libraries how to parse the pvclock data structures.) I would certainly not suggest putting anything beyond the bare minimum into the vdso. FWIW, something should probably specify exactly when it's safe to try a userspace rdpmc. I think that the answer is that, for a perf event watching a pid, only that pid can do it (in particular, other threads must not try). For a perf event monitoring a whole cpu, the answer is less clear to me. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/