Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751541AbdIAIVH (ORCPT ); Fri, 1 Sep 2017 04:21:07 -0400 Received: from mail-io0-f181.google.com ([209.85.223.181]:36299 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751367AbdIAIVE (ORCPT ); Fri, 1 Sep 2017 04:21:04 -0400 X-Google-Smtp-Source: ADKCNb4v67kagnSPAw52dKzJeDLqWY3R228aNXBvOvLfp0SsjgTt3q8cgj9UXPDiEPGldfmuD2c7RV1k5LazlhXaGdo= MIME-Version: 1.0 In-Reply-To: <20170901075901.GD14815@krava> References: <1503342849-2036-1-git-send-email-eranian@google.com> <87wp5wr0cb.fsf@firstfloor.org> <20170822012545.GJ2482@two.firstfloor.org> <20170822070332.GA21829@krava> <20170828204114.GR2482@two.firstfloor.org> <20170901075901.GD14815@krava> From: Stephane Eranian Date: Fri, 1 Sep 2017 01:21:03 -0700 Message-ID: Subject: Re: [PATCH] perf record: enable multiplexing scaling via -R To: Jiri Olsa Cc: Andi Kleen , LKML , Arnaldo Carvalho de Melo , Peter Zijlstra , "mingo@elte.hu" , "Liang, Kan" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3136 Lines: 76 On Fri, Sep 1, 2017 at 12:59 AM, Jiri Olsa wrote: > On Wed, Aug 30, 2017 at 11:21:23PM -0700, Stephane Eranian wrote: >> Hi, >> >> >> On Mon, Aug 28, 2017 at 1:41 PM, Andi Kleen wrote: >> >> So I think we are good to go. to capture multiplexing scaling factor >> >> when sampling simply use the S >> >> modifier. >> >> But to my surprise, newer kernels are not happy with the cmdline: >> >> $ perf record -e cycles:S noploop 1 >> >> Error: >> >> The sys_perf_event_open() syscall returned with 22 (Invalid argument) >> >> for event (cycles:Su). >> >> /bin/dmesg may provide additional information. >> >> No CONFIG_PERF_EVENTS=y kernel support configured? >> > >> > Likely due to >> > >> > ba5213ae6b88 perf/core: Correct event creation with PERF_FORMAT_GROUP >> > >> > It's not supported with inherited events. >> > >> Yes, and other things have changed as well. I did a bit of research to >> figure out how >> to make this work out-of the-box with the latest perf (v4.13). It >> turns out you need to >> combine multiple options and an event modifier. This is quite cumbersome >> but here it is: >> >> $ perf record --no-inherit --running-time -e cycles:S ........ >> >> You need: >> - no-inherit: the kernel does not know how to deal with multiplexing >> when events are inherited >> - running-time: this used to be automatic for PERF_SAMPLE_READ with >> perf record, now it is not >> This includes TIME_ENABLED/TIME_RUNNING in the sample_read format. >> - :S : to add a PERF_SAMPLE_READ to each sample, it encapsulates the >> event value + timings. >> We do not care about the value but are only interested in the timings. >> >> The kernel cannot record the timings without a PERF_SAMPLE_READ. >> >> I am also surprised to see that perf record keep inherit=1 in >> system-wide mode. I don't think this >> is relavant in this mode. But the kernel this fails in this case, >> which I think is a bug. In system-wide >> mode, the attr-.no_inherit should be ignored. We can fix perf record >> to avoid this in system-wide. >> >> The cmdline above works for both per-thread and system-wide modes. >> >> So I think we do not need my patch or variations thereof, everything >> is there, though a bit difficult >> to combine. > > hum, how about we introduce new modifier to attach timing info, like: > $ perf record -e cycles:T .... > > modifiers might be scares resource, but we don't add them every day, > and this requirement looks generic > It is not just a matter of modifier, you need to have the kernel record just what you want. AFAIK, the only way for the the kernel to record timings on sampling events is to force PERF_SAMPLE_READ. So the T modifier would have to also set that format, at which point, I wonder how useful it is compared to S. Alternatively, we could improve the kernel to support recording timing with PERF_SAMPLE_TIMINGS as a pair of u64 to represent time_enabled, time_running. That would avoid the whole PERF_SAMPLE_READ and the extra u64 it records. Recording the value of the sampling event is not very useful because it keeps being reset for each period.