Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754108AbbGWQZI (ORCPT ); Thu, 23 Jul 2015 12:25:08 -0400 Received: from mail7.hitachi.co.jp ([133.145.228.42]:47070 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753752AbbGWQZB (ORCPT ); Thu, 23 Jul 2015 12:25:01 -0400 Message-ID: <55B11555.9060100@hitachi.com> Date: Fri, 24 Jul 2015 01:24:53 +0900 From: Masami Hiramatsu Organization: Hitachi, Ltd., Japan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Arnaldo Carvalho de Melo CC: Hemant Kumar , Peter Zijlstra , linux-kernel@vger.kernel.org, Adrian Hunter , Ingo Molnar , Jiri Olsa , Namhyung Kim , Borislav Petkov Subject: Re: Re: Re: [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support References: <20150715091352.8915.87480.stgit@localhost.localdomain> <55A7215F.40803@linux.vnet.ibm.com> <55A874C6.5030202@hitachi.com> <55AFA4E2.4040801@linux.vnet.ibm.com> <55B0E872.8030206@hitachi.com> <20150723140127.GD3152@kernel.org> In-Reply-To: <20150723140127.GD3152@kernel.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9558 Lines: 236 On 2015/07/23 23:01, Arnaldo Carvalho de Melo wrote: > Em Thu, Jul 23, 2015 at 10:13:22PM +0900, Masami Hiramatsu escreveu: >> On 2015/07/22 23:12, Hemant Kumar wrote: >>> On 07/17/2015 08:51 AM, Masami Hiramatsu wrote: >>>> On 2015/07/16 12:13, Hemant Kumar wrote: >>> The idea behind '%' was to identify the SDT events and take a different path >>> to lookup through the cache, put a probe, record and then delete the probe. >>> Or, do you want "perf record" to record any event this way (not just an sdt >>> event). > >> I see, but I think that is not good by following reasons, > >> - when we record event with "-e %provider:event", it will be shown as >> "provider:event" >> - if perf-list shows the SDT(cached) events as "%provider:event", that >> will not match the recorded result. >> - it is somewhat fragile that we temporary add the SDT event and remove it >> after record, because the event will not hide from ftrace users (this >> means that we'll fail removing the event by -EBUSY if someone use it >> via ftrace) > > We should avoid that, if we say record event "foo:bar", then we should > consistently show "foo:bar" everywhere this is referenced. > >> - if we set SDT events perf-probe, it will be shown as "provider:event" name >> because "%" will be rejected by ftrace. In that case, what the perf-list show >> those events, both of %provider:event and provider:event ? > >> thus I pushed the "%" as a "special remembering mark" only for looking >> up the event from cache by perf-probe. > >> So I'd like to suggest that the following behavior > >> 1) perf-list shows the cached-with-name and SDT events as Tracepoint events >> even if it is not yet probed. > > I can agree with 'perf list' showing what can be used as events, and > SDTs, AFAIK, match that definition, i.e. we have somewhere (in the DSOs, > right?) information about where to ask for an event to be enabled. SDTs are described in .note section, and is extracted by perf-buildid-cache and stored into perf-probe's cache file (under buildid-cache). Perf-list shows only the extracted one. > If there are details on how that needs to be obtained, then passed to > the kernel somehow to then become really, really accessible, then these > details are completely internal. agreed. > >> # perf list >> >> List of pre-defined events (to be used in -e): >> ... >> libc:memory_heap_new [Tracepoint event] > > Is it like this or is it like [ku]probes where we already have a > namespace qualifier, i.e.: > > [root@zoo ~]# perf probe icmp_rcv > Added new event: > probe:icmp_rcv (on icmp_rcv) > > You can now use it in all perf tools, such as: > > perf record -e probe:icmp_rcv -aR sleep 1 > > [root@zoo ~]# > > [root@zoo ~]# perf probe /lib64/libc-2.20.so malloc > Added new events: > probe_libc:malloc (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_1 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_2 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_3 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_4 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_5 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_6 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_7 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_8 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_9 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_10 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_11 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_12 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_13 (on malloc in /lib64/libc-2.20.so) > probe_libc:malloc_14 (on malloc in /lib64/libc-2.20.so) > > You can now use it in all perf tools, such as: > > perf record -e probe_libc:malloc_14 -aR sleep 1 > > [root@zoo ~]# > > "probe" for kernel events, "probe_%s" % DSO basename for userspace > events. > > Why not continue with that and have SDTs use the probe_%s: namespace? > Sorry if this was already discussed here... :) We are discussing about that in another thread, anyway, probe_%s can solve a little part of the clash of names. > > If there is some ambiguity, that can be resolved by explicitely setting > a new name, 'perf probe' has provision for that, right? I.e.: Yes, but that means we'll have to give new names before using that. Actually, SDT has "provider-name", "event-name" and "probe location" (also have arguments, but not supported). And provider name is not always same as the binary name. (actually, the application developers can use any name for it...) So adding something special prefix or detect clash before using will be the option. The following patterns we've discussed. - : simple, but could easily clash with others. - probe_: - sdt_: also simple and similar to current solution. but fragile against clash among SDTs. - probe_:_ also simple, but if provider or/and name has '_', it is hard to split the provider and name. and fragile against clash among SDTs too. - _/ possible, but ugly since buildid is a random long xdigits(maybe cut up to 8 or 12 bytes). > [root@zoo ~]# perf probe /lib64/libc-2.20.so another_name=malloc > Added new events: > > probe_libc:another_name_14 (on malloc in /lib64/libc-2.20.so) > > You can now use it in all perf tools, such as: > > perf record -e probe_libc:another_name_14 -aR sleep 1 > > [root@zoo ~]# > >> ... >> probes:myevent [Tracepoint event] >> ... > >> 2) perf-record -e with no-probed event should try to set up the given probe >> by using perf-probe. It is possible to remove that the probe after recording, >> but also ignore if it fails by -EBUSY. (anyway, there is no difference for >> users) > > Right, being able to add new probes _without_ calling 'perf probe', but > instead functions used by 'perf probe' is something the eBPF does (I'm > almost getting there... :) ) and that I want to do in other tools, like > 'trace' as well... Yes, it should be done internally. (of course, users also can use perf-probe for activating SDT.) > There are permission problems about how to add new probes and how to > _enable_ them, i.e. I think it is ok to allow users to ask for > probe:foo, if they are monitoring just their workloads... I see... But that's another issue I think, since the issue is already there for using uprobes. We'd better fork thread for that issue. [...] > This needs refining, i.e. I should just warn that albeit "probe:vfs_getname" is available, > it can't be used, unless we open the doors wide. > > Sorry for the digression, but these permission issues will hit us with SDT as well, no? Yes, and we have another way to avoid this issue, by using setuid as systemtap does. >> This rule will solve the contradiction between the event name on recorded >> data and listed events. However, as we discussed there are other clashes. > >> A) clash among binaries: Since the binary builders can freely use the >> provider name, it is possible to clash to other binaries' SDTs. > > Yes, one way to disambiguate is to use the buildid, i.e. content based, when/if > the need arises. We should _always_ store the build-id, together with any probe > used, so that we can bail out when trying to run those with a non matching DSO > (kernel, module, library, whatever). Store the build-id with what? SDTs and cached probes are already stored under build-id directory(same location of buildid-cache). Would you mean perf.data? And managing all probe events strictly by perftools is hard since someone can set new probes via tracefs(debugfs) directly. > > When specifying some SDT that is ambiguous, we should bail out and ask for further > namespacing, like: > > [acme@zoo linux]$ git show --oneline 25b6 > error: short SHA1 25b6 is ambiguous. > error: short SHA1 25b6 is ambiguous. > fatal: ambiguous argument '25b6': unknown revision or path not in the working tree. > Use '--' to separate paths from revisions, like this: > 'git [...] -- [...]' > [acme@zoo linux]$ git show --oneline 25b67 > error: short SHA1 25b67 is ambiguous. > error: short SHA1 25b67 is ambiguous. > fatal: ambiguous argument '25b67': unknown revision or path not in the working tree. > Use '--' to separate paths from revisions, like this: > 'git [...] -- [...]' > > It can be via build-id or by asking for the full pathname to the DSO, etc. > >> B) clash among different versions: Of course the different versions of binaries >> can be co-exist on the system. Those usually have the same SDTs and same >> basename, just different build-ids. > > Right, in that case the build-id or the full pathname needs to be specified somehow > >> These issues are not solved by using "%" because it happens among SDTs. >> So we need to find another way to distinguish the SDTs. > >>From what I've read so far (not much): agreed. Thank you, -- Masami HIRAMATSU Linux Technology Research Center, System Productivity Research Dept. Center for Technology Innovation - Systems Engineering Hitachi, Ltd., Research & Development Group E-mail: masami.hiramatsu.pt@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/