Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752917AbbGWNNj (ORCPT ); Thu, 23 Jul 2015 09:13:39 -0400 Received: from mail7.hitachi.co.jp ([133.145.228.42]:47199 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752389AbbGWNNa (ORCPT ); Thu, 23 Jul 2015 09:13:30 -0400 Message-ID: <55B0E872.8030206@hitachi.com> Date: Thu, 23 Jul 2015 22:13:22 +0900 From: Masami Hiramatsu Organization: Hitachi, Ltd., Japan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Hemant Kumar , Arnaldo Carvalho de Melo CC: Peter Zijlstra , linux-kernel@vger.kernel.org, Adrian Hunter , Ingo Molnar , Paul Mackerras , Jiri Olsa , Namhyung Kim , Borislav Petkov Subject: Re: Re: [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support References: <20150715091352.8915.87480.stgit@localhost.localdomain> <55A7215F.40803@linux.vnet.ibm.com> <55A874C6.5030202@hitachi.com> <55AFA4E2.4040801@linux.vnet.ibm.com> In-Reply-To: <55AFA4E2.4040801@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7778 Lines: 196 On 2015/07/22 23:12, Hemant Kumar wrote: > Hi Masami, > > Apologies for the delayed response. > > On 07/17/2015 08:51 AM, Masami Hiramatsu wrote: >> Hi Hemant, >> >> On 2015/07/16 12:13, Hemant Kumar wrote: >>> Hi Masami, >>> >>> On 07/15/2015 02:43 PM, Masami Hiramatsu wrote: >>>> Hi, >>>> >>>> Here is the 2nd version of the patchset for probe-cache and >>>> initial SDT support which are going to be perf-cache finally. >>> Thanks for adding the SDT support. >>> >>>> The perf-probe is useful for debugging, but it strongly depends >>>> on the debuginfo. Without debuginfo, it is just a frontend of >>>> ftrace's dynamic events. This can usually happen in server >>>> farms or on cloud system, since no one wants to distribute >>>> big debuginfo packages. >>>> >>>> To solve this issue, I had tried to make a pre-analyzed probes >>>> ( https://lkml.org/lkml/2014/10/31/207 ) but it has a problm >>>> that we can't ensure the probed binary is same as what we analyzed. >>>> Arnaldo gave me an idea to reuse build-id cache for that perpose >>>> and this series is the first prototype of that. >>>> >>>> At the same time, Hemant has started to support SDT probes which >>>> also use the cache file of SDT info. So I decided to merge this >>>> into the same build-id cache. >>>> In this version, SDT support is still very limited, it works >>>> as a part of probe-cache. >>>> >>>> In this version, perf probe supports --cache option which means >>>> that perf probe manipulate probe caches, for example, >>>> >>>> # perf probe --cache --add "probe-desc" >>>> >>>> does not only add probe events but also add "probe-desc" and >>>> it's result on the cache. (Note that the cached entry is always >>>> referred even without --cache) >>>> The --list and --del commands also support --cache. Note that >>>> both are only manipulate caches, not real events. >>>> >>>> To use SDT, we have to scan the target binary at first by using >>>> perf-buildid-cache, e.g. >>>> >>>> # perf buildid-cache --add /lib/libc-2.17.so >>>> >>>> And perf probe --cache --list shows what SDTs are scanned. >>>> >>>> # perf probe --cache --list >>>> /usr/lib/libc-2.17.so (a6fb821bdf53660eb2c29f778757aef294d3d392): >>>> libc:setjmp=setjmp >>>> libc:longjmp=longjmp >>>> libc:longjmp_target=longjmp_target >>>> libc:memory_heap_new=memory_heap_new >>>> libc:memory_sbrk_less=memory_sbrk_less >>>> libc:memory_arena_reuse_free_list=memory_arena_reuse_free_list >>>> libc:memory_arena_reuse=memory_arena_reuse >>>> ... >>>> >>>> To use the SDT events, perf probe -x BIN %SDTEVENT allows you to >>>> add a probe on SDTEVENT@BIN. >>>> >>>> # perf probe -x /lib/libc-2.17.so %memory_heap_new >>>> >>>> If you define a cached probe with event name, you can also reuse >>>> it as same as SDT events. >>>> >>>> # perf probe -x ./perf --cache -n 'myevent=dso__load $params' >>>> >>>> (Note that "-n" option only updates caches) >>>> To use the above "myevent", you just have to add "%myevent". >>>> >>>> # perf probe -x ./perf %myevent >>>> >>>> >>>> TODOs: >>>> - Show available cached/SDT events by perf-list >>>> - Allow perf-record to use cached/SDT events directly >>> As I was already working on SDT events' recording >>> https://lkml.org/lkml/2014/11/2/73, >>> I can re-spin the patches on top of your patchset and make the >>> required changes to implement the above TODOs. >> Sounds great! :) >> Note that you'll need to re-implement almost from scratch, since >> now the SDT is implemented on buildid-cache. Maybe I have to work >> on the buildid-cache one more to filter out binaries which are gone >> or different version from current running one (e.g. old vmlinux). >> It could help you to get available SDTs when showing it via perf-list. > > Sure. That would be great. > >>> What would you suggest? >> Now I'm thinking that we should avoid using %event syntax for perf-list >> and perf-record to avoid confusion. For example, suppose that we have >> "libfoo:bar" SDT event, when we just scanned the libfoo binary and >> use it via perf-record, we'll run perf record -e "%libfoo:bar". >> However, after we set the probe via perf-probe, we have to run >> perf record -e "libfoo:bar". That difference looks no good. >> So, I think in both case it should accept -e "libfoo:bar" syntax. > > Although I agree to have "perf record" as a higher level tool and not bother > this tool to distinguish between its events, but that way we end up looking > into kprobe_events, uprobe_events, kernel tracepoints and then the entire > cache for any event (which may or may not be an SDT event or even a valid > event) lookup. Right? Yeah, right. > > The idea behind '%' was to identify the SDT events and take a different path > to lookup through the cache, put a probe, record and then delete the probe. > Or, do you want "perf record" to record any event this way (not just an sdt > event). I see, but I think that is not good by following reasons, - when we record event with "-e %provider:event", it will be shown as "provider:event" - if perf-list shows the SDT(cached) events as "%provider:event", that will not match the recorded result. - it is somewhat fragile that we temporary add the SDT event and remove it after record, because the event will not hide from ftrace users (this means that we'll fail removing the event by -EBUSY if someone use it via ftrace) - if we set SDT events perf-probe, it will be shown as "provider:event" name because "%" will be rejected by ftrace. In that case, what the perf-list show those events, both of %provider:event and provider:event ? thus I pushed the "%" as a "special remembering mark" only for looking up the event from cache by perf-probe. So I'd like to suggest that the following behavior 1) perf-list shows the cached-with-name and SDT events as Tracepoint events even if it is not yet probed. # perf list List of pre-defined events (to be used in -e): ... libc:memory_heap_new [Tracepoint event] ... probes:myevent [Tracepoint event] ... 2) perf-record -e with no-probed event should try to set up the given probe by using perf-probe. It is possible to remove that the probe after recording, but also ignore if it fails by -EBUSY. (anyway, there is no difference for users) This rule will solve the contradiction between the event name on recorded data and listed events. However, as we discussed there are other clashes. A) clash among binaries: Since the binary builders can freely use the provider name, it is possible to clash to other binaries' SDTs. B) clash among different versions: Of course the different versions of binaries can be co-exist on the system. Those usually have the same SDTs and same basename, just different build-ids. These issues are not solved by using "%" because it happens among SDTs. So we need to find another way to distinguish the SDTs. Thank you, > > Please correct me if I missed something. > >> In this series I've introduced %event syntax only to recall cached event >> setting explicitly, because perf-probe is a lower layer tool to set up >> new event. IMO, perf-list and perf-record should be higher tools which >> handle abstract events. >> >> Thanks! >> >> > -- Masami HIRAMATSU Linux Technology Research Center, System Productivity Research Dept. Center for Technology Innovation - Systems Engineering Hitachi, Ltd., Research & Development Group E-mail: masami.hiramatsu.pt@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/