Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp664312pxj; Thu, 10 Jun 2021 09:42:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx0DfO+2rCq9sNu5NbAyWfXnDE4zWOt6lRub4Se95HKmpkEcA5mSquSsZRWTemTJjUFUNyg X-Received: by 2002:a17:906:6bd8:: with SMTP id t24mr464460ejs.501.1623343342619; Thu, 10 Jun 2021 09:42:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623343342; cv=none; d=google.com; s=arc-20160816; b=GZoHxiJBaPGwdgEU0qMe8xMRL3tV+6hvMzWN9tZQi4DcKhZnlhz6tbHVyL7u/ruMl9 tMGPvM+t2M7GSSA2bft21ZiZ7qTFyeWrsaJ/XmpshMbZxxuZkVnyQGklBHhopygmMVBG hf+WU9Buc+kV3Y8ny4fl8MmrJGlzni2YhWQmaIIxT/k3mxqZJ3sDIqtBlPkXvjjREogE 0wgGKmYf4RAFOe1/LrauEqzghiZ8tuzcwgcGDY1pwIeorTPypgmZNEEBjI1un6+ZSXV+ zXfmLdhVQ0JhDxjltiSGl5/wsfOe4sT6vvVir62tdgXSXnJOoHxqnaALtjeZUmht0uGK 9/ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=3NrbkZ4gaNm3FYjrbDCPAtrIiFQJBVFjNcUyHj2TZc0=; b=qHmL+6MnQwWWlj1U9MQ2ye2Wdm1MEgSdumNx9+KI3cJNM9GHp3kvztPg4gR/nn9dRG sdHIBB1DLJBEVYMbyEL1F+xM9qu0BKUA62FOKxvW5MJKuyGmVhREn1QEpyDGhJOKnOmH /672IAPb+Hl4k+fUAWIVfIae4by0h+Esj2MjAFcCeJR0WOJZZ+VsrrUg6jdKUCpwEvyw vpjuevuMP7gGhC8QyBuQpA71AE5wNhiRuf0WxeWYcY+OEOieupxFguHj/0r7UIRkcx3i RIjvLmwHnQ7/4rwpTd7bJgAaM96wgyEWtSqyWPoxfWyamr6OlxkBC9O2TFinRYAxeq8j mIfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vXk5Nc6z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ar13si2544902ejc.73.2021.06.10.09.41.58; Thu, 10 Jun 2021 09:42:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vXk5Nc6z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230077AbhFJQlb (ORCPT + 99 others); Thu, 10 Jun 2021 12:41:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229823AbhFJQl3 (ORCPT ); Thu, 10 Jun 2021 12:41:29 -0400 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39851C061574 for ; Thu, 10 Jun 2021 09:39:17 -0700 (PDT) Received: by mail-pf1-x433.google.com with SMTP id p13so2114244pfw.0 for ; Thu, 10 Jun 2021 09:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3NrbkZ4gaNm3FYjrbDCPAtrIiFQJBVFjNcUyHj2TZc0=; b=vXk5Nc6zTRjyHXZeqYqRph0d3uohuXDQMuPmlDb453IXgCHpGm4VSAqhRI7mJpkDSR h0wry1rKPCMkHOR5rmdKpKbZ1uuOmBD4KRwOXwUdgv0P2kP6MuSCiW+M58AfyQXYBung +wGlGYF+4TJQ7SchzkDDuXWEXjjihL5yJXc7ETBAlJoYmVAZgpsNLwBL83e4H4pAuCO1 QrnnV5uuFoP3VnyhSDkf2J/+/0KIJqREz8BnBpk7Z6zzuNK51ZlvzQmI+lBkklpRxBaD z+Ga5H1y4XQVRij0r+jNdNCYZlkEwPekmwhw9ry29KEoa/+evjZ6GIu+0AANW8/Hv4wA yPQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3NrbkZ4gaNm3FYjrbDCPAtrIiFQJBVFjNcUyHj2TZc0=; b=gizaiZNxlWUA18EeLYi3P2HgLCCkGVs48aekbB6fKajjEwJmxWS88aExLS1lycplJ9 OVs1V9min5QofCeyg1jgAudbIB5mt720OId3qrqoTusWfy9MwzVqNHaZ3syyZEZte6rU VwlFHbu5sGISAL7RKWZhSMrZne1l2v2efy6gAyjX1lIOGeEwY8M75HpM0cJZP6mb6tJr KFXVMArxCDB6b/eJ0gkSZIt2H+gPj+HL99z+nmHhmQuSOirKJOiDSxJzFiXpUUtoy0Lv qKS0tDirtXNrFRgRJ2+MRIinJ8QRszavqvEtKXO7o4TetmU2PzQru38j6nszjnEoxKUj 2uoQ== X-Gm-Message-State: AOAM531mDouS5ILIMhww1sH5WDA7zi46oS3lHS2loe3ykKTzlNSWtvt0 snsYkZbPE4um6te9KHrWQPjwTw== X-Received: by 2002:a63:5f4e:: with SMTP id t75mr5728782pgb.75.1623343156587; Thu, 10 Jun 2021 09:39:16 -0700 (PDT) Received: from p14s (S0106889e681aac74.cg.shawcable.net. [68.147.0.187]) by smtp.gmail.com with ESMTPSA id bo14sm8069376pjb.40.2021.06.10.09.39.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jun 2021 09:39:15 -0700 (PDT) Date: Thu, 10 Jun 2021 10:39:13 -0600 From: Mathieu Poirier To: James Clark Cc: acme@kernel.org, coresight@lists.linaro.org, leo.yan@linaro.org, al.grant@arm.com, branislav.rankov@arm.com, denik@chromium.org, suzuki.poulose@arm.com, anshuman.khandual@arm.com, Mike Leach , John Garry , Will Deacon , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events() Message-ID: <20210610163913.GD34238@p14s> References: <20210609130421.13934-1-james.clark@arm.com> <20210610162453.GC34238@p14s> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210610162453.GC34238@p14s> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 10, 2021 at 10:24:53AM -0600, Mathieu Poirier wrote: > This patch doesn't apply on coresight next. I was able to apply this patch to Arnaldo's tree. > > On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote: > > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and > > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record. > > > > This can cause the "data has no samples!" error if the first > > PERF_RECORD_AUX record comes before the first (or any relevant) > > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder > > to access the binary data. > > > > This change pushes the start of non-timeless decoding to the very end of > > parsing the file. The PERF_RECORD_EXIT event can't be used because it > > might not exist in system-wide or snapshot modes. > > > > I have not been able to find the exact cause for the events to be > > intermittently in the wrong order in the basic scenario: > > > > perf record -e cs_etm/@tmc_etr0/u top > > > > But it can be made to happen every time with the --delay option. This is > > because "enable_on_exec" is disabled, which causes tracing to start > > before the process to be launched is exec'd. For example: > > > > perf record -e cs_etm/@tmc_etr0/u --delay=1 top > > perf report -D | grep 'AUX\|MAP' > > > > 0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 [] > > 0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 [] > > 0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 [] > > 4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top > > 4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so > > > > Another scenario in which decoding from the first aux record fails is a > > workload that forks. Although the aux record comes after 'bash', it > > comes before 'top', which is what we are interested in. For example: > > > > perf record -e cs_etm/@tmc_etr0/u -- bash -c top > > perf report -D | grep 'AUX\|MAP' > > > > 4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash > > 4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so > > 4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] > > 0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 [] > > ... > > 0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 [] > > 4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top > > 4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so > > 4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] > > > > A third scenario is when the majority of time is spent in a shared > > library that is not loaded at startup. For example a dynamically loaded > > plugin. > > > > Testing > > ======= > > > > Testing was done by checking if any samples that are present in the > > old output are missing from the new output. Timestamps must be > > stripped out with awk because now they are set to the last AUX sample, > > rather than the first: > > > > ./perf script $4 | awk '!($4="")' > new.script > > ./perf-default script $4 | awk '!($4="")' > default.script > > comm -13 <(sort -u new.script) <(sort -u default.script) > > > > Testing showed that the new output is a superset of the old. When lines > > appear in the comm output, it is not because they are missing but > > because [unknown] is now resolved to sensible locations. For example > > last putp branch here now resolves to libtinfo, so it's not missing > > from the output, but is actually improved: > > > > Old: > > top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) > > top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) > > top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown]) > > New: > > top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) > > top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) > > top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9) > > > > In the following two modes, decoding now works and the "data has no > > samples!" error is not displayed any more: > > > > perf record -e cs_etm/@tmc_etr0/u -- bash -c top > > perf record -e cs_etm/@tmc_etr0/u --delay=1 top > > > > In snapshot mode, there is also an improvement to decoding. Previously > > samples for the 'kill' process that was used to send SIGUSR2 were > > completely missing, because the process hadn't started yet. But now > > there are additional samples present: > > > > perf record -e cs_etm/@tmc_etr0/u --snapshot -a > > perf script > > > > stress 19380 [003] 161627.938153: 1000000 instructions:uH: aaaabb612fb4 [unknown] (/usr/bin/stress) > > kill 19644 [000] 161627.938153: 1000000 instructions:uH: ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so) > > stress 19380 [003] 161627.938153: 1000000 instructions:uH: ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so) > > > > Also tested was the round trip of 'perf inject' followed by 'perf > > report' which has the same differences and improvements. > > > > Signed-off-by: James Clark > > --- > > tools/perf/util/cs-etm.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > > index 57aea2c7fc77..ceed0b038796 100644 > > --- a/tools/perf/util/cs-etm.c > > +++ b/tools/perf/util/cs-etm.c > > @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session, > > return err; > > } > > > > + /* > > + * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We > > + * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because > > + * ETM_OPT_CTXTID is not enabled. > > + */ > > if (etm->timeless_decoding && > > event->header.type == PERF_RECORD_EXIT) > > return cs_etm__process_timeless_queues(etm, > > @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session, > > * onwards. > > */ > > etm->latest_kernel_timestamp = sample_kernel_timestamp; > > - return cs_etm__process_queues(etm); > > } > > > > return 0; > > -- > > 2.28.0 > >