Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5144790pxj; Wed, 9 Jun 2021 10:07:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNtYloR8dfrNmD8IXwRoyHlj7B8aBWPMpuwRS6FFdXqIMgGH3AiAlOzD0T6DENhFaNCRs1 X-Received: by 2002:a17:906:1401:: with SMTP id p1mr828711ejc.526.1623258469802; Wed, 09 Jun 2021 10:07:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623258469; cv=none; d=google.com; s=arc-20160816; b=fQQJA73quatZOwF2QFZlZbowJMiVptD0aM1FYEbBAptUUjplxaHpCSpI++v38Pa5Bb g3BmpbObLN/zrnVPob64MbxFDZnJijoJn8pOsjEHiH4IDqtUWnFHwp+szsWENqdd/y7P Z5TEMdk9TCW9jODM2kuU4czb3QBkBQC4Mfo2gi2o0Yk5rc9Mm+k/bJQNFseubfxo6+dB jwLlJwOvNsaxjVcEam0zCuPOX+YimJzLEbspYSe996aZ6XCk5a4KknG/f/gCmXUajYFE k417WVvY84in0vWMyakGTISC1O/kcpPXY/GDvSrxoQmw2WjR5h0oj2n2MiDPYZU+mvMT RsMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+TUXdESrfWsLAB7ujMtiaWOI2wfBx49V15E9i1S/Dz4=; b=EMSqx9FDWhOZjk5308ilUcaRW0dogKDw/eIthYlwxuzw8X3HD4xOE7kyhQlsyUnCzz iCrH1jP/bdhK2EmcFt8ATLz27xqBxhMXpSNuo5MVzyJlViDLPv4Ik+BRxRa453aYxXso 6E20r7KiD86+TDtksjpaA9pAqZ8h+pQk7G+0DqqMMF8p+FhXp3hbfVFjv62uetgWZrno lmgbDfN7DYhJRVbW9iTzMuwPDM0NhoFHsmNEPQzHiZ/LsgXzt28Ygs7qxynoOXHHPvGG lqHlCwMnP+iQnPNLzaUAQfvhzSkVgqfoOsSsDKJvXRI2ddRxrPUqj5eGunHf58xh7QRV wRlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Box34dRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s3si252465ejs.220.2021.06.09.10.07.25; Wed, 09 Jun 2021 10:07:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Box34dRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232001AbhFIGSq (ORCPT + 99 others); Wed, 9 Jun 2021 02:18:46 -0400 Received: from mail-wm1-f48.google.com ([209.85.128.48]:51764 "EHLO mail-wm1-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232973AbhFIGSp (ORCPT ); Wed, 9 Jun 2021 02:18:45 -0400 Received: by mail-wm1-f48.google.com with SMTP id l9so3259770wms.1 for ; Tue, 08 Jun 2021 23:16:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+TUXdESrfWsLAB7ujMtiaWOI2wfBx49V15E9i1S/Dz4=; b=Box34dRXLbv4PwyoR/kYUCsIQsPyADF7ThRntyYkuOlB/4vg0VOlFp1bhTa/GFU/QX GmflTLqJDPbaEpoK7COWWuQaXadBmffOUud31umyLrC/6KNCdXw2B97d6gmCT3VG4PMf WOnSmLDGYTwBgX9mH4GM4DlXqck2lQeqpdybR+CKHNuUDdGEZD7009zD0wm8+u68AS3f 21XtzESOb4M/Xcn/AfQzAq1x6NZaYzydbtgmUsh7b3B9Ibtz1LGPk3wWnFo8sMQqia72 9SHw0wSf1jcI12jUHIIY8ucWKmT5rlzkGAgYmNTKJ65/EY0BZctQcb/+ua8kl/SkVszX I5cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+TUXdESrfWsLAB7ujMtiaWOI2wfBx49V15E9i1S/Dz4=; b=b4iaooqJy0HekQSpXNTKp2T2aO4VLMzInxTmqJExwZn2d3zvSjVI5zhSfhdG0HfiMK iNRoCFTm15gCMxTqixZv8erL4UH7/Rt+agLuEQvFor2xmVL7dcCQl+0xtOI07NtRQ4fE uekq+r2pqiRkor6Hls6zemPZRcExGZE4+NjDc2JJN1MG1FmUgQCmKPWsZMzeQW3h0cLb XqPMJvJjkaKgC1ycQtMbri4cqvBVVOkFkhjIDKPEQ5h2jXA28eL0cR0J7/0/J0UjoC3Z 7/lFbpZnwWjXMaadE3OMLDz+avRF+v0qvff1+D8Psb1ZORgHjLivPj5XuoV6T+X9Hh0w Wucg== X-Gm-Message-State: AOAM532ELZuvRlTtikF1Bl3I53JgLfxDU8lIhCPMAWClWhjaYlzOZmKg kBbo+AkegIW83KJaw0PhkD/G5l7nx3FmQ8oxg0MjBxyBpJ1qQG5Fv3mcvKNVd6wYPUFQyhx7uhK LxVXcdJd78crv/rFPaXyv+9k/bdwz2vs= X-Received: by 2002:a1c:2015:: with SMTP id g21mr7984981wmg.87.1623219350466; Tue, 08 Jun 2021 23:15:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ian Rogers Date: Tue, 8 Jun 2021 23:15:38 -0700 Message-ID: Subject: Re: perf tool: Issues with metricgroups To: John Garry Cc: Arnaldo Carvalho de Melo , Jiri Olsa , "linux-kernel@vger.kernel.org" , "linux-perf-users@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" X-ccpol: medium Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 7, 2021 at 10:21 AM John Garry wrote: > > Hi guys, > > I am finding a couple of issues in metricgroup support. Firstly I have > found a segfault (which I caused with my "fix" in commit 9c880c24cb0d), > and a fix for that is at the bottom. > > Another issue is that the ordering of the metrics we supply for stat > command causes an issue. > > On my broadwell, this works ok: > > sudo ./perf stat -vv -M backend_bound,retiring sleep 1 > Using CPUID GenuineIntel-6-3D-4 > metric expr 1 - ( (idq_uops_not_delivered.core / (4 * cycles)) + (( > uops_issued.any - uops_retired.retire_slots + 4 * > int_misc.recovery_cycles ) / (4 * cycles)) + (uops_retired.retire_slots > / (4 * cycles)) ) for Backend_Bound > found event uops_issued.any > found event cycles > found event idq_uops_not_delivered.core > found event int_misc.recovery_cycles > found event uops_retired.retire_slots > metric expr uops_retired.retire_slots / (4 * cycles) for Retiring > found event cycles > found event uops_retired.retire_slots > adding > {uops_issued.any,cycles,idq_uops_not_delivered.core,int_misc.recovery_cycles,uops_retired.retire_slots}:W,{cycles,uops_retired.retire_slots}:W > uops_issued.any -> cpu/umask=0x1,(null)=0x1e8483,event=0xe/ > idq_uops_not_delivered.core -> cpu/umask=0x1,(null)=0x1e8483,event=0x9c/ > int_misc.recovery_cycles -> > cpu/umask=0x3,(null)=0x1e8483,cmask=0x1,event=0xd/ > uops_retired.retire_slots -> cpu/umask=0x2,(null)=0x1e8483,event=0xc2/ > uops_retired.retire_slots -> cpu/umask=0x2,(null)=0x1e8483,event=0xc2/ > Control descriptor is not initialized > uops_issued.any: 1655376 547221 547221 > cycles: 1665343 547221 547221 > idq_uops_not_delivered.core: 1983394 547221 547221 > int_misc.recovery_cycles: 69571 547221 547221 > uops_retired.retire_slots: 1311124 547221 547221 > > Performance counter stats for 'sleep 1': > > 1,655,376 uops_issued.any # 0.41 > Backend_Bound > 1,665,343 cycles > > # 0.20 Retiring > > 1,983,394 idq_uops_not_delivered.core > > 69,571 int_misc.recovery_cycles > > 1,311,124 uops_retired.retire_slots > > > 1.000992470 seconds time elapsed > > 0.001031000 seconds user > 0.000000000 seconds sys > ----/--- > > But when I reorder the metrics, it fails: > > sudo ./perf stat -v -M retiring,backend_bound sleep 1 > Using CPUID GenuineIntel-6-3D-4 > metric expr uops_retired.retire_slots / (4 * cycles) for Retiring > found event cycles > found event uops_retired.retire_slots > metric expr 1 - ( (idq_uops_not_delivered.core / (4 * cycles)) + (( > uops_issued.any - uops_retired.retire_slots + 4 * > int_misc.recovery_cycles ) / (4 * cycles)) + (uops_retired.retire_slots > / (4 * cycles)) ) for Backend_Bound > found event uops_issued.any > found event cycles > found event idq_uops_not_delivered.core > found event int_misc.recovery_cycles > found event uops_retired.retire_slots > adding > {cycles,uops_retired.retire_slots}:W,{uops_issued.any,cycles,idq_uops_not_delivered.core,int_misc.recovery_cycles,uops_retired.retire_slots}:W > uops_retired.retire_slots -> cpu/umask=0x2,(null)=0x1e8483,event=0xc2/ > uops_issued.any -> cpu/umask=0x1,(null)=0x1e8483,event=0xe/ > idq_uops_not_delivered.core -> cpu/umask=0x1,(null)=0x1e8483,event=0x9c/ > int_misc.recovery_cycles -> > cpu/umask=0x3,(null)=0x1e8483,cmask=0x1,event=0xd/ > uops_retired.retire_slots -> cpu/umask=0x2,(null)=0x1e8483,event=0xc2/ > Control descriptor is not initialized > cycles: 1695223 563189 563189 > uops_retired.retire_slots: 1343463 563189 563189 > uops_issued.any: 0 563189 0 > cycles: 0 563189 0 > idq_uops_not_delivered.core: 0 563189 0 > int_misc.recovery_cycles: 0 563189 0 > uops_retired.retire_slots: 0 563189 0 > > Performance counter stats for 'sleep 1': > > 1,695,223 cycles > > # 0.20 Retiring > > 1,343,463 uops_retired.retire_slots > > uops_issued.any > (0.00%) > cycles > (0.00%) > idq_uops_not_delivered.core > (0.00%) > int_misc.recovery_cycles > (0.00%) > uops_retired.retire_slots > (0.00%) > > 1.000980001 seconds time elapsed > > 0.001028000 seconds user > 0.000000000 seconds sys > > ----/--- > > I think that it may be related to changes when introducing hashmap for > evsel (that's before I started fiddling with metricgroups). > Specifically, it looks to be in metricgroup__setup_events() -> > find_evsel_group(). We seem to be enabling wrong evsels. > > I'll continue to look at this, but any help would be appreciated. > > Thanks, > John Hi John, The fix to avoid uncore_ events being deduplicated against each other added complexity to the code and means that metric-no-group doesn't really work any more. I have it on my list of things to look at. It relates to what you are looking at as the deduplication afterward is tricky given the funny invariants on evsel names. I think it would be easier to deduplicate events before doing the event parse. It may also be good to change evsels so that they own the string for their name (this would mean uncore_imc events could have unique names and not get deduplicated against each other). The invariants around cycles in your change look weird, but I can see how it might workaround an issue. My attempts to reproduce the issue weren't successful on a SkylakeX. Thanks for reporting the issues. I planned to look at this logic to fix metric-no-group, it'd be nice to land: https://lore.kernel.org/lkml/20210112230434.2631593-1-irogers@google.com/ just so that I'm not making patch sets that conflict with myself. Ian > perf metricgroup: Fix find_evsel_group() > > The following command segfaults on my x86 broadwell: > > $ ./perf stat -M frontend_bound,retiring,backend_bound,bad_speculation > sleep 1 > WARNING: grouped events cpus do not match, disabling group: > anon group { raw 0x10e } > anon group { raw 0x10e } > perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' > failed. > Aborted (core dumped) > > The issue shows itself as a use-after-free in evlist__check_cpu_maps(), > whereby the leader of an event selector (evsel) has been deleted (yet we > still attempt to verify for an evsel). > > Fundamentally the problem comes from metricgroup__setup_events() -> > find_evsel_group(), and has developed from the previous fix attempt in > commit 9c880c24cb0d ("perf metricgroup: Fix for metrics containing > duration_time"). > > The problem now is that the logic in checking if an evsel is in the same > group is subtly broken for "cycles" event. For "cycles" event, the > pmu_name is NULL; however the logic in find_evsel_group() may set an > event matched against "cycles" as used, when it should not be. > > This leads to a condition where an evsel is set, yet its leader is not. > > Fix the check for evsel pmu_name by not matching evsels when either has > a NULL pmu_name. > > Fixes: 9c880c24cb0d ("perf metricgroup: Fix for metrics containing > duration_time") > Signed-off-by: John Garry > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index 8336dd8e8098..c456fdeae06a 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -162,10 +162,10 @@ static bool contains_event(struct evsel > **metric_events, int num_events, > return false; > } > > -static bool evsel_same_pmu(struct evsel *ev1, struct evsel *ev2) > +static bool evsel_same_pmu_or_none(struct evsel *ev1, struct evsel *ev2) > { > if (!ev1->pmu_name || !ev2->pmu_name) > - return false; > + return true; > > return !strcmp(ev1->pmu_name, ev2->pmu_name); > } > @@ -288,7 +288,7 @@ static struct evsel *find_evsel_group(struct evlist > *perf_evlist, > */ > if (!has_constraint && > ev->leader != metric_events[i]->leader && > - evsel_same_pmu(ev->leader, > metric_events[i]->leader)) > + evsel_same_pmu_or_none(ev->leader, > metric_events[i]->leader)) > break; > if (!strcmp(metric_events[i]->name, ev->name)) { > set_bit(ev->idx, evlist_used); > lines 25-63/63 (END) > >