Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5636071rwr; Mon, 1 May 2023 08:37:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7puqvPk6nul2T0H2QVm32URpqxJbw21L2O3R7KPx95obbgrscMlnMJb9rtxizqdrZy/5PV X-Received: by 2002:a17:90b:1a81:b0:247:78c0:125e with SMTP id ng1-20020a17090b1a8100b0024778c0125emr22222650pjb.15.1682955421554; Mon, 01 May 2023 08:37:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682955421; cv=none; d=google.com; s=arc-20160816; b=yxSoB7RBpJgDMjNPa8P99H6qTVAk6+9hsRnZ6FDFw1pLjr5uMBgDhCmhkM0tfLaHKc cBh+kMzZE3WnTJYPP6ZodvsBPezXhv2SD9NnRtftVT3jyqi2a+NEKS4wAg3MQ0ArQpKK LqEfomuIVZ2+SOa0i6cffNk0yj7/9M+jKKMWNg6lV6GoMa+rA0TCHT7MjtREE3iNoO5y mvYtJXKVMktSBOzCU26XrWnEG1HB5jNTLDS/iOYdkuoUALSbe9xP17C4NhGZGQuxNun5 68dRtGUu/NyPKGyt1HQBhkFER5ZO3AXYbGP+jFz/4wSZEy/WKq7K7LW0AY2Y529xi0ef ntOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=XoLH4MoVzWMKPmuaWwNpwbaq4+1Ny7M8bJ2SpA3CcL4=; b=d3DcnvO/jx6LbO4zw+4U3Y+GMZg9uHcV4RB6YVsMYYQFRE4CwPj9rANwuCmEeKJFdo aEadpuufDc4/Wcji2Cp90yFLJ2co/SIIArtu/3Lszj+IX+m0m7iyldpUwsGKi8rMq12z HgJ8ROQod4KbHL/31RVjWGg99S1325n7zv6J+GZFyAVHl3564YuplRUwefGFuSKN5CLW dZFweKF5Onw22Bs/7FWiJ8rguJCYTr+xRs92Q0Rmz4QoysOM6oSNsfRMOhPxFceBXSYf vlqC2mEVe0DI6B8u9+sl3x07KfCsFC1uF7+8Ka785oFyvu3K2xjubSFmPB1roaX7gyb/ VFJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=DmI7HG5E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a1-20020a17090acb8100b0024df18639fasi3879285pju.154.2023.05.01.08.36.47; Mon, 01 May 2023 08:37:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=DmI7HG5E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232181AbjEAPaL (ORCPT + 99 others); Mon, 1 May 2023 11:30:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229928AbjEAPaJ (ORCPT ); Mon, 1 May 2023 11:30:09 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1D4ABF for ; Mon, 1 May 2023 08:30:07 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id e9e14a558f8ab-330ec047d3bso67555ab.0 for ; Mon, 01 May 2023 08:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682955007; x=1685547007; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XoLH4MoVzWMKPmuaWwNpwbaq4+1Ny7M8bJ2SpA3CcL4=; b=DmI7HG5EBEIEK3reaDygTCNrN8k7T1UISGQ3x6UQdiotEs4tNi43PXUbGX6LJn5l3U 1+TI+iszRtmvlr0fihUDa5ECLxYujRvcUVYVMaH/WK3BLo2Ba1xgAeRJgIa0JaGuvdej 6hzTSYECWbnVAGXJW5DoAsWmxbWrdVJudi0RGWKIF3OmEMlYDr/0ZjK+4T4qdjByHNac MMuWQnsOuHgg0C/+S2OFLu9A1u/xWWSv5tAbAcZMFqvJEP9CLGsmkJmsRHTx2Zx4q4hL YN4eH5qcWogZm/hgxX5RtBxMRQuk3kisJR9ic5iJ2N+7uCIOLrJls8UhJWxhYOkQ7Z7d pGXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682955007; x=1685547007; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XoLH4MoVzWMKPmuaWwNpwbaq4+1Ny7M8bJ2SpA3CcL4=; b=LeyRS0OBstJF3o62afDygmibwTekrFrxDcVOHK3hsllw1S5r1J5QROJ12fu+G/M2UB 74JNcZg9fDaVv3pvr2CfhXtXwejOqUs3YSSawiFb87wAkaBWxen5jKZ5NgdMBKQV4Gsr EnCV5h9aiOTCSc15v5skhib/QBgJO+WlKEZCxFL7SvRuwnMYo4aToAjbLajTxdelVAb4 2RwDE2aYpOySKJm4YL7hHWy0KXTWSNF5NnGJa03T8T2RA71QKSgSeSYF+yQieclmSLTu By1pn7QA+gTrEQT366IYi0rUffmUKrkNVkvIMF/n72/Z9nOHUJH7Cao03PK01ZGjlfLD CeiA== X-Gm-Message-State: AC+VfDzJy+vMSC99ASRG4Hum2VT3OPebmL5a6UKh78aYxhUTwcMadl6Y HBDt/yj+fc+P201arMD48FRCcTiWU9ljKwPsaXRk7A== X-Received: by 2002:a05:6e02:b49:b0:32a:642d:2a13 with SMTP id f9-20020a056e020b4900b0032a642d2a13mr498768ilu.6.1682955006738; Mon, 01 May 2023 08:30:06 -0700 (PDT) MIME-Version: 1.0 References: <20230429053506.1962559-1-irogers@google.com> <20230429053506.1962559-4-irogers@google.com> In-Reply-To: From: Ian Rogers Date: Mon, 1 May 2023 08:29:55 -0700 Message-ID: Subject: Re: [PATCH v3 03/46] perf stat: Introduce skippable evsels To: "Liang, Kan" , Weilin Wang Cc: Arnaldo Carvalho de Melo , Ahmad Yasin , Peter Zijlstra , Ingo Molnar , Stephane Eranian , Andi Kleen , Perry Taylor , Samantha Alt , Caleb Biggers , Edward Baker , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Florian Fischer , Rob Herring , Zhengjun Xing , John Garry , Kajol Jain , Sumanth Korikkar , Thomas Richter , Tiezhu Yang , Ravi Bangoria , Leo Yan , Yang Jihong , James Clark , Suzuki Poulouse , Kang Minchul , Athira Rajeev , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 1, 2023 at 7:56=E2=80=AFAM Liang, Kan wrote: > > > > On 2023-04-29 1:34 a.m., Ian Rogers wrote: > > Perf stat with no arguments will use default events and metrics. These > > events may fail to open even with kernel and hypervisor disabled. When > > these fail then the permissions error appears even though they were > > implicitly selected. This is particularly a problem with the automatic > > selection of the TopdownL1 metric group on certain architectures like > > Skylake: > > > > ''' > > $ perf stat true > > Error: > > Access to performance monitoring and observability operations is limite= d. > > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open > > access to performance monitoring and observability operations for proce= sses > > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability. > > More information can be found at 'Perf events and tool security' docume= nt: > > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html > > perf_event_paranoid setting is 2: > > -1: Allow use of (almost) all events by all users > > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK > >> =3D 0: Disallow raw and ftrace function tracepoint access > >> =3D 1: Disallow CPU event access > >> =3D 2: Disallow kernel profiling > > To make the adjusted perf_event_paranoid setting permanent preserve it > > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid =3D ) > > ''' > > > > This patch adds skippable evsels that when they fail to open won't > > cause termination and will appear as "" in output. The > > TopdownL1 events, from the metric group, are marked as skippable. This > > turns the failure above to: > > > > ''' > > $ perf stat perf bench internals synthesize > > Computing performance of single threaded perf event synthesis by > > synthesizing events on the perf process itself: > > Average synthesis took: 49.287 usec (+- 0.083 usec) > > Average num. events: 3.000 (+- 0.000) > > Average time per event 16.429 usec > > Average data synthesis took: 49.641 usec (+- 0.085 usec) > > Average num. events: 11.000 (+- 0.000) > > Average time per event 4.513 usec > > > > Performance counter stats for 'perf bench internals synthesize': > > > > 1,222.38 msec task-clock:u # 0.993 CPU= s utilized > > 0 context-switches:u # 0.000 /se= c > > 0 cpu-migrations:u # 0.000 /se= c > > 162 page-faults:u # 132.529 /se= c > > 774,445,184 cycles:u # 0.634 GHz= (49.61%) > > 1,640,969,811 instructions:u # 2.12 ins= n per cycle (59.67%) > > 302,052,148 branches:u # 247.102 M/s= ec (59.69%) > > 1,807,718 branch-misses:u # 0.60% of = all branches (59.68%) > > 5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/s= ec > > # 17.3 % tma_fro= ntend_bound > > # 56.4 % tma_ret= iring > > # nan % tma_bac= kend_bound > > # nan % tma_bad= _speculation (60.01%) > > 536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/s= ec (60.33%) > > INT_MISC.RECOVERY_CYCLES_ANY:u > > 5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274= M/sec (40.31%) > > 774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/s= ec (50.34%) > > 1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/s= ec (50.12%) > > 1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/s= ec (49.70%) > > > > 1.231055525 seconds time elapsed > > > > 0.258327000 seconds user > > 0.965749000 seconds sys > > > Which branch is this patch series based on? > > I still cannot get the same output as the examples. > > I'm using the latest perf-tools-next (The latest commit ID is > 5d27a645f609 ("perf tracepoint: Fix memory leak in is_valid_tracepoint()"= )). > I only applied patch 2 and patch 3, since the patch 1 is already merged. > > It's a single socket Cascade Lake. with kernel 5.19-8. > $ uname -r > 5.19.8-100.fc35.x86_64 > > As you can see, all the topdown related events are displayed twice. > > With root permission, > > $ sudo ./perf stat perf bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 91.487 usec (+- 0.050 usec) > Average num. events: 47.000 (+- 0.000) > Average time per event 1.947 usec > Average data synthesis took: 97.720 usec (+- 0.059 usec) > Average num. events: 245.000 (+- 0.000) > Average time per event 0.399 usec > > Performance counter stats for 'perf bench internals synthesize': > > 2,077.81 msec task-clock # 0.998 CPUs > utilized > 466 context-switches # 224.274 /sec > 4 cpu-migrations # 1.925 /sec > 775 page-faults # 372.988 /sec > 9,561,957,326 cycles # 4.602 GHz > (31.17%) > 24,466,854,021 instructions # 2.56 insn > per cycle (37.42%) > 5,547,892,196 branches # 2.670 > G/sec (37.48%) > 37,880,526 branch-misses # 0.68% of > all branches (37.52%) > 49,576,109 CPU_CLK_UNHALTED.REF_XCLK # 23.860 M/sec > # 59.9 % tma_retir= ing > # 4.6 % > tma_bad_speculation (37.47%) > 228,406,003 INT_MISC.RECOVERY_CYCLES_ANY # 109.926 > M/sec (37.52%) > 49,591,815 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.867 > M/sec (24.99%) > 9,553,472,893 CPU_CLK_UNHALTED.THREAD # 4.598 > G/sec (31.25%) > 22,893,372,651 UOPS_RETIRED.RETIRE_SLOTS # 11.018 > G/sec (31.23%) > 24,180,375,299 UOPS_ISSUED.ANY # 11.637 > G/sec (31.25%) > 49,562,300 CPU_CLK_UNHALTED.REF_XCLK # 23.853 M/sec > # 28.1 % > tma_frontend_bound > # 7.2 % > tma_backend_bound (31.24%) > 10,735,205,084 IDQ_UOPS_NOT_DELIVERED.CORE # 5.167 > G/sec (31.30%) > 228,798,426 INT_MISC.RECOVERY_CYCLES_ANY # 110.115 > M/sec (25.04%) > 49,559,962 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.852 > M/sec (25.00%) > 9,538,354,333 CPU_CLK_UNHALTED.THREAD # 4.591 > G/sec (31.29%) > 24,207,967,071 UOPS_ISSUED.ANY # 11.651 > G/sec (31.24%) > > 2.082670856 seconds time elapsed > > 0.812763000 seconds user > 1.252387000 seconds sys The events are displayed twice as there are 2 groups of events. This is changed by: https://lore.kernel.org/lkml/20230429053506.1962559-5-irogers@google.com/ where the events are no longer grouped. > With non-root, nothing is counted for the topdownL1 events. > > $ ./perf stat perf bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 91.852 usec (+- 0.139 usec) > Average num. events: 47.000 (+- 0.000) > Average time per event 1.954 usec > Average data synthesis took: 96.230 usec (+- 0.046 usec) > Average num. events: 245.000 (+- 0.000) > Average time per event 0.393 usec > > Performance counter stats for 'perf bench internals synthesize': > > 2,051.95 msec task-clock:u # 0.997 CPUs > utilized > 0 context-switches:u # 0.000 /sec > 0 cpu-migrations:u # 0.000 /sec > 765 page-faults:u # 372.816 /sec > 3,601,662,523 cycles:u # 1.755 GHz > (16.72%) > 9,241,811,003 instructions:u # 2.57 insn > per cycle (33.43%) > 2,238,848,485 branches:u # 1.091 > G/sec (50.06%) > 19,966,181 branch-misses:u # 0.89% of > all branches (66.77%) > CPU_CLK_UNHALTED.REF_XCLK:u > INT_MISC.RECOVERY_CYCLES_ANY:u > CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u > CPU_CLK_UNHALTED.THREAD:u > UOPS_RETIRED.RETIRE_SLOTS:u > UOPS_ISSUED.ANY:u > CPU_CLK_UNHALTED.REF_XCLK:u > IDQ_UOPS_NOT_DELIVERED.CORE:u > INT_MISC.RECOVERY_CYCLES_ANY:u > CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u > CPU_CLK_UNHALTED.THREAD:u > UOPS_ISSUED.ANY:u > > 2.057691297 seconds time elapsed > > 0.766640000 seconds user > 1.275170000 seconds sys The reason nothing is counted is that all the metrics are trying to share groups with the event INT_MISC.RECOVERY_CYCLES_ANY in them, which means the whole group ends up being not supported. Again, the patch above that removes the groups for TopdownL1 and TopdownL2, as requested by Weilin, will address the issue. Thanks, Ian > Thanks, > Kan >