Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp5458647rwb; Mon, 21 Nov 2022 23:38:25 -0800 (PST) X-Google-Smtp-Source: AA0mqf40YVMtwe1ych5Z66u2ucOJqim9KTxkGzW3bxZMa9+Gzz53mSb5Pa4qZR1eMPdw03Xaw/8x X-Received: by 2002:a17:902:bd83:b0:17d:6603:8e45 with SMTP id q3-20020a170902bd8300b0017d66038e45mr6082114pls.173.1669102705732; Mon, 21 Nov 2022 23:38:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669102705; cv=none; d=google.com; s=arc-20160816; b=nJDHfgT79uo0qAB7hJIkz8XNwYcCqpn7ttEV3ncxXv+NC57CgLbx047NRknm+WmWqI w9dwcxT2O5D+6JHjOGF+bLVgJuvUgFCQnCdV8tBpDpb6QSBt+7ulkfL/ttd0+JSO67kv DW7BmRY3F9MULnwvE6ZvcNwHnnTEPi+pEQZ6GnuXxv9X6es4R5Kc4viONH8wjsbH3Bam WnJmNYWeX44kM+YKD1EBruSaAUGET2FZLQMYs3d1Tn0J/TppFvQc8X8wevE2iPK1/pzA 5uJ0XY1cijtGZtWGA0IlgKDn11ZRC8TxZtFBLySCbBE/1nHss33YENe8MAXNTFuuYTfa 0pJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=OVyzSlM0uI8ryawh6gsFC4UJCHs0tKBiVAe9gd5M24w=; b=CGGjIxCHrH7DqjIyR4CJB6xk1Obx8IIou5RbbQbSWsOLdfK8NNvOkwD9UVDU3SD95W h5eA15dQt/gGtUgtoN36OoX7kkCYsoo1ATXxZTlc0r9GtWrK7GkWVK0cGOPLxMh76Kkg vN+uChG8knDSk0nlyHIoBDT/tUX5YcV3A8y21yQPwgvrESo2UWbHWWyWcV7k7e9/Grcp ad9oklwxT3s0GQLeoFoLevGQIRgVjoQZtbjFsM/jCYl7NKtb7JYrnTBP/RlJFokW78SQ N0VJDWTPQ06KwA5LxkXqGdnJX6wthu/OZjX2cVFl5XahVOSywYyYeGW07TOj6M6p/FZm 5Qdw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k24-20020a63d858000000b0046ec058379bsi13688260pgj.99.2022.11.21.23.38.13; Mon, 21 Nov 2022 23:38:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232159AbiKVHLp (ORCPT + 91 others); Tue, 22 Nov 2022 02:11:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbiKVHLo (ORCPT ); Tue, 22 Nov 2022 02:11:44 -0500 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC3CB303EF; Mon, 21 Nov 2022 23:11:42 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VVR4.1Q_1669101096; Received: from 30.221.132.69(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VVR4.1Q_1669101096) by smtp.aliyun-inc.com; Tue, 22 Nov 2022 15:11:38 +0800 Message-ID: <74d26daa-69cb-41bf-5a33-229c95521536@linux.alibaba.com> Date: Tue, 22 Nov 2022 15:11:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: [PATCH RFC 0/6] Add metrics for neoverse-n2 To: James Clark , nick Forrington , Jumana MP , John Garry , Ian Rogers Cc: Will Deacon , Mike Leach , Leo Yan , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Andrew Kilroy , Shuai Xue , Zhuo Song , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org References: <1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com> <107dda1a-6053-ea35-1e29-96ee6d049eb1@arm.com> From: Jing Zhang In-Reply-To: <107dda1a-6053-ea35-1e29-96ee6d049eb1@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/11/21 下午7:51, James Clark 写道: > > > On 16/11/2022 15:26, Jing Zhang wrote: >> >> >> 在 2022/11/16 下午7:19, James Clark 写道: >>> >>> >>> On 31/10/2022 11:11, Jing Zhang wrote: >>>> This series add six metricgroups for neoverse-n2, among which, the >>>> formula of topdown L1 is from the document: >>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token= >>>> >>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such >>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to >>>> help further analysis of performance bottlenecks. >>>> >>> >>> Hi Jing, >>> >>> Thanks for working on this, these metrics look ok to me in general, >>> although we're currently working on publishing standardised metrics >>> across all new cores as part of a new project in Arm. This will include >>> N2, and our ones are very similar (or almost identical) to yours, >>> barring slightly different group names, metric names, and differences in >>> things like outputting topdown metrics as percentages. >>> >>> We plan to publish our standard metrics some time in the next 2 months. >>> Would you consider holding off on merging this change so that we have >>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we >>> will have a script to generate perf jsons from them, so you can review. >>> >> >> Do you mean that after you release the new standard metrics, I remake my >> patch referring to them, such as consistent group names and unit? > > Hi Jing, > > I was planning to submit the patch myself, but there will be a script to > generate perf json files, so no manual work would be needed. Although > this is complicated by the fact that we won't be publishing the fixed > TopdownL1 metrics that you have for the existing N2 silicon so there > would be a one time copy paste to fix that part. > >> >> >>> We also have a slightly different forumula for one of the top down >>> metrics which I think would be slightly more accurate. We don't have >> >> >> The v2 version of the patchset updated the formula of topdown L1. >> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/ >> >> The formula of the v2 version is more accurate than v1, and it has been >> verified in our test environment. Can you share your formula first and we >> can discuss it together? :) > > I was looking at v2 but replied to the root of the thread by mistake. I > also had it the wrong way round. So your version corrects for the errata > on the current version of N2 (as you mentioned in the commit message). > Our version would be if there is a future new silicon revision with that > fixed, but it does have an extra improvement by subtracting the branch > mispredicts. > > Perf doesn't currently match the jsons based on silicon revision, so > we'd have to add something in for that if a fixed silicon version is > released. But this is another problem for another time. > Hi James, Let's do what Ian said, and you can improve it later with the standard metrics, after the fixed silicon version is released. > This is the frontend bound metric we have for future revisions: > > "100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED * > 4)/CPU_CYCLES) )" > > Other changes are, for example, your 'wasted' metric, we have > 'bad_speculation', and without the > cycles subtraction: > > 100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES * > 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) ) > Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified as frontend bound? How do you judge the extra improvement by subtracting branch mispredicts? > And some more details filled in around the units, for example: > > { > "MetricName": "bad_speculation", > "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - > (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )", > "BriefDescription": "Bad Speculation", > "PublicDescription": "This metric is the percentage of total > slots that executed operations and didn't retire due to a pipeline > flush.\nThis indicates cycles that were utilized but inefficiently.", > "MetricGroup": "TopdownL1", > "ScaleUnit": "1percent of slots" > }, > My "wasted" metric was changed according to the arm documentation description, it was originally "bad_speculation". I will change "wasted" back to "bad_speculation", if you wish. Thanks, Jing > So ignoring the errata issue, the main reason to hold off is for > consistency and churn because these metrics in this format will be > released for all cores going forwards. > > Thanks > James >