Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3472706pxb; Mon, 4 Apr 2022 18:04:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvhgHtg9NRgslxNomtYgKYiJSr6kzlbukxEE15UjloEiVolrjhz5npfpq6zHxCf265ga38 X-Received: by 2002:a63:eb0b:0:b0:382:a08a:8809 with SMTP id t11-20020a63eb0b000000b00382a08a8809mr747866pgh.47.1649120655227; Mon, 04 Apr 2022 18:04:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649120655; cv=none; d=google.com; s=arc-20160816; b=CHpGSuyEyYVwRnx0mrq2Eoarg91YJP45XngVF9Dt3JMsnwgXXstZIOMHMUc/veJX6A QzuRfeUNS0vMGqYzSFcOK5/85spKcY4cDcmAV3eo5quxlGoZulrGRdh/YQTir4Cv7gap fGOp4nCaa7URftUIoQ762NGUSTd/pV0Ka56Zm+IotMYWY9XCiXKE9tIqxNfH1QFCd+aj MAPxBhskzylkVPkHcFYJspI90Fmp711v4CfNFcBrjDTCtxxd0uWgqMvi4XgfwaBE+H9K Gikdn/xA6et9rZA7McaVN8yE1ukWmP1cJ/d/bTKvQtF75bs32NZVemo2neK1zX680bjt RFtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ltdVIr5hib0+ak/jhu+JrlIb6MgeCdcOgL6iYru4NN8=; b=Zb54HOWqONTMP2QDsVCRifR+gGnJd/pU0PLMyjm0t5LBG4fiLTYJzN9Rxerghu70PL BAXZmwPRZkIXXnhOdcOTQdzUrOB31zUzpy+L5nLTh3zPF3u3/lbS6nQXAnYrNb06RJYJ 32qGDbnYG7OElOXVgd+qDBEONJa9A80LTvDxICciJaoJnYZ+6R9Yz6ocIwEFPeiZkE9y czLMBMm3TUX4oygTq5bUuRXa2Di/G46GKqH+A+c/9r6QLjCI6TYJ4FN2iMCWK6F0R784 0bR1YjFW5XBZJswhEjIlw/qBiLNyCOeelORy5yhgYvj0QaBvpQ4DFD9QmBOmaq5iHGy6 gATw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dGNHzuEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id d32-20020a631d20000000b003990b9576f5si6524464pgd.425.2022.04.04.18.04.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 18:04:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dGNHzuEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2534B12F142; Mon, 4 Apr 2022 17:03:39 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378462AbiDDPO1 (ORCPT + 99 others); Mon, 4 Apr 2022 11:14:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378450AbiDDPOZ (ORCPT ); Mon, 4 Apr 2022 11:14:25 -0400 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D95C63B297 for ; Mon, 4 Apr 2022 08:12:28 -0700 (PDT) Received: by mail-ej1-x62b.google.com with SMTP id k23so17175399ejd.3 for ; Mon, 04 Apr 2022 08:12:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ltdVIr5hib0+ak/jhu+JrlIb6MgeCdcOgL6iYru4NN8=; b=dGNHzuEdI9Z2soncKftgc697gXjduv24OkaB/tZG9kZYBD7i61c5sGcz57Fokrhcpi xJsjFdfQfQitpicNqmI+ujEeoVo6k4AwomQDVht0os6sDOZsfXEJujdRcP+EOpaGzUUu jF8+yuYK/FYeSYTxhnAuNTQDG3goXouXX+nr7RTFOnkYC0150RpTnYiKCZorDYNtei6B FEDi9wI+SKG2Haudn5ZiK7bY7OlKoic79yC8qP1ZKnPhDb4l9QBNp6xofa0BVYsbpR2F E5/cCZ+DCbLDBBNzQBLYh/2dNE5eWCqHP24rmqF3oTa4LgjU5T5nBgAXwrValSVeGonw /rsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ltdVIr5hib0+ak/jhu+JrlIb6MgeCdcOgL6iYru4NN8=; b=a00VPbG4dPIRT2gEIYGIC83WC45Xma8gHAow6V/Lg4OVSy+QpWpKo6lZi/9LuyCfLT jcb0VHkKUzjNQRkmd+IollSOXyq7xL0f1gSMvmh6/lmXe+mCJ4D+ZO6Y41rZyn8wzWxA GY4Y3F9rHRUCb48LDGR2JMB1VkT5bANfc+YRSDfFuVSzr8Yq0za5F/hcfTR1qtuL/Jh6 loRzL2OQfLHjIRPQcaQQZI4102mbctxf7JtcG6KKLkJjB13zVW8V0KFre+FxeEn3r+NK BortSq3NFUHzNDgDu2AaBzK84wbV+nHCHX58qkzjud1cPJNQ060XRuSFd1GjrglCJZL8 KnSw== X-Gm-Message-State: AOAM533QLxTZKPqZxidzrhx6bFP1nd78qYGPtmM27Sxg6jKeruDPnYPE sQpTkj90fyZH8AITNCBr72Vz+A== X-Received: by 2002:a17:907:1c1b:b0:6e0:6618:8ac with SMTP id nc27-20020a1709071c1b00b006e0661808acmr573509ejc.82.1649085147161; Mon, 04 Apr 2022 08:12:27 -0700 (PDT) Received: from leoy-ThinkPad-X240s ([104.245.96.34]) by smtp.gmail.com with ESMTPSA id r16-20020a056402019000b00418ed60c332sm5456100edv.65.2022.04.04.08.12.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 08:12:26 -0700 (PDT) Date: Mon, 4 Apr 2022 23:12:18 +0800 From: Leo Yan To: Ali Saidi Cc: Nick.Forrington@arm.com, acme@kernel.org, alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com, benh@kernel.crashing.org, german.gomez@arm.com, james.clark@arm.com, john.garry@huawei.com, jolsa@kernel.org, kjain@linux.ibm.com, lihuafei1@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, mark.rutland@arm.com, mathieu.poirier@linaro.org, mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org, will@kernel.org Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores Message-ID: <20220404151218.GA898573@leoy-ThinkPad-X240s> References: <20220331124425.GB1704284@leoy-ThinkPad-X240s> <20220403203337.18927-1-alisaidi@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220403203337.18927-1-alisaidi@amazon.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 03, 2022 at 08:33:37PM +0000, Ali Saidi wrote: [...] > > Let me just bring up my another > > thinking (sorry that my suggestion is float): another choice is we set > > ANY_CACHE as cache level if we are not certain the cache level, and > > extend snoop field to indicate the snooping logics, like: > > > > PERF_MEM_SNOOP_PEER_CORE > > PERF_MEM_SNOOP_LCL_CLSTR > > PERF_MEM_SNOOP_PEER_CLSTR > > > > Seems to me, we doing this is not only for cache level, it's more > > important for users to know the variant cost for involving different > > snooping logics. > > I think we've come full circle :). Not too bad, and I learned a lot :) > Going back to what do we want to indicate to > a user about the source of the cache line, I believe there are three things with > an eye toward helping a user of the data improve the performance of their > application: Thanks a lot for summary! > 1. The level below them in the hierarchy it it (L1, L2, LLC, local DRAM). > Depending on the level this directly indicates the expense of the operation. > > 2. If it came from a peer of theirs on the same socket. I'm really of the > opinion still that exactly which peer, doesn't matter much as it's a 2nd or 3rd > order concern compared to, it it couldn't be sourced from a cache level below > the originating core, had to come from a local peer and the request went to > that lower levels and was eventually sourced from a peer. Why it was sourced > from the peer is still almost irrelevant to me. If it was truly modified or the > core it was sourced from only had permission to modify it the snoop filter > doesn't necessarily need to know the difference and the outcome is the same. I think here the key information delivered is: For the peer snooping, you think there has big cost difference between L2 cache snooping and L3 cache snooping; for L3 cache snooping, we don't care about it's an internal cluster snooping or external cluster snooping, and we have no enough info to reason snooping type (HIT vs HITM). > 3. For multi-socket systems that it came from a different socket and there it is > probably most interesting if it came from DRAM on the remote socket or a cache. > > I'm putting 3 aside for now since we've really been focusing on 1 and 2 in this > discussion and I think the biggest hangup has been the definition of HIT vs > HITM. Agree on the item 3. > If someone has a precise definition, that would be great, but AFAIK it > goes back to the P6 bus where HIT was asserted by another core if it had a line > (in any state) and HITM was additionally asserted if a core needed to inhibit > another device (e.g. DDR controller) from providing that line to the requestor. Thanks for sharing the info for how the bus implements HIT/HITM. > The latter logic is why I think it's perfectly acceptable to use HITM to > indicate a peer cache-to-cache transfer, however since others don't feel that way > let me propose a single additional snooping type PERF_MEM_SNOOP_PEER that > indicates some peer of the hierarchy below the originating core sourced the > data. This clears up the definition that line came from from a peer and may or > may not have been modified, but it doesn't add a lot of implementation dependant > functionality into the SNOOP API. > > We could use the mem-level to indicate the level of the cache hierarchy we had > to get to before the snoop traveled upward, which seems like what x86 is doing > here. It makes sense to me that to use the highest cache level as mem-level. Please add comments in the code for this, this would be useful for understanding the code. > PEER_CORE -> MEM_SNOOP_PEER + L2 > PEER_CLSTR -> MEM_SNOOP_PEER + L3 > PEER_LCL_CLSTR -> MEM_SNOOP_PEER + L3 (since newer neoverse cores don't support > the clusters and the existing commercial implementations don't have them). Generally, this idea is fine for me. Following your suggestion, if we connect the concepts PoC and PoU in Arm reference manual, we can extend the snooping mode with MEM_SNOOP_POU (for PoU) and MEM_SNOOP_POC (for PoC), so: PEER_CORE -> MEM_SNOOP_POU + L2 PEER_LCL_CLSTR -> MEM_SNOOP_POU + L3 PEER_CLSTR -> MEM_SNOOP_POC + L3 Seems to me, we could consider for this. If this is over complexity or even I said any wrong concepts for this, please use your method. Thanks, Leo