Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp457489ybx; Mon, 4 Nov 2019 23:53:58 -0800 (PST) X-Google-Smtp-Source: APXvYqzM8trPLlVorOG3Pa9yE9LN5kktyL+pA2B9y/hlmwFRiy/DcoCPQcbLvAiyNN0TG/NNKfcI X-Received: by 2002:a17:906:780e:: with SMTP id u14mr27363370ejm.97.1572940438195; Mon, 04 Nov 2019 23:53:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572940438; cv=none; d=google.com; s=arc-20160816; b=mnf08dpgWNrGngl0hT0XhTUtfwGJm628c4cjAuBReYPFXDIqeG6qvIJFB5us3nNi1t VGos9oRPmUyJPjbGot0a+2N4f0JCAmVGWdUULkLz0JjdvzsIWkWhp5U0u6muPvIypMBa 1Px0vrkcCUROgZnMSrs1QdznQqONXt0n5hbKqJ4I5l/e2Jg8mGQAhQNCWRh4BbemJS6b 9zB/n6soEvduXiIDp23sa3N/e2IfzZHnMxTZ58eVM72iIk/H6N72U6f2Oe1QVjS6ni0D 7wtaDnhi26xiv/ROlU5CNQu3tBbxwYuni4Ids1LdpTsTBWa2U3KC/en4b84Nm4l4EKM7 3TAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=Shlk83dGXmQmbc3JeFMg3ubBotUIqA4znV7O9D03zkU=; b=DHTIo4jcXgd/uuMXesyKoDqg8y+t1+Li13f5O3zELLOUMZ8xEQ+2jLOquT9mtBM0p4 cMsLZyr2HAbFRDB75+ZxWRVcBp9FjI1YnKJj+KcGSvckEku8iRVy1wX7KnGvK89EwVjI kpLkGM5Z7OLqHmFeEFCBmKVucHSko8JWxaOY79iCbY4Za5JIlCF+K1WGba4v6GYHG48O gq6vvlUmJVk0kpyw057Cl3SwqzZOzbBK4ZggJlI9iyqNzwjgmWe6I/n89ACrckuFvDYZ Eg+M/hAMc/JprR+C+HjbY8oFUDOZq7Z6waflbm8nSN/8ubOopFf+E3HiIk6XMmNAJWOK bjOA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w26si2611359eju.48.2019.11.04.23.53.34; Mon, 04 Nov 2019 23:53:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387903AbfKEHvn (ORCPT + 99 others); Tue, 5 Nov 2019 02:51:43 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:6152 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387711AbfKEHvm (ORCPT ); Tue, 5 Nov 2019 02:51:42 -0500 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 2FCA26BAC3E25437A584; Tue, 5 Nov 2019 15:51:38 +0800 (CST) Received: from [127.0.0.1] (10.74.221.148) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.439.0; Tue, 5 Nov 2019 15:51:29 +0800 Subject: Re: [RFC] About perf-mem command support on arm64 platform To: Will Deacon References: <74f8ddb5-13cc-5dce-82a6-ca8bd02f8175@hisilicon.com> <20191104142654.GA24609@willie-the-truck> CC: , , Jiri Olsa , Arnaldo Carvalho de Melo , Mark Rutland , , , , Jonathan Cameron From: Shaokun Zhang Message-ID: <76664d1a-cc8f-cd27-bc04-ddc687880b1f@hisilicon.com> Date: Tue, 5 Nov 2019 15:51:29 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20191104142654.GA24609@willie-the-truck> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.221.148] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Will, Thanks your reply firstly. On 2019/11/4 22:26, Will Deacon wrote: > On Mon, Nov 04, 2019 at 05:18:00PM +0800, Shaokun Zhang wrote: >> perf-mem is used to profile memory access which has been implemented on x86 >> platform. It needs mem-stores events and mem-loads/load-latency. >> For mem-stores events, it is MEM_INST_RETIRED_ALL_STORES whose raw number >> is r82d0, and mem-loads/load-latency is from PEBS if I follow its code. >> >> Now, for some arm64 cores, like HiSilicon's tsv110 and ARM's Neoverse N1, >> has supported the SPE(Statistical Profiling Extensions), so is it a >> possibility that perf-mem is supported on arm64? >> https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 > > I don't understand the relationship you're trying to draw between mem-stores There may be some misunderstanding if I don't describe it correctly. From the implementation of perf-mem on x86, it needs: a. mem-stores PMU events; b. mem-loads/load-latency from PEBS; If arm64 plans to support perf-mem, we need to support mem-stores and mem-loads/load-latency, and we can derive the latter from SPE. > and SPE. How does perf-mem work and what does it actually require from the > CPU? An excellent question, I don't check the perf-mem carefully. Just from my understanding, it needs the mentioned events and PEBS sampled data that is filtered by desired latency for loads event. > > One thing that may be worth noting is that SPE isn't generally able to > capture information about all instructions being executed by the CPU: Got it and I have used SPE on Huawei Kunpeng 920 SoC. > instead, it instructions (most likely micro-ops) are sampled based on > some user-specified period. The CPU advertises a minimum recommended Ok, If I follow it right, perf record -c XXX to define the period for SPE. > period which we expose under /sys and enforce when programming events. > >> For arm64 PMU, it has 'st_retired' event that the event number is 0x0007 >> which is equal to mem-stores on x86, if we want support perf-mem, it seems >> that 'st_retired' shall be replaced by 'mem-stores' >> in arch/arm64/kernel/perf_event.c file. Of course, the cpu core should >> support st_retired event. I'm not sure Will/Mark are happy on this.;-) >> >> For mem-loads/load-latency, we can derive them from SPE sampled data which >> supports by load_filter and min_latency in SPE driver. and we may do some >> work on tools/perf/builtin-mem.c. > > I don't see how you could reconcile the sampling nature of SPE with a > CPU PMU counter, particularly as filtering in SPE happens /after/ sampling. > Jiri, can you give some implementations of perf-mem on mem-stores and PEBS please? >> From the above conditions, it seems that we may have the opportunity to >> support the perf-mem command on arm64. >> I'm not very sure about it, so I send this RFC and any comments are welcome. > > I don't think there's enough information here to comment meaningfully more > than SPE != PEBS. We can get load-latency from SPE now and want to throw the thoughts whether we should do perf-mem on arm64. Thanks, Shaokun > > Will > > . >