Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp108751rdb; Fri, 5 Jan 2024 04:31:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IEpBNyyoqMziWrS4uZtdHvN4hQFpQgB3bMb2y9Ez9JF4L0hjCRfE9vhnn1e8f+vQ0dE1Ad8 X-Received: by 2002:a50:fa84:0:b0:557:13:b862 with SMTP id w4-20020a50fa84000000b005570013b862mr1233349edr.30.1704457916081; Fri, 05 Jan 2024 04:31:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704457916; cv=none; d=google.com; s=arc-20160816; b=rSSWjU9mGLR9s0ugETYe72pgajYvJFVCh8hmub2L+lNdupCFmCfvKTMTOM4O3QNE+z 55sejF1wSEWd7PhaQkot36QKgXlbRlv3doQgtcrKgMaW/Z6C76HxJnU/zk7dfLBw2IQT 0oCaC0LmbuDC2DGJa+BaA3LQJ0znu1LwW+mpH/Tetr+GxJm4vBO2VP8AjImdG+Gy7tPk 2jxluK/fRBWwf8evk4oUqjq8GQlSovnRg86G/r5gqo01gIDLV5oXJt8uRnDmIV8G3jUY W075RZf+6h+lRsrcOBe53rhS3AiL3MgxDjx8ZJ37C8MSscMnmf/YapEulX+5q8WgZLVx +kiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date; bh=HvS45Zk1HungjatdzmjX7FhtBpqUX0YjLR/lzAqj824=; fh=I8FODd40DBm9rmjSS0Sjt9miiO/wFVP7YL+8AoXutDA=; b=NmHYmnwOnTKIa198Rz77eCTkRVbjxK6kp0xmr2GYG3fw6HAVnt6YnKoFabTAbMj6ih 6uRjZKROdtb64/mMUh16+Gwl+PhImmAyOTB0FeE8slQ2jb+yvwMn6WwacsE2ZpNubMSS MVl3kIf5ddwPBmsBOLhTr48zMHFgfmFqXCAHvmWkrTFia09JehJFqi2h+GcYH4NUDgTe qi1ZJEb5eDp66I1m13fqYfX8X1M1947KELkzfwbZ1nja5vneiob/mYTZW7NzLfXECZr7 H4qwjw4VOCLE/dMeUl69gjDSYgJeasYLNW1lkGC0mA45I1VcOmulJ01d4PpOMEGRn38e Qrmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-17831-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-17831-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id t12-20020a50d70c000000b005565057cf23si625506edi.435.2024.01.05.04.31.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jan 2024 04:31:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-17831-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-17831-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-17831-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A912B1F23344 for ; Fri, 5 Jan 2024 12:31:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1F3D12D039; Fri, 5 Jan 2024 12:31:46 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 26E9E2D042; Fri, 5 Jan 2024 12:31:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 89753C15; Fri, 5 Jan 2024 04:32:28 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.86.44]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DD6D23F64C; Fri, 5 Jan 2024 04:31:40 -0800 (PST) Date: Fri, 5 Jan 2024 12:31:38 +0000 From: Mark Rutland To: Arnaldo Carvalho de Melo Cc: "Liang, Kan" , Ian Rogers , Namhyung Kim , maz@kernel.org, marcan@marcan.st, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus Message-ID: References: <20231208210855.407580-1-kan.liang@linux.intel.com> <07677ab2-c29b-499b-b473-f7535fb27a8c@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri, Dec 15, 2023 at 02:49:14PM -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, Dec 15, 2023 at 04:51:49PM +0000, Mark Rutland escreveu: > > On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu: > > > > On ARM it'll be essentially the same as on x86: if you open an event with > > > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > > > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > > > > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > > > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > > > > will opened on the appropriate CPU PMU, by virtue of being rejected by others > > > > when perf_init_event() iterates over the 'pmus' list. > > > > The way that it is working non on my intel hybrid system, with Kan's > > > patch, is equivalent to using this on the RK3399pc board I have: > > > > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P > > > > Wouldn't be better to make 'perf top' on ARM work the way is being done > > > in x86 now, i.e. default to opening the two events, one per PMU and > > > allow the user to switch back and forth using the TUI/stdio? > > > TBH, for perf top I don't know *which* behaviour is preferable, but I agree > > that it'd be good for x86 and arm to work in the same way. > > Right, reducing the difference in the user experience accross arches. > > > For design-cleanliness and consistency with other commands I can see that > > opening those separately is probably for the best, but for typical usage of > > perf top it's really nice to have those presented together without having to > > tab back-and-forth between the distinct PMUs, and so the existing behaviour of > > Humm, so you would want two histogram viewers, one for each PMU, side by > side? I had meant as an aggregated view (the same as what you'd get if you opened one plain PERF_TYPE_HARDWARE event per cpu); I hadn't considered side-by-side views. To be clear, I'm personally happy to tab between per-pmu views, and if that's how x86 works today for heterogeneous PMUs, I'm fine with the same for arm/arm64. I was trying to say that I didn't have a strong preference. > > using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user. > > So, on ARM64, start the following 'perf trace' session, then run the > stock 'perf top': > > root@roc-rk3399-pc:~# perf trace -e perf_event_open > > 535.764 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 19 > 535.783 ( 0.067 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 28 > 535.854 ( 0.063 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 29 > 535.920 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 30 > 535.939 ( 0.016 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 4, group_fd: -1, flags: FD_CLOEXEC) = 31 > 535.959 ( 0.011 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 5, group_fd: -1, flags: FD_CLOEXEC) = 32 > > root@roc-rk3399-pc:~# grep "CPU part" /proc/cpuinfo | uniq -c > 4 CPU part : 0xd03 > 2 CPU part : 0xd08 > root@roc-rk3399-pc:~# > > It is already doing what you suggest, right? PERF_TYPE_HARDWARE, one > counter per CPU, maps to armv8_cortex_a72/cycles/P and > armv8_cortex_a53/cycles/P. Sounds like it; as above I'm happy for that to change to per-PMU views. > One thing I'm thinking is that we should split this per PMU at the > hist_entry, so that we could show how many samples/events came from each > of them... That sounds sensible to me. Mark. > > - Arnaldo > > > I don't have a strong feeling either way; I'm personally happy so long as > > explicit pmu_name/event/ events don't get silently converted into > > PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended > > HW type when we decide to use that. > > > > Thanks, > > Mark. > > > > > Kan, I also noticed that the name of the event is: > > > > > > 1K cpu_atom/cycles:P/ ◆ > > > 11K cpu_core/cycles:P/ > > > > > > If I try to use that on the command line: > > > > > > root@number:~# perf top -e cpu_atom/cycles:P/ > > > event syntax error: 'cpu_atom/cycles:P/' > > > \___ Bad event or PMU > > > > > > Unable to find PMU or event on a PMU of 'cpu_atom' > > > > > > Initial error: > > > event syntax error: 'cpu_atom/cycles:P/' > > > \___ unknown term 'cycles:P' for pmu 'cpu_atom' > > > > > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > > > Run 'perf list' for a list of valid events > > > > > > Usage: perf top [] > > > > > > -e, --event event selector. use 'perf list' to list available events > > > root@number:~# > > > > > > It should be: > > > > > > "cpu_atom/cycles/P"