Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp435556rdg; Thu, 12 Oct 2023 09:42:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGjNU3MQG87rWMlEuCQwNuRetVDCTdGdJfU/RoKSylwgKPfnTWh4XD7mbURl+XnTXLscUKf X-Received: by 2002:a05:6a00:4c99:b0:68a:48e7:9deb with SMTP id eb25-20020a056a004c9900b0068a48e79debmr28826852pfb.2.1697128948508; Thu, 12 Oct 2023 09:42:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697128948; cv=none; d=google.com; s=arc-20160816; b=TAHe4ck+Fr3ZuAbZPgT5fIm9UWB0LVW23F7ren7LJ0fbogvCfgF2S7EnpVFwuqZuw0 sod71phxm1lMmc15Inv3+Xe5ZziTgv+y94siPyM5R4nlhSE+KjbXi03WSwwy9fyA/110 Wdjp/jG/hTHKSbOnZwBaORloi/k0jpNR2oJZ7YlqWYGFTS2HElOKQczNQscAefIjZv6i iIPSRGX15kdiufo4PiPyAY5efmJnIUQETn99Pr84W7wIfkbb4cbIyWqyUgiz9OIxi+VD ZkUXcTqxgo8nF6/kskDj24eWxc8zDIrvjcuCN77Sw9oksfZNBEmPPzRv20IcnvA2QKHJ B3KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=thYTUAq/l0C0WsdXwunmHVjkyP0+xj103jAE6RUzAF4=; fh=rVgMNPmdt4qvI7dOsUGDK1nOBl45fpF4CucA9Y3kJs0=; b=UZx+RMdZNHp2nqV+ZPRMdaElTbPE/kgA0VmRw+t8hefrriHGHRAaw/Yb2C0okxozfi lm975H8CHLBuk0FtdW8WsTB8b+izTiYcqfV4QVgYcUj2RLueSpyqV7qknUqev/ht7J2k AU7FM1DBRTD4iDXl/pNiKDUh6YPbW8PDhkv6IX3KiylLL3Hn+PZbO8PArYBndhiF0QLK T0whtCI0NGO0g43H/rT3M1fgRiVrTUo7Kx2MR1hecLvBWG1r8nA0JPK+F7rNU7tNMx8W 3yp+1JBL7jyVkaEoGORbGa/ggNxPanTpjs/w118JEj95jVwnCgMzV2Afy6dvh6g+vfrF 2uxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id cq27-20020a056a00331b00b0069018a768d7si10638230pfb.385.2023.10.12.09.42.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Oct 2023 09:42:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id BA135822CD5D; Thu, 12 Oct 2023 09:42:27 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235762AbjJLQmP convert rfc822-to-8bit (ORCPT + 99 others); Thu, 12 Oct 2023 12:42:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235760AbjJLQmP (ORCPT ); Thu, 12 Oct 2023 12:42:15 -0400 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9955FD3; Thu, 12 Oct 2023 09:42:09 -0700 (PDT) Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-27ce05a23e5so1863381a91.1; Thu, 12 Oct 2023 09:42:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697128929; x=1697733729; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QT20buiWY9WxsNuSNjOiAMe5fIgWrJA1aPxG7BUKqaQ=; b=v22ha9oXZWsPdyhUPHfV+JjBvOgeP1C300IVx/GzKoAPP+D8jCVy0YMGDHKmMtwW+2 O48pRY3x+oWc2xm+CRjI5fC/3FbaphsIJ4v2bINSELz1+8gZRbUrGHEGU60khXTd5PTB Ep6o+9wR8Sl11UEM9Zmgcu0QOmDMz4BC0T6FmEZD8Ei+Okzg0ahdWcu0o8qNoK/yPnVi ciL0GaR/8EpubSTm2Nnws6s1L9n1rSXjKSlBkInyyXPuGWBErotRndQ68jlmRWYnoHnP 4PASNUk1vxxrbrtqXaMt1BppsDGD1m99MwnBj1X64V9Yk/1v73sHg++DUCevixsh1UVg VEuA== X-Gm-Message-State: AOJu0YzDFRigYCu+GlmmtzhJLMHViJGwyjoFomrsfU7fv+hSXcVJoqeZ GqRQUXHc/sMWhYddaMiMYJkonhKVqb6L4ypRtOM= X-Received: by 2002:a17:90a:72ca:b0:27d:2ce9:d6d5 with SMTP id l10-20020a17090a72ca00b0027d2ce9d6d5mr960842pjk.12.1697128928859; Thu, 12 Oct 2023 09:42:08 -0700 (PDT) MIME-Version: 1.0 References: <20231012035111.676789-1-namhyung@kernel.org> <20231012091128.GL6307@noisy.programming.kicks-ass.net> In-Reply-To: <20231012091128.GL6307@noisy.programming.kicks-ass.net> From: Namhyung Kim Date: Thu, 12 Oct 2023 09:41:57 -0700 Message-ID: Subject: Re: [RFC 00/48] perf tools: Introduce data type profiling (v1) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo , Jiri Olsa , Ian Rogers , Adrian Hunter , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, Linus Torvalds , Stephane Eranian , Masami Hiramatsu , linux-toolchains@vger.kernel.org, linux-trace-devel@vger.kernel.org, Ben Woodard , Joe Mario , Kees Cook , David Blaikie , Xu Liu , Kan Liang , Ravi Bangoria Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 12 Oct 2023 09:42:27 -0700 (PDT) Hi Peter, On Thu, Oct 12, 2023 at 2:13 AM Peter Zijlstra wrote: > > > W00t!! Finally! :-) Yay! > > On Wed, Oct 11, 2023 at 08:50:23PM -0700, Namhyung Kim wrote: > > > * How to use it > > > > To get precise memory access samples, users can use `perf mem record` > > command to utilize those events supported by their architecture. Intel > > machines would work best as they have dedicated memory access events but > > they would have a filter to ignore low latency loads like less than 30 > > cycles (use --ldlat option to change the default value). > > > > # To get memory access samples in kernel for 1 second (on Intel) > > $ sudo perf mem record -a -K --ldlat=4 -- sleep 1 > > Fundamentally this should work with anything PEBS from MEM_ as > well, no? No real reason to rely on perf mem for this. Correct, experienced users can choose any supported event. Right now it doesn't even use any MEM_ (data_src) fields but it should be added later. BTW I think it'd be better to have an option to enable the data src sample collection without gathering data MMAPs. > > > In perf report, it's just a matter of selecting new sort keys: 'type' > > and 'typeoff'. The 'type' shows name of the data type as a whole while > > 'typeoff' shows name of the field in the data type. I found it useful > > to use it with --hierarchy option to group relevant entries in the same > > level. > > > > $ sudo perf report -s type,typeoff --hierarchy --stdio > > ... > > # > > # Overhead Data Type / Data Type Offset > > # ........... ............................ > > # > > 23.95% (stack operation) > > 23.95% (stack operation) +0 (no field) > > 23.43% (unknown) > > 23.43% (unknown) +0 (no field) > > 10.30% struct pcpu_hot > > 4.80% struct pcpu_hot +0 (current_task) > > 3.53% struct pcpu_hot +8 (preempt_count) > > 1.88% struct pcpu_hot +12 (cpu_number) > > 0.07% struct pcpu_hot +24 (top_of_stack) > > 0.01% struct pcpu_hot +40 (softirq_pending) > > 4.25% struct task_struct > > 1.48% struct task_struct +2036 (rcu_read_lock_nesting) > > 0.53% struct task_struct +2040 (rcu_read_unlock_special.b.blocked) > > 0.49% struct task_struct +2936 (cred) > > 0.35% struct task_struct +3144 (audit_context) > > 0.19% struct task_struct +46 (flags) > > 0.17% struct task_struct +972 (policy) > > 0.15% struct task_struct +32 (stack) > > 0.15% struct task_struct +8 (thread_info.syscall_work) > > 0.10% struct task_struct +976 (nr_cpus_allowed) > > 0.09% struct task_struct +2272 (mm) > > ... > > > > The (stack operation) and (unknown) have no type and field info. FYI, > > the stack operations are samples in PUSH, POP or RET instructions which > > save or restore registers from/to the stack. They are usually parts of > > function prologue and epilogue and have no type info. The next is the > > struct pcpu_hot and you can see the first field (current_task) at offset > > 0 was accessed mostly. It's listed in order of access frequency (not in > > offset) as you can see it in the task_struct. > > > > In perf annotate, new --data-type option was added to enable data > > field level annotation. Now it only shows number of samples for each > > field but we can improve it. > > > > $ sudo perf annotate --data-type > > Annotate type: 'struct pcpu_hot' in [kernel.kallsyms] (223 samples): > > ============================================================================ > > samples offset size field > > 223 0 64 struct pcpu_hot { > > 223 0 64 union { > > 223 0 48 struct { > > 78 0 8 struct task_struct* current_task; > > 98 8 4 int preempt_count; > > 45 12 4 int cpu_number; > > 0 16 8 u64 call_depth; > > 1 24 8 long unsigned int top_of_stack; > > 0 32 8 void* hardirq_stack_ptr; > > 1 40 2 u16 softirq_pending; > > 0 42 1 bool hardirq_stack_inuse; > > }; > > 223 0 64 u8* pad; > > }; > > }; > > ... > > > > This shows each struct one by one and field-level access info in C-like > > style. The number of samples for the outer struct is a sum of number of > > samples in every field in the struct. In unions, each field is placed > > in the same offset so they will have the same number of samples. > > This is excellent -- and pretty much what I've been asking for forever. Glad you like it. > > Would it be possible to have multiple sample columns, for eg. > MEM_LOADS_UOPS_RETIRED.L1_HIT and MEM_LOADS_UOPS_RETIRED.L1_MISS > or even more (adding LLC hit and miss as well etc.) ? Yep, that should be supported. Ideally it would display samples (or overhead) for each event in an event group. And you can force individual events to a group at report/annotate time. But it doesn't work well with this for now. Will fix. > > (for bonus points: --data-type=typename, would be awesome) Right, will do that in the next spin. > > Additionally, annotating the regular perf-annotate output with data-type > information (where we have it) might also be very useful. That way, even > when profiling with PEBS-cycles, an expensive memop immediately gives a > clue as to what data-type to look at. > > > No TUI support yet. > > Yeah, nobody needs that anyway :-) I need that ;-) At least, interactive transition between perf report and perf annotate is really useful for me. You should try that someday. Note that perf report TUI works well with data types. > > > This can generate instructions like below. > > > > ... > > 0x123456: mov 0x18(%rdi), %rcx > > 0x12345a: mov 0x10(%rcx), %rax <=== sample > > 0x12345e: test %rax, %rax > > 0x123461: je <...> > > ... > > > > And imagine we have a sample at 0x12345a. Then it cannot find a > > variable for %rcx since DWARF didn't generate one (it only knows about > > 'bar'). Without compiler support, all it can do is to track the code > > execution in each instruction and propagate the type info in each > > register and stack location by following the memory access. > > Right, this has more or less been the 'excuse' for why doing this has > been 'difficult' for the past 10+ years :/ I'm sure I missed some cases, but I managed to make it work on usual cases. We can improve it by handling it more cases and instructions but it'd be great if we have a better support from the toolchains. > > > Actually I found a discussion in the DWARF mailing list to support > > "inverted location lists" and it seems a perfect fit for this project. > > It'd be great if new DWARF would provide a way to lookup variable and > > type info using a concrete location info (like a register number). > > > > https://lists.dwarfstd.org/pipermail/dwarf-discuss/2023-June/002278.html > > Stephane was going to talk to tools people about this over 10 years ago > :-) Hope that they would make some progress. > > Thanks for *finally* getting this started!! Yep, let's make it better! Thanks, Namhyung