Received: by 2002:a05:6a10:5594:0:0:0:0 with SMTP id ee20csp520612pxb; Mon, 25 Apr 2022 15:28:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw+sf7aixno478S5K91zDGlpPptiSwV1Q2ixytc9oTTKZ0ia0dqbandaekxBsXkl8RQrx5h X-Received: by 2002:a17:902:d48e:b0:15c:f182:47c7 with SMTP id c14-20020a170902d48e00b0015cf18247c7mr12155725plg.113.1650925693161; Mon, 25 Apr 2022 15:28:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650925693; cv=none; d=google.com; s=arc-20160816; b=ItkbK9hcvzZdCtDiBVgTUGekf3FMWRi8kFB5kbBoEC/L4YTVluiHWWCvbPL2S6L+mF fEB8D5qTfa1v6axxwDrrbSpNhMfajjcP8hRgJdPAehb95IvMw1oYJWXxZZfZg0XVT58w rvXqfRiD7eB66mGV4eaafrcrVcpfi6RL0jkNc+0L8ANjpCK77nbl54s2eB6nKYEYhCsW rsAcoJ4rjrlTP6MyZZrPdLtrhl4/Pq2VeCNln/379RCO7IHPILoCVhJqd2zVMN6WMdmD KKXhpxTvR7sNfftF6ECBulZwcindYkXq8404qFXX3nRaEuLPr3kXE30GbK2t0LL6GbFI w9rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ueXJpQvL2pfynfplxOLCcVOjZIcVvqpX+r8wn9uwNPI=; b=N1Lozz54E6WUz7x3culu/RBWpgNX4yBdnN/3KoUYoMGkjWmHE/5Bd0DFkwWBCV/gJ5 Ji9h1hrNPevw/G6y0URwgIzPI0dNlUUclFt5ZqqHehkF5nENuA2YFE4905EcQO/B4U9q B6seHaEPwfHXFkXsZEc+P1MDQL0wIEP/od4zgxrRfBZkG6QzhjQ2dLAtlgOJr9fs9OOQ piUe041KmR2VgZR+0SXDA9pMyOV3Vth3IKMMqiiMRXfZ2hMe2/QMAy2LPBH0ERcUXaXJ /C+JTB8AI7XFL16q2O3XlWXslv0Y0m0AGupD2HbPymN/L8dGLL+yj9wtyPKd8tA5aAkE NPjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DUfEXR7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r80-20020a632b53000000b0039d2ffc2de2si18074244pgr.789.2022.04.25.15.27.58; Mon, 25 Apr 2022 15:28:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DUfEXR7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243590AbiDYQwz (ORCPT + 99 others); Mon, 25 Apr 2022 12:52:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243582AbiDYQwv (ORCPT ); Mon, 25 Apr 2022 12:52:51 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BD8324972 for ; Mon, 25 Apr 2022 09:49:46 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id x18so21797175wrc.0 for ; Mon, 25 Apr 2022 09:49:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ueXJpQvL2pfynfplxOLCcVOjZIcVvqpX+r8wn9uwNPI=; b=DUfEXR7pB451R0llk/iDmWhUQZlHNJ4Q7zbGysozNMINnHO02Z1oMI/cNzFAgURFAA QP2xDVa8DeLNMOZBzWIZoI2pQGQWvVHDA1Gm2oobaus+2DgI9SXWDtBp6k7M6efqB/D9 7SveUQi/xVbUVmcjj2fKMeoK6jsHUOxbCnX6wSrJcqGBv3IzZMi8rPa21fedfcjAi3sf BoPIF6BKV4YCVRuM1ujSUan2bYvMfdFVKB0KPakipDEbrR3kN1vbw0HkV5sHPNenwlxb qAlHUqg1UJXiGnsHH0uGwKyVX4RASGrJrxjo5oPm/gdJtGdhGmJ5mmovqxNDshBygKTJ tIvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ueXJpQvL2pfynfplxOLCcVOjZIcVvqpX+r8wn9uwNPI=; b=lYuATgffMnFIPVx7wkAafDz5bk8quxbaAc5bH/Zf44q/HOKVagRZ06zfjp1DnfXiCr spRkce/CgpRFhoILWL357V9DzwYllcZ/OVk9fr301WfClUT/NlwDAPlsFluUR0wfAFSz tP3Om1/mCdi7P71w8BY5T2sSNwwhhiFa4ogQ15YLAk//ew80Apj0K1y5iF8qwU/C6cZb A0VvW01mSV+a6dkbNswc1UtK+jHr02rajzS1zzsiNuakFmWnvEw4QixXBB1HXsGpMLGW ww8DgkUH6RDQRGHWNpZgc9e/pnBgvKFcNLScWRoCahIHmlaW0TsPAoeCo1Wr4VcB+Nnq CRKA== X-Gm-Message-State: AOAM5308HJyGgaHrswsJdFBWjAGmtN1tQ06MDqpkCfHZPog1uWZSFvZc 8RCUBWJsf1f/L6pvcMSCgE2XGWc9MHgL6MzT0MABBw== X-Received: by 2002:adf:f30a:0:b0:20a:e193:6836 with SMTP id i10-20020adff30a000000b0020ae1936836mr1153087wro.654.1650905384423; Mon, 25 Apr 2022 09:49:44 -0700 (PDT) MIME-Version: 1.0 References: <20220422053401.208207-1-namhyung@kernel.org> <35121321.B44TWeBT9p@milian-workstation> <5616892.dGzqbEiDyy@milian-workstation> In-Reply-To: <5616892.dGzqbEiDyy@milian-workstation> From: Ian Rogers Date: Mon, 25 Apr 2022 09:49:31 -0700 Message-ID: Subject: Re: [RFC 0/4] perf record: Implement off-cpu profiling with BPF (v1) To: Milian Wolff Cc: Namhyung Kim , Arnaldo Carvalho de Melo , Jiri Olsa , Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Song Liu , Hao Luo , bpf , linux-perf-users , Blake Jones , "michael@michaellarabel.com" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 25, 2022 at 5:42 AM Milian Wolff wrote: > > On Freitag, 22. April 2022 17:01:15 CEST Namhyung Kim wrote: > > Hi Milian, > > > > On Fri, Apr 22, 2022 at 3:21 AM Milian Wolff wrote: > > > On Freitag, 22. April 2022 07:33:57 CEST Namhyung Kim wrote: > > > > Hello, > > > > > > > > This is the first version of off-cpu profiling support. Together with > > > > (PMU-based) cpu profiling, it can show holistic view of the performance > > > > characteristics of your application or system. > > > > > > Hey Namhyung, > > > > > > this is awesome news! In hotspot, I've long done off-cpu profiling > > > manually by looking at the time between --switch-events. The downside is > > > that we also need to track the sched:sched_switch event to get a call > > > stack. But this approach also works with dwarf based unwinding, and also > > > includes kernel stacks. > > > > Thanks, I've also briefly thought about the switch event based off-cpu > > profiling as it doesn't require root. But collecting call stacks is hard > > and I'd like to do it in kernel/bpf to reduce the overhead. > > I'm all for reducing the overhead, I just wonder about the practicality. At > the very least, please make sure to note this limitation explicitly to end > users. As a preacher for perf, I have come across lots of people stumbling > over `perf record -g` not producing any sensible output because they are > simply not aware that this requires frame pointers which are basically non > existing on most "normal" distributions. Nowadays `man perf record` tries to > educate people, please do the same for the new `--off-cpu` switch. I think documenting that off-cpu has a dependency on frame pointers makes sense. There has been work to make LBR work: https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/ DWARF unwinding is problematic and is probably something best kept in user land. There is also Intel's CET that may provide an alternate backtraces. More recent Intel and AMD cpus have techniques to turn memory locations into registers, an approach generally called memory renaming. There is some description here: https://www.agner.org/forum/viewtopic.php?t=41 In LLVM there is a pass to promote memory locations into registers called mem2reg. Having the frame pointer as an extra register will help this pass as there will be 1 more register to replace something from memory. The memory renaming optimization is similar to mem2reg except done in the CPU's front-end. It would be interesting to see benchmark results on modern CPUs with and without omit-frame-pointer. My expectation is that the performance wins aren't as great, if any, as they used to be (cc-ed Michael Larabel as I Iove phoronix and it'd be awesome if someone could do an omit-frame-pointer shoot-out). > > > > With BPF, it can aggregate scheduling stats for interested tasks > > > > and/or states and convert the data into a form of perf sample records. > > > > I chose the bpf-output event which is a software event supposed to be > > > > consumed by BPF programs and renamed it as "offcpu-time". So it > > > > requires no change on the perf report side except for setting sample > > > > types of bpf-output event. > > > > > > > > Basically it collects userspace callstack for tasks as it's what users > > > > want mostly. Maybe we can add support for the kernel stacks but I'm > > > > afraid that it'd cause more overhead. So the offcpu-time event will > > > > always have callchains regardless of the command line option, and it > > > > enables the children mode in perf report by default. > > > > > > Has anything changed wrt perf/bpf and user applications not compiled with > > > `- fno-omit-frame-pointer`? I.e. does this new utility only work for > > > specially compiled applications, or do we also get backtraces for > > > "normal" binaries that we can install through package managers? > > > > I am not aware of such changes, it still needs a frame pointer to get > > backtraces. > > May I ask what kind of setup you are using this on? Do you use something like > Gentoo or yocto where you compile your whole system with `-fno-omit-frame- > pointer`? Because otherwise, any kind of off-cpu time in system libraries will > not be resolved properly, no? I agree with your point. Often in cloud environments binaries are static blobs linking in all their dependencies. This can aid deployment, bug compatibility, etc. Fwiw, all backtraces gathered in Google's profiling are frame pointer based. A large motivation for this is the security aspect of having a privileged application able to snapshot other threads stacks that happens with dwarf based unwinding. In summary, your point is that frame pointer based unwinding is largely broken on all major distributions today limiting the utility of off-CPU as it is here. I agree, memory renaming in hardware could hopefully mean that this isn't the case in distributions in the future. Even if it isn't there are alternate backtraces from sources like LBR and CET that mean we can fix this other ways. Thanks, Ian > Thanks > -- > Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer > KDAB (Deutschland) GmbH, a KDAB Group company > Tel: +49-30-521325470 > KDAB - The Qt, C++ and OpenGL Experts