Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1741704pxf; Fri, 12 Mar 2021 18:49:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJwOGIPykLn6qH2o2X4ndScL8jZlQbHKFi+UXM69twLA5UplD9xYMTBYM29WT6/LOI832ZuA X-Received: by 2002:a05:6402:1c98:: with SMTP id cy24mr17681979edb.296.1615603798347; Fri, 12 Mar 2021 18:49:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615603798; cv=none; d=google.com; s=arc-20160816; b=n0+9pivpMdqjj+MSRPJBvyY+VPI0NFtoAWg51J3iJyFWGljiZR9stHiOwI54ZA+cIW dua0YL9eCRuYNwhzrWJAB0lQoEUdeKMHq9NaLKr9L+F9YdtdQmvpA/W4m5QHfNrhKKoR DrOaDsRPA+mNEqFMFWnLznkzqwbB6kkcqJKJZsvOY6QS6Qi095rE7goFnVkIOAHu3EhI 4+IUFf90tJkS7upNELgvlRKdVGSCAIYc0QRnufwfWN9ZpEwljYD7RNPN4jIRw6hJWNFF pdNNj0enTt4rGD+xQnCgm/kVSMbuhvlwTrXr0DkGqdgMGMGNPsDf3NaaKYl87CJAWnFR 4eMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=YX3RbIt5YnG1YAf8+WjG5K0mJd5hi9bl0zole+0mMuc=; b=gBQ5u7wrkOsRwQea16X7f4lizd/HqO1NjDjlG07O2+ZPEw/0Kavfv2VBiqElB+A4KS gEM8I4xxSZqaZ5yYu31ZG66tjUP/L02RSslsySqVkzG3s9Dn4vwfRzasG31Cj8rDblOV HEJTQvzF9ACFOV038UtuDhm18in3WolKBSjCdj3QRVkv4lAK/eV5hnpzEQCYFZhR3XVk aVJ+CIFWqeUyZlxF+8qDVs6G5XaxU+B6u2TKcu313Y6mAZrisfYXjV5I9aUF65wpIixr O0PwJtDOjgS1bqNsA8+k808nV8SSPZ/zunNaz0J2zZnvue4I6ZB4p2a8pKEj9XpeN6Fu OjCg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z7si5552535eja.451.2021.03.12.18.49.35; Fri, 12 Mar 2021 18:49:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231789AbhCMCsb (ORCPT + 99 others); Fri, 12 Mar 2021 21:48:31 -0500 Received: from mail-lf1-f41.google.com ([209.85.167.41]:37304 "EHLO mail-lf1-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232867AbhCMCsC (ORCPT ); Fri, 12 Mar 2021 21:48:02 -0500 Received: by mail-lf1-f41.google.com with SMTP id n16so48437873lfb.4 for ; Fri, 12 Mar 2021 18:48:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YX3RbIt5YnG1YAf8+WjG5K0mJd5hi9bl0zole+0mMuc=; b=BKH6dKKpzqqBFmf9mrgKTsNZ3U2DSi8YRDXLb9mcCrbKxXXADs6Z+MCt2bs2qNlJeH PrbTC5zQK6u5dYD2BcVKMsDiP6pIBn9mr+dTxyv17Xw/tB/3vsJI/DMqGHBzc4YPJKUQ /RBHay2yXd86i2RMiP1TgVGyns0zQ6J5czJpnTD3AmIRMdEAe+6CD8XgKQpA6ozmbrtZ kHnhXmsumQgLR+6UROTS7AXxpWFdqORHmawRqYXXcSY/uStVPLkSnCh2BZhiCEjznagN SnCvqlNVBOK/KqgtXbYaNDnd3rsUN0Uj/Nwi0LNPQzRhL146GLPJJ/IZYlrkTiCYC9/H DeJw== X-Gm-Message-State: AOAM531kBc+KD/Hm16buTDsm136QtSwYRquVbomdEYzGnZPouicmS7Y5 Jw+D4R4/rK+eWJCVDqBHzZUK4SxmJP79DJ8pa+yQcej3 X-Received: by 2002:a05:6512:12c1:: with SMTP id p1mr1422124lfg.374.1615603681203; Fri, 12 Mar 2021 18:48:01 -0800 (PST) MIME-Version: 1.0 References: <20210312020257.197137-1-songliubraving@fb.com> <4B3CF1B3-5EED-4882-BC99-AD676D4E3429@fb.com> In-Reply-To: <4B3CF1B3-5EED-4882-BC99-AD676D4E3429@fb.com> From: Namhyung Kim Date: Sat, 13 Mar 2021 11:47:51 +0900 Message-ID: Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF To: Song Liu Cc: linux-kernel , Kernel Team , Arnaldo Carvalho de Melo , Arnaldo Carvalho de Melo , Jiri Olsa Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Mar 13, 2021 at 12:38 AM Song Liu wrote: > > > > > On Mar 12, 2021, at 12:36 AM, Namhyung Kim wrote: > > > > Hi, > > > > On Fri, Mar 12, 2021 at 11:03 AM Song Liu wrote: > >> > >> perf uses performance monitoring counters (PMCs) to monitor system > >> performance. The PMCs are limited hardware resources. For example, > >> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu. > >> > >> Modern data center systems use these PMCs in many different ways: > >> system level monitoring, (maybe nested) container level monitoring, per > >> process monitoring, profiling (in sample mode), etc. In some cases, > >> there are more active perf_events than available hardware PMCs. To allow > >> all perf_events to have a chance to run, it is necessary to do expensive > >> time multiplexing of events. > >> > >> On the other hand, many monitoring tools count the common metrics (cycles, > >> instructions). It is a waste to have multiple tools create multiple > >> perf_events of "cycles" and occupy multiple PMCs. > >> > >> bperf tries to reduce such wastes by allowing multiple perf_events of > >> "cycles" or "instructions" (at different scopes) to share PMUs. Instead > >> of having each perf-stat session to read its own perf_events, bperf uses > >> BPF programs to read the perf_events and aggregate readings to BPF maps. > >> Then, the perf-stat session(s) reads the values from these BPF maps. > >> > >> Please refer to the comment before the definition of bperf_ops for the > >> description of bperf architecture. > > > > Interesting! Actually I thought about something similar before, > > but my BPF knowledge is outdated. So I need to catch up but > > failed to have some time for it so far. ;-) > > > >> > >> bperf is off by default. To enable it, pass --use-bpf option to perf-stat. > >> bperf uses a BPF hashmap to share information about BPF programs and maps > >> used by bperf. This map is pinned to bpffs. The default address is > >> /sys/fs/bpf/bperf_attr_map. The user could change the address with option > >> --attr-map. > >> > >> --- > >> Known limitations: > >> 1. Do not support per cgroup events; > >> 2. Do not support monitoring of BPF program (perf-stat -b); > >> 3. Do not support event groups. > > > > In my case, per cgroup event counting is very important. > > And I'd like to do that with lots of cpus and cgroups. > > We can easily extend this approach to support cgroups events. I didn't > implement it to keep the first version simple. OK. > > > So I'm working on an in-kernel solution (without BPF), > > I hope to share it soon. > > This is interesting! I cannot wait to see how it looks like. I spent > quite some time try to enable in kernel sharing (not just cgroup > events), but finally decided to try BPF approach. Well I found it hard to support generic event sharing that works for all use cases. So I'm focusing on the per cgroup case only. > > > > > And for event groups, it seems the current implementation > > cannot handle more than one event (not even in a group). > > That could be a serious limitation.. > > It supports multiple events. Multiple events are independent, i.e., > "cycles" and "instructions" would use two independent leader programs. OK, then do you need multiple bperf_attr_maps? Does it work for an arbitrary number of events? > > > > >> > >> The following commands have been tested: > >> > >> perf stat --use-bpf -e cycles -a > >> perf stat --use-bpf -e cycles -C 1,3,4 > >> perf stat --use-bpf -e cycles -p 123 > >> perf stat --use-bpf -e cycles -t 100,101 > > > > Hmm... so it loads both leader and follower programs if needed, right? > > Does it support multiple followers with different targets at the same time? > > Yes, the whole idea is to have one leader program and multiple follower > programs. If we only run one of these commands at a time, it will load > one leader and one follower. If we run multiple of them in parallel, > they will share the same leader program and load multiple follower > programs. > > I actually tested more than the commands above. The list actually means > we support -a, -C -p, and -t. > > Currently, this works for multiple events, and different parallel > perf-stat. The two commands below will work well in parallel: > > perf stat --use-bpf -e ref-cycles,instructions -a > perf stat --use-bpf -e ref-cycles,cycles -C 1,3,5 > > Note the use of ref-cycles, which can only use one counter on Intel CPUs. > With this approach, the above two commands will not do time multiplexing > on ref-cycles. Awesome! Thanks, Namhyung