Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3281459pxf; Mon, 15 Mar 2021 06:12:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzA/YV2yPc5vOx4ggzdVyxGmf3oRuaQ6BzpTrvO8QGwo0RD1q3a/71rlJDMgCavtpQ3DvpB X-Received: by 2002:aa7:cf16:: with SMTP id a22mr29151267edy.288.1615813961092; Mon, 15 Mar 2021 06:12:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615813961; cv=none; d=google.com; s=arc-20160816; b=hzdZ0WiS1Cm5GxT8MRwevKzQobRdaKsmgcAlasEEeE7Wv99pXJnIHysUTnqMVRJV7e SZajV03UggSR/YvhZyo1yx4CQU89S3ZG6jTgeQ/Ru+qm6XnMR1sr5UOINoYDPM4plumv DiIOCbzAfYbhDwEDHlf9K0S+JRnoVwReBTe88C3pkTUbjGHZtNxv5n35nBygErty40FU P2OESNrgsIHO/lRFj6b6PbauP+W/VbNAweGCE0DNQQmsPf9sU4OlluGdMeAqRfHG+B+Y YzSyi1rQQQDS/egPTGEUsB/tNcGPak+EEllM6RoB4bs3dCPAURfbBKxUqEP759AISAZ7 xDsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=hdb5ASK9Ht9nbOTVhTgoTAnbcQnvxe6K5Qm7oz/M4ic=; b=Y8uyHFwhVn9VUv7nUSb3o45r7Tc9hvhmDqIksHl7V1ct7ynLhclSjwjMVnf3gODYup 5k2glN2gVIEJRwAng5qohWBMCx6rNkB07MlISETc3tTwbz4UHH3KRBzhJVmhKhbgpU+6 yf2jmqv8a/I577KCvgzqCrkrOn5+OCCrrzN6kixyVNitQGbehoSwouom3ZrjgJFIUrmK 5FmwWNfXkTkowP7lViB0a2yKbv8WZaGfvpGtg9TXysnBvSVMxNu54nzR6IGRV3QpzJWN L9qas8h81fpZz9d6RjR1iDj6RiYs9sAuKeoyCGKsN4V+3Fpiq1RbwYr2AiDIdjYDQTP3 CTMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=acX2vCxh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q23si10969972edr.167.2021.03.15.06.12.13; Mon, 15 Mar 2021 06:12:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=acX2vCxh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229994AbhCONKi (ORCPT + 99 others); Mon, 15 Mar 2021 09:10:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:38748 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229807AbhCONKR (ORCPT ); Mon, 15 Mar 2021 09:10:17 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id AA49A64E74; Mon, 15 Mar 2021 13:10:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615813817; bh=ICd/Atj2FZXZ0iL/M9SMQR97uK2V17AYDzyyz2GmxsA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=acX2vCxhcUiwzCPU2wbxWQavkq6vI2lb8d8timNCAEOGqa8nGqSYqB77e7TeZwDBC zMm/WhnabaLO+X1UMMywJJaRLpIoXMgKvlWd34huPEGv7o49OBelCHQGQLvx7WZJjz S9/81127GYN3W4tEGz+eGLFvm3EC2OIxsB0qCltXr783Vh61OFIIKyF8FtB2XSZ/nW K79QEcMdjzjM1IRoVasGVIDZs4+WnDRsSXBHVSmaw1SFr8DBXu4Kzsx24Pa5/U4t8U yTn/uqoIOc1PID2WzWpMRf5/9yjFB/VXD9wvnhlyv28D2zF7oHDwYqtOp1z5G8aVMi pt8WQKrdEBpGA== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 680E440647; Mon, 15 Mar 2021 10:10:14 -0300 (-03) Date: Mon, 15 Mar 2021 10:10:14 -0300 From: Arnaldo Carvalho de Melo To: Song Liu , Andi Kleen , Jiri Olsa Cc: linux-kernel , Kernel Team , "namhyung@kernel.org" , "linux-perf-users@ghostprotocols.net" Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF Message-ID: References: <20210312020257.197137-1-songliubraving@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://acmel.wordpress.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Fri, Mar 12, 2021 at 06:52:39PM +0000, Song Liu escreveu: > > On Mar 12, 2021, at 6:24 AM, Arnaldo Carvalho de Melo wrote: > > Em Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu escreveu: > >> perf uses performance monitoring counters (PMCs) to monitor system > >> performance. The PMCs are limited hardware resources. For example, > >> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu. > >> > >> Modern data center systems use these PMCs in many different ways: > >> system level monitoring, (maybe nested) container level monitoring, per > >> process monitoring, profiling (in sample mode), etc. In some cases, > >> there are more active perf_events than available hardware PMCs. To allow > >> all perf_events to have a chance to run, it is necessary to do expensive > >> time multiplexing of events. > >> > >> On the other hand, many monitoring tools count the common metrics (cycles, > >> instructions). It is a waste to have multiple tools create multiple > >> perf_events of "cycles" and occupy multiple PMCs. > >> > >> bperf tries to reduce such wastes by allowing multiple perf_events of > >> "cycles" or "instructions" (at different scopes) to share PMUs. Instead > >> of having each perf-stat session to read its own perf_events, bperf uses > >> BPF programs to read the perf_events and aggregate readings to BPF maps. > >> Then, the perf-stat session(s) reads the values from these BPF maps. > >> > >> Please refer to the comment before the definition of bperf_ops for the > >> description of bperf architecture. > >> > >> bperf is off by default. To enable it, pass --use-bpf option to perf-stat. > >> bperf uses a BPF hashmap to share information about BPF programs and maps > >> used by bperf. This map is pinned to bpffs. The default address is > >> /sys/fs/bpf/bperf_attr_map. The user could change the address with option > >> --attr-map. > >> > >> --- > >> Known limitations: > >> 1. Do not support per cgroup events; > >> 2. Do not support monitoring of BPF program (perf-stat -b); > >> 3. Do not support event groups. > >> The following commands have been tested: > >> > >> perf stat --use-bpf -e cycles -a > >> perf stat --use-bpf -e cycles -C 1,3,4 > >> perf stat --use-bpf -e cycles -p 123 > >> perf stat --use-bpf -e cycles -t 100,101 > >> @@ -925,15 +931,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) > >> /* > >> * Enable counters and exec the command: > >> */ > >> - t0 = rdclock(); > >> - clock_gettime(CLOCK_MONOTONIC, &ref_time); > >> - > >> if (forks) { > >> evlist__start_workload(evsel_list); > >> err = enable_counters(); > >> if (err) > >> return -1; > >> > >> + t0 = rdclock(); > >> + clock_gettime(CLOCK_MONOTONIC, &ref_time); > >> + > >> if (interval || timeout || evlist__ctlfd_initialized(evsel_list)) > >> status = dispatch_events(forks, timeout, interval, ×); > >> if (child_pid != -1) { > >> @@ -954,6 +960,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) > >> err = enable_counters(); > >> if (err) > >> return -1; > >> + > >> + t0 = rdclock(); > >> + clock_gettime(CLOCK_MONOTONIC, &ref_time); > >> + > >> status = dispatch_events(forks, timeout, interval, ×); > >> } > >> > > > > The above two hunks seems out of place, i.e. can they go to a different > > patch and with an explanation about why this is needed? > > Actually, I am still debating whether we want the above change in a separate > patch. It is related to the following change. > > [...] > > >> + /* > >> + * Attahcing the skeleton takes non-trivial time (0.2s+ on a kernel > >> + * with some debug options enabled). This shows as a longer first > >> + * interval: > >> + * > >> + * # perf stat -e cycles -a -I 1000 > >> + * # time counts unit events > >> + * 1.267634674 26,259,166,523 cycles > >> + * 2.271637827 22,550,822,286 cycles > >> + * 3.275406553 22,852,583,744 cycles > >> + * > >> + * Fix this by zeroing accum_readings after attaching the program. > >> + */ > >> + bperf_sync_counters(evsel); > >> + entry_cnt = bpf_map__max_entries(skel->maps.accum_readings); > >> + memset(values, 0, sizeof(struct bpf_perf_event_value) * num_cpu_bpf); > >> + > >> + for (i = 0; i < entry_cnt; i++) { > >> + bpf_map_update_elem(bpf_map__fd(skel->maps.accum_readings), > >> + &i, values, BPF_ANY); > >> + } > >> + return 0; > >> +} > > Attaching the skeleton takes non-trivial time, so that we get a bigger first > interval (1.26s in the example above). To fix this, in __run_perf_stat(), we > get t0 and ref_time after enable_counters(). > > Maybe a comment in __run_perf_stat() is better than a separate patch? I still think that there is value in taking those measurements after we enable the counters, as, for instance, for interval mode we want measurements with the counters enabled, whatever happens before the counters are enabled is just startup time, etc. Jiri, Andi? - Arnaldo