2022-07-29 16:33:29

by Adrián Herrera Arcila

[permalink] [raw]
Subject: [PATCH 1/2] perf stat: refactor __run_perf_stat common code

This extracts common code from the branches of the forks if-then-else.
enable_counters(), which was at the beginning of both branches of the
conditional, is now unconditional; evlist__start_workload is extracted
to a different if, which enables making the common clocking code
unconditional.

Signed-off-by: Adrián Herrera Arcila <[email protected]>
---
tools/perf/builtin-stat.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d2ecd4d29624..318ffd119dad 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -966,18 +966,18 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
return err;
}

- /*
- * Enable counters and exec the command:
- */
- if (forks) {
- err = enable_counters();
- if (err)
- return -1;
+ err = enable_counters();
+ if (err)
+ return -1;
+
+ /* Exec the command, if any */
+ if (forks)
evlist__start_workload(evsel_list);

- t0 = rdclock();
- clock_gettime(CLOCK_MONOTONIC, &ref_time);
+ t0 = rdclock();
+ clock_gettime(CLOCK_MONOTONIC, &ref_time);

+ if (forks) {
if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
status = dispatch_events(forks, timeout, interval, &times);
if (child_pid != -1) {
@@ -995,13 +995,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (WIFSIGNALED(status))
psignal(WTERMSIG(status), argv[0]);
} else {
- err = enable_counters();
- if (err)
- return -1;
-
- t0 = rdclock();
- clock_gettime(CLOCK_MONOTONIC, &ref_time);
-
status = dispatch_events(forks, timeout, interval, &times);
}

--
2.36.1


2022-07-29 16:38:59

by Adrián Herrera Arcila

[permalink] [raw]
Subject: [PATCH 2/2] perf stat: fix unexpected delay behaviour

The described --delay behaviour is to delay the enablement of events, but
not the execution of the command, if one is passed, which is incorrectly
the current behaviour.

This patch decouples the enablement from the delay, and enables events
before or after launching the workload dependent on the options passed
by the user. This code structure is inspired by that in perf-record, and
tries to be consistent with it.

Link: https://lore.kernel.org/linux-perf-users/[email protected]
Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
Signed-off-by: Adrián Herrera Arcila <[email protected]>
---
tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 318ffd119dad..f98c823b16dd 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -559,7 +559,7 @@ static bool handle_interval(unsigned int interval, int *times)
return false;
}

-static int enable_counters(void)
+static int enable_bpf_counters(void)
{
struct evsel *evsel;
int err;
@@ -572,28 +572,6 @@ static int enable_counters(void)
if (err)
return err;
}
-
- if (stat_config.initial_delay < 0) {
- pr_info(EVLIST_DISABLED_MSG);
- return 0;
- }
-
- if (stat_config.initial_delay > 0) {
- pr_info(EVLIST_DISABLED_MSG);
- usleep(stat_config.initial_delay * USEC_PER_MSEC);
- }
-
- /*
- * We need to enable counters only if:
- * - we don't have tracee (attaching to task or cpu)
- * - we have initial delay configured
- */
- if (!target__none(&target) || stat_config.initial_delay) {
- if (!all_counters_use_bpf)
- evlist__enable(evsel_list);
- if (stat_config.initial_delay > 0)
- pr_info(EVLIST_ENABLED_MSG);
- }
return 0;
}

@@ -966,10 +944,24 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
return err;
}

- err = enable_counters();
+ err = enable_bpf_counters();
if (err)
return -1;

+ /*
+ * Enable events manually here if perf-stat is run:
+ * 1. with a target (any of --all-cpus, --cpu, --pid or --tid)
+ * 2. without measurement delay (no --delay)
+ * 3. without all events associated to BPF
+ *
+ * This is because if run with a target, events are not enabled
+ * on exec if a workload is passed, and because there is no delay
+ * we ensure to enable them before the workload starts
+ */
+ if (!target__none(&target) && !stat_config.initial_delay &&
+ !all_counters_use_bpf)
+ evlist__enable(evsel_list);
+
/* Exec the command, if any */
if (forks)
evlist__start_workload(evsel_list);
@@ -977,6 +969,22 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);

+ /*
+ * If a measurement delay was specified, start it, and if positive,
+ * enable events manually after. We respect the delay even if all
+ * events are associated to BPF
+ */
+ if (stat_config.initial_delay) {
+ /* At this point, events are guaranteed disabled */
+ pr_info(EVLIST_DISABLED_MSG);
+ if (stat_config.initial_delay > 0) {
+ usleep(stat_config.initial_delay * USEC_PER_MSEC);
+ if (!all_counters_use_bpf)
+ evlist__enable(evsel_list);
+ pr_info(EVLIST_ENABLED_MSG);
+ }
+ }
+
if (forks) {
if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
status = dispatch_events(forks, timeout, interval, &times);
--
2.36.1

2022-08-01 08:40:58

by James Clark

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour



On 29/07/2022 17:12, Adrián Herrera Arcila wrote:
> The described --delay behaviour is to delay the enablement of events, but
> not the execution of the command, if one is passed, which is incorrectly
> the current behaviour.
>
> This patch decouples the enablement from the delay, and enables events
> before or after launching the workload dependent on the options passed
> by the user. This code structure is inspired by that in perf-record, and
> tries to be consistent with it.
>
> Link: https://lore.kernel.org/linux-perf-users/[email protected]
> Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> Signed-off-by: Adrián Herrera Arcila <[email protected]>
> ---
> tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> 1 file changed, 32 insertions(+), 24 deletions(-)

Looks good to me. Fixes the counter delay issue and the code is pretty
similar to perf record now. Although I would wait for Leo's or Song's
comment as well because they were involved.

Reviewed-by: James Clark <[email protected]>

>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 318ffd119dad..f98c823b16dd 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -559,7 +559,7 @@ static bool handle_interval(unsigned int interval, int *times)
> return false;
> }
>
> -static int enable_counters(void)
> +static int enable_bpf_counters(void)
> {
> struct evsel *evsel;
> int err;
> @@ -572,28 +572,6 @@ static int enable_counters(void)
> if (err)
> return err;
> }
> -
> - if (stat_config.initial_delay < 0) {
> - pr_info(EVLIST_DISABLED_MSG);
> - return 0;
> - }
> -
> - if (stat_config.initial_delay > 0) {
> - pr_info(EVLIST_DISABLED_MSG);
> - usleep(stat_config.initial_delay * USEC_PER_MSEC);
> - }
> -
> - /*
> - * We need to enable counters only if:
> - * - we don't have tracee (attaching to task or cpu)
> - * - we have initial delay configured
> - */
> - if (!target__none(&target) || stat_config.initial_delay) {
> - if (!all_counters_use_bpf)
> - evlist__enable(evsel_list);
> - if (stat_config.initial_delay > 0)
> - pr_info(EVLIST_ENABLED_MSG);
> - }
> return 0;
> }
>
> @@ -966,10 +944,24 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> return err;
> }
>
> - err = enable_counters();
> + err = enable_bpf_counters();
> if (err)
> return -1;
>
> + /*
> + * Enable events manually here if perf-stat is run:
> + * 1. with a target (any of --all-cpus, --cpu, --pid or --tid)
> + * 2. without measurement delay (no --delay)
> + * 3. without all events associated to BPF
> + *
> + * This is because if run with a target, events are not enabled
> + * on exec if a workload is passed, and because there is no delay
> + * we ensure to enable them before the workload starts
> + */
> + if (!target__none(&target) && !stat_config.initial_delay &&
> + !all_counters_use_bpf)
> + evlist__enable(evsel_list);
> +
> /* Exec the command, if any */
> if (forks)
> evlist__start_workload(evsel_list);
> @@ -977,6 +969,22 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> t0 = rdclock();
> clock_gettime(CLOCK_MONOTONIC, &ref_time);
>
> + /*
> + * If a measurement delay was specified, start it, and if positive,
> + * enable events manually after. We respect the delay even if all
> + * events are associated to BPF
> + */
> + if (stat_config.initial_delay) {
> + /* At this point, events are guaranteed disabled */
> + pr_info(EVLIST_DISABLED_MSG);
> + if (stat_config.initial_delay > 0) {
> + usleep(stat_config.initial_delay * USEC_PER_MSEC);
> + if (!all_counters_use_bpf)
> + evlist__enable(evsel_list);
> + pr_info(EVLIST_ENABLED_MSG);
> + }
> + }
> +
> if (forks) {
> if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
> status = dispatch_events(forks, timeout, interval, &times);

2022-08-01 11:50:04

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf stat: refactor __run_perf_stat common code

On Fri, Jul 29, 2022 at 04:12:43PM +0000, Adri?n Herrera Arcila wrote:
> This extracts common code from the branches of the forks if-then-else.
> enable_counters(), which was at the beginning of both branches of the
> conditional, is now unconditional; evlist__start_workload is extracted
> to a different if, which enables making the common clocking code
> unconditional.
>
> Signed-off-by: Adri?n Herrera Arcila <[email protected]>

Reviewed-by: Leo Yan <[email protected]>

2022-08-01 11:50:37

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

On Fri, Jul 29, 2022 at 04:12:44PM +0000, Adri?n Herrera Arcila wrote:
> The described --delay behaviour is to delay the enablement of events, but
> not the execution of the command, if one is passed, which is incorrectly
> the current behaviour.
>
> This patch decouples the enablement from the delay, and enables events
> before or after launching the workload dependent on the options passed
> by the user. This code structure is inspired by that in perf-record, and
> tries to be consistent with it.
>
> Link: https://lore.kernel.org/linux-perf-users/[email protected]
> Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> Signed-off-by: Adri?n Herrera Arcila <[email protected]>

bpf_counter() is not enabled for delay in this patch, but I think this
is purposed to keep the same behaviour with before. I would leave it
to Song for a call.

The patch LGTM and I tested with commands:

$ time ./perf stat --delay 2000 --quiet sleep 2
Events disabled
Events enabled

real 0m2.039s
user 0m0.007s
sys 0m0.016s

$ ./perf stat --delay 2000 --quiet echo "marking"
Events disabled
marking
Events enabled

Reviewed-by: Leo Yan <[email protected]>
Tested-by: Leo Yan <[email protected]>

P.s. I took a bit time to get clear how 'perf stat' set enable_on_exec
flag, which is set in the function create_perf_stat_counter(), so this
can enable PMU event for the case when delay is zero, and this can
avoid losing PMU tracing for workload.

2022-12-13 15:29:43

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
>
>
> On 29/07/2022 17:12, Adri?n Herrera Arcila wrote:
> > The described --delay behaviour is to delay the enablement of events, but
> > not the execution of the command, if one is passed, which is incorrectly
> > the current behaviour.
> >
> > This patch decouples the enablement from the delay, and enables events
> > before or after launching the workload dependent on the options passed
> > by the user. This code structure is inspired by that in perf-record, and
> > tries to be consistent with it.
> >
> > Link: https://lore.kernel.org/linux-perf-users/[email protected]
> > Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> > Signed-off-by: Adri?n Herrera Arcila <[email protected]>
> > ---
> > tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> > 1 file changed, 32 insertions(+), 24 deletions(-)
>
> Looks good to me. Fixes the counter delay issue and the code is pretty
> similar to perf record now. Although I would wait for Leo's or Song's
> comment as well because they were involved.

I think I didn't notice Leo's ack, it still applies, so I'm doing it
now.

- Arnaldo

> Reviewed-by: James Clark <[email protected]>
>
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 318ffd119dad..f98c823b16dd 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -559,7 +559,7 @@ static bool handle_interval(unsigned int interval, int *times)
> > return false;
> > }
> >
> > -static int enable_counters(void)
> > +static int enable_bpf_counters(void)
> > {
> > struct evsel *evsel;
> > int err;
> > @@ -572,28 +572,6 @@ static int enable_counters(void)
> > if (err)
> > return err;
> > }
> > -
> > - if (stat_config.initial_delay < 0) {
> > - pr_info(EVLIST_DISABLED_MSG);
> > - return 0;
> > - }
> > -
> > - if (stat_config.initial_delay > 0) {
> > - pr_info(EVLIST_DISABLED_MSG);
> > - usleep(stat_config.initial_delay * USEC_PER_MSEC);
> > - }
> > -
> > - /*
> > - * We need to enable counters only if:
> > - * - we don't have tracee (attaching to task or cpu)
> > - * - we have initial delay configured
> > - */
> > - if (!target__none(&target) || stat_config.initial_delay) {
> > - if (!all_counters_use_bpf)
> > - evlist__enable(evsel_list);
> > - if (stat_config.initial_delay > 0)
> > - pr_info(EVLIST_ENABLED_MSG);
> > - }
> > return 0;
> > }
> >
> > @@ -966,10 +944,24 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > return err;
> > }
> >
> > - err = enable_counters();
> > + err = enable_bpf_counters();
> > if (err)
> > return -1;
> >
> > + /*
> > + * Enable events manually here if perf-stat is run:
> > + * 1. with a target (any of --all-cpus, --cpu, --pid or --tid)
> > + * 2. without measurement delay (no --delay)
> > + * 3. without all events associated to BPF
> > + *
> > + * This is because if run with a target, events are not enabled
> > + * on exec if a workload is passed, and because there is no delay
> > + * we ensure to enable them before the workload starts
> > + */
> > + if (!target__none(&target) && !stat_config.initial_delay &&
> > + !all_counters_use_bpf)
> > + evlist__enable(evsel_list);
> > +
> > /* Exec the command, if any */
> > if (forks)
> > evlist__start_workload(evsel_list);
> > @@ -977,6 +969,22 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > t0 = rdclock();
> > clock_gettime(CLOCK_MONOTONIC, &ref_time);
> >
> > + /*
> > + * If a measurement delay was specified, start it, and if positive,
> > + * enable events manually after. We respect the delay even if all
> > + * events are associated to BPF
> > + */
> > + if (stat_config.initial_delay) {
> > + /* At this point, events are guaranteed disabled */
> > + pr_info(EVLIST_DISABLED_MSG);
> > + if (stat_config.initial_delay > 0) {
> > + usleep(stat_config.initial_delay * USEC_PER_MSEC);
> > + if (!all_counters_use_bpf)
> > + evlist__enable(evsel_list);
> > + pr_info(EVLIST_ENABLED_MSG);
> > + }
> > + }
> > +
> > if (forks) {
> > if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
> > status = dispatch_events(forks, timeout, interval, &times);

--

- Arnaldo

2022-12-13 16:55:33

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

Hi,

On Tue, Dec 13, 2022 at 6:44 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
> >
> >
> > On 29/07/2022 17:12, Adrián Herrera Arcila wrote:
> > > The described --delay behaviour is to delay the enablement of events, but
> > > not the execution of the command, if one is passed, which is incorrectly
> > > the current behaviour.
> > >
> > > This patch decouples the enablement from the delay, and enables events
> > > before or after launching the workload dependent on the options passed
> > > by the user. This code structure is inspired by that in perf-record, and
> > > tries to be consistent with it.
> > >
> > > Link: https://lore.kernel.org/linux-perf-users/[email protected]
> > > Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> > > Signed-off-by: Adrián Herrera Arcila <[email protected]>
> > > ---
> > > tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> > > 1 file changed, 32 insertions(+), 24 deletions(-)
> >
> > Looks good to me. Fixes the counter delay issue and the code is pretty
> > similar to perf record now. Although I would wait for Leo's or Song's
> > comment as well because they were involved.
>
> I think I didn't notice Leo's ack, it still applies, so I'm doing it
> now.

I think the BPF counters should be enabled/disabled together.

Thanks,
Namhyung

2022-12-14 15:32:48

by James Clark

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour



On 13/12/2022 16:40, Namhyung Kim wrote:
> Hi,
>
> On Tue, Dec 13, 2022 at 6:44 AM Arnaldo Carvalho de Melo
> <[email protected]> wrote:
>>
>> Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
>>>
>>>
>>> On 29/07/2022 17:12, Adrián Herrera Arcila wrote:
>>>> The described --delay behaviour is to delay the enablement of events, but
>>>> not the execution of the command, if one is passed, which is incorrectly
>>>> the current behaviour.
>>>>
>>>> This patch decouples the enablement from the delay, and enables events
>>>> before or after launching the workload dependent on the options passed
>>>> by the user. This code structure is inspired by that in perf-record, and
>>>> tries to be consistent with it.
>>>>
>>>> Link: https://lore.kernel.org/linux-perf-users/[email protected]
>>>> Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
>>>> Signed-off-by: Adrián Herrera Arcila <[email protected]>
>>>> ---
>>>> tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
>>>> 1 file changed, 32 insertions(+), 24 deletions(-)
>>>
>>> Looks good to me. Fixes the counter delay issue and the code is pretty
>>> similar to perf record now. Although I would wait for Leo's or Song's
>>> comment as well because they were involved.
>>
>> I think I didn't notice Leo's ack, it still applies, so I'm doing it
>> now.
>
> I think the BPF counters should be enabled/disabled together.

I did notice that difference between the two, but I wasn't sure of the
exact reason that it was done that way on Adrián's version. It seems
like it's not separated in perf record so maybe you are right.

>
> Thanks,
> Namhyung

2022-12-14 15:48:46

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

Em Tue, Dec 13, 2022 at 08:40:31AM -0800, Namhyung Kim escreveu:
> Hi,
>
> On Tue, Dec 13, 2022 at 6:44 AM Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> >
> > Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
> > >
> > >
> > > On 29/07/2022 17:12, Adri?n Herrera Arcila wrote:
> > > > The described --delay behaviour is to delay the enablement of events, but
> > > > not the execution of the command, if one is passed, which is incorrectly
> > > > the current behaviour.
> > > >
> > > > This patch decouples the enablement from the delay, and enables events
> > > > before or after launching the workload dependent on the options passed
> > > > by the user. This code structure is inspired by that in perf-record, and
> > > > tries to be consistent with it.
> > > >
> > > > Link: https://lore.kernel.org/linux-perf-users/[email protected]
> > > > Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> > > > Signed-off-by: Adri?n Herrera Arcila <[email protected]>
> > > > ---
> > > > tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> > > > 1 file changed, 32 insertions(+), 24 deletions(-)
> > >
> > > Looks good to me. Fixes the counter delay issue and the code is pretty
> > > similar to perf record now. Although I would wait for Leo's or Song's
> > > comment as well because they were involved.
> >
> > I think I didn't notice Leo's ack, it still applies, so I'm doing it
> > now.
>
> I think the BPF counters should be enabled/disabled together.

Ok, so I removed this one and applied Namhyung's.

- Arnaldo

2022-12-14 16:37:42

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

On Wed, Dec 14, 2022 at 11:41:55AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 13, 2022 at 08:40:31AM -0800, Namhyung Kim escreveu:
> > Hi,
> >
> > On Tue, Dec 13, 2022 at 6:44 AM Arnaldo Carvalho de Melo
> > <[email protected]> wrote:
> > >
> > > Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
> > > >
> > > >
> > > > On 29/07/2022 17:12, Adri?n Herrera Arcila wrote:
> > > > > The described --delay behaviour is to delay the enablement of events, but
> > > > > not the execution of the command, if one is passed, which is incorrectly
> > > > > the current behaviour.
> > > > >
> > > > > This patch decouples the enablement from the delay, and enables events
> > > > > before or after launching the workload dependent on the options passed
> > > > > by the user. This code structure is inspired by that in perf-record, and
> > > > > tries to be consistent with it.
> > > > >
> > > > > Link: https://lore.kernel.org/linux-perf-users/[email protected]
> > > > > Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> > > > > Signed-off-by: Adri?n Herrera Arcila <[email protected]>
> > > > > ---
> > > > > tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> > > > > 1 file changed, 32 insertions(+), 24 deletions(-)
> > > >
> > > > Looks good to me. Fixes the counter delay issue and the code is pretty
> > > > similar to perf record now. Although I would wait for Leo's or Song's
> > > > comment as well because they were involved.
> > >
> > > I think I didn't notice Leo's ack, it still applies, so I'm doing it
> > > now.
> >
> > I think the BPF counters should be enabled/disabled together.
>
> Ok, so I removed this one and applied Namhyung's.

I can guess why Adri?n doesn't enable/disable BPF counters together :)

Since 'perf stat' doesn't enable BPF counters with other normal PMU
events in the first place, I believe this is deliberately by Song's
patch fa853c4b839e ("perf stat: Enable counting events for BPF
programs"), it says:

"'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps".

IIUC, when loading eBPF (counter) program, perf tool needs to handle
eBPF program map specially (so that perf tool can know the latest eBPF
program's map in kernel).

I don't know anything for eBPF counter, so this is why I am still a bit
puzzle which way is right to do (bind vs separate eBPF counters). But
I personally prefer to let eBPF counter to respect delay, so it's fine
for me to apply Namhyung's patch.

Thanks,
Leo

2022-12-14 17:42:56

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

Em Wed, Dec 14, 2022 at 11:57:21PM +0800, Leo Yan escreveu:
> On Wed, Dec 14, 2022 at 11:41:55AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Dec 13, 2022 at 08:40:31AM -0800, Namhyung Kim escreveu:
> > > Hi,
> > >
> > > On Tue, Dec 13, 2022 at 6:44 AM Arnaldo Carvalho de Melo
> > > <[email protected]> wrote:
> > > >
> > > > Em Mon, Aug 01, 2022 at 09:20:37AM +0100, James Clark escreveu:
> > > > >
> > > > >
> > > > > On 29/07/2022 17:12, Adri?n Herrera Arcila wrote:
> > > > > > The described --delay behaviour is to delay the enablement of events, but
> > > > > > not the execution of the command, if one is passed, which is incorrectly
> > > > > > the current behaviour.
> > > > > >
> > > > > > This patch decouples the enablement from the delay, and enables events
> > > > > > before or after launching the workload dependent on the options passed
> > > > > > by the user. This code structure is inspired by that in perf-record, and
> > > > > > tries to be consistent with it.
> > > > > >
> > > > > > Link: https://lore.kernel.org/linux-perf-users/[email protected]
> > > > > > Fixes: d0a0a511493d ("perf stat: Fix forked applications enablement of counters")
> > > > > > Signed-off-by: Adri?n Herrera Arcila <[email protected]>
> > > > > > ---
> > > > > > tools/perf/builtin-stat.c | 56 ++++++++++++++++++++++-----------------
> > > > > > 1 file changed, 32 insertions(+), 24 deletions(-)
> > > > >
> > > > > Looks good to me. Fixes the counter delay issue and the code is pretty
> > > > > similar to perf record now. Although I would wait for Leo's or Song's
> > > > > comment as well because they were involved.
> > > >
> > > > I think I didn't notice Leo's ack, it still applies, so I'm doing it
> > > > now.
> > >
> > > I think the BPF counters should be enabled/disabled together.
> >
> > Ok, so I removed this one and applied Namhyung's.
>
> I can guess why Adri?n doesn't enable/disable BPF counters together :)
>
> Since 'perf stat' doesn't enable BPF counters with other normal PMU
> events in the first place, I believe this is deliberately by Song's
> patch fa853c4b839e ("perf stat: Enable counting events for BPF
> programs"), it says:
>
> "'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
> programs (monitor-progs) to the target BPF program (target-prog). The
> monitor-progs read perf_event before and after the target-prog, and
> aggregate the difference in a BPF map. Then the user space reads data
> from these maps".
>
> IIUC, when loading eBPF (counter) program, perf tool needs to handle
> eBPF program map specially (so that perf tool can know the latest eBPF
> program's map in kernel).
>
> I don't know anything for eBPF counter, so this is why I am still a bit
> puzzle which way is right to do (bind vs separate eBPF counters). But
> I personally prefer to let eBPF counter to respect delay, so it's fine
> for me to apply Namhyung's patch.

"I'm fine" can be read as an Acked-by, right? :-)

- Arnaldo

2022-12-15 02:03:51

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf stat: fix unexpected delay behaviour

On Wed, Dec 14, 2022 at 02:35:01PM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > I don't know anything for eBPF counter, so this is why I am still a bit
> > puzzle which way is right to do (bind vs separate eBPF counters). But
> > I personally prefer to let eBPF counter to respect delay, so it's fine
> > for me to apply Namhyung's patch.
>
> "I'm fine" can be read as an Acked-by, right? :-)

Yes, have sent my Reviewed tag on Namhyung's patch.

Thanks,
Leo