2021-06-21 11:28:52

by John Garry

[permalink] [raw]
Subject: perf tool: About tests debug level

Hi guys,

I noticed that when it was discovered recently that the new icelake JSON
did not pass "PMU events" test, running the test with -v makes pr_err()
and pr_debug() come out at the same level, so it's hard to distinguish
the important logs.

Here is a link:
https://lore.kernel.org/lkml/YLdq%[email protected]/

And here is an extract:

parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@
/ 1000000000 / duration_time'
parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single +
fp_arith_inst_retired.scalar_double ) + 2 *
fp_arith_inst_retired.128b_packed_double + 4 * (
fp_arith_inst_retired.128b_packed_single +
fp_arith_inst_retired.256b_packed_double ) + 8 * (
fp_arith_inst_retired.256b_packed_single +
fp_arith_inst_retired.512b_packed_double ) + 16 *
fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
parsing '1 - cpu_clk_unhalted.one_thread_active /
cpu_clk_unhalted.ref_distributed'
parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
parsing '( 64 * ( uncore_imc@cas_count_read@ +
uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ /
duration_time )'
parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
parsing '( 1000000000 * (
cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm /
cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@
)' [pr_debug]
check_parse_fake failed [pr_err]
test child finished with -1
---- end ----

I annotated in [], above

As for another issue, if you consider "Parse and process metrics", debug
from core code comes out at same level as test code, e.g. with -v, we
see pr_debug() from test code and core code. Again, this makes it hard
to distinguish various levels. As an illustration, here I hack the code
to fail a test:

sudo ./perf test -v 68
68: Parse and process metrics :
--- start ---
test child forked, pid 9747
metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
found event inst_retired.any verbose=1 pr_debug
found event inst_retired.any verbose=1 pr_err
found event cpu_clk_unhalted.thread verbose=1 pr_debug
found event cpu_clk_unhalted.thread verbose=1 pr_err
adding {inst_retired.any,cpu_clk_unhalted.thread}:W
FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
FAILED tests/parse-metric.c:374 IPC failed
test child finished with -1
---- end ----
Parse and process metrics: FAILED!

Note that the "FAILED" messages from the test code come from pr_debug().

In a way, I feel that pr_debug()/err from the test is more important
than pr_debug() from the core code (when running a test).

Any opinion on this or how to improve (if anyone agrees with me)? Or am
I missing something? Or is it not so important?

Thanks,
John


2021-06-22 05:06:28

by Ian Rogers

[permalink] [raw]
Subject: Re: perf tool: About tests debug level

On Mon, Jun 21, 2021 at 4:26 AM John Garry <[email protected]> wrote:
>
> Hi guys,
>
> I noticed that when it was discovered recently that the new icelake JSON
> did not pass "PMU events" test, running the test with -v makes pr_err()
> and pr_debug() come out at the same level, so it's hard to distinguish
> the important logs.
>
> Here is a link:
> https://lore.kernel.org/lkml/YLdq%[email protected]/
>
> And here is an extract:
>
> parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@
> / 1000000000 / duration_time'
> parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single +
> fp_arith_inst_retired.scalar_double ) + 2 *
> fp_arith_inst_retired.128b_packed_double + 4 * (
> fp_arith_inst_retired.128b_packed_single +
> fp_arith_inst_retired.256b_packed_double ) + 8 * (
> fp_arith_inst_retired.256b_packed_single +
> fp_arith_inst_retired.512b_packed_double ) + 16 *
> fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
> parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
> parsing '1 - cpu_clk_unhalted.one_thread_active /
> cpu_clk_unhalted.ref_distributed'
> parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
> parsing '( 64 * ( uncore_imc@cas_count_read@ +
> uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
> parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ /
> duration_time )'
> parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
> parsing '( 1000000000 * (
> cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@
> )' [pr_debug]
> check_parse_fake failed [pr_err]
> test child finished with -1
> ---- end ----
>
> I annotated in [], above
>
> As for another issue, if you consider "Parse and process metrics", debug
> from core code comes out at same level as test code, e.g. with -v, we
> see pr_debug() from test code and core code. Again, this makes it hard
> to distinguish various levels. As an illustration, here I hack the code
> to fail a test:
>
> sudo ./perf test -v 68
> 68: Parse and process metrics :
> --- start ---
> test child forked, pid 9747
> metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
> found event inst_retired.any verbose=1 pr_debug
> found event inst_retired.any verbose=1 pr_err
> found event cpu_clk_unhalted.thread verbose=1 pr_debug
> found event cpu_clk_unhalted.thread verbose=1 pr_err
> adding {inst_retired.any,cpu_clk_unhalted.thread}:W
> FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
> FAILED tests/parse-metric.c:374 IPC failed
> test child finished with -1
> ---- end ----
> Parse and process metrics: FAILED!
>
> Note that the "FAILED" messages from the test code come from pr_debug().
>
> In a way, I feel that pr_debug()/err from the test is more important
> than pr_debug() from the core code (when running a test).
>
> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> I missing something? Or is it not so important?

Hi John,

I think the issue is that in the parsing you don't know it's broken
until something goes wrong. Putting everything on pr_err would cause
spam in the not broken case. Improving the parsing error handling is a
big task with lex and yacc to some extent getting in the way. Perhaps
a middle way is to have a parameter to the parser that logs more, and
recursively call this in the parser when parsing fails. I guess there
is also a danger of a performance hit.

Thanks,
Ian

> Thanks,
> John

2021-06-22 12:00:01

by John Garry

[permalink] [raw]
Subject: Re: perf tool: About tests debug level

On 22/06/2021 06:04, Ian Rogers wrote:
>> ---- end ----
>> Parse and process metrics: FAILED!
>>
>> Note that the "FAILED" messages from the test code come from pr_debug().
>>
>> In a way, I feel that pr_debug()/err from the test is more important
>> than pr_debug() from the core code (when running a test).
>>
>> Any opinion on this or how to improve (if anyone agrees with me)? Or am
>> I missing something? Or is it not so important?
> Hi John,
>

Hi Ian,

> I think the issue is that in the parsing you don't know it's broken
> until something goes wrong. Putting everything on pr_err would cause
> spam in the not broken case.

Right, I would not suggest using pr_err everywhere.

> Improving the parsing error handling is a
> big task with lex and yacc to some extent getting in the way. Perhaps
> a middle way is to have a parameter to the parser that logs more, and
> recursively call this in the parser when parsing fails. I guess there
> is also a danger of a performance hit.

So I am thinking that for running a test, -v means different levels logs
for test code and for core (non-test code). For example, -v prints
pr_warn() and higher for test logs, but nothing for core logs. And then
-vv for running a test gives pr_debug and above for test logs, and
pr_warn and above for core logs. Or something like that.

Maybe that is not a good idea. But I'm just saying that it's hard to
debug currently at -v for tests.

Thanks,
John

2021-06-22 16:01:38

by Ian Rogers

[permalink] [raw]
Subject: Re: perf tool: About tests debug level

On Tue, Jun 22, 2021 at 4:58 AM John Garry <[email protected]> wrote:
>
> On 22/06/2021 06:04, Ian Rogers wrote:
> >> ---- end ----
> >> Parse and process metrics: FAILED!
> >>
> >> Note that the "FAILED" messages from the test code come from pr_debug().
> >>
> >> In a way, I feel that pr_debug()/err from the test is more important
> >> than pr_debug() from the core code (when running a test).
> >>
> >> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> >> I missing something? Or is it not so important?
> > Hi John,
> >
>
> Hi Ian,
>
> > I think the issue is that in the parsing you don't know it's broken
> > until something goes wrong. Putting everything on pr_err would cause
> > spam in the not broken case.
>
> Right, I would not suggest using pr_err everywhere.
>
> > Improving the parsing error handling is a
> > big task with lex and yacc to some extent getting in the way. Perhaps
> > a middle way is to have a parameter to the parser that logs more, and
> > recursively call this in the parser when parsing fails. I guess there
> > is also a danger of a performance hit.
>
> So I am thinking that for running a test, -v means different levels logs
> for test code and for core (non-test code). For example, -v prints
> pr_warn() and higher for test logs, but nothing for core logs. And then
> -vv for running a test gives pr_debug and above for test logs, and
> pr_warn and above for core logs. Or something like that.
>
> Maybe that is not a good idea. But I'm just saying that it's hard to
> debug currently at -v for tests.
>
> Thanks,
> John

I think this sounds good. It'd be nice also to have verbose output in
the shell tests following the same convention. There's currently no
verbose logging in shell tests but I propose it here:
https://lore.kernel.org/lkml/[email protected]/
By their nature some of the shell tests launch perf, perhaps there can
be some convention on passing the verbose flag through in those cases.

Thanks,
Ian

2021-06-22 17:44:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: perf tool: About tests debug level

Em Tue, Jun 22, 2021 at 09:00:31AM -0700, Ian Rogers escreveu:
> On Tue, Jun 22, 2021 at 4:58 AM John Garry <[email protected]> wrote:
> >
> > On 22/06/2021 06:04, Ian Rogers wrote:
> > >> ---- end ----
> > >> Parse and process metrics: FAILED!
> > >>
> > >> Note that the "FAILED" messages from the test code come from pr_debug().
> > >>
> > >> In a way, I feel that pr_debug()/err from the test is more important
> > >> than pr_debug() from the core code (when running a test).
> > >>
> > >> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> > >> I missing something? Or is it not so important?
> > > Hi John,
> > >
> >
> > Hi Ian,
> >
> > > I think the issue is that in the parsing you don't know it's broken
> > > until something goes wrong. Putting everything on pr_err would cause
> > > spam in the not broken case.
> >
> > Right, I would not suggest using pr_err everywhere.
> >
> > > Improving the parsing error handling is a
> > > big task with lex and yacc to some extent getting in the way. Perhaps
> > > a middle way is to have a parameter to the parser that logs more, and
> > > recursively call this in the parser when parsing fails. I guess there
> > > is also a danger of a performance hit.
> >
> > So I am thinking that for running a test, -v means different levels logs
> > for test code and for core (non-test code). For example, -v prints
> > pr_warn() and higher for test logs, but nothing for core logs. And then
> > -vv for running a test gives pr_debug and above for test logs, and
> > pr_warn and above for core logs. Or something like that.
> >
> > Maybe that is not a good idea. But I'm just saying that it's hard to
> > debug currently at -v for tests.
> >
> > Thanks,
> > John
>
> I think this sounds good. It'd be nice also to have verbose output in
> the shell tests following the same convention. There's currently no
> verbose logging in shell tests but I propose it here:
> https://lore.kernel.org/lkml/[email protected]/
> By their nature some of the shell tests launch perf, perhaps there can
> be some convention on passing the verbose flag through in those cases.

Hey, there is even a v2 for that one, lemme process it :-)

- Arnaldo