Hi guys,
I noticed that when it was discovered recently that the new icelake JSON
did not pass "PMU events" test, running the test with -v makes pr_err()
and pr_debug() come out at the same level, so it's hard to distinguish
the important logs.
Here is a link:
https://lore.kernel.org/lkml/YLdq%[email protected]/
And here is an extract:
parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@
/ 1000000000 / duration_time'
parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single +
fp_arith_inst_retired.scalar_double ) + 2 *
fp_arith_inst_retired.128b_packed_double + 4 * (
fp_arith_inst_retired.128b_packed_single +
fp_arith_inst_retired.256b_packed_double ) + 8 * (
fp_arith_inst_retired.256b_packed_single +
fp_arith_inst_retired.512b_packed_double ) + 16 *
fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
parsing '1 - cpu_clk_unhalted.one_thread_active /
cpu_clk_unhalted.ref_distributed'
parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
parsing '( 64 * ( uncore_imc@cas_count_read@ +
uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ /
duration_time )'
parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
parsing '( 1000000000 * (
cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm /
cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@
)' [pr_debug]
check_parse_fake failed [pr_err]
test child finished with -1
---- end ----
I annotated in [], above
As for another issue, if you consider "Parse and process metrics", debug
from core code comes out at same level as test code, e.g. with -v, we
see pr_debug() from test code and core code. Again, this makes it hard
to distinguish various levels. As an illustration, here I hack the code
to fail a test:
sudo ./perf test -v 68
68: Parse and process metrics :
--- start ---
test child forked, pid 9747
metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
found event inst_retired.any verbose=1 pr_debug
found event inst_retired.any verbose=1 pr_err
found event cpu_clk_unhalted.thread verbose=1 pr_debug
found event cpu_clk_unhalted.thread verbose=1 pr_err
adding {inst_retired.any,cpu_clk_unhalted.thread}:W
FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
FAILED tests/parse-metric.c:374 IPC failed
test child finished with -1
---- end ----
Parse and process metrics: FAILED!
Note that the "FAILED" messages from the test code come from pr_debug().
In a way, I feel that pr_debug()/err from the test is more important
than pr_debug() from the core code (when running a test).
Any opinion on this or how to improve (if anyone agrees with me)? Or am
I missing something? Or is it not so important?
Thanks,
John
On Mon, Jun 21, 2021 at 4:26 AM John Garry <[email protected]> wrote:
>
> Hi guys,
>
> I noticed that when it was discovered recently that the new icelake JSON
> did not pass "PMU events" test, running the test with -v makes pr_err()
> and pr_debug() come out at the same level, so it's hard to distinguish
> the important logs.
>
> Here is a link:
> https://lore.kernel.org/lkml/YLdq%[email protected]/
>
> And here is an extract:
>
> parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@
> / 1000000000 / duration_time'
> parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single +
> fp_arith_inst_retired.scalar_double ) + 2 *
> fp_arith_inst_retired.128b_packed_double + 4 * (
> fp_arith_inst_retired.128b_packed_single +
> fp_arith_inst_retired.256b_packed_double ) + 8 * (
> fp_arith_inst_retired.256b_packed_single +
> fp_arith_inst_retired.512b_packed_double ) + 16 *
> fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
> parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
> parsing '1 - cpu_clk_unhalted.one_thread_active /
> cpu_clk_unhalted.ref_distributed'
> parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
> parsing '( 64 * ( uncore_imc@cas_count_read@ +
> uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
> parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ /
> duration_time )'
> parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
> parsing '( 1000000000 * (
> cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@
> )' [pr_debug]
> check_parse_fake failed [pr_err]
> test child finished with -1
> ---- end ----
>
> I annotated in [], above
>
> As for another issue, if you consider "Parse and process metrics", debug
> from core code comes out at same level as test code, e.g. with -v, we
> see pr_debug() from test code and core code. Again, this makes it hard
> to distinguish various levels. As an illustration, here I hack the code
> to fail a test:
>
> sudo ./perf test -v 68
> 68: Parse and process metrics :
> --- start ---
> test child forked, pid 9747
> metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
> found event inst_retired.any verbose=1 pr_debug
> found event inst_retired.any verbose=1 pr_err
> found event cpu_clk_unhalted.thread verbose=1 pr_debug
> found event cpu_clk_unhalted.thread verbose=1 pr_err
> adding {inst_retired.any,cpu_clk_unhalted.thread}:W
> FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
> FAILED tests/parse-metric.c:374 IPC failed
> test child finished with -1
> ---- end ----
> Parse and process metrics: FAILED!
>
> Note that the "FAILED" messages from the test code come from pr_debug().
>
> In a way, I feel that pr_debug()/err from the test is more important
> than pr_debug() from the core code (when running a test).
>
> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> I missing something? Or is it not so important?
Hi John,
I think the issue is that in the parsing you don't know it's broken
until something goes wrong. Putting everything on pr_err would cause
spam in the not broken case. Improving the parsing error handling is a
big task with lex and yacc to some extent getting in the way. Perhaps
a middle way is to have a parameter to the parser that logs more, and
recursively call this in the parser when parsing fails. I guess there
is also a danger of a performance hit.
Thanks,
Ian
> Thanks,
> John
On 22/06/2021 06:04, Ian Rogers wrote:
>> ---- end ----
>> Parse and process metrics: FAILED!
>>
>> Note that the "FAILED" messages from the test code come from pr_debug().
>>
>> In a way, I feel that pr_debug()/err from the test is more important
>> than pr_debug() from the core code (when running a test).
>>
>> Any opinion on this or how to improve (if anyone agrees with me)? Or am
>> I missing something? Or is it not so important?
> Hi John,
>
Hi Ian,
> I think the issue is that in the parsing you don't know it's broken
> until something goes wrong. Putting everything on pr_err would cause
> spam in the not broken case.
Right, I would not suggest using pr_err everywhere.
> Improving the parsing error handling is a
> big task with lex and yacc to some extent getting in the way. Perhaps
> a middle way is to have a parameter to the parser that logs more, and
> recursively call this in the parser when parsing fails. I guess there
> is also a danger of a performance hit.
So I am thinking that for running a test, -v means different levels logs
for test code and for core (non-test code). For example, -v prints
pr_warn() and higher for test logs, but nothing for core logs. And then
-vv for running a test gives pr_debug and above for test logs, and
pr_warn and above for core logs. Or something like that.
Maybe that is not a good idea. But I'm just saying that it's hard to
debug currently at -v for tests.
Thanks,
John
On Tue, Jun 22, 2021 at 4:58 AM John Garry <[email protected]> wrote:
>
> On 22/06/2021 06:04, Ian Rogers wrote:
> >> ---- end ----
> >> Parse and process metrics: FAILED!
> >>
> >> Note that the "FAILED" messages from the test code come from pr_debug().
> >>
> >> In a way, I feel that pr_debug()/err from the test is more important
> >> than pr_debug() from the core code (when running a test).
> >>
> >> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> >> I missing something? Or is it not so important?
> > Hi John,
> >
>
> Hi Ian,
>
> > I think the issue is that in the parsing you don't know it's broken
> > until something goes wrong. Putting everything on pr_err would cause
> > spam in the not broken case.
>
> Right, I would not suggest using pr_err everywhere.
>
> > Improving the parsing error handling is a
> > big task with lex and yacc to some extent getting in the way. Perhaps
> > a middle way is to have a parameter to the parser that logs more, and
> > recursively call this in the parser when parsing fails. I guess there
> > is also a danger of a performance hit.
>
> So I am thinking that for running a test, -v means different levels logs
> for test code and for core (non-test code). For example, -v prints
> pr_warn() and higher for test logs, but nothing for core logs. And then
> -vv for running a test gives pr_debug and above for test logs, and
> pr_warn and above for core logs. Or something like that.
>
> Maybe that is not a good idea. But I'm just saying that it's hard to
> debug currently at -v for tests.
>
> Thanks,
> John
I think this sounds good. It'd be nice also to have verbose output in
the shell tests following the same convention. There's currently no
verbose logging in shell tests but I propose it here:
https://lore.kernel.org/lkml/[email protected]/
By their nature some of the shell tests launch perf, perhaps there can
be some convention on passing the verbose flag through in those cases.
Thanks,
Ian
Em Tue, Jun 22, 2021 at 09:00:31AM -0700, Ian Rogers escreveu:
> On Tue, Jun 22, 2021 at 4:58 AM John Garry <[email protected]> wrote:
> >
> > On 22/06/2021 06:04, Ian Rogers wrote:
> > >> ---- end ----
> > >> Parse and process metrics: FAILED!
> > >>
> > >> Note that the "FAILED" messages from the test code come from pr_debug().
> > >>
> > >> In a way, I feel that pr_debug()/err from the test is more important
> > >> than pr_debug() from the core code (when running a test).
> > >>
> > >> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> > >> I missing something? Or is it not so important?
> > > Hi John,
> > >
> >
> > Hi Ian,
> >
> > > I think the issue is that in the parsing you don't know it's broken
> > > until something goes wrong. Putting everything on pr_err would cause
> > > spam in the not broken case.
> >
> > Right, I would not suggest using pr_err everywhere.
> >
> > > Improving the parsing error handling is a
> > > big task with lex and yacc to some extent getting in the way. Perhaps
> > > a middle way is to have a parameter to the parser that logs more, and
> > > recursively call this in the parser when parsing fails. I guess there
> > > is also a danger of a performance hit.
> >
> > So I am thinking that for running a test, -v means different levels logs
> > for test code and for core (non-test code). For example, -v prints
> > pr_warn() and higher for test logs, but nothing for core logs. And then
> > -vv for running a test gives pr_debug and above for test logs, and
> > pr_warn and above for core logs. Or something like that.
> >
> > Maybe that is not a good idea. But I'm just saying that it's hard to
> > debug currently at -v for tests.
> >
> > Thanks,
> > John
>
> I think this sounds good. It'd be nice also to have verbose output in
> the shell tests following the same convention. There's currently no
> verbose logging in shell tests but I propose it here:
> https://lore.kernel.org/lkml/[email protected]/
> By their nature some of the shell tests launch perf, perhaps there can
> be some convention on passing the verbose flag through in those cases.
Hey, there is even a v2 for that one, lemme process it :-)
- Arnaldo