2020-04-22 07:50:24

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 0/8] perf metric fixes and test

Add a test that all perf metrics (for your architecture) are
parsable. Fix bugs in the expr parser and in x86 metrics. Untested on
architectures other than x86.

Ian Rogers (8):
perf expr: unlimited escaped characters in a symbol
perf metrics: fix parse errors in cascade lake metrics
perf metrics: fix parse errors in skylake metrics
perf expr: allow ',' to be an other token
perf expr: increase max other
perf expr: parse numbers as doubles
perf expr: debug lex if debugging yaxx
perf test: add expr test for pmu metrics

.../arch/x86/cascadelakex/clx-metrics.json | 10 +-
.../arch/x86/skylakex/skx-metrics.json | 4 +-
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/expr.c | 91 ++++++++++++++++++-
tools/perf/tests/tests.h | 2 +
tools/perf/util/expr.c | 1 +
tools/perf/util/expr.h | 2 +-
tools/perf/util/expr.l | 16 ++--
tools/perf/util/expr.y | 2 +-
9 files changed, 115 insertions(+), 18 deletions(-)

--
2.26.2.303.gf8c07b1a785-goog


2020-04-22 07:50:42

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 8/8] perf test: add expr test for pmu metrics

Add basic floating point number test.
Verify that all pmu metrics parse.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/builtin-test.c | 5 ++
tools/perf/tests/expr.c | 91 ++++++++++++++++++++++++++++++++-
tools/perf/tests/tests.h | 2 +
3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b6322eb0f423..28d547951f6b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -63,6 +63,11 @@ static struct test generic_tests[] = {
{
.desc = "Simple expression parser",
.func = test__expr,
+ .subtest = {
+ .get_nr = test__expr_subtest_get_nr,
+ .get_desc = test__expr_subtest_get_desc,
+ },
+
},
{
.desc = "PERF_RECORD_* events & perf_sample fields",
diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index ea10fc4412c4..125f9b040e20 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -1,9 +1,11 @@
// SPDX-License-Identifier: GPL-2.0
+#include "pmu-events/pmu-events.h"
#include "util/debug.h"
#include "util/expr.h"
#include "tests.h"
#include <stdlib.h>
#include <string.h>
+#include <linux/kernel.h>
#include <linux/zalloc.h>

static int test(struct expr_parse_ctx *ctx, const char *e, double val2)
@@ -16,7 +18,7 @@ static int test(struct expr_parse_ctx *ctx, const char *e, double val2)
return 0;
}

-int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
+static int parse_tests(void)
{
const char *p;
const char **other;
@@ -39,6 +41,7 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
ret |= test(&ctx, "min(1,2) + 1", 2);
ret |= test(&ctx, "max(1,2) + 1", 3);
ret |= test(&ctx, "1+1 if 3*4 else 0", 2);
+ ret |= test(&ctx, "1.1 + 2.1", 3.2);

if (ret)
return ret;
@@ -65,3 +68,89 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)

return 0;
}
+
+static int pmu_tests(void)
+{
+ struct pmu_events_map *map;
+ struct pmu_event *pe;
+ int i, j, k;
+ const char **ids;
+ int idnum;
+ int ret = 0;
+ struct expr_parse_ctx ctx;
+ double result;
+
+ i = 0;
+ for (;;) {
+ map = &pmu_events_map[i++];
+ if (!map->table) {
+ map = NULL;
+ break;
+ }
+ j = 0;
+ for (;;) {
+ pe = &map->table[j++];
+ if (!pe->name && !pe->metric_group && !pe->metric_name)
+ break;
+ if (!pe->metric_expr)
+ continue;
+ if (expr__find_other(pe->metric_expr, NULL,
+ &ids, &idnum) < 0) {
+ pr_debug("Parse other failed for map %s %s %s\n",
+ map->cpuid, map->version, map->type);
+ pr_debug("On metric %s\n", pe->metric_name);
+ pr_debug("On expression %s\n", pe->metric_expr);
+ ret++;
+ continue;
+ }
+ expr__ctx_init(&ctx);
+ for (k = 0; k < idnum; k++)
+ expr__add_id(&ctx, ids[k], 1);
+
+ if (expr__parse(&result, &ctx, pe->metric_expr)) {
+ pr_debug("Parse failed for map %s %s %s\n",
+ map->cpuid, map->version, map->type);
+ pr_debug("On metric %s\n", pe->metric_name);
+ pr_debug("On expression %s\n", pe->metric_expr);
+ ret++;
+ }
+ for (k = 0; k < idnum; k++)
+ zfree(&ids[k]);
+ free(ids);
+ }
+ }
+ return ret;
+}
+
+static const struct {
+ int (*func)(void);
+ const char *desc;
+} expr_testcase_table[] = {
+ {
+ .func = parse_tests,
+ .desc = "Basic expressions",
+ },
+ {
+ .func = pmu_tests,
+ .desc = "Parsing of pmu metrics",
+ },
+};
+
+const char *test__expr_subtest_get_desc(int i)
+{
+ if (i < 0 || i >= (int)ARRAY_SIZE(expr_testcase_table))
+ return NULL;
+ return expr_testcase_table[i].desc;
+}
+
+int test__expr_subtest_get_nr(void)
+{
+ return (int)ARRAY_SIZE(expr_testcase_table);
+}
+
+int test__expr(struct test *test __maybe_unused, int i __maybe_unused)
+{
+ if (i < 0 || i >= (int)ARRAY_SIZE(expr_testcase_table))
+ return TEST_FAIL;
+ return expr_testcase_table[i].func();
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 61a1ab032080..315d64ffd14c 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -72,6 +72,8 @@ int test__keep_tracking(struct test *test, int subtest);
int test__parse_no_sample_id_all(struct test *test, int subtest);
int test__dwarf_unwind(struct test *test, int subtest);
int test__expr(struct test *test, int subtest);
+const char *test__expr_subtest_get_desc(int subtest);
+int test__expr_subtest_get_nr(void);
int test__hists_filter(struct test *test, int subtest);
int test__mmap_thread_lookup(struct test *test, int subtest);
int test__thread_maps_share(struct test *test, int subtest);
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 07:50:47

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 7/8] perf expr: debug lex if debugging yaxx

Only effects parser debugging (disabled by default). Enables displaying
'--accepting rule at line .. ("...").

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/expr.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index c3382d58cf40..8694bec86f1a 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -44,6 +44,7 @@ __expr__parse(double *val, struct expr_parse_ctx *ctx, const char *expr,

#ifdef PARSER_DEBUG
expr_debug = 1;
+ expr_set_debug(1, scanner);
#endif

ret = expr_parse(val, ctx, scanner);
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 07:50:47

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 6/8] perf expr: parse numbers as doubles

This is expected in expr.y and metrics use floating point values such as
x86 broadwell IFetch_Line_Utilization.

Fixes: 26226a97724d (perf expr: Move expr lexer to flex)
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/expr.l | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
index 95bcf3629edf..0efda2ce2766 100644
--- a/tools/perf/util/expr.l
+++ b/tools/perf/util/expr.l
@@ -10,12 +10,12 @@
char *expr_get_text(yyscan_t yyscanner);
YYSTYPE *expr_get_lval(yyscan_t yyscanner);

-static int __value(YYSTYPE *yylval, char *str, int base, int token)
+static double __value(YYSTYPE *yylval, char *str, int token)
{
- u64 num;
+ double num;

errno = 0;
- num = strtoull(str, NULL, base);
+ num = strtod(str, NULL);
if (errno)
return EXPR_ERROR;

@@ -23,12 +23,12 @@ static int __value(YYSTYPE *yylval, char *str, int base, int token)
return token;
}

-static int value(yyscan_t scanner, int base)
+static int value(yyscan_t scanner)
{
YYSTYPE *yylval = expr_get_lval(scanner);
char *text = expr_get_text(scanner);

- return __value(yylval, text, base, NUMBER);
+ return __value(yylval, text, NUMBER);
}

/*
@@ -68,7 +68,7 @@ static int str(yyscan_t scanner, int token)
}
%}

-number [0-9]+
+number [0-9]*\.?[0-9]+

sch [-,=]
spec \\{sch}
@@ -92,7 +92,7 @@ min { return MIN; }
if { return IF; }
else { return ELSE; }
#smt_on { return SMT_ON; }
-{number} { return value(yyscanner, 10); }
+{number} { return value(yyscanner); }
{symbol} { return str(yyscanner, ID); }
"|" { return '|'; }
"^" { return '^'; }
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 07:51:06

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 5/8] perf expr: increase max other

Large metrics such as Branch_Misprediction_Cost_SMT on x86 broadwell
need more space.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/expr.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
index 0938ad166ece..4938bfc608b7 100644
--- a/tools/perf/util/expr.h
+++ b/tools/perf/util/expr.h
@@ -2,7 +2,7 @@
#ifndef PARSE_CTX_H
#define PARSE_CTX_H 1

-#define EXPR_MAX_OTHER 20
+#define EXPR_MAX_OTHER 64
#define MAX_PARSE_ID EXPR_MAX_OTHER

struct expr_parse_id {
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 07:51:10

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 4/8] perf expr: allow ',' to be an other token

Corrects parse errors in expr__find_other of expressions with min.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/expr.y | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y
index cd17486c1c5d..54260094b947 100644
--- a/tools/perf/util/expr.y
+++ b/tools/perf/util/expr.y
@@ -80,7 +80,7 @@ other: ID
ctx->ids[ctx->num_ids++].name = $1;
}
|
-MIN | MAX | IF | ELSE | SMT_ON | NUMBER | '|' | '^' | '&' | '-' | '+' | '*' | '/' | '%' | '(' | ')'
+MIN | MAX | IF | ELSE | SMT_ON | NUMBER | '|' | '^' | '&' | '-' | '+' | '*' | '/' | '%' | '(' | ')' | ','


all_expr: if_expr { *final_val = $1; }
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 07:53:26

by Ian Rogers

[permalink] [raw]
Subject: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

Remove over escaping with \\.
Remove extraneous if 1 if 0 == 1 else 0 else 0.

Signed-off-by: Ian Rogers <[email protected]>
---
.../pmu-events/arch/x86/cascadelakex/clx-metrics.json | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
index 7fde0d2943cd..d25eebce34c9 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
@@ -328,31 +328,31 @@
},
{
"BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches",
- "MetricExpr": "1000000000 * ( cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x35\\\\\\,umask\\=0x21@ ) / ( cha_0@event\\=0x0@ / duration_time )",
+ "MetricExpr": "1000000000 * ( cha@event\\=0x36\\,umask\\=0x21@ / cha@event\\=0x35\\,umask\\=0x21@ ) / ( cha_0@event\\=0x0@ / duration_time )",
"MetricGroup": "Memory_Lat",
"MetricName": "DRAM_Read_Latency"
},
{
"BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches",
- "MetricExpr": "cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1@",
+ "MetricExpr": "cha@event\\=0x36\\,umask\\=0x21@ / cha@event\\=0x36\\,umask\\=0x21\\,thresh\\=1@",
"MetricGroup": "Memory_BW",
"MetricName": "DRAM_Parallel_Reads"
},
{
"BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches",
- "MetricExpr": "( 1000000000 * ( imc@event\\=0xe0\\\\\\,umask\\=0x1@ / imc@event\\=0xe3@ ) / imc_0@event\\=0x0@ ) if 1 if 0 == 1 else 0 else 0",
+ "MetricExpr": "( 1000000000 * ( imc@event\\=0xe0\\,umask\\=0x1@ / imc@event\\=0xe3@ ) / imc_0@event\\=0x0@ )",
"MetricGroup": "Memory_Lat",
"MetricName": "MEM_PMM_Read_Latency"
},
{
"BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]",
- "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time ) if 1 if 0 == 1 else 0 else 0",
+ "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time )",
"MetricGroup": "Memory_BW",
"MetricName": "PMM_Read_BW"
},
{
"BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]",
- "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time ) if 1 if 0 == 1 else 0 else 0",
+ "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time )",
"MetricGroup": "Memory_BW",
"MetricName": "PMM_Write_BW"
},
--
2.26.2.303.gf8c07b1a785-goog

2020-04-22 14:42:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
> Remove over escaping with \\.
> Remove extraneous if 1 if 0 == 1 else 0 else 0.

So where do these parse errors happen exactly? Some earlier
patches introduced them as regressions?

The original metrics worked without parse errors as far as I know.

If it fixes something earlier it would need Fixes: tags.

-Andi

2020-04-22 15:35:53

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
>
> On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
> > Remove over escaping with \\.
> > Remove extraneous if 1 if 0 == 1 else 0 else 0.
>
> So where do these parse errors happen exactly? Some earlier
> patches introduced them as regressions?

I'll work to track down a Fixes tag. I can repro the Skylakex errors
without the test in this series, by doing:

$ perf stat -M DRAM_Read_Latency sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cha/event=0x36\,uma
sk=0x21/).
/bin/dmesg | grep -i perf may provide additional information.

This was just the escaping issue. I'm less clear on the other cascade
lake issue, and it is a bit more work for me to test on cascade lake.
What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
for the Fixes will let me know, but it looks like a copy-paste error.

> The original metrics worked without parse errors as far as I know.

The skylake issue above repros on 5.2.17 and so it seems like it is
broken for a while. The test in this series will prevent this in the
future, but without this patch that test fails.

> If it fixes something earlier it would need Fixes: tags.

Working on it. Thanks for the input!

Ian

> -Andi

2020-04-22 16:20:47

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <[email protected]> wrote:
>
> On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
> >
> > On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
> > > Remove over escaping with \\.
> > > Remove extraneous if 1 if 0 == 1 else 0 else 0.
> >
> > So where do these parse errors happen exactly? Some earlier
> > patches introduced them as regressions?
>
> I'll work to track down a Fixes tag. I can repro the Skylakex errors
> without the test in this series, by doing:
>
> $ perf stat -M DRAM_Read_Latency sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (cha/event=0x36\,uma
> sk=0x21/).
> /bin/dmesg | grep -i perf may provide additional information.
>
> This was just the escaping issue. I'm less clear on the other cascade
> lake issue, and it is a bit more work for me to test on cascade lake.
> What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
> for the Fixes will let me know, but it looks like a copy-paste error.
>
> > The original metrics worked without parse errors as far as I know.
>
> The skylake issue above repros on 5.2.17 and so it seems like it is
> broken for a while. The test in this series will prevent this in the
> future, but without this patch that test fails.

The parse errors were introduced with the metrics, so they've never worked:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba

I will send out a v2 with Fixes in the commit message but wanted to
wait in case there was any more feedback. In particular the fixes to
the new test and expr parser lex code. The lex code wasn't broken at
the time the metrics were added and should be working again after this
patch set.

Thanks,
Ian

> > If it fixes something earlier it would need Fixes: tags.
>
> Working on it. Thanks for the input!
>
> Ian
>
> > -Andi

2020-04-23 01:13:42

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics



On 4/23/2020 12:18 AM, Ian Rogers wrote:
> On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <[email protected]> wrote:
>>
>> On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
>>>
>>> On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
>>>> Remove over escaping with \\.
>>>> Remove extraneous if 1 if 0 == 1 else 0 else 0.
>>>
>>> So where do these parse errors happen exactly? Some earlier
>>> patches introduced them as regressions?
>>
>> I'll work to track down a Fixes tag. I can repro the Skylakex errors
>> without the test in this series, by doing:
>>
>> $ perf stat -M DRAM_Read_Latency sleep 1
>> Error:
>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>> for event (cha/event=0x36\,uma
>> sk=0x21/).
>> /bin/dmesg | grep -i perf may provide additional information.
>>

I also think some patches introduced this regression. When we rollback
to commit 61ec07f5917e (perf vendor events intel: Update all the Intel
JSON metrics from TMAM 3.6.), there is no this error on CLX.

Thanks
Jin Yao

>> This was just the escaping issue. I'm less clear on the other cascade
>> lake issue, and it is a bit more work for me to test on cascade lake.
>> What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
>> for the Fixes will let me know, but it looks like a copy-paste error.
>>
>>> The original metrics worked without parse errors as far as I know.
>>
>> The skylake issue above repros on 5.2.17 and so it seems like it is
>> broken for a while. The test in this series will prevent this in the
>> future, but without this patch that test fails.
>
> The parse errors were introduced with the metrics, so they've never worked:
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba
>
> I will send out a v2 with Fixes in the commit message but wanted to
> wait in case there was any more feedback. In particular the fixes to
> the new test and expr parser lex code. The lex code wasn't broken at
> the time the metrics were added and should be working again after this
> patch set.
>
> Thanks,
> Ian
>
>>> If it fixes something earlier it would need Fixes: tags.
>>
>> Working on it. Thanks for the input!
>>
>> Ian
>>
>>> -Andi

2020-04-23 05:56:06

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

Hi Jiri,

Bisected to this commit which introduced the regression.

26226a97724d ("perf expr: Move expr lexer to flex")

Would you like to look at that?

Thanks
Jin Yao

On 4/23/2020 9:08 AM, Jin, Yao wrote:
>
>
> On 4/23/2020 12:18 AM, Ian Rogers wrote:
>> On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <[email protected]> wrote:
>>>
>>> On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
>>>>
>>>> On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
>>>>> Remove over escaping with \\.
>>>>> Remove extraneous if 1 if 0 == 1 else 0 else 0.
>>>>
>>>> So where do these parse errors happen exactly? Some earlier
>>>> patches introduced them as regressions?
>>>
>>> I'll work to track down a Fixes tag. I can repro the Skylakex errors
>>> without the test in this series, by doing:
>>>
>>> $ perf stat -M DRAM_Read_Latency sleep 1
>>> Error:
>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>>> for event (cha/event=0x36\,uma
>>> sk=0x21/).
>>> /bin/dmesg | grep -i perf may provide additional information.
>>>
>
> I also think some patches introduced this regression. When we rollback
> to commit 61ec07f5917e (perf vendor events intel: Update all the Intel
> JSON metrics from TMAM 3.6.), there is no this error on CLX.
>
> Thanks
> Jin Yao
>
>>> This was just the escaping issue. I'm less clear on the other cascade
>>> lake issue, and it is a bit more work for me to test on cascade lake.
>>> What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
>>> for the Fixes will let me know, but it looks like a copy-paste error.
>>>
>>>> The original metrics worked without parse errors as far as I know.
>>>
>>> The skylake issue above repros on 5.2.17 and so it seems like it is
>>> broken for a while. The test in this series will prevent this in the
>>> future, but without this patch that test fails.
>>
>> The parse errors were introduced with the metrics, so they've never
>> worked:
>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba
>>
>>
>> I will send out a v2 with Fixes in the commit message but wanted to
>> wait in case there was any more feedback. In particular the fixes to
>> the new test and expr parser lex code. The lex code wasn't broken at
>> the time the metrics were added and should be working again after this
>> patch set.
>>
>> Thanks,
>> Ian
>>
>>>> If it fixes something earlier it would need Fixes: tags.
>>>
>>> Working on it. Thanks for the input!
>>>
>>> Ian
>>>
>>>> -Andi

2020-04-23 06:11:14

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <[email protected]> wrote:
>
> Hi Jiri,
>
> Bisected to this commit which introduced the regression.
>
> 26226a97724d ("perf expr: Move expr lexer to flex")
>
> Would you like to look at that?

Hi Jin,

that commit breaks parsing of things like ','. See fixes in this patch
set such as:
https://lore.kernel.org/lkml/[email protected]/
Fixing the lex issues then exposes other bugs that need to be
corrected in the json. I've added Fixes to the commit message of:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
and would be glad of a review. If we can land:
https://lore.kernel.org/lkml/[email protected]/
then expr as the source of parse errors can go away :-) The next
problem is the parse events code, but some of that logic is dependent
on the machine it is running on. It'd be good to add a test that
parsed events code can handle the events in metrics too, filtering out
things like duration_time that are special to metrics.

Thanks,
Ian

> Thanks
> Jin Yao
>
> On 4/23/2020 9:08 AM, Jin, Yao wrote:
> >
> >
> > On 4/23/2020 12:18 AM, Ian Rogers wrote:
> >> On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <[email protected]> wrote:
> >>>
> >>> On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
> >>>>
> >>>> On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
> >>>>> Remove over escaping with \\.
> >>>>> Remove extraneous if 1 if 0 == 1 else 0 else 0.
> >>>>
> >>>> So where do these parse errors happen exactly? Some earlier
> >>>> patches introduced them as regressions?
> >>>
> >>> I'll work to track down a Fixes tag. I can repro the Skylakex errors
> >>> without the test in this series, by doing:
> >>>
> >>> $ perf stat -M DRAM_Read_Latency sleep 1
> >>> Error:
> >>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> >>> for event (cha/event=0x36\,uma
> >>> sk=0x21/).
> >>> /bin/dmesg | grep -i perf may provide additional information.
> >>>
> >
> > I also think some patches introduced this regression. When we rollback
> > to commit 61ec07f5917e (perf vendor events intel: Update all the Intel
> > JSON metrics from TMAM 3.6.), there is no this error on CLX.
> >
> > Thanks
> > Jin Yao
> >
> >>> This was just the escaping issue. I'm less clear on the other cascade
> >>> lake issue, and it is a bit more work for me to test on cascade lake.
> >>> What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
> >>> for the Fixes will let me know, but it looks like a copy-paste error.
> >>>
> >>>> The original metrics worked without parse errors as far as I know.
> >>>
> >>> The skylake issue above repros on 5.2.17 and so it seems like it is
> >>> broken for a while. The test in this series will prevent this in the
> >>> future, but without this patch that test fails.
> >>
> >> The parse errors were introduced with the metrics, so they've never
> >> worked:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba
> >>
> >>
> >> I will send out a v2 with Fixes in the commit message but wanted to
> >> wait in case there was any more feedback. In particular the fixes to
> >> the new test and expr parser lex code. The lex code wasn't broken at
> >> the time the metrics were added and should be working again after this
> >> patch set.
> >>
> >> Thanks,
> >> Ian
> >>
> >>>> If it fixes something earlier it would need Fixes: tags.
> >>>
> >>> Working on it. Thanks for the input!
> >>>
> >>> Ian
> >>>
> >>>> -Andi

2020-04-23 07:54:16

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

Hi Ian,

On 4/23/2020 2:09 PM, Ian Rogers wrote:
> On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <[email protected]> wrote:
>>
>> Hi Jiri,
>>
>> Bisected to this commit which introduced the regression.
>>
>> 26226a97724d ("perf expr: Move expr lexer to flex")
>>
>> Would you like to look at that?
>
> Hi Jin,
>
> that commit breaks parsing of things like ','. See fixes in this patch
> set such as:
> https://lore.kernel.org/lkml/[email protected]/
> Fixing the lex issues then exposes other bugs that need to be
> corrected in the json. I've added Fixes to the commit message of:
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
> and would be glad of a review. If we can land:
> https://lore.kernel.org/lkml/[email protected]/
> then expr as the source of parse errors can go away :-) The next
> problem is the parse events code, but some of that logic is dependent
> on the machine it is running on. It'd be good to add a test that
> parsed events code can handle the events in metrics too, filtering out
> things like duration_time that are special to metrics.
>
> Thanks,
> Ian
>

Only with the fix
"https://lore.kernel.org/lkml/[email protected]/"
(without other json modifications), the issue was still there.

localhost:~ # perf stat -M DRAM_Read_Latency
event syntax error:
'../event=0x36,,umask=0x21/,cha/event=0x35,cha_0/event=0x0/}:W,duration_time'
\___ parser error

Usage: perf stat [<options>] [<command>]

-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups
(separated by ,)

So you added other commits which changed the json to let the parse work.
But I don't know if we have to do with this way because it should be a
regression issue.

In my opinion, we'd better fix the issue in 26226a97724d ("perf expr:
Move expr lexer to flex") and try not to change the json if possible.

Thanks
Jin Yao

>> Thanks
>> Jin Yao
>>
>> On 4/23/2020 9:08 AM, Jin, Yao wrote:
>>>
>>>
>>> On 4/23/2020 12:18 AM, Ian Rogers wrote:
>>>> On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <[email protected]> wrote:
>>>>>
>>>>> On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
>>>>>>> Remove over escaping with \\.
>>>>>>> Remove extraneous if 1 if 0 == 1 else 0 else 0.
>>>>>>
>>>>>> So where do these parse errors happen exactly? Some earlier
>>>>>> patches introduced them as regressions?
>>>>>
>>>>> I'll work to track down a Fixes tag. I can repro the Skylakex errors
>>>>> without the test in this series, by doing:
>>>>>
>>>>> $ perf stat -M DRAM_Read_Latency sleep 1
>>>>> Error:
>>>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>>>>> for event (cha/event=0x36\,uma
>>>>> sk=0x21/).
>>>>> /bin/dmesg | grep -i perf may provide additional information.
>>>>>
>>>
>>> I also think some patches introduced this regression. When we rollback
>>> to commit 61ec07f5917e (perf vendor events intel: Update all the Intel
>>> JSON metrics from TMAM 3.6.), there is no this error on CLX.
>>>
>>> Thanks
>>> Jin Yao
>>>
>>>>> This was just the escaping issue. I'm less clear on the other cascade
>>>>> lake issue, and it is a bit more work for me to test on cascade lake.
>>>>> What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
>>>>> for the Fixes will let me know, but it looks like a copy-paste error.
>>>>>
>>>>>> The original metrics worked without parse errors as far as I know.
>>>>>
>>>>> The skylake issue above repros on 5.2.17 and so it seems like it is
>>>>> broken for a while. The test in this series will prevent this in the
>>>>> future, but without this patch that test fails.
>>>>
>>>> The parse errors were introduced with the metrics, so they've never
>>>> worked:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba
>>>>
>>>>
>>>> I will send out a v2 with Fixes in the commit message but wanted to
>>>> wait in case there was any more feedback. In particular the fixes to
>>>> the new test and expr parser lex code. The lex code wasn't broken at
>>>> the time the metrics were added and should be working again after this
>>>> patch set.
>>>>
>>>> Thanks,
>>>> Ian
>>>>
>>>>>> If it fixes something earlier it would need Fixes: tags.
>>>>>
>>>>> Working on it. Thanks for the input!
>>>>>
>>>>> Ian
>>>>>
>>>>>> -Andi

2020-04-23 10:12:56

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Thu, Apr 23, 2020 at 03:51:18PM +0800, Jin, Yao wrote:
> Hi Ian,
>
> On 4/23/2020 2:09 PM, Ian Rogers wrote:
> > On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <[email protected]> wrote:
> > >
> > > Hi Jiri,
> > >
> > > Bisected to this commit which introduced the regression.
> > >
> > > 26226a97724d ("perf expr: Move expr lexer to flex")
> > >
> > > Would you like to look at that?
> >
> > Hi Jin,
> >
> > that commit breaks parsing of things like ','. See fixes in this patch
> > set such as:
> > https://lore.kernel.org/lkml/[email protected]/
> > Fixing the lex issues then exposes other bugs that need to be
> > corrected in the json. I've added Fixes to the commit message of:
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> > and would be glad of a review. If we can land:
> > https://lore.kernel.org/lkml/[email protected]/
> > then expr as the source of parse errors can go away :-) The next
> > problem is the parse events code, but some of that logic is dependent
> > on the machine it is running on. It'd be good to add a test that
> > parsed events code can handle the events in metrics too, filtering out
> > things like duration_time that are special to metrics.
> >
> > Thanks,
> > Ian
> >
>
> Only with the fix
> "https://lore.kernel.org/lkml/[email protected]/"
> (without other json modifications), the issue was still there.
>
> localhost:~ # perf stat -M DRAM_Read_Latency
> event syntax error:
> '../event=0x36,,umask=0x21/,cha/event=0x35,cha_0/event=0x0/}:W,duration_time'
> \___ parser error
>
> Usage: perf stat [<options>] [<command>]
>
> -M, --metrics <metric/metric group list>
> monitor specified metrics or metric groups
> (separated by ,)

hum, I don't have that metric, is there another example of broken metric?

[jolsa@krava perf]$ sudo ./perf stat -M DRAM_Read_Latency
Cannot find metric or group `DRAM_Read_Latency'

>
> So you added other commits which changed the json to let the parse work. But
> I don't know if we have to do with this way because it should be a
> regression issue.
>
> In my opinion, we'd better fix the issue in 26226a97724d ("perf expr: Move
> expr lexer to flex") and try not to change the json if possible.

yea, that change definitely had a potential of breaking things ;-)
but it should be easy to fix them

I'll go through the v3 of the patchset

thanks,
jirka

2020-04-23 10:14:40

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Thu, Apr 23, 2020 at 12:10:36PM +0200, Jiri Olsa wrote:
> On Thu, Apr 23, 2020 at 03:51:18PM +0800, Jin, Yao wrote:
> > Hi Ian,
> >
> > On 4/23/2020 2:09 PM, Ian Rogers wrote:
> > > On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <[email protected]> wrote:
> > > >
> > > > Hi Jiri,
> > > >
> > > > Bisected to this commit which introduced the regression.
> > > >
> > > > 26226a97724d ("perf expr: Move expr lexer to flex")
> > > >
> > > > Would you like to look at that?
> > >
> > > Hi Jin,
> > >
> > > that commit breaks parsing of things like ','. See fixes in this patch
> > > set such as:
> > > https://lore.kernel.org/lkml/[email protected]/
> > > Fixing the lex issues then exposes other bugs that need to be
> > > corrected in the json. I've added Fixes to the commit message of:
> > > https://lore.kernel.org/lkml/[email protected]/
> > > https://lore.kernel.org/lkml/[email protected]/
> > > and would be glad of a review. If we can land:
> > > https://lore.kernel.org/lkml/[email protected]/
> > > then expr as the source of parse errors can go away :-) The next
> > > problem is the parse events code, but some of that logic is dependent
> > > on the machine it is running on. It'd be good to add a test that
> > > parsed events code can handle the events in metrics too, filtering out
> > > things like duration_time that are special to metrics.
> > >
> > > Thanks,
> > > Ian
> > >
> >
> > Only with the fix
> > "https://lore.kernel.org/lkml/[email protected]/"
> > (without other json modifications), the issue was still there.
> >
> > localhost:~ # perf stat -M DRAM_Read_Latency
> > event syntax error:
> > '../event=0x36,,umask=0x21/,cha/event=0x35,cha_0/event=0x0/}:W,duration_time'
> > \___ parser error
> >
> > Usage: perf stat [<options>] [<command>]
> >
> > -M, --metrics <metric/metric group list>
> > monitor specified metrics or metric groups
> > (separated by ,)
>
> hum, I don't have that metric, is there another example of broken metric?
>
> [jolsa@krava perf]$ sudo ./perf stat -M DRAM_Read_Latency
> Cannot find metric or group `DRAM_Read_Latency'
>
> >
> > So you added other commits which changed the json to let the parse work. But
> > I don't know if we have to do with this way because it should be a
> > regression issue.
> >
> > In my opinion, we'd better fix the issue in 26226a97724d ("perf expr: Move
> > expr lexer to flex") and try not to change the json if possible.
>
> yea, that change definitely had a potential of breaking things ;-)
> but it should be easy to fix them
>
> I'll go through the v3 of the patchset

ok, there's v2 now ;-)

jirka

2020-04-23 19:15:13

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

On Thu, Apr 23, 2020 at 3:11 AM Jiri Olsa <[email protected]> wrote:
>
> On Thu, Apr 23, 2020 at 12:10:36PM +0200, Jiri Olsa wrote:
> > On Thu, Apr 23, 2020 at 03:51:18PM +0800, Jin, Yao wrote:
> > > Hi Ian,
> > >
> > > On 4/23/2020 2:09 PM, Ian Rogers wrote:
> > > > On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <[email protected]> wrote:
> > > > >
> > > > > Hi Jiri,
> > > > >
> > > > > Bisected to this commit which introduced the regression.
> > > > >
> > > > > 26226a97724d ("perf expr: Move expr lexer to flex")
> > > > >
> > > > > Would you like to look at that?
> > > >
> > > > Hi Jin,
> > > >
> > > > that commit breaks parsing of things like ','. See fixes in this patch
> > > > set such as:
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > Fixing the lex issues then exposes other bugs that need to be
> > > > corrected in the json. I've added Fixes to the commit message of:
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > and would be glad of a review. If we can land:
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > then expr as the source of parse errors can go away :-) The next
> > > > problem is the parse events code, but some of that logic is dependent
> > > > on the machine it is running on. It'd be good to add a test that
> > > > parsed events code can handle the events in metrics too, filtering out
> > > > things like duration_time that are special to metrics.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > >
> > > Only with the fix
> > > "https://lore.kernel.org/lkml/[email protected]/"
> > > (without other json modifications), the issue was still there.
> > >
> > > localhost:~ # perf stat -M DRAM_Read_Latency
> > > event syntax error:
> > > '../event=0x36,,umask=0x21/,cha/event=0x35,cha_0/event=0x0/}:W,duration_time'
> > > \___ parser error
> > >
> > > Usage: perf stat [<options>] [<command>]
> > >
> > > -M, --metrics <metric/metric group list>
> > > monitor specified metrics or metric groups
> > > (separated by ,)
> >
> > hum, I don't have that metric, is there another example of broken metric?
> >
> > [jolsa@krava perf]$ sudo ./perf stat -M DRAM_Read_Latency
> > Cannot find metric or group `DRAM_Read_Latency'
> >
> > >
> > > So you added other commits which changed the json to let the parse work. But
> > > I don't know if we have to do with this way because it should be a
> > > regression issue.
> > >
> > > In my opinion, we'd better fix the issue in 26226a97724d ("perf expr: Move
> > > expr lexer to flex") and try not to change the json if possible.
> >
> > yea, that change definitely had a potential of breaking things ;-)
> > but it should be easy to fix them
> >
> > I'll go through the v3 of the patchset
>
> ok, there's v2 now ;-)

Fwiw, you can use the test CL without the others and then it should
reproduce the problems. I placed it at the end of the series as
otherwise the test is broken until all the fixes land.

Thanks,
Ian

> jirka
>