2021-01-18 05:14:58

by Jin Yao

[permalink] [raw]
Subject: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
--per-die aggregation yet.

One issue is found in check_per_pkg() for uncore events running on
AP system. On cascade Lake-AP, we have:

S0-D0
S0-D1
S1-D0
S1-D1

But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the
mask bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't
check die_id. So the counting for S0-D1 and S1-D1 are set to zero.
That's not correct.

root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
1.001460963 S0-D0 1 1317376 Bytes llc_misses.mem_read
1.001460963 S0-D1 1 998016 Bytes llc_misses.mem_read
1.001460963 S1-D0 1 970496 Bytes llc_misses.mem_read
1.001460963 S1-D1 1 1291264 Bytes llc_misses.mem_read
2.003488021 S0-D0 1 1082048 Bytes llc_misses.mem_read
2.003488021 S0-D1 1 1919040 Bytes llc_misses.mem_read
2.003488021 S1-D0 1 890752 Bytes llc_misses.mem_read
2.003488021 S1-D1 1 2380800 Bytes llc_misses.mem_read
3.005613270 S0-D0 1 1126080 Bytes llc_misses.mem_read
3.005613270 S0-D1 1 2898176 Bytes llc_misses.mem_read
3.005613270 S1-D0 1 870912 Bytes llc_misses.mem_read
3.005613270 S1-D1 1 3388608 Bytes llc_misses.mem_read
4.007627598 S0-D0 1 1124608 Bytes llc_misses.mem_read
4.007627598 S0-D1 1 3884416 Bytes llc_misses.mem_read
4.007627598 S1-D0 1 921088 Bytes llc_misses.mem_read
4.007627598 S1-D1 1 4451840 Bytes llc_misses.mem_read
5.001479927 S0-D0 1 963328 Bytes llc_misses.mem_read
5.001479927 S0-D1 1 4831936 Bytes llc_misses.mem_read
5.001479927 S1-D0 1 895104 Bytes llc_misses.mem_read
5.001479927 S1-D1 1 5496640 Bytes llc_misses.mem_read

From above output, we can see S0-D1 and S1-D1 don't report the interval
values, they are continued to grow. That's because check_per_pkg() wrongly
decides to use zero counts for S0-D1 and S1-D1.

So in check_per_pkg(), we should use hashmap(socket,die) to decide if
the cpu counts needs to skip. Only considering socket is not enough.

Now with this patch,

root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
1.001586691 S0-D0 1 1229440 Bytes llc_misses.mem_read
1.001586691 S0-D1 1 976832 Bytes llc_misses.mem_read
1.001586691 S1-D0 1 938304 Bytes llc_misses.mem_read
1.001586691 S1-D1 1 1227328 Bytes llc_misses.mem_read
2.003776312 S0-D0 1 1586752 Bytes llc_misses.mem_read
2.003776312 S0-D1 1 875392 Bytes llc_misses.mem_read
2.003776312 S1-D0 1 855616 Bytes llc_misses.mem_read
2.003776312 S1-D1 1 949376 Bytes llc_misses.mem_read
3.006512788 S0-D0 1 1338880 Bytes llc_misses.mem_read
3.006512788 S0-D1 1 920064 Bytes llc_misses.mem_read
3.006512788 S1-D0 1 877184 Bytes llc_misses.mem_read
3.006512788 S1-D1 1 1020736 Bytes llc_misses.mem_read
4.008895291 S0-D0 1 926592 Bytes llc_misses.mem_read
4.008895291 S0-D1 1 906368 Bytes llc_misses.mem_read
4.008895291 S1-D0 1 892224 Bytes llc_misses.mem_read
4.008895291 S1-D1 1 987712 Bytes llc_misses.mem_read
5.001590993 S0-D0 1 962624 Bytes llc_misses.mem_read
5.001590993 S0-D1 1 912512 Bytes llc_misses.mem_read
5.001590993 S1-D0 1 891200 Bytes llc_misses.mem_read
5.001590993 S1-D1 1 978432 Bytes llc_misses.mem_read

On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
is not changed.

Reported-by: Huang Ying <[email protected]>
Signed-off-by: Jin Yao <[email protected]>
---
v7:
It reported build error on 32-bit system (such as cross build by mipsel-linux-gnu-gcc).

In v7,
1. Use size_t to replace uint64_t.
2. The hash key is changed from 'die_id << 32 | socket_id' to 'die_id << 16 | socket_id',
16 bits is enough for socket id , right?

v6:
Fix the perf test python failure by adding hashmap.c to python-ext-sources.

root@kbl-ppc:~# ./perf test python
19: 'import perf' in python : Ok

v5:
Hash key is changed to die_id << 32 | socket.
In pkg_id_hash, return (int64_t)key & 0xffffffff; actually it's socket.

v4:
v3 used unnecessary bool allocatioin. v4 just uses the hash value '(void *)1'.

v4 is compiled ok with tmp.perf/core.

v3:
Since for some cpumap functions, the return type is changed from 'int' to
'struct aggr_cpu_id', the patch needs to be updated as well.

before:
d = cpu_map__get_die()

after:
d = cpu_map__get_die().die

v3 is compiled ok with tmp.perf/core.

v2:
Use hashmap to check the used socket+die pair.

tools/perf/util/evsel.c | 4 +++-
tools/perf/util/evsel.h | 3 ++-
tools/perf/util/python-ext-sources | 1 +
tools/perf/util/stat.c | 38 +++++++++++++++++++++++++-----
4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c26ea82220bd..9715ed9b03f6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -45,6 +45,7 @@
#include "string2.h"
#include "memswap.h"
#include "util.h"
+#include "hashmap.h"
#include "../perf-sys.h"
#include "util/parse-branch-options.h"
#include <internal/xyarray.h>
@@ -1377,7 +1378,8 @@ void evsel__exit(struct evsel *evsel)
zfree(&evsel->group_name);
zfree(&evsel->name);
zfree(&evsel->pmu_name);
- zfree(&evsel->per_pkg_mask);
+ hashmap__free(evsel->per_pkg_mask);
+ evsel->per_pkg_mask = NULL;
zfree(&evsel->metric_events);
perf_evsel__object.fini(evsel);
}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cd1d8dd43199..951628943fd0 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -17,6 +17,7 @@ struct cgroup;
struct perf_counts;
struct perf_stat_evsel;
union perf_event;
+struct hashmap;

typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);

@@ -110,7 +111,7 @@ struct evsel {
bool merged_stat;
bool reset_group;
bool errored;
- unsigned long *per_pkg_mask;
+ struct hashmap *per_pkg_mask;
struct evsel *leader;
struct list_head config_terms;
int err;
diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index a9d9c142eb7c..266874913dbb 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -35,3 +35,4 @@ util/symbol_fprintf.c
util/units.c
util/affinity.c
util/rwsem.c
+util/hashmap.c
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ce1479c98f0..5aba8fa92386 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -13,6 +13,7 @@
#include "evlist.h"
#include "evsel.h"
#include "thread_map.h"
+#include "hashmap.h"
#include <linux/zalloc.h>

void update_stats(struct stats *stats, u64 val)
@@ -276,15 +277,27 @@ void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
- memset(counter->per_pkg_mask, 0, cpu__max_cpu());
+ hashmap__clear(counter->per_pkg_mask);
+}
+
+static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
+{
+ return (size_t)key & 0xffff;
+}
+
+static bool pkg_id_equal(const void *key1, const void *key2,
+ void *ctx __maybe_unused)
+{
+ return (size_t)key1 == (size_t)key2;
}

static int check_per_pkg(struct evsel *counter,
struct perf_counts_values *vals, int cpu, bool *skip)
{
- unsigned long *mask = counter->per_pkg_mask;
+ struct hashmap *mask = counter->per_pkg_mask;
struct perf_cpu_map *cpus = evsel__cpus(counter);
- int s;
+ int s, d, ret = 0;
+ size_t key;

*skip = false;

@@ -295,7 +308,7 @@ static int check_per_pkg(struct evsel *counter,
return 0;

if (!mask) {
- mask = zalloc(cpu__max_cpu());
+ mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
if (!mask)
return -ENOMEM;

@@ -317,8 +330,21 @@ static int check_per_pkg(struct evsel *counter,
if (s < 0)
return -1;

- *skip = test_and_set_bit(s, mask) == 1;
- return 0;
+ /*
+ * On multi-die system, die_id > 0. On no-die system, die_id = 0.
+ * We use hashmap(socket, die) to check the used socket+die pair.
+ */
+ d = cpu_map__get_die(cpus, cpu, NULL).die;
+ if (d < 0)
+ return -1;
+
+ key = (size_t)d << 16 | s;
+ if (hashmap__find(mask, (void *)key, NULL))
+ *skip = true;
+ else
+ ret = hashmap__add(mask, (void *)key, (void *)1);
+
+ return ret;
}

static int
--
2.17.1


2021-01-21 01:48:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

On Mon, Jan 18, 2021 at 12:05:21PM +0800, Jin Yao wrote:

SNIP

> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 8ce1479c98f0..5aba8fa92386 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -13,6 +13,7 @@
> #include "evlist.h"
> #include "evsel.h"
> #include "thread_map.h"
> +#include "hashmap.h"
> #include <linux/zalloc.h>
>
> void update_stats(struct stats *stats, u64 val)
> @@ -276,15 +277,27 @@ void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> static void zero_per_pkg(struct evsel *counter)
> {
> if (counter->per_pkg_mask)
> - memset(counter->per_pkg_mask, 0, cpu__max_cpu());
> + hashmap__clear(counter->per_pkg_mask);
> +}
> +
> +static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
> +{
> + return (size_t)key & 0xffff;
> +}
> +
> +static bool pkg_id_equal(const void *key1, const void *key2,
> + void *ctx __maybe_unused)
> +{
> + return (size_t)key1 == (size_t)key2;
> }
>
> static int check_per_pkg(struct evsel *counter,
> struct perf_counts_values *vals, int cpu, bool *skip)
> {
> - unsigned long *mask = counter->per_pkg_mask;
> + struct hashmap *mask = counter->per_pkg_mask;
> struct perf_cpu_map *cpus = evsel__cpus(counter);
> - int s;
> + int s, d, ret = 0;
> + size_t key;

please use uint32_t to make it obvious

>
> *skip = false;
>
> @@ -295,7 +308,7 @@ static int check_per_pkg(struct evsel *counter,
> return 0;
>
> if (!mask) {
> - mask = zalloc(cpu__max_cpu());
> + mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
> if (!mask)
> return -ENOMEM;
>
> @@ -317,8 +330,21 @@ static int check_per_pkg(struct evsel *counter,
> if (s < 0)
> return -1;
>
> - *skip = test_and_set_bit(s, mask) == 1;
> - return 0;
> + /*
> + * On multi-die system, die_id > 0. On no-die system, die_id = 0.
> + * We use hashmap(socket, die) to check the used socket+die pair.
> + */
> + d = cpu_map__get_die(cpus, cpu, NULL).die;
> + if (d < 0)
> + return -1;
> +
> + key = (size_t)d << 16 | s;

I'm not sure about the socket number bounds, but I guess we should at
least check that it's not over the limit

how hard would it be to allocate key values and keep the uint64_t?

thanks,
jirka

2021-01-21 04:26:21

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

Hi Jiri,

On 1/21/2021 6:07 AM, Jiri Olsa wrote:
> On Mon, Jan 18, 2021 at 12:05:21PM +0800, Jin Yao wrote:
>
> SNIP
>
>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>> index 8ce1479c98f0..5aba8fa92386 100644
>> --- a/tools/perf/util/stat.c
>> +++ b/tools/perf/util/stat.c
>> @@ -13,6 +13,7 @@
>> #include "evlist.h"
>> #include "evsel.h"
>> #include "thread_map.h"
>> +#include "hashmap.h"
>> #include <linux/zalloc.h>
>>
>> void update_stats(struct stats *stats, u64 val)
>> @@ -276,15 +277,27 @@ void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
>> static void zero_per_pkg(struct evsel *counter)
>> {
>> if (counter->per_pkg_mask)
>> - memset(counter->per_pkg_mask, 0, cpu__max_cpu());
>> + hashmap__clear(counter->per_pkg_mask);
>> +}
>> +
>> +static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
>> +{
>> + return (size_t)key & 0xffff;
>> +}
>> +
>> +static bool pkg_id_equal(const void *key1, const void *key2,
>> + void *ctx __maybe_unused)
>> +{
>> + return (size_t)key1 == (size_t)key2;
>> }
>>
>> static int check_per_pkg(struct evsel *counter,
>> struct perf_counts_values *vals, int cpu, bool *skip)
>> {
>> - unsigned long *mask = counter->per_pkg_mask;
>> + struct hashmap *mask = counter->per_pkg_mask;
>> struct perf_cpu_map *cpus = evsel__cpus(counter);
>> - int s;
>> + int s, d, ret = 0;
>> + size_t key;
>
> please use uint32_t to make it obvious
>

Maybe we have to use size_t for better support for 32 bits and 64 bits platforms.

typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);

It defines size_t for hashmap_hash_fn.

I will explain more at below.

>>
>> *skip = false;
>>
>> @@ -295,7 +308,7 @@ static int check_per_pkg(struct evsel *counter,
>> return 0;
>>
>> if (!mask) {
>> - mask = zalloc(cpu__max_cpu());
>> + mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
>> if (!mask)
>> return -ENOMEM;
>>
>> @@ -317,8 +330,21 @@ static int check_per_pkg(struct evsel *counter,
>> if (s < 0)
>> return -1;
>>
>> - *skip = test_and_set_bit(s, mask) == 1;
>> - return 0;
>> + /*
>> + * On multi-die system, die_id > 0. On no-die system, die_id = 0.
>> + * We use hashmap(socket, die) to check the used socket+die pair.
>> + */
>> + d = cpu_map__get_die(cpus, cpu, NULL).die;
>> + if (d < 0)
>> + return -1;
>> +
>> + key = (size_t)d << 16 | s;
>
> I'm not sure about the socket number bounds, but I guess we should at
> least check that it's not over the limit
>
> how hard would it be to allocate key values and keep the uint64_t?
>

I'm considering to use conditional compilation for supporting 32 bits platform and 64 bits platform.

#if BITS_PER_LONG == 64
#define KEY_SHIFT 32
#else
#define KEY_SHIFT 16
#endif

static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
{
return (size_t)key & (~0ul >> KEY_SHIFT);
}

static bool pkg_id_equal(const void *key1, const void *key2,
void *ctx __maybe_unused)
{
return (size_t)key1 == (size_t)key2;
}

mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
d = cpu_map__get_die(cpus, cpu, NULL).die;
key = (size_t)d << KEY_SHIFT | s; /* s is socket id */
if (hashmap__find(mask, (void *)key, NULL))
*skip = true;
else
ret = hashmap__add(mask, (void *)key, (void *)1);

If we use 'unsigned long' to replace 'size_t', it reports the build error for 32 bits:

stat.c:320:23: warning: passing argument 1 of ‘hashmap__new’ from incompatible pointer type
[-Wincompatible-pointer-types]
mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
^~~~~~~~~~~
In file included from stat.c:16:
hashmap.h:75:17: note: expected ‘hashmap_hash_fn’ {aka ‘unsigned int (*)(const void *, void *)’} but
argument is of type ‘long unsigned int (*)(const void *, void *)’

If we use "unsigned int", it's not good for 64 bits. So I still use 'size_t' in this patch.

Any comments for this idea (using conditional compilation)?

Thanks
Jin Yao

> thanks,
> jirka
>

2021-01-23 23:00:34

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

On Thu, Jan 21, 2021 at 12:21:36PM +0800, Jin, Yao wrote:

sNIP

> mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
> d = cpu_map__get_die(cpus, cpu, NULL).die;
> key = (size_t)d << KEY_SHIFT | s; /* s is socket id */
> if (hashmap__find(mask, (void *)key, NULL))
> *skip = true;
> else
> ret = hashmap__add(mask, (void *)key, (void *)1);
>
> If we use 'unsigned long' to replace 'size_t', it reports the build error for 32 bits:
>
> stat.c:320:23: warning: passing argument 1 of ‘hashmap__new’ from
> incompatible pointer type [-Wincompatible-pointer-types]
> mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
> ^~~~~~~~~~~
> In file included from stat.c:16:
> hashmap.h:75:17: note: expected ‘hashmap_hash_fn’ {aka ‘unsigned int
> (*)(const void *, void *)’} but argument is of type ‘long unsigned int
> (*)(const void *, void *)’
>
> If we use "unsigned int", it's not good for 64 bits. So I still use 'size_t' in this patch.
>
> Any comments for this idea (using conditional compilation)?

isn't it simpler to allocate the key then? like below
(just compile tested)

jirka


---
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 5aba8fa92386..195fda142c98 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -276,19 +276,31 @@ void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)

static void zero_per_pkg(struct evsel *counter)
{
- if (counter->per_pkg_mask)
+ struct hashmap_entry *cur;
+ size_t bkt;
+
+ if (counter->per_pkg_mask) {
+ hashmap__for_each_entry(counter->per_pkg_mask, cur, bkt)
+ free((char *)cur->key);
+
hashmap__clear(counter->per_pkg_mask);
+ }
}

-static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
+static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
{
- return (size_t)key & 0xffff;
+ uint64_t *key = (uint64_t*) __key;
+
+ return *key & 0xffffffff;
}

-static bool pkg_id_equal(const void *key1, const void *key2,
+static bool pkg_id_equal(const void *__key1, const void *__key2,
void *ctx __maybe_unused)
{
- return (size_t)key1 == (size_t)key2;
+ uint64_t *key1 = (uint64_t*) __key1;
+ uint64_t *key2 = (uint64_t*) __key2;
+
+ return *key1 == *key2;
}

static int check_per_pkg(struct evsel *counter,
@@ -297,7 +309,7 @@ static int check_per_pkg(struct evsel *counter,
struct hashmap *mask = counter->per_pkg_mask;
struct perf_cpu_map *cpus = evsel__cpus(counter);
int s, d, ret = 0;
- size_t key;
+ uint64_t *key;

*skip = false;

@@ -338,7 +350,11 @@ static int check_per_pkg(struct evsel *counter,
if (d < 0)
return -1;

- key = (size_t)d << 16 | s;
+ key = malloc(sizeof(*key));
+ if (!key)
+ return -ENOMEM;
+
+ *key = (size_t)d << 32 | s;
if (hashmap__find(mask, (void *)key, NULL))
*skip = true;
else

2021-01-25 05:56:15

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

Hi Jiri,

On 1/24/2021 6:57 AM, Jiri Olsa wrote:
> On Thu, Jan 21, 2021 at 12:21:36PM +0800, Jin, Yao wrote:
>
> sNIP
>
>> mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
>> d = cpu_map__get_die(cpus, cpu, NULL).die;
>> key = (size_t)d << KEY_SHIFT | s; /* s is socket id */
>> if (hashmap__find(mask, (void *)key, NULL))
>> *skip = true;
>> else
>> ret = hashmap__add(mask, (void *)key, (void *)1);
>>
>> If we use 'unsigned long' to replace 'size_t', it reports the build error for 32 bits:
>>
>> stat.c:320:23: warning: passing argument 1 of ‘hashmap__new’ from
>> incompatible pointer type [-Wincompatible-pointer-types]
>> mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
>> ^~~~~~~~~~~
>> In file included from stat.c:16:
>> hashmap.h:75:17: note: expected ‘hashmap_hash_fn’ {aka ‘unsigned int
>> (*)(const void *, void *)’} but argument is of type ‘long unsigned int
>> (*)(const void *, void *)’
>>
>> If we use "unsigned int", it's not good for 64 bits. So I still use 'size_t' in this patch.
>>
>> Any comments for this idea (using conditional compilation)?
>
> isn't it simpler to allocate the key then? like below
> (just compile tested)
>
> jirka
>

Hmm, Each method has advantages and disadvantages.

My method uses conditional compilation but it looks a bit complicated. The advantage is it doesn't
need to allocate the memory for key.

If you need me to post v8, I'd love to.

Anyway, either method is fine for me. :)

>
> ---
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 5aba8fa92386..195fda142c98 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -276,19 +276,31 @@ void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
>
> static void zero_per_pkg(struct evsel *counter)
> {
> - if (counter->per_pkg_mask)
> + struct hashmap_entry *cur;
> + size_t bkt;
> +
> + if (counter->per_pkg_mask) {
> + hashmap__for_each_entry(counter->per_pkg_mask, cur, bkt)
> + free((char *)cur->key);
> +
> hashmap__clear(counter->per_pkg_mask);
> + }
> }
>
> -static size_t pkg_id_hash(const void *key, void *ctx __maybe_unused)
> +static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
> {
> - return (size_t)key & 0xffff;
> + uint64_t *key = (uint64_t*) __key;
> +
> + return *key & 0xffffffff;
> }
>
> -static bool pkg_id_equal(const void *key1, const void *key2,
> +static bool pkg_id_equal(const void *__key1, const void *__key2,
> void *ctx __maybe_unused)
> {
> - return (size_t)key1 == (size_t)key2;
> + uint64_t *key1 = (uint64_t*) __key1;
> + uint64_t *key2 = (uint64_t*) __key2;
> +
> + return *key1 == *key2;
> }
>
> static int check_per_pkg(struct evsel *counter,
> @@ -297,7 +309,7 @@ static int check_per_pkg(struct evsel *counter,
> struct hashmap *mask = counter->per_pkg_mask;
> struct perf_cpu_map *cpus = evsel__cpus(counter);
> int s, d, ret = 0;
> - size_t key;
> + uint64_t *key;
>
> *skip = false;
>
> @@ -338,7 +350,11 @@ static int check_per_pkg(struct evsel *counter,
> if (d < 0)
> return -1;
>
> - key = (size_t)d << 16 | s;
> + key = malloc(sizeof(*key));
> + if (!key)
> + return -ENOMEM;
> +
> + *key = (size_t)d << 32 | s;

Should be "*key = (uint64_t)d << 32 | s;"?

Otherwise on 32 bits system, a build warning is:

stat.c: In function ‘check_per_pkg’:
stat.c:357:19: warning: left shift count >= width of type [-Wshift-count-overflow]
*key = (size_t)d << 32 | s;

Thanks
Jin Yao

> if (hashmap__find(mask, (void *)key, NULL))
> *skip = true;
> else
>

2021-01-26 05:19:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

On Mon, Jan 25, 2021 at 01:47:54PM +0800, Jin, Yao wrote:
> Hi Jiri,
>
> On 1/24/2021 6:57 AM, Jiri Olsa wrote:
> > On Thu, Jan 21, 2021 at 12:21:36PM +0800, Jin, Yao wrote:
> >
> > sNIP
> >
> > > mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
> > > d = cpu_map__get_die(cpus, cpu, NULL).die;
> > > key = (size_t)d << KEY_SHIFT | s; /* s is socket id */
> > > if (hashmap__find(mask, (void *)key, NULL))
> > > *skip = true;
> > > else
> > > ret = hashmap__add(mask, (void *)key, (void *)1);
> > >
> > > If we use 'unsigned long' to replace 'size_t', it reports the build error for 32 bits:
> > >
> > > stat.c:320:23: warning: passing argument 1 of ‘hashmap__new’ from
> > > incompatible pointer type [-Wincompatible-pointer-types]
> > > mask = hashmap__new(pkg_id_hash, pkg_id_equal, NULL);
> > > ^~~~~~~~~~~
> > > In file included from stat.c:16:
> > > hashmap.h:75:17: note: expected ‘hashmap_hash_fn’ {aka ‘unsigned int
> > > (*)(const void *, void *)’} but argument is of type ‘long unsigned int
> > > (*)(const void *, void *)’
> > >
> > > If we use "unsigned int", it's not good for 64 bits. So I still use 'size_t' in this patch.
> > >
> > > Any comments for this idea (using conditional compilation)?
> >
> > isn't it simpler to allocate the key then? like below
> > (just compile tested)
> >
> > jirka
> >
>
> Hmm, Each method has advantages and disadvantages.
>
> My method uses conditional compilation but it looks a bit complicated. The
> advantage is it doesn't need to allocate the memory for key.
>
> If you need me to post v8, I'd love to.
>
> Anyway, either method is fine for me. :)

I believe that the less ifdefs te better, if you could squash this
change with your patch and send it, that'd be great

SNIP

> > + return *key & 0xffffffff;
> > }
> > -static bool pkg_id_equal(const void *key1, const void *key2,
> > +static bool pkg_id_equal(const void *__key1, const void *__key2,
> > void *ctx __maybe_unused)
> > {
> > - return (size_t)key1 == (size_t)key2;
> > + uint64_t *key1 = (uint64_t*) __key1;
> > + uint64_t *key2 = (uint64_t*) __key2;
> > +
> > + return *key1 == *key2;
> > }
> > static int check_per_pkg(struct evsel *counter,
> > @@ -297,7 +309,7 @@ static int check_per_pkg(struct evsel *counter,
> > struct hashmap *mask = counter->per_pkg_mask;
> > struct perf_cpu_map *cpus = evsel__cpus(counter);
> > int s, d, ret = 0;
> > - size_t key;
> > + uint64_t *key;
> > *skip = false;
> > @@ -338,7 +350,11 @@ static int check_per_pkg(struct evsel *counter,
> > if (d < 0)
> > return -1;
> > - key = (size_t)d << 16 | s;
> > + key = malloc(sizeof(*key));
> > + if (!key)
> > + return -ENOMEM;
> > +
> > + *key = (size_t)d << 32 | s;
>
> Should be "*key = (uint64_t)d << 32 | s;"?

yes, I missed this bit

thanks,
jirka

2021-01-26 10:27:44

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v7] perf stat: Fix wrong skipping for per-die aggregation

Hi Jiri,

On 1/25/2021 5:45 PM, Jiri Olsa wrote:
> I believe that the less ifdefs te better, if you could squash this
> change with your patch and send it, that'd be great
>
> SNIP

I will add your change in the patch and send the v8. Thanks so much for your help!

Thanks
Jin Yao