Current implementation of cpu_set_t type by glibc has internal cpu
mask size limitation of no more than 1024 CPUs. This limitation confines
NUMA awareness of Perf tool in record mode, thru --affinity option,
to the first 1024 CPUs on machines with larger amount of CPUs.
This patch set enables Perf tool to overcome 1024 CPUs limitation by
using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
API operations to manipulate affinity masks of the tool's thread and
the mmaped data buffers.
tools bitmap API has been extended with bitmap_free() function and
bitmap_equal() operation whose implementation is derived from the
kernel one.
---
Alexey Budankov (3):
tools bitmap: implement bitmap_equal() operation at bitmap API
perf mmap: declare type for cpu mask of arbitrary length
perf record: adapt affinity to machines with #CPUs > 1K
tools/include/linux/bitmap.h | 30 ++++++++++++++++++++++++++++++
tools/lib/bitmap.c | 15 +++++++++++++++
tools/perf/builtin-record.c | 30 ++++++++++++++++++++++++------
tools/perf/util/mmap.c | 31 +++++++++++++++++++++++++------
tools/perf/util/mmap.h | 11 ++++++++++-
5 files changed, 104 insertions(+), 13 deletions(-)
---
Changes in v2:
- implemented bitmap_free() for symmetry with bitmap_alloc()
- capitalized MMAP_CPU_MASK_BYTES() macro
- returned -1 from perf_mmap__setup_affinity_mask()
- implemented releasing of masks using bitmap_free()
- moved debug printing under -vv option
---
Testing:
tools/perf/perf record -vv --affinity=cpu -- ls
thread mask[8]: empty
Using CPUID GenuineIntel-6-5E-3
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
nr_cblocks: 0
affinity: CPU
mmap flush: 1
comp level: 0
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 28649 cpu 0 group_fd -1 flags 0x8 = 4
sys_perf_event_open: pid 28649 cpu 1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 28649 cpu 2 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 28649 cpu 3 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 28649 cpu 4 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 28649 cpu 5 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 28649 cpu 6 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 28649 cpu 7 group_fd -1 flags 0x8 = 13
mmap size 528384B
0x7f1898200010: mmap mask[8]: 0
0x7f18982100d8: mmap mask[8]: 1
0x7f18982201a0: mmap mask[8]: 2
0x7f1898230268: mmap mask[8]: 3
0x7f1898240330: mmap mask[8]: 4
0x7f18982503f8: mmap mask[8]: 5
0x7f18982604c0: mmap mask[8]: 6
0x7f1898270588: mmap mask[8]: 7
------------------------------------------------------------
perf_event_attr:
type 1
size 112
config 0x9
watermark 1
sample_id_all 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 20
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 21
...
Synthesizing TSC conversion information
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
thread mask[8]: 4
arch copy Documentation init kernel MAINTAINERS modules.builtin.modinfo perf.data scripts System.map vmlinux
block COPYING drivers ipc lbuild Makefile modules.order perf.data.old security tools vmlinux.o
certs CREDITS fs Kbuild lib mm Module.symvers README sound usr
config-5.2.7-100.fc29.x86_64 crypto include Kconfig LICENSES modules.builtin net samples stdio virt
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
thread mask[8]: 4
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
[ perf record: Woken up 0 times to write data ]
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
thread mask[8]: 4
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
Looking at the vmlinux_path (8 entries long)
Using vmlinux for symbols
[ perf record: Captured and wrote 0.014 MB perf.data (8 samples) ]
--
2.20.1
Extend tools bitmap API with bitmap_equal() implementation.
The implementation has been derived from the kernel.
Extend tools bitmap API with bitmap_free() implementation for
symmetry with bitmap_alloc() function.
Signed-off-by: Alexey Budankov <[email protected]>
---
Changes in v2:
- implemented bitmap_free() for symmetry with bitmap_alloc()
---
tools/include/linux/bitmap.h | 30 ++++++++++++++++++++++++++++++
tools/lib/bitmap.c | 15 +++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 05dca5c203f3..477a1cae513f 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -15,6 +15,8 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits);
int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int bits);
+int __bitmap_equal(const unsigned long *bitmap1,
+ const unsigned long *bitmap2, unsigned int bits);
void bitmap_clear(unsigned long *map, unsigned int start, int len);
#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
@@ -123,6 +125,15 @@ static inline unsigned long *bitmap_alloc(int nbits)
return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
}
+/*
+ * bitmap_free - Free bitmap
+ * @bitmap: pointer to bitmap
+ */
+static inline void bitmap_free(unsigned long *bitmap)
+{
+ free(bitmap);
+}
+
/*
* bitmap_scnprintf - print bitmap list into buffer
* @bitmap: bitmap
@@ -148,4 +159,23 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
return __bitmap_and(dst, src1, src2, nbits);
}
+#ifdef __LITTLE_ENDIAN
+#define BITMAP_MEM_ALIGNMENT 8
+#else
+#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
+#endif
+#define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
+#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
+
+static inline int bitmap_equal(const unsigned long *src1,
+ const unsigned long *src2, unsigned int nbits)
+{
+ if (small_const_nbits(nbits))
+ return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+ if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
+ IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
+ return !memcmp(src1, src2, nbits / 8);
+ return __bitmap_equal(src1, src2, nbits);
+}
+
#endif /* _PERF_BITOPS_H */
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 38494782be06..5043747ef6c5 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -71,3 +71,18 @@ int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
BITMAP_LAST_WORD_MASK(bits));
return result != 0;
}
+
+int __bitmap_equal(const unsigned long *bitmap1,
+ const unsigned long *bitmap2, unsigned int bits)
+{
+ unsigned int k, lim = bits/BITS_PER_LONG;
+ for (k = 0; k < lim; ++k)
+ if (bitmap1[k] != bitmap2[k])
+ return 0;
+
+ if (bits % BITS_PER_LONG)
+ if ((bitmap1[k] ^ bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+ return 0;
+
+ return 1;
+}
--
2.20.1
Declare a dedicated struct map_cpu_mask type for cpu masks of
arbitrary length. Mask is available thru bits pointer and the
mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro
returns mask storage size in bytes.
Signed-off-by: Alexey Budankov <[email protected]>
---
Changes in v2:
- capitalized MMAP_CPU_MASK_BYTES() macro
---
tools/perf/util/mmap.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index bee4e83f7109..786c235d6062 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -15,6 +15,15 @@
#include "event.h"
struct aiocb;
+
+struct mmap_cpu_mask {
+ unsigned long *bits;
+ size_t nbits;
+};
+
+#define MMAP_CPU_MASK_BYTES(m) \
+ (BITS_TO_LONGS(((struct mmap_cpu_mask *)m)->nbits) * sizeof(unsigned long))
+
/**
* struct mmap - perf's ring buffer mmap details
*
--
2.20.1
Use struct mmap_cpu_mask type for tool's thread and mmap data
buffers to overcome current 1024 CPUs mask size limitation of
cpu_set_t type.
Currently glibc cpu_set_t type has internal mask size limit
of 1024 CPUs. Moving to struct mmap_cpu_mask type allows
overcoming that limit. tools bitmap API is used to manipulate
objects of struct mmap_cpu_mask type.
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Alexey Budankov <[email protected]>
---
Changes in v2:
- returned -1 from perf_mmap__setup_affinity_mask()
- moved debug printing under -vv option
- implemented releasing of masks using bitmap_free()
---
tools/perf/builtin-record.c | 30 ++++++++++++++++++++++++------
tools/perf/util/mmap.c | 31 +++++++++++++++++++++++++------
tools/perf/util/mmap.h | 2 +-
3 files changed, 50 insertions(+), 13 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b95c000c1ed9..f38d9de8183f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -62,6 +62,7 @@
#include <linux/string.h>
#include <linux/time64.h>
#include <linux/zalloc.h>
+#include <linux/bitmap.h>
struct switch_output {
bool enabled;
@@ -93,7 +94,7 @@ struct record {
bool timestamp_boundary;
struct switch_output switch_output;
unsigned long long samples;
- cpu_set_t affinity_mask;
+ struct mmap_cpu_mask affinity_mask;
unsigned long output_max_size; /* = 0: unlimited */
};
@@ -951,13 +952,21 @@ static struct perf_event_header finished_round_event = {
.type = PERF_RECORD_FINISHED_ROUND,
};
+#define MASK_SIZE 1023
static void record__adjust_affinity(struct record *rec, struct mmap *map)
{
+ char mask[MASK_SIZE + 1] = {0};
+
if (rec->opts.affinity != PERF_AFFINITY_SYS &&
- !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
- CPU_ZERO(&rec->affinity_mask);
- CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
- sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
+ !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
+ rec->affinity_mask.nbits)) {
+ bitmap_zero(rec->affinity_mask.bits, rec->affinity_mask.nbits);
+ bitmap_or(rec->affinity_mask.bits, rec->affinity_mask.bits,
+ map->affinity_mask.bits, rec->affinity_mask.nbits);
+ sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&rec->affinity_mask),
+ (cpu_set_t *)rec->affinity_mask.bits);
+ bitmap_scnprintf(rec->affinity_mask.bits, rec->affinity_mask.nbits, mask, MASK_SIZE);
+ pr_debug2("thread mask[%ld]: %s\n", rec->affinity_mask.nbits, mask);
}
}
@@ -2389,7 +2398,6 @@ int cmd_record(int argc, const char **argv)
# undef REASON
#endif
- CPU_ZERO(&rec->affinity_mask);
rec->opts.affinity = PERF_AFFINITY_SYS;
rec->evlist = evlist__new();
@@ -2455,6 +2463,14 @@ int cmd_record(int argc, const char **argv)
symbol__init(NULL);
+ rec->affinity_mask.nbits = cpu__max_cpu();
+ rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
+ if (!rec->affinity_mask.bits) {
+ pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
+ return -ENOMEM;
+ }
+ pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
+
err = record__auxtrace_init(rec);
if (err)
goto out;
@@ -2569,6 +2585,8 @@ int cmd_record(int argc, const char **argv)
err = __cmd_record(&record, argc, argv);
out:
+ if (rec->affinity_mask.bits)
+ bitmap_free(rec->affinity_mask.bits);
evlist__delete(rec->evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 063d1b93c53d..3b097ae7f5fd 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -23,6 +23,7 @@
#include "mmap.h"
#include "../perf.h"
#include <internal/lib.h> /* page_size */
+#include <linux/bitmap.h>
size_t mmap__mmap_len(struct mmap *map)
{
@@ -207,6 +208,9 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused)
void mmap__munmap(struct mmap *map)
{
+ if (map->affinity_mask.bits)
+ bitmap_free(map->affinity_mask.bits);
+
perf_mmap__aio_munmap(map);
if (map->data != NULL) {
munmap(map->data, mmap__mmap_len(map));
@@ -215,7 +219,7 @@ void mmap__munmap(struct mmap *map)
auxtrace_mmap__munmap(&map->auxtrace_mmap);
}
-static void build_node_mask(int node, cpu_set_t *mask)
+static void build_node_mask(int node, struct mmap_cpu_mask *mask)
{
int c, cpu, nr_cpus;
const struct perf_cpu_map *cpu_map = NULL;
@@ -228,28 +232,43 @@ static void build_node_mask(int node, cpu_set_t *mask)
for (c = 0; c < nr_cpus; c++) {
cpu = cpu_map->map[c]; /* map c index to online cpu index */
if (cpu__get_node(cpu) == node)
- CPU_SET(cpu, mask);
+ set_bit(cpu, mask->bits);
}
}
-static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
+static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
{
- CPU_ZERO(&map->affinity_mask);
+ map->affinity_mask.nbits = cpu__max_cpu();
+ map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
+ if (!map->affinity_mask.bits)
+ return -1;
+
if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
else if (mp->affinity == PERF_AFFINITY_CPU)
- CPU_SET(map->core.cpu, &map->affinity_mask);
+ set_bit(map->core.cpu, map->affinity_mask.bits);
+
+ return 0;
}
+#define MASK_SIZE 1023
int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
{
+ char mask[MASK_SIZE + 1] = {0};
+
if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) {
pr_debug2("failed to mmap perf event ring buffer, error %d\n",
errno);
return -1;
}
- perf_mmap__setup_affinity_mask(map, mp);
+ if (perf_mmap__setup_affinity_mask(map, mp)) {
+ pr_debug2("failed to alloc mmap affinity mask, error %d\n",
+ errno);
+ return -1;
+ }
+ bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
+ pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
map->core.flush = mp->flush;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 786c235d6062..65f9b04c45e4 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -40,7 +40,7 @@ struct mmap {
int nr_cblocks;
} aio;
#endif
- cpu_set_t affinity_mask;
+ struct mmap_cpu_mask affinity_mask;
void *data;
int comp_level;
};
--
2.20.1
On Mon, Nov 25, 2019 at 09:08:57AM +0300, Alexey Budankov wrote:
SNIP
> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
> {
> - CPU_ZERO(&map->affinity_mask);
> + map->affinity_mask.nbits = cpu__max_cpu();
> + map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
> + if (!map->affinity_mask.bits)
> + return -1;
> +
> if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
> build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
> else if (mp->affinity == PERF_AFFINITY_CPU)
> - CPU_SET(map->core.cpu, &map->affinity_mask);
> + set_bit(map->core.cpu, map->affinity_mask.bits);
> +
> + return 0;
> }
>
> +#define MASK_SIZE 1023
> int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
> {
> + char mask[MASK_SIZE + 1] = {0};
does this need to be initialized?
> +
> if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) {
> pr_debug2("failed to mmap perf event ring buffer, error %d\n",
> errno);
> return -1;
> }
>
> - perf_mmap__setup_affinity_mask(map, mp);
> + if (perf_mmap__setup_affinity_mask(map, mp)) {
> + pr_debug2("failed to alloc mmap affinity mask, error %d\n",
> + errno);
> + return -1;
> + }
> + bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
> + pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
the bitmap_scnprintf could be called only for debug case, right?
if (version >= 2) {
bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
}
ditto int the record__adjust_affinity function
jirka
On Mon, Nov 25, 2019 at 02:13:20PM +0300, Alexey Budankov wrote:
> On 25.11.2019 12:42, Jiri Olsa wrote:
> > On Mon, Nov 25, 2019 at 09:08:57AM +0300, Alexey Budankov wrote:
> >
> > SNIP
> >
> >> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
> >> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
> >> {
> >> - CPU_ZERO(&map->affinity_mask);
> >> + map->affinity_mask.nbits = cpu__max_cpu();
> >> + map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
> >> + if (!map->affinity_mask.bits)
> >> + return -1;
> >> +
> >> if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
> >> build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
> >> else if (mp->affinity == PERF_AFFINITY_CPU)
> >> - CPU_SET(map->core.cpu, &map->affinity_mask);
> >> + set_bit(map->core.cpu, map->affinity_mask.bits);
> >> +
> >> + return 0;
> >> }
> >>
> >> +#define MASK_SIZE 1023
> >> int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
> >> {
> >> + char mask[MASK_SIZE + 1] = {0};
> >
> > does this need to be initialized?
>
> This is to make sure the message is zero terminated for vfprintf call()
hum AFAICS it's used only in bitmap_scnprintf, which should
terminate the string properly
jirka
>
> >
> >> +
> >> if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) {
> >> pr_debug2("failed to mmap perf event ring buffer, error %d\n",
> >> errno);
> >> return -1;
> >> }
> >>
> >> - perf_mmap__setup_affinity_mask(map, mp);
> >> + if (perf_mmap__setup_affinity_mask(map, mp)) {
> >> + pr_debug2("failed to alloc mmap affinity mask, error %d\n",
> >> + errno);
> >> + return -1;
> >> + }
> >> + bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
> >> + pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
> >
> > the bitmap_scnprintf could be called only for debug case, right?
>
> Right. It is required to prepare debug message.
>
> >
> > if (version >= 2) {
> > bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
> > pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
> > }
> >
> > ditto int the record__adjust_affinity function
> >
> > jirka
> >
> >
>
> ~Alexey
>
On 25.11.2019 12:42, Jiri Olsa wrote:
> On Mon, Nov 25, 2019 at 09:08:57AM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>> {
>> - CPU_ZERO(&map->affinity_mask);
>> + map->affinity_mask.nbits = cpu__max_cpu();
>> + map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
>> + if (!map->affinity_mask.bits)
>> + return -1;
>> +
>> if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
>> build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
>> else if (mp->affinity == PERF_AFFINITY_CPU)
>> - CPU_SET(map->core.cpu, &map->affinity_mask);
>> + set_bit(map->core.cpu, map->affinity_mask.bits);
>> +
>> + return 0;
>> }
>>
>> +#define MASK_SIZE 1023
>> int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>> {
>> + char mask[MASK_SIZE + 1] = {0};
>
> does this need to be initialized?
This is to make sure the message is zero terminated for vfprintf call()
>
>> +
>> if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) {
>> pr_debug2("failed to mmap perf event ring buffer, error %d\n",
>> errno);
>> return -1;
>> }
>>
>> - perf_mmap__setup_affinity_mask(map, mp);
>> + if (perf_mmap__setup_affinity_mask(map, mp)) {
>> + pr_debug2("failed to alloc mmap affinity mask, error %d\n",
>> + errno);
>> + return -1;
>> + }
>> + bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
>> + pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
>
> the bitmap_scnprintf could be called only for debug case, right?
Right. It is required to prepare debug message.
>
> if (version >= 2) {
> bitmap_scnprintf(map->affinity_mask.bits, map->affinity_mask.nbits, mask, MASK_SIZE);
> pr_debug2("%p: mmap mask[%ld]: %s\n", map, map->affinity_mask.nbits, mask);
> }
>
> ditto int the record__adjust_affinity function
>
> jirka
>
>
~Alexey
On 25.11.2019 14:21, Jiri Olsa wrote:
> On Mon, Nov 25, 2019 at 02:13:20PM +0300, Alexey Budankov wrote:
>> On 25.11.2019 12:42, Jiri Olsa wrote:
>>> On Mon, Nov 25, 2019 at 09:08:57AM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>>>> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>>>> {
>>>> - CPU_ZERO(&map->affinity_mask);
>>>> + map->affinity_mask.nbits = cpu__max_cpu();
>>>> + map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
>>>> + if (!map->affinity_mask.bits)
>>>> + return -1;
>>>> +
>>>> if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
>>>> build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
>>>> else if (mp->affinity == PERF_AFFINITY_CPU)
>>>> - CPU_SET(map->core.cpu, &map->affinity_mask);
>>>> + set_bit(map->core.cpu, map->affinity_mask.bits);
>>>> +
>>>> + return 0;
>>>> }
>>>>
>>>> +#define MASK_SIZE 1023
>>>> int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>>>> {
>>>> + char mask[MASK_SIZE + 1] = {0};
>>>
>>> does this need to be initialized?
>>
>> This is to make sure the message is zero terminated for vfprintf call()
>
> hum AFAICS it's used only in bitmap_scnprintf, which should
> terminate the string properly
If vfprintf() explicitly terminates output buffer with zero then
the initialization above can be avoided.
>
> jirka
>
~Alexey
On 25.11.2019 14:27, Alexey Budankov wrote:
>
> On 25.11.2019 14:21, Jiri Olsa wrote:
>> On Mon, Nov 25, 2019 at 02:13:20PM +0300, Alexey Budankov wrote:
>>> On 25.11.2019 12:42, Jiri Olsa wrote:
>>>> On Mon, Nov 25, 2019 at 09:08:57AM +0300, Alexey Budankov wrote:
>>>>
>>>> SNIP
>>>>
>>>>> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>>>>> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>>>>> {
>>>>> - CPU_ZERO(&map->affinity_mask);
>>>>> + map->affinity_mask.nbits = cpu__max_cpu();
>>>>> + map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
>>>>> + if (!map->affinity_mask.bits)
>>>>> + return -1;
>>>>> +
>>>>> if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
>>>>> build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
>>>>> else if (mp->affinity == PERF_AFFINITY_CPU)
>>>>> - CPU_SET(map->core.cpu, &map->affinity_mask);
>>>>> + set_bit(map->core.cpu, map->affinity_mask.bits);
>>>>> +
>>>>> + return 0;
>>>>> }
>>>>>
>>>>> +#define MASK_SIZE 1023
>>>>> int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>>>>> {
>>>>> + char mask[MASK_SIZE + 1] = {0};
>>>>
>>>> does this need to be initialized?
>>>
>>> This is to make sure the message is zero terminated for vfprintf call()
>>
>> hum AFAICS it's used only in bitmap_scnprintf, which should
>> terminate the string properly
>
> If vfprintf() explicitly terminates output buffer with zero then
> the initialization above can be avoided.
Well, bitmap_scnprintf() returns the length of resulting string and
the length can be used to make zero termination in the buffer and
avoid mask initialization in the beginning.
~Alexey
>
>>
>> jirka
>>
>
> ~Alexey
>