2024-05-20 09:07:11

by Leo Yan

[permalink] [raw]
Subject: [PATCH v3 0/3] perf maps: Improve the kcore maps merging

This patch series follows up on the patch [1] to improve the sorting and
merging of kcore maps.

Since the kcore maps are not sorted, merging them into the kernel maps
causes mess, e.g. some kcore maps might be ignored. Patch 01 sorts the
kcore maps and ensures the subset region is placed ahead of the superset
region in the list.

Patch 02 uses maps__remove_maps() to remove the identical kernel map
(generated by reading symbols from '/proc/kallsyms') from the map list.
This can give us a neat list without interfered by old map data and is a
preparation for later relying puerly on kcore maps.

Patch 03 removes the kernel text section replacement. Alternatively,
it searches the kcore maps and finds the map for kernel text section,
then update the pointer 'machine->vmlinux_map'.

This patch series is based on the latest acme's perf-tool-next branch
and tested on Arm64 Hikey960 board.

[1] https://lore.kernel.org/linux-perf-users/[email protected]/T/#m7c86a69d43103cd0cb446b0993e47c36df0f40f2


Leo Yan (3):
perf maps: Sort kcore maps
perf maps: Remove the kernel text map with maps__remove_maps()
perf maps: Remove the replacement of kernel map

tools/perf/util/maps.c | 4 +-
tools/perf/util/maps.h | 2 +-
tools/perf/util/symbol.c | 136 ++++++++++++++++++++-------------------
3 files changed, 74 insertions(+), 68 deletions(-)

--
2.34.1



2024-05-20 09:07:22

by Leo Yan

[permalink] [raw]
Subject: [PATCH v3 1/3] perf maps: Sort kcore maps

When merging kcore maps into the kernel maps, it has an implicit
requirement for the kcore maps ordering, otherwise, some sections
delivered by the kcore maps will be ignored.

Let's see below example:

Ordering 1:
Kcore maps | Start address | End address
-----------------+--------------------+--------------------
kcore_text | 0xffff800080000000 | 0xffff8000822f0000
vmalloc | 0xffff800080000000 | 0xfffffdffbf800000

Ordering 2:
Kcore maps | Start address | End address
-----------------+--------------------+--------------------
vmalloc | 0xffff800080000000 | 0xfffffdffbf800000
kcore_text | 0xffff800080000000 | 0xffff8000822f0000

The 'kcore_text' map is a subset of the 'vmalloc' map. When merging
these two maps into the kernal maps with the maps__merge_in() function,
the ordering 1 inserts the 'kcore_text' map prior to the 'vmalloc' map,
thus the 'kcore_text' map will be respected. On the other hand, if maps
are inserted with the ordering 2, the 'vmalloc' is inserted ahead, as a
result, its subset map will be ignored afterwards.

To merge the maps in a reliable way, this commit sorts the kcore maps
before merging them. Besides sorting the maps based on the end address,
it also gives the priority to a subset map to insert it before its
superset map in the list.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/symbol.c | 50 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9e5940b5bc59..c1513976ab6e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1256,6 +1256,7 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
{
struct kcore_mapfn_data *md = data;
struct map_list_node *list_node = map_list_node__new();
+ struct map_list_node *node;

if (!list_node)
return -ENOMEM;
@@ -1269,8 +1270,55 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
map__set_end(list_node->map, map__start(list_node->map) + len);
map__set_pgoff(list_node->map, pgoff);

- list_add(&list_node->node, &md->maps);
+ list_for_each_entry(node, &md->maps, node) {
+ /*
+ * When the new map (list_node)'s end address is less than
+ * current node, it can be divided into three cases.
+ *
+ * Case 1: the new map does not overlap with the current node,
+ * as the new map's end address is less than the current node's
+ * start address.
+ * [*******node********]
+ * [***list_node***] `start `end
+ * `start `end
+ *
+ * Case 2: the new map overlaps with the current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ *
+ * Case 3: the new map is subset of the current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ *
+ * For above three cases, insert the new map node before the
+ * current node.
+ */
+ if (map__end(node->map) > map__end(list_node->map))
+ break;
+
+ /*
+ * When the new map is subset of the current node and both nodes
+ * have the same end address, insert the new map node before the
+ * current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ */
+ if ((map__end(node->map) == map__end(list_node->map)) &&
+ (map__start(node->map) <= map__start(list_node->map)))
+ break;
+ }

+ /* Insert the new node (list_node) ahead */
+ list_add_tail(&list_node->node, &node->node);
return 0;
}

--
2.34.1


2024-05-20 09:07:39

by Leo Yan

[permalink] [raw]
Subject: [PATCH v3 2/3] perf maps: Remove the kernel text map with maps__remove_maps()

maps__remove_maps() removes all kernel maps except the text map and eBPF
maps. Afterwards, the kernel text map is removed from the list and
added again with updated information to the maps list.

This commit refactors maps__remove_maps() for deleting the 'map'
parameter, resulting in the removal of all kernel maps from the list.
Thus, the dso__load_kcore() function no longer needs to remove the kernel
text map.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/maps.c | 4 ++--
tools/perf/util/maps.h | 2 +-
tools/perf/util/symbol.c | 9 +++------
3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 16b39db594f4..4ddd0d50ac2c 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -589,7 +589,7 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
return ret;
}

-void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map))
{
struct map **maps_by_address;

@@ -597,7 +597,7 @@ void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data

maps_by_address = maps__maps_by_address(maps);
for (unsigned int i = 0; i < maps__nr_maps(maps);) {
- if (cb(maps_by_address[i], data))
+ if (cb(maps_by_address[i]))
__maps__remove(maps, maps_by_address[i]);
else
i++;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index d9aa62ed968a..90a1ff8b39c5 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -40,7 +40,7 @@ bool maps__equal(struct maps *a, struct maps *b);
/* Iterate over map calling cb for each entry. */
int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
/* Iterate over map removing an entry if cb returns true. */
-void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map));

struct machine *maps__machine(const struct maps *maps);
unsigned int maps__nr_maps(const struct maps *maps); /* Test only. */
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index c1513976ab6e..915435d55498 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1322,15 +1322,13 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
return 0;
}

-static bool remove_old_maps(struct map *map, void *data)
+static bool remove_old_maps(struct map *map)
{
- const struct map *map_to_save = data;
-
/*
* We need to preserve eBPF maps even if they are covered by kcore,
* because we need to access eBPF dso for source data.
*/
- return !RC_CHK_EQUAL(map, map_to_save) && !__map__is_bpf_prog(map);
+ return !__map__is_bpf_prog(map);
}

static int dso__load_kcore(struct dso *dso, struct map *map,
@@ -1385,7 +1383,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
}

/* Remove old maps */
- maps__remove_maps(kmaps, remove_old_maps, map);
+ maps__remove_maps(kmaps, remove_old_maps);
machine->trampolines_mapped = false;

/* Find the kernel map using the '_stext' symbol */
@@ -1422,7 +1420,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
* remaining maps so vmlinux gets split if necessary.
*/
map_ref = map__get(map);
- maps__remove(kmaps, map_ref);

map__set_start(map_ref, map__start(replacement_map));
map__set_end(map_ref, map__end(replacement_map));
--
2.34.1


2024-05-20 09:07:47

by Leo Yan

[permalink] [raw]
Subject: [PATCH v3 3/3] perf maps: Remove the replacement of kernel map

The kernel text map has been removed from the kernel maps by
maps__remove_maps(), and the kcore maps are organized in order, allowing
us to achieve neat kernel maps.

As a result, it is not necessary to replace the old kernel text map.
Instead, the commit finds the latest text map, assigns it to
'machine->vmlinux_map', and releases the old map.

One concern is if a platform fails to find a kernel text map after
updating maps list with kcore, in this case, it should not happen and
reports the failure.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/symbol.c | 77 ++++++++++------------------------------
1 file changed, 19 insertions(+), 58 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 915435d55498..a4b65d868fc9 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1335,13 +1335,12 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
const char *kallsyms_filename)
{
struct maps *kmaps = map__kmaps(map);
+ struct map *vmlinux_map;
struct kcore_mapfn_data md;
- struct map *map_ref, *replacement_map = NULL;
struct machine *machine;
bool is_64_bit;
int err, fd;
char kcore_filename[PATH_MAX];
- u64 stext;

if (!kmaps)
return -EINVAL;
@@ -1386,51 +1385,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
maps__remove_maps(kmaps, remove_old_maps);
machine->trampolines_mapped = false;

- /* Find the kernel map using the '_stext' symbol */
- if (!kallsyms__get_function_start(kallsyms_filename, "_stext", &stext)) {
- u64 replacement_size = 0;
- struct map_list_node *new_node;
-
- list_for_each_entry(new_node, &md.maps, node) {
- struct map *new_map = new_node->map;
- u64 new_size = map__size(new_map);
-
- if (!(stext >= map__start(new_map) && stext < map__end(new_map)))
- continue;
-
- /*
- * On some architectures, ARM64 for example, the kernel
- * text can get allocated inside of the vmalloc segment.
- * Select the smallest matching segment, in case stext
- * falls within more than one in the list.
- */
- if (!replacement_map || new_size < replacement_size) {
- replacement_map = new_map;
- replacement_size = new_size;
- }
- }
- }
-
- if (!replacement_map)
- replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
-
- /*
- * Update addresses of vmlinux map. Re-insert it to ensure maps are
- * correctly ordered. Do this before using maps__merge_in() for the
- * remaining maps so vmlinux gets split if necessary.
- */
- map_ref = map__get(map);
-
- map__set_start(map_ref, map__start(replacement_map));
- map__set_end(map_ref, map__end(replacement_map));
- map__set_pgoff(map_ref, map__pgoff(replacement_map));
- map__set_mapping_type(map_ref, map__mapping_type(replacement_map));
-
- err = maps__insert(kmaps, map_ref);
- map__put(map_ref);
- if (err)
- goto out_err;
-
/* Add new maps */
while (!list_empty(&md.maps)) {
struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
@@ -1438,21 +1392,28 @@ static int dso__load_kcore(struct dso *dso, struct map *map,

list_del_init(&new_node->node);

- /* skip if replacement_map, already inserted above */
- if (!RC_CHK_EQUAL(new_map, replacement_map)) {
- /*
- * Merge kcore map into existing maps,
- * and ensure that current maps (eBPF)
- * stay intact.
- */
- if (maps__merge_in(kmaps, new_map)) {
- err = -EINVAL;
- goto out_err;
- }
+ /*
+ * Merge kcore map into existing maps,
+ * and ensure that current maps (eBPF)
+ * stay intact.
+ */
+ if (maps__merge_in(kmaps, new_map)) {
+ err = -EINVAL;
+ goto out_err;
}
free(new_node);
}

+ /* Update vmlinux_map */
+ vmlinux_map = maps__find(kmaps, map__start(map));
+ if (vmlinux_map) {
+ free(machine->vmlinux_map);
+ machine->vmlinux_map = vmlinux_map;
+ } else {
+ pr_err("Failed to find vmlinux map from kcore\n");
+ goto out_err;
+ }
+
if (machine__is(machine, "x86_64")) {
u64 addr;

--
2.34.1


2024-05-22 08:58:22

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] perf maps: Remove the replacement of kernel map

On 20/05/24 12:06, Leo Yan wrote:
> The kernel text map has been removed from the kernel maps by
> maps__remove_maps(), and the kcore maps are organized in order, allowing
> us to achieve neat kernel maps.
>
> As a result, it is not necessary to replace the old kernel text map.
> Instead, the commit finds the latest text map, assigns it to
> 'machine->vmlinux_map', and releases the old map.

There is also the parameter 'map' that the caller can continue
to use. It was originally vmlinux_map which was checked:

/* This function requires that the map is the kernel map */
if (!__map__is_kernel(map))
return -EINVAL;

Although that now looks broken and should probably be:

/* This function requires that the map is the kernel map */
if (map != machine__kernel_map(machine))
return -EINVAL;

>
> One concern is if a platform fails to find a kernel text map after
> updating maps list with kcore, in this case, it should not happen and
> reports the failure.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/symbol.c | 77 ++++++++++------------------------------
> 1 file changed, 19 insertions(+), 58 deletions(-)
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 915435d55498..a4b65d868fc9 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1335,13 +1335,12 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> const char *kallsyms_filename)
> {
> struct maps *kmaps = map__kmaps(map);
> + struct map *vmlinux_map;
> struct kcore_mapfn_data md;
> - struct map *map_ref, *replacement_map = NULL;
> struct machine *machine;
> bool is_64_bit;
> int err, fd;
> char kcore_filename[PATH_MAX];
> - u64 stext;
>
> if (!kmaps)
> return -EINVAL;
> @@ -1386,51 +1385,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> maps__remove_maps(kmaps, remove_old_maps);
> machine->trampolines_mapped = false;
>
> - /* Find the kernel map using the '_stext' symbol */
> - if (!kallsyms__get_function_start(kallsyms_filename, "_stext", &stext)) {
> - u64 replacement_size = 0;
> - struct map_list_node *new_node;
> -
> - list_for_each_entry(new_node, &md.maps, node) {
> - struct map *new_map = new_node->map;
> - u64 new_size = map__size(new_map);
> -
> - if (!(stext >= map__start(new_map) && stext < map__end(new_map)))
> - continue;
> -
> - /*
> - * On some architectures, ARM64 for example, the kernel
> - * text can get allocated inside of the vmalloc segment.
> - * Select the smallest matching segment, in case stext
> - * falls within more than one in the list.
> - */
> - if (!replacement_map || new_size < replacement_size) {
> - replacement_map = new_map;
> - replacement_size = new_size;
> - }
> - }
> - }
> -
> - if (!replacement_map)
> - replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
> -
> - /*
> - * Update addresses of vmlinux map. Re-insert it to ensure maps are
> - * correctly ordered. Do this before using maps__merge_in() for the
> - * remaining maps so vmlinux gets split if necessary.
> - */
> - map_ref = map__get(map);
> -
> - map__set_start(map_ref, map__start(replacement_map));
> - map__set_end(map_ref, map__end(replacement_map));
> - map__set_pgoff(map_ref, map__pgoff(replacement_map));
> - map__set_mapping_type(map_ref, map__mapping_type(replacement_map));
> -
> - err = maps__insert(kmaps, map_ref);
> - map__put(map_ref);
> - if (err)
> - goto out_err;
> -
> /* Add new maps */
> while (!list_empty(&md.maps)) {
> struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
> @@ -1438,21 +1392,28 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
>
> list_del_init(&new_node->node);
>
> - /* skip if replacement_map, already inserted above */
> - if (!RC_CHK_EQUAL(new_map, replacement_map)) {
> - /*
> - * Merge kcore map into existing maps,
> - * and ensure that current maps (eBPF)
> - * stay intact.
> - */
> - if (maps__merge_in(kmaps, new_map)) {
> - err = -EINVAL;
> - goto out_err;
> - }
> + /*
> + * Merge kcore map into existing maps,
> + * and ensure that current maps (eBPF)
> + * stay intact.
> + */
> + if (maps__merge_in(kmaps, new_map)) {
> + err = -EINVAL;
> + goto out_err;
> }
> free(new_node);
> }
>
> + /* Update vmlinux_map */
> + vmlinux_map = maps__find(kmaps, map__start(map));
> + if (vmlinux_map) {
> + free(machine->vmlinux_map);

Reference counting?

> + machine->vmlinux_map = vmlinux_map;
> + } else {
> + pr_err("Failed to find vmlinux map from kcore\n");
> + goto out_err;
> + }
> +
> if (machine__is(machine, "x86_64")) {
> u64 addr;
>


2024-05-22 10:31:47

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] perf maps: Sort kcore maps

On 20/05/24 12:06, Leo Yan wrote:
> When merging kcore maps into the kernel maps, it has an implicit
> requirement for the kcore maps ordering, otherwise, some sections
> delivered by the kcore maps will be ignored.

perf treats the kernel text map as a special case. The problem
is that the kcore loading logic did not cater for there being 2
maps that covered the kernel mapping.

The workaround was to choose the smaller mapping, but then that
still only worked if that was the first.

James essentially fixed that by ensuring the kernel "replacement"
map is inserted first.

In other respects, the ordering of the maps does not matter, so
I am not sure it is worth this extra processing.

>
> Let's see below example:
>
> Ordering 1:
> Kcore maps | Start address | End address
> -----------------+--------------------+--------------------
> kcore_text | 0xffff800080000000 | 0xffff8000822f0000
> vmalloc | 0xffff800080000000 | 0xfffffdffbf800000
>
> Ordering 2:
> Kcore maps | Start address | End address
> -----------------+--------------------+--------------------
> vmalloc | 0xffff800080000000 | 0xfffffdffbf800000
> kcore_text | 0xffff800080000000 | 0xffff8000822f0000
>
> The 'kcore_text' map is a subset of the 'vmalloc' map. When merging
> these two maps into the kernal maps with the maps__merge_in() function,
> the ordering 1 inserts the 'kcore_text' map prior to the 'vmalloc' map,
> thus the 'kcore_text' map will be respected. On the other hand, if maps
> are inserted with the ordering 2, the 'vmalloc' is inserted ahead, as a
> result, its subset map will be ignored afterwards.
>
> To merge the maps in a reliable way, this commit sorts the kcore maps
> before merging them. Besides sorting the maps based on the end address,
> it also gives the priority to a subset map to insert it before its
> superset map in the list.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/symbol.c | 50 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 9e5940b5bc59..c1513976ab6e 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1256,6 +1256,7 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
> {
> struct kcore_mapfn_data *md = data;
> struct map_list_node *list_node = map_list_node__new();
> + struct map_list_node *node;
>
> if (!list_node)
> return -ENOMEM;
> @@ -1269,8 +1270,55 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
> map__set_end(list_node->map, map__start(list_node->map) + len);
> map__set_pgoff(list_node->map, pgoff);
>
> - list_add(&list_node->node, &md->maps);
> + list_for_each_entry(node, &md->maps, node) {
> + /*
> + * When the new map (list_node)'s end address is less than
> + * current node, it can be divided into three cases.
> + *
> + * Case 1: the new map does not overlap with the current node,
> + * as the new map's end address is less than the current node's
> + * start address.
> + * [*******node********]
> + * [***list_node***] `start `end
> + * `start `end
> + *
> + * Case 2: the new map overlaps with the current node.
> + *
> + * ,start ,end
> + * [*******node********]
> + * [***list_node***]
> + * `start `end
> + *
> + * Case 3: the new map is subset of the current node.
> + *
> + * ,start ,end
> + * [*******node********]
> + * [***list_node***]
> + * `start `end
> + *
> + * For above three cases, insert the new map node before the
> + * current node.
> + */
> + if (map__end(node->map) > map__end(list_node->map))
> + break;
> +
> + /*
> + * When the new map is subset of the current node and both nodes
> + * have the same end address, insert the new map node before the
> + * current node.
> + *
> + * ,start ,end
> + * [*******node********]
> + * [***list_node***]
> + * `start `end
> + */
> + if ((map__end(node->map) == map__end(list_node->map)) &&
> + (map__start(node->map) <= map__start(list_node->map)))
> + break;
> + }
>
> + /* Insert the new node (list_node) ahead */
> + list_add_tail(&list_node->node, &node->node);
> return 0;
> }
>


2024-05-22 11:28:05

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] perf maps: Remove the kernel text map with maps__remove_maps()

On 20/05/24 12:06, Leo Yan wrote:
> maps__remove_maps() removes all kernel maps except the text map and eBPF
> maps. Afterwards, the kernel text map is removed from the list and
> added again with updated information to the maps list.
>
> This commit refactors maps__remove_maps() for deleting the 'map'
> parameter, resulting in the removal of all kernel maps from the list.
> Thus, the dso__load_kcore() function no longer needs to remove the kernel
> text map.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/maps.c | 4 ++--
> tools/perf/util/maps.h | 2 +-
> tools/perf/util/symbol.c | 9 +++------
> 3 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
> index 16b39db594f4..4ddd0d50ac2c 100644
> --- a/tools/perf/util/maps.c
> +++ b/tools/perf/util/maps.c
> @@ -589,7 +589,7 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
> return ret;
> }
>
> -void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
> +void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map))
> {
> struct map **maps_by_address;
>
> @@ -597,7 +597,7 @@ void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data
>
> maps_by_address = maps__maps_by_address(maps);
> for (unsigned int i = 0; i < maps__nr_maps(maps);) {
> - if (cb(maps_by_address[i], data))
> + if (cb(maps_by_address[i]))
> __maps__remove(maps, maps_by_address[i]);
> else
> i++;
> diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
> index d9aa62ed968a..90a1ff8b39c5 100644
> --- a/tools/perf/util/maps.h
> +++ b/tools/perf/util/maps.h
> @@ -40,7 +40,7 @@ bool maps__equal(struct maps *a, struct maps *b);
> /* Iterate over map calling cb for each entry. */
> int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
> /* Iterate over map removing an entry if cb returns true. */
> -void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
> +void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map));
>
> struct machine *maps__machine(const struct maps *maps);
> unsigned int maps__nr_maps(const struct maps *maps); /* Test only. */
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index c1513976ab6e..915435d55498 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1322,15 +1322,13 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
> return 0;
> }
>
> -static bool remove_old_maps(struct map *map, void *data)
> +static bool remove_old_maps(struct map *map)
> {
> - const struct map *map_to_save = data;
> -
> /*
> * We need to preserve eBPF maps even if they are covered by kcore,
> * because we need to access eBPF dso for source data.
> */
> - return !RC_CHK_EQUAL(map, map_to_save) && !__map__is_bpf_prog(map);
> + return !__map__is_bpf_prog(map);
> }
>
> static int dso__load_kcore(struct dso *dso, struct map *map,
> @@ -1385,7 +1383,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> }
>
> /* Remove old maps */
> - maps__remove_maps(kmaps, remove_old_maps, map);
> + maps__remove_maps(kmaps, remove_old_maps);
> machine->trampolines_mapped = false;
>
> /* Find the kernel map using the '_stext' symbol */
> @@ -1422,7 +1420,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> * remaining maps so vmlinux gets split if necessary.
> */
> map_ref = map__get(map);

If the ref is needed, then it should be taken before
the call to maps__remove_maps().

> - maps__remove(kmaps, map_ref);
>
> map__set_start(map_ref, map__start(replacement_map));
> map__set_end(map_ref, map__end(replacement_map));


2024-05-25 09:30:09

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] perf maps: Sort kcore maps

Hi Adrian,

On 5/22/2024 11:31 AM, Adrian Hunter wrote:
> On 20/05/24 12:06, Leo Yan wrote:
>> When merging kcore maps into the kernel maps, it has an implicit
>> requirement for the kcore maps ordering, otherwise, some sections
>> delivered by the kcore maps will be ignored.
>
> perf treats the kernel text map as a special case. The problem
> is that the kcore loading logic did not cater for there being 2
> maps that covered the kernel mapping.
>
> The workaround was to choose the smaller mapping, but then that
> still only worked if that was the first.

You could see below are Kcore maps dumped on Arm64:

kore map start: ffff000000000000 end: ffff00001ac00000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff00001ad88000 end: ffff000032000000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff000032101000 end: ffff00003e000000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff000040000000 end: ffff000089b80000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff000089cc0000 end: ffff0000b9ab0000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff0000b9ad0000 end: ffff0000b9bb0000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff0000b9c50000 end: ffff0000b9d50000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff0000ba114000 end: ffff0000bf130000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff0000bf180000 end: ffff0000e0000000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff000200000000 end: ffff000220000000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff800000000000 end: ffff800080000000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff800080000000 end: ffff8000822f0000 name: [kernel.kallsyms]
refcnt: 1
kore map start: ffff800080000000 end: fffffdffbf800000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc0000000 end: fffffdffc06b0000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc06b6000 end: fffffdffc0c80000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc0c84000 end: fffffdffc0f80000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc1000000 end: fffffdffc226e000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc2273000 end: fffffdffc2e6b000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc2e6b000 end: fffffdffc2e6f000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc2e71000 end: fffffdffc2e76000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc2e84000 end: fffffdffc2fc5000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc2fc6000 end: fffffdffc3800000 name: [kernel.kallsyms]
refcnt: 1
kore map start: fffffdffc8000000 end: fffffdffc8800000 name: [kernel.kallsyms]
refcnt: 1

You could see it's much more complex rather than only for kernel text section
and vmalloc region. We cannot only handle the case for the overlapping between
the text section and vmalloc region, it is possible for other maps to be
overlapping with each other.

And different arches have their own definition for the Kcore maps. This is why
I want to sort maps in this patch, it can allow us to find a reliable way to
append the kcore maps.

> James essentially fixed that by ensuring the kernel "replacement"
> map is inserted first.

Yeah, I agreed James' patch has fixed the kernel "replacement" map. But as I
elaborated above, there still have other maps might overlap with each other,
my understanding is we don't handle all cases.
> In other respects, the ordering of the maps does not matter, so
> I am not sure it is worth this extra processing.

To sell my patch, I have another point for why we need sorting Kcore maps.

Now Perf verifies the map in the function check_invariants(), this function
reports the broken issue, but we still have no clue how the broken issue happens.

If we can sort the Kcore maps in kcore_mapfn(), then we have chance to detect
the overlapping within maps, and then it can reports the overlapping in the
first place and this would be helpful for debugging and locating the failures.

So I have another patch for printing overlapping maps in kcore_mapfn(). I did
not send out the patch, as I found there have other logs in maps.c and
symbol.c should be improved a bit. So I am planning to send out a separate
patch series.

Please let me know if you still think this patch is not useful or not.

Thanks,
Leo

2024-05-28 15:01:14

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] perf maps: Sort kcore maps



On 25/05/2024 10:29, Leo Yan wrote:
> Hi Adrian,
>
> On 5/22/2024 11:31 AM, Adrian Hunter wrote:
>> On 20/05/24 12:06, Leo Yan wrote:
>>> When merging kcore maps into the kernel maps, it has an implicit
>>> requirement for the kcore maps ordering, otherwise, some sections
>>> delivered by the kcore maps will be ignored.
>>
>> perf treats the kernel text map as a special case. The problem
>> is that the kcore loading logic did not cater for there being 2
>> maps that covered the kernel mapping.
>>
>> The workaround was to choose the smaller mapping, but then that
>> still only worked if that was the first.
>
> You could see below are Kcore maps dumped on Arm64:
>
> kore map start: ffff000000000000 end: ffff00001ac00000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff00001ad88000 end: ffff000032000000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff000032101000 end: ffff00003e000000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff000040000000 end: ffff000089b80000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff000089cc0000 end: ffff0000b9ab0000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff0000b9ad0000 end: ffff0000b9bb0000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff0000b9c50000 end: ffff0000b9d50000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff0000ba114000 end: ffff0000bf130000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff0000bf180000 end: ffff0000e0000000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff000200000000 end: ffff000220000000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff800000000000 end: ffff800080000000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff800080000000 end: ffff8000822f0000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: ffff800080000000 end: fffffdffbf800000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc0000000 end: fffffdffc06b0000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc06b6000 end: fffffdffc0c80000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc0c84000 end: fffffdffc0f80000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc1000000 end: fffffdffc226e000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc2273000 end: fffffdffc2e6b000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc2e6b000 end: fffffdffc2e6f000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc2e71000 end: fffffdffc2e76000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc2e84000 end: fffffdffc2fc5000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc2fc6000 end: fffffdffc3800000 name: [kernel.kallsyms]
> refcnt: 1
> kore map start: fffffdffc8000000 end: fffffdffc8800000 name: [kernel.kallsyms]
> refcnt: 1
>
> You could see it's much more complex rather than only for kernel text section
> and vmalloc region. We cannot only handle the case for the overlapping between
> the text section and vmalloc region, it is possible for other maps to be
> overlapping with each other.
>
> And different arches have their own definition for the Kcore maps. This is why
> I want to sort maps in this patch, it can allow us to find a reliable way to
> append the kcore maps.
>
>> James essentially fixed that by ensuring the kernel "replacement"
>> map is inserted first.
>
> Yeah, I agreed James' patch has fixed the kernel "replacement" map. But as I
> elaborated above, there still have other maps might overlap with each other,
> my understanding is we don't handle all cases.
>> In other respects, the ordering of the maps does not matter, so
>> I am not sure it is worth this extra processing.
>
> To sell my patch, I have another point for why we need sorting Kcore maps.
>
> Now Perf verifies the map in the function check_invariants(), this function
> reports the broken issue, but we still have no clue how the broken issue happens.
>
> If we can sort the Kcore maps in kcore_mapfn(), then we have chance to detect
> the overlapping within maps, and then it can reports the overlapping in the
> first place and this would be helpful for debugging and locating the failures.
>

+1 for this point. If check_invariants() insists on sorted maps, then
keeping them sorted as early as possible is better.

Debugging future issues will be easier if the assert is closer to the
source of the problem.

James

2024-05-28 15:33:26

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] perf maps: Remove the replacement of kernel map



On 20/05/2024 10:06, Leo Yan wrote:
> The kernel text map has been removed from the kernel maps by
> maps__remove_maps(), and the kcore maps are organized in order, allowing
> us to achieve neat kernel maps.
>
> As a result, it is not necessary to replace the old kernel text map.
> Instead, the commit finds the latest text map, assigns it to
> 'machine->vmlinux_map', and releases the old map.
>
> One concern is if a platform fails to find a kernel text map after
> updating maps list with kcore, in this case, it should not happen and
> reports the failure.
>

Hi Leo,

For some reason this commit causes the symbol to not be resolved for
one kernel sample in a recording I have. It's a bit weird because the
same address is resolved in other samples.

$ perf script

Before this commit:

perf-exec ffff80008030cfd4 perf_event_exec+0x22c ([kernel.kallsyms])
perf-exec ffff80008030cfd4 perf_event_exec+0x22c ([kernel.kallsyms])
perf-exec ffff80008030cfd4 perf_event_exec+0x22c ([kernel.kallsyms])
perf-exec ffff80008129a0f8 _raw_spin_unlock_irqrestore+0x48
([kernel.kallsyms])
ls ffff80008012f5ec lock_acquire+0x74 ([kernel.kallsyms])

After:

perf-exec ffff80008030cfd4 [unknown] ([kernel.kallsyms])
perf-exec ffff80008030cfd4 perf_event_exec+0x22c ([kernel.kallsyms])
perf-exec ffff80008030cfd4 perf_event_exec+0x22c ([kernel.kallsyms])
perf-exec ffff80008129a0f8 _raw_spin_unlock_irqrestore+0x48
([kernel.kallsyms])
ls ffff80008012f5ec lock_acquire+0x74 ([kernel.kallsyms])

James

2024-05-29 08:51:15

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] perf maps: Sort kcore maps

On 28/05/24 18:00, James Clark wrote:
>
>
> On 25/05/2024 10:29, Leo Yan wrote:
>> Hi Adrian,
>>
>> On 5/22/2024 11:31 AM, Adrian Hunter wrote:
>>> On 20/05/24 12:06, Leo Yan wrote:
>>>> When merging kcore maps into the kernel maps, it has an implicit
>>>> requirement for the kcore maps ordering, otherwise, some sections
>>>> delivered by the kcore maps will be ignored.
>>>
>>> perf treats the kernel text map as a special case. The problem
>>> is that the kcore loading logic did not cater for there being 2
>>> maps that covered the kernel mapping.
>>>
>>> The workaround was to choose the smaller mapping, but then that
>>> still only worked if that was the first.
>>
>> You could see below are Kcore maps dumped on Arm64:
>>
>> kore map start: ffff000000000000 end: ffff00001ac00000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff00001ad88000 end: ffff000032000000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff000032101000 end: ffff00003e000000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff000040000000 end: ffff000089b80000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff000089cc0000 end: ffff0000b9ab0000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff0000b9ad0000 end: ffff0000b9bb0000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff0000b9c50000 end: ffff0000b9d50000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff0000ba114000 end: ffff0000bf130000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff0000bf180000 end: ffff0000e0000000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff000200000000 end: ffff000220000000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff800000000000 end: ffff800080000000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff800080000000 end: ffff8000822f0000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: ffff800080000000 end: fffffdffbf800000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc0000000 end: fffffdffc06b0000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc06b6000 end: fffffdffc0c80000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc0c84000 end: fffffdffc0f80000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc1000000 end: fffffdffc226e000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc2273000 end: fffffdffc2e6b000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc2e6b000 end: fffffdffc2e6f000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc2e71000 end: fffffdffc2e76000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc2e84000 end: fffffdffc2fc5000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc2fc6000 end: fffffdffc3800000 name: [kernel.kallsyms]
>> refcnt: 1
>> kore map start: fffffdffc8000000 end: fffffdffc8800000 name: [kernel.kallsyms]
>> refcnt: 1
>>
>> You could see it's much more complex rather than only for kernel text section
>> and vmalloc region. We cannot only handle the case for the overlapping between
>> the text section and vmalloc region, it is possible for other maps to be
>> overlapping with each other.

That should not matter because maps__merge_in() deals with overlaps. Just
need the replacement map to cover the kernel text.

There would be a problem if the mappings were inconsistent i.e. the overlapped
part pointed to a different part of the file. Not sure how much sense there is
in general in overlapping program headers in an ELF file, but if they exist,
they *must* have:

vaddr_1 - offset_1 == vaddr_2 - offset_2

We could add a check for that, but that would be fatal i.e. fail to load kcore
at all.

>>
>> And different arches have their own definition for the Kcore maps. This is why
>> I want to sort maps in this patch, it can allow us to find a reliable way to
>> append the kcore maps.
>>
>>> James essentially fixed that by ensuring the kernel "replacement"
>>> map is inserted first.
>>
>> Yeah, I agreed James' patch has fixed the kernel "replacement" map. But as I
>> elaborated above, there still have other maps might overlap with each other,
>> my understanding is we don't handle all cases.
>>> In other respects, the ordering of the maps does not matter, so
>>> I am not sure it is worth this extra processing.
>>
>> To sell my patch, I have another point for why we need sorting Kcore maps.
>>
>> Now Perf verifies the map in the function check_invariants(), this function
>> reports the broken issue, but we still have no clue how the broken issue happens.
>>
>> If we can sort the Kcore maps in kcore_mapfn(), then we have chance to detect
>> the overlapping within maps, and then it can reports the overlapping in the
>> first place and this would be helpful for debugging and locating the failures.
>>
>
> +1 for this point. If check_invariants() insists on sorted maps, then
> keeping them sorted as early as possible is better.
>
> Debugging future issues will be easier if the assert is closer to the
> source of the problem.

The issue there is with using maps__insert(). It should be
maps__fixup_overlap_and_insert() instead. Probably most callers
of maps__insert() should be using maps__fixup_overlap_and_insert().
Probably they should be renamed:

maps__insert -> maps__insert_unsafe
maps__fixup_overlap_and_insert -> maps__insert