This patch series follows up on the patch [1] to improve the sorting and
merging of kcore maps.
Since the kcore maps are not sorted, merging them into the kernel maps
causes difficulty, e.g. some kcore maps might be ignored. This is why
the dso__load_kcore() function handles the kernel text section
particularly for replacement a complete kernel section.
This patch sorts the kcore maps and ensures the subset region is placed
ahead of the superset region in the list. With this change, merging
these maps becomes easier - no need the special handling for the kernel
text section.
This patch series is based on the latest acme's perf-tool-next branch
and tested on Arm64 Hikey960 board.
[1] https://lore.kernel.org/linux-perf-users/[email protected]/T/#m7c86a69d43103cd0cb446b0993e47c36df0f40f2
Leo Yan (2):
perf maps: Sort kcore maps
perf maps: Remove the replacement of kernel map
tools/perf/util/symbol.c | 117 +++++++++++++++++++--------------------
1 file changed, 57 insertions(+), 60 deletions(-)
--
2.34.1
When merging kcore maps into the kernel maps, it has an implicit
requirement for the kcore maps ordering, otherwise, some sections
delivered by the kcore maps will be ignored.
Let's see below example:
Ordering 1:
Kcore maps | Start address | End address
-----------------+--------------------+--------------------
kcore_text | 0xffff800080000000 | 0xffff8000822f0000
vmalloc | 0xffff800080000000 | 0xfffffdffbf800000
Ordering 2:
Kcore maps | Start address | End address
-----------------+--------------------+--------------------
vmalloc | 0xffff800080000000 | 0xfffffdffbf800000
kcore_text | 0xffff800080000000 | 0xffff8000822f0000
The 'kcore_text' map is a subset of the 'vmalloc' map. When merging
these two maps into the kernal maps with the maps__merge_in() function,
the ordering 1 inserts the 'kcore_text' map prior to the 'vmalloc' map,
thus the 'kcore_text' map will be respected. On the other hand, if maps
are inserted with the ordering 2, the 'vmalloc' is inserted ahead, as a
result, its subset map will be ignored afterwards.
To merge the maps in a reliable way, this commit sorts the kcore maps
before merging them. Besides sorting the maps based on the end address,
it also gives the priority to a subset map to insert it before its
superset map in the list.
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/symbol.c | 50 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9e5940b5bc59..c1513976ab6e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1256,6 +1256,7 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
{
struct kcore_mapfn_data *md = data;
struct map_list_node *list_node = map_list_node__new();
+ struct map_list_node *node;
if (!list_node)
return -ENOMEM;
@@ -1269,8 +1270,55 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
map__set_end(list_node->map, map__start(list_node->map) + len);
map__set_pgoff(list_node->map, pgoff);
- list_add(&list_node->node, &md->maps);
+ list_for_each_entry(node, &md->maps, node) {
+ /*
+ * When the new map (list_node)'s end address is less than
+ * current node, it can be divided into three cases.
+ *
+ * Case 1: the new map does not overlap with the current node,
+ * as the new map's end address is less than the current node's
+ * start address.
+ * [*******node********]
+ * [***list_node***] `start `end
+ * `start `end
+ *
+ * Case 2: the new map overlaps with the current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ *
+ * Case 3: the new map is subset of the current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ *
+ * For above three cases, insert the new map node before the
+ * current node.
+ */
+ if (map__end(node->map) > map__end(list_node->map))
+ break;
+
+ /*
+ * When the new map is subset of the current node and both nodes
+ * have the same end address, insert the new map node before the
+ * current node.
+ *
+ * ,start ,end
+ * [*******node********]
+ * [***list_node***]
+ * `start `end
+ */
+ if ((map__end(node->map) == map__end(list_node->map)) &&
+ (map__start(node->map) <= map__start(list_node->map)))
+ break;
+ }
+ /* Insert the new node (list_node) ahead */
+ list_add_tail(&list_node->node, &node->node);
return 0;
}
--
2.34.1
Now the kcore maps are organized in ordering, the kernel text section
will be respected when merging into the kernel maps. As a result, it is
no need to replace the kernel text section specially.
This commit removes the code for replacement of the kernel text section.
With this change, the final maps has a minor difference - it separates
the kernel map [_stext.._end] into two sections:
[_stext.._edata] and [_edata.._end]
As result, it still keeps the intact info for the kernel regions.
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/symbol.c | 67 +++++-----------------------------------
1 file changed, 8 insertions(+), 59 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index c1513976ab6e..0712d3a6893f 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1338,12 +1338,10 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
{
struct maps *kmaps = map__kmaps(map);
struct kcore_mapfn_data md;
- struct map *map_ref, *replacement_map = NULL;
struct machine *machine;
bool is_64_bit;
int err, fd;
char kcore_filename[PATH_MAX];
- u64 stext;
if (!kmaps)
return -EINVAL;
@@ -1388,52 +1386,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
maps__remove_maps(kmaps, remove_old_maps, map);
machine->trampolines_mapped = false;
- /* Find the kernel map using the '_stext' symbol */
- if (!kallsyms__get_function_start(kallsyms_filename, "_stext", &stext)) {
- u64 replacement_size = 0;
- struct map_list_node *new_node;
-
- list_for_each_entry(new_node, &md.maps, node) {
- struct map *new_map = new_node->map;
- u64 new_size = map__size(new_map);
-
- if (!(stext >= map__start(new_map) && stext < map__end(new_map)))
- continue;
-
- /*
- * On some architectures, ARM64 for example, the kernel
- * text can get allocated inside of the vmalloc segment.
- * Select the smallest matching segment, in case stext
- * falls within more than one in the list.
- */
- if (!replacement_map || new_size < replacement_size) {
- replacement_map = new_map;
- replacement_size = new_size;
- }
- }
- }
-
- if (!replacement_map)
- replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
-
- /*
- * Update addresses of vmlinux map. Re-insert it to ensure maps are
- * correctly ordered. Do this before using maps__merge_in() for the
- * remaining maps so vmlinux gets split if necessary.
- */
- map_ref = map__get(map);
- maps__remove(kmaps, map_ref);
-
- map__set_start(map_ref, map__start(replacement_map));
- map__set_end(map_ref, map__end(replacement_map));
- map__set_pgoff(map_ref, map__pgoff(replacement_map));
- map__set_mapping_type(map_ref, map__mapping_type(replacement_map));
-
- err = maps__insert(kmaps, map_ref);
- map__put(map_ref);
- if (err)
- goto out_err;
-
/* Add new maps */
while (!list_empty(&md.maps)) {
struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
@@ -1441,17 +1393,14 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
list_del_init(&new_node->node);
- /* skip if replacement_map, already inserted above */
- if (!RC_CHK_EQUAL(new_map, replacement_map)) {
- /*
- * Merge kcore map into existing maps,
- * and ensure that current maps (eBPF)
- * stay intact.
- */
- if (maps__merge_in(kmaps, new_map)) {
- err = -EINVAL;
- goto out_err;
- }
+ /*
+ * Merge kcore map into existing maps,
+ * and ensure that current maps (eBPF)
+ * stay intact.
+ */
+ if (maps__merge_in(kmaps, new_map)) {
+ err = -EINVAL;
+ goto out_err;
}
free(new_node);
}
--
2.34.1
On Wed, May 15, 2024 at 2:11 PM Leo Yan <[email protected]> wrote:
>
> This patch series follows up on the patch [1] to improve the sorting and
> merging of kcore maps.
>
> Since the kcore maps are not sorted, merging them into the kernel maps
> causes difficulty, e.g. some kcore maps might be ignored. This is why
> the dso__load_kcore() function handles the kernel text section
> particularly for replacement a complete kernel section.
>
> This patch sorts the kcore maps and ensures the subset region is placed
> ahead of the superset region in the list. With this change, merging
> these maps becomes easier - no need the special handling for the kernel
> text section.
>
> This patch series is based on the latest acme's perf-tool-next branch
> and tested on Arm64 Hikey960 board.
>
> [1] https://lore.kernel.org/linux-perf-users/[email protected]/T/#m7c86a69d43103cd0cb446b0993e47c36df0f40f2
Thanks Leo, testing this change on perf-tools-next with an x86 debian
laptop I see:
```
$ perf test 24 -v
24: Object code reading:
--- start ---
test child forked, pid 3407499
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
symsrc__init: cannot get elf header.
overlapping maps in [kernel.kallsyms] (disable tui for more info)
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
Parsing event 'cycles'
Using CPUID GenuineIntel-6-8D-1
mmap size 528384B
Reading object code for memory address: 0xfffffffface8d64a
File is: /proc/kcore
On file address is: 0xfffffffface8d64a
dso__data_read_offset failed
---- end(-1) ----
24: Object code reading : FAILED!
```
The test passes without the changes. Let me know if you need me to dig deeper.
Thanks,
Ian
> Leo Yan (2):
> perf maps: Sort kcore maps
> perf maps: Remove the replacement of kernel map
>
> tools/perf/util/symbol.c | 117 +++++++++++++++++++--------------------
> 1 file changed, 57 insertions(+), 60 deletions(-)
>
> --
> 2.34.1
>
Hi Ian,
On 5/15/24 22:39, Ian Rogers wrote:
[...]
> Thanks Leo, testing this change on perf-tools-next with an x86 debian
> laptop I see:
> ```
> $ perf test 24 -v
> 24: Object code reading:
> --- start ---
> test child forked, pid 3407499
> Looking at the vmlinux_path (8 entries long)
> symsrc__init: build id mismatch for vmlinux.
> symsrc__init: cannot get elf header.
> overlapping maps in [kernel.kallsyms] (disable tui for more info)
> Using /proc/kcore for kernel data
> Using /proc/kallsyms for symbols
> Parsing event 'cycles'
> Using CPUID GenuineIntel-6-8D-1
> mmap size 528384B
> Reading object code for memory address: 0xfffffffface8d64a
> File is: /proc/kcore
> On file address is: 0xfffffffface8d64a
> dso__data_read_offset failed
> ---- end(-1) ----
> 24: Object code reading : FAILED!
> ```
> The test passes without the changes. Let me know if you need me to dig deeper.
I can reproduce this test failure at my side. I will look into it.
Thanks for testing!
Leo
On 5/15/24 22:39, Ian Rogers wrote:
[...]
> Thanks Leo, testing this change on perf-tools-next with an x86 debian
> laptop I see:
> ```
> $ perf test 24 -v
> 24: Object code reading:
> --- start ---
> test child forked, pid 3407499
> Looking at the vmlinux_path (8 entries long)
> symsrc__init: build id mismatch for vmlinux.
> symsrc__init: cannot get elf header.
> overlapping maps in [kernel.kallsyms] (disable tui for more info)
> Using /proc/kcore for kernel data
> Using /proc/kallsyms for symbols
> Parsing event 'cycles'
> Using CPUID GenuineIntel-6-8D-1
> mmap size 528384B
> Reading object code for memory address: 0xfffffffface8d64a
> File is: /proc/kcore
> On file address is: 0xfffffffface8d64a
> dso__data_read_offset failed
> ---- end(-1) ----
> 24: Object code reading : FAILED!
> ```
> The test passes without the changes. Let me know if you need me to dig deeper.
Just update for the test failure.
The failure is caused by the memory map's attribute is not updated after
I removed the map replacement related code in patch 02. This means the
old (identical) kernel map (generated by '/proc/kallsyms') still keeps
some stale info for 'pgoff', 'mapping_type', etc.
To fix this issue, in the new patch, it removes the old kernel map (to
avoid any stale info) and update 'machine->vmlinux_map'.
Just one concern. I saw the kcore exposes kernel text section which is
dependent on the CONFIG_ARCH_PROC_KCORE_TEXT config, but some arches
(e.g. PowerPC, etc) are absent for this config. But you can see even the
current code doesn't handle this case, if the replacement map is not
found, it is arbitrarily assigned to the first map in kcore [1]. An
option is to rollback to use the old identical kernel map, this is not
addressed in my latest patch, which is simply report failure in this
case. Please review it and let me know how you think.
Thanks,
Leo
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/symbol.c?h=v6.9#n1366