Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
Improve __maps__fixup_overlap_and_insert's performance 21x in the case
of overlapping mmaps. [email protected] reported slowness opening
perf.data files from chromium where the files contained a large number
of overlapping mappings. Improve this case primarily by avoiding
unnecessary sorting.
Unscientific timing data processing a perf.data file with overlapping
mmap events from chromium:
Before:
real 0m9.856s
user 0m9.637s
sys 0m0.204s
After:
real 0m0.675s
user 0m0.454s
sys 0m0.196s
Tested with address/leak sanitizer, invariant checks and validating
the before and after output are identical.
Ian Rogers (3):
perf maps: Fix use after free in __maps__fixup_overlap_and_insert
perf maps: Reduce sorting for overlapping mappings
perf maps: Add/use a sorted insert for fixup overlap and insert
tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
1 file changed, 92 insertions(+), 21 deletions(-)
--
2.45.0.rc1.225.g2a3ae87e7f-goog
In the case 'before' and 'after' are broken out from pos,
maps_by_address may be changed by __maps__insert, as such it needs
re-reading.
Don't ignore the return value from __maps_insert.
Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 16b39db594f4..eaada3e0f5b4 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -741,7 +741,6 @@ static unsigned int first_ending_after(struct maps *maps, const struct map *map)
*/
static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
{
- struct map **maps_by_address;
int err = 0;
FILE *fp = debug_file();
@@ -749,12 +748,12 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
if (!maps__maps_by_address_sorted(maps))
__maps__sort_by_address(maps);
- maps_by_address = maps__maps_by_address(maps);
/*
* Iterate through entries where the end of the existing entry is
* greater-than the new map's start.
*/
for (unsigned int i = first_ending_after(maps, new); i < maps__nr_maps(maps); ) {
+ struct map **maps_by_address = maps__maps_by_address(maps);
struct map *pos = maps_by_address[i];
struct map *before = NULL, *after = NULL;
@@ -821,8 +820,10 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
/* Maps are still ordered, go to next one. */
i++;
if (after) {
- __maps__insert(maps, after);
+ err = __maps__insert(maps, after);
map__put(after);
+ if (err)
+ goto out_err;
if (!maps__maps_by_address_sorted(maps)) {
/*
* Sorting broken so invariants don't
@@ -851,7 +852,7 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
check_invariants(maps);
}
/* Add the map. */
- __maps__insert(maps, new);
+ err = __maps__insert(maps, new);
out_err:
return err;
}
--
2.45.0.rc1.225.g2a3ae87e7f-goog
When an 'after' map is generated the 'new' map must be before it so
terminate iterating and don't resort. If the entry 'pos' is entirely
overlapped by the 'new' mapping then don't remove and insert the
mapping, just replace - again to remove sorting.
For a perf report on a perf.data file containing overlapping mappings
the time numbers are:
Before:
real 0m9.856s
user 0m9.637s
sys 0m0.204s
After:
real 0m5.894s
user 0m5.650s
sys 0m0.231s
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 55 +++++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index eaada3e0f5b4..f6b6df82f4cf 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -744,7 +744,6 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
int err = 0;
FILE *fp = debug_file();
-sort_again:
if (!maps__maps_by_address_sorted(maps))
__maps__sort_by_address(maps);
@@ -820,36 +819,54 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
/* Maps are still ordered, go to next one. */
i++;
if (after) {
- err = __maps__insert(maps, after);
- map__put(after);
- if (err)
- goto out_err;
- if (!maps__maps_by_address_sorted(maps)) {
- /*
- * Sorting broken so invariants don't
- * hold, sort and go again.
- */
- goto sort_again;
- }
/*
- * Maps are still ordered, skip after and go to
- * next one (terminate loop).
+ * 'before' and 'after' mean 'new' split the
+ * 'pos' mapping and therefore there are no
+ * later mappings.
*/
- i++;
+ err = __maps__insert(maps, new);
+ if (!err)
+ err = __maps__insert(maps, after);
+ map__put(after);
+ check_invariants(maps);
+ return err;
}
+ check_invariants(maps);
} else if (after) {
+ /*
+ * 'after' means 'new' split 'pos' and there are no
+ * later mappings.
+ */
map__put(maps_by_address[i]);
- maps_by_address[i] = after;
- /* Maps are ordered, go to next one. */
- i++;
+ maps_by_address[i] = map__get(new);
+ err = __maps__insert(maps, after);
+ map__put(after);
+ check_invariants(maps);
+ return err;
} else {
+ struct map *next = NULL;
+
+ if (i + 1 < maps__nr_maps(maps))
+ next = maps_by_address[i + 1];
+
+ if (!next || map__start(next) >= map__end(new)) {
+ /*
+ * Replace existing mapping and end knowing
+ * there aren't later overlapping or any
+ * mappings.
+ */
+ map__put(maps_by_address[i]);
+ maps_by_address[i] = map__get(new);
+ check_invariants(maps);
+ return err;
+ }
__maps__remove(maps, pos);
+ check_invariants(maps);
/*
* Maps are ordered but no need to increase `i` as the
* later maps were moved down.
*/
}
- check_invariants(maps);
}
/* Add the map. */
err = __maps__insert(maps, new);
--
2.45.0.rc1.225.g2a3ae87e7f-goog
On Tue, May 21, 2024 at 9:51 AM Ian Rogers <[email protected]> wrote:
>
> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
>
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. [email protected] reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
>
> Unscientific timing data processing a perf.data file with overlapping
> mmap events from chromium:
>
> Before:
> real 0m9.856s
> user 0m9.637s
> sys 0m0.204s
>
> After:
> real 0m0.675s
> user 0m0.454s
> sys 0m0.196s
>
> Tested with address/leak sanitizer, invariant checks and validating
> the before and after output are identical.
>
> Ian Rogers (3):
> perf maps: Fix use after free in __maps__fixup_overlap_and_insert
> perf maps: Reduce sorting for overlapping mappings
> perf maps: Add/use a sorted insert for fixup overlap and insert
Ping. Thanks,
Ian
> tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 92 insertions(+), 21 deletions(-)
>
> --
> 2.45.0.rc1.225.g2a3ae87e7f-goog
>
On 21/05/2024 17:51, Ian Rogers wrote:
> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
>
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. [email protected] reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
>
> Unscientific timing data processing a perf.data file with overlapping
> mmap events from chromium:
>
> Before:
> real 0m9.856s
> user 0m9.637s
> sys 0m0.204s
>
> After:
> real 0m0.675s
> user 0m0.454s
> sys 0m0.196s
>
> Tested with address/leak sanitizer, invariant checks and validating
> the before and after output are identical.
>
> Ian Rogers (3):
> perf maps: Fix use after free in __maps__fixup_overlap_and_insert
> perf maps: Reduce sorting for overlapping mappings
> perf maps: Add/use a sorted insert for fixup overlap and insert
>
> tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 92 insertions(+), 21 deletions(-)
>
Reviewed-by: James Clark <[email protected]>
I'm wondering if there is any point in the non sorted insert any more?
Maps could be always either sorted by name or sorted by address and
insert() is always a sorted/fixup-overlaps insert depending on the sort
style of the list.
On Thu, Jun 6, 2024 at 3:56 AM James Clark <[email protected]> wrote:
>
>
>
> On 21/05/2024 17:51, Ian Rogers wrote:
> > Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
> >
> > Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> > of overlapping mmaps. [email protected] reported slowness opening
> > perf.data files from chromium where the files contained a large number
> > of overlapping mappings. Improve this case primarily by avoiding
> > unnecessary sorting.
> >
> > Unscientific timing data processing a perf.data file with overlapping
> > mmap events from chromium:
> >
> > Before:
> > real 0m9.856s
> > user 0m9.637s
> > sys 0m0.204s
> >
> > After:
> > real 0m0.675s
> > user 0m0.454s
> > sys 0m0.196s
> >
> > Tested with address/leak sanitizer, invariant checks and validating
> > the before and after output are identical.
> >
> > Ian Rogers (3):
> > perf maps: Fix use after free in __maps__fixup_overlap_and_insert
> > perf maps: Reduce sorting for overlapping mappings
> > perf maps: Add/use a sorted insert for fixup overlap and insert
> >
> > tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
> > 1 file changed, 92 insertions(+), 21 deletions(-)
> >
>
> Reviewed-by: James Clark <[email protected]>
>
> I'm wondering if there is any point in the non sorted insert any more?
>
> Maps could be always either sorted by name or sorted by address and
> insert() is always a sorted/fixup-overlaps insert depending on the sort
> style of the list.
One of the motivations for the sorted array, instead of an rbtree, was
to simplify reference count checking. We're in much better shape with
that these days, I think the worst "memory leak" is the dsos holding
onto a dso and its symbols indefinitely, instead of removing older
unused dsos. It'd be hard to go back to an rb tree with reference
counting.
For the rbtree insert it was O(log N), the sorted insert these changes
add is O(N) and the regular insert without sorting is O(1). A common
case from scanning /proc/pid/maps is that when map entries are
inserted they are inserted in order. The O(1) insert checks that and
maintains that the maps are sorted.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/maps.c?h=perf-tools-next#n469
For the synthesized maps we should be able to insert N map entries in
O(N). The sorted insert would be similar as the memmoves would always
copy 0 elements. So I can see an argument for keeping the array always
sorted. For the perf.data files we commonly process synthesized mmaps
dominate. In the case for this patch, JIT data dominated, with
frequent overlapping mapping changes. I guess the biggest hurdle to
just always keeping things sorted is the mental hurdle of ignoring the
worst insert complexity, which should never apply given how the maps
are commonly built.
Thanks,
Ian
On Tue, 21 May 2024 09:51:06 -0700, Ian Rogers wrote:
> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
>
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. [email protected] reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
>
> [...]
Applied to perf-tools-next, thanks!
Best regards,
Namhyung