2023-11-22 20:03:45

by Chun Ng

[permalink] [raw]
Subject: [REGRESSION]: mmap performance regression starting with k-6.1

Hi,

Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.

Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.

The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call.? Measured time of a sample run with different vanilla kernel versions are:
k-5.10 and k-6.0: ~157us
k-6.1: ~194us
k-6.7: ~214us
Results are pretty consistent across multiple runs with a small percentage variance.? Ftrace shows that latency of mmap_region(...) has increased since k-6.1.??An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.

Please find the ftrace results and kernel config files in this folder:
https://drive.google.com/drive/folders/1qy8YTBqxu8Gdbs7IigYbSd4FXldId5sd?usp=drive_link

The test program can be found in here:
https://drive.google.com/file/d/1tG6_BbQMCHwfKebvAIAg_xqbM_lpPcuM/view?usp=sharing

Info on the testing environment:
cpufreq_governor: performance
Test machine: Raspberry Pi 4, 8GB DDR
SCHED_FIFO with priority 99 for running the test program

Vanilla kernels are not tainted. However on k-6.0 and k-6.7, I have to patch the drivers/clk/bcm/clk-raspberrypi.c file with the version in Raspberry Pi kernel tree for the CPU frequency governor to work.

Best,
Chun
[nvpublic]


2023-11-23 01:18:55

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:
> Hi,
>
> Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.
>
> Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.
>
> The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call.  Measured time of a sample run with different vanilla kernel versions are:
> k-5.10 and k-6.0: ~157us
> k-6.1: ~194us
> k-6.7: ~214us
> Results are pretty consistent across multiple runs with a small percentage variance.  Ftrace shows that latency of mmap_region(...) has increased since k-6.1.  An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.

Did you mean that v6.0 doesn't have this regression?

Confused...

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (1.08 kB)
signature.asc (235.00 B)
Download all attachments

2023-11-23 03:06:58

by Chun Ng

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

>> Did you mean that v6.0 doesn't have this regression?

No, k-6.0 does NOT have this regression. The regression starts from k-6.1.

Best,
Chun

From:?Bagas Sanjaya
Sent:?Wednesday, November 22, 2023 5:18 PM
To:?Chun Ng; Linux Kernel Mailing List
Cc:?Linux Regressions; Andrew Morton; Linux Memory Management List; Liam R. Howlett; Ankita Garg
Subject:?Re: [REGRESSION]: mmap performance regression starting with k-6.1


On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:

> Hi,

>

> Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.

>

> Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.

>

> The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call.? Measured time of a sample run with different vanilla kernel versions are:

> k-5.10 and k-6.0: ~157us

> k-6.1: ~194us

> k-6.7: ~214us

> Results are pretty consistent across multiple runs with a small percentage variance.? Ftrace shows that latency of mmap_region(...) has increased since k-6.1.??An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.



Did you mean that v6.0 doesn't have this regression?



Confused...



--

An old man doll... just what I always wanted! - Clara

2023-11-23 05:04:32

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On 11/23/23 10:06, Chun Ng wrote:
>>> Did you mean that v6.0 doesn't have this regression?
>
> No, k-6.0 does NOT have this regression. The regression starts from k-6.1.
>

Thanks for confirmation.

--
An old man doll... just what I always wanted! - Clara

2023-11-23 05:07:57

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:
> Hi,
>
> Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.
>
> Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.
>
> The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call.  Measured time of a sample run with different vanilla kernel versions are:
> k-5.10 and k-6.0: ~157us
> k-6.1: ~194us
> k-6.7: ~214us
> Results are pretty consistent across multiple runs with a small percentage variance.  Ftrace shows that latency of mmap_region(...) has increased since k-6.1.  An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.
>
> Please find the ftrace results and kernel config files in this folder:
> https://drive.google.com/drive/folders/1qy8YTBqxu8Gdbs7IigYbSd4FXldId5sd?usp=drive_link
>
> The test program can be found in here:
> https://drive.google.com/file/d/1tG6_BbQMCHwfKebvAIAg_xqbM_lpPcuM/view?usp=sharing
>
> Info on the testing environment:
> cpufreq_governor: performance
> Test machine: Raspberry Pi 4, 8GB DDR
> SCHED_FIFO with priority 99 for running the test program
>
> Vanilla kernels are not tainted. However on k-6.0 and k-6.7, I have to patch the drivers/clk/bcm/clk-raspberrypi.c file with the version in Raspberry Pi kernel tree for the CPU frequency governor to work.
>

The next step is to find the commit that introduces your regression with
`git bisect`. If you haven't done so, see
Documentation/admin-guide/bug-bisect.rst for instructions.

Anyway, I'm adding this regression to regzbot:

#regzbot ^introduced: v6.0..v6.1

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (1.96 kB)
signature.asc (235.00 B)
Download all attachments

2023-11-23 08:33:16

by David Wang

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

Hi,

Just contribute some information I recently collected for this thread:

I run some profiler, and it shows fundamental difference between 6.0 and 6.1:


v6.0:
```
__x64_sys_munmap(60.544% 6474/10693)
__vm_munmap(98.749% 6393/6474)
__do_munmap(97.982% 6264/6393)
__split_vma(53.975% 3381/6264)
vm_area_dup(59.036% 1996/3381)
__vma_adjust(32.121% 1086/3381)
anon_vma_clone(5.915% 200/3381)
vma_dup_policy(0.769% 26/3381)
unmap_region(16.699% 1046/6264)
find_vma(10.361% 649/6264)
remove_vma(7.822% 490/6264)
percpu_counter_add_batch(2.011% 126/6264)
__vma_rb_erase(1.405% 88/6264)
userfaultfd_unmap_prep(0.798% 50/6264)
downgrade_write(0.511% 32/6264)
```

v6.1:
```
__x64_sys_munmap(68.024% 24741/36371)
__vm_munmap(99.681% 24662/24741)
do_mas_munmap(99.015% 24419/24662)
do_mas_align_munmap(98.243% 23990/24419)
__split_vma(58.966% 14146/23990)
__vma_adjust(83.755% 11848/14146)
vm_area_dup(13.191% 1866/14146)
anon_vma_clone(2.050% 290/14146)
vma_dup_policy(0.254% 36/14146)
mas_store_prealloc(11.709% 2809/23990)
mas_preallocate(9.579% 2298/23990)
unmap_region(5.523% 1325/23990)
```

v6.1 introduce the maple tree data structure, and mmap/munmap performance started to degrade since.
Base on the observation, I tested two commit:
9832fb87834e2bd925d30020962c81b05948fa7b GOOD (Same as v6.0, about 20seonds) (This is before "Patch series "Introducing the Maple Tree")
11f9a21ab65542189372b7d64bb2d2937dfdc9dc BAD (about 51seconds) (This one is somewhere middle in the path series for maple tree.)
While with v6.1, the test run about 56 seconds

For v6.7, profiler show further fundimental changes, some vmi stuff, and preformance is worse (~70 seconds).
```
__x64_sys_munmap(63.873% 30725/48103)
__vm_munmap(99.456% 30558/30725)
do_vmi_munmap(97.670% 29846/30558)
do_vmi_align_munmap(97.196% 29009/29846)
__split_vma(63.701% 18479/29009)
vma_complete(34.417% 6360/18479)
vm_area_dup(33.681% 6224/18479)
mas_preallocate(11.835% 2187/18479)
down_write(5.173% 956/18479)
up_write(3.815% 705/18479)
asm_sysvec_apic_timer_interrupt(1.153% 213/18479)
anon_vma_clone(0.974% 180/18479)
vma_adjust_trans_huge(0.622% 115/18479)
mas_next_slot(0.498% 92/18479)
vma_dup_policy(0.465% 86/18479)
vma_prepare(0.357% 66/18479)
srso_return_thunk(0.336% 62/18479)
mas_find(0.114% 21/18479)
unmap_region.constprop.0(12.196% 3538/29009)
mas_store_gfp(10.548% 3060/29009)
__call_rcu_common.constprop.0(1.992% 578/29009)
```

I use following test code, and timed it
```
#define MAXN 1024
struct { void* addr; size_t n; } maps[MAXN];
int main() {
int i, n, k, r;
void *p;
for (i=0; i<MAXN; i++) {
n = 1024*((rand()%32)+1);
p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("fail to mmap");
return -1;
}
maps[i].addr = p;
maps[i].n = n;

}
for (i=0; i<10000000; i++) {
k = rand()%MAXN;
r = munmap(maps[k].addr, maps[k].n);
if (r) {
perror("fail to munmap");
return -1;
}
n = 1024*((rand()%32)+1);
p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("fail to mmap");
return -1;
}
maps[k].addr = p;
maps[k].n = n;
}
for (i=0; i<MAXN; i++) munmap(maps[i].addr, maps[i].n);
return 0;
}
```

Thanks
David Wang

2023-11-23 14:36:47

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

* Bagas Sanjaya <[email protected]> [231123 00:07]:
> On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:
> > Hi,
> >
> > Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.
> >
> > Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.

This is almost certainly the maple tree. The tree is slower on writes
than the rbtree and so if the benchmark mmaps/munmaps in a tight loop
you will see this slow down. What you are doing is measuring the speed
of inserting and removing a VMA with this benchmark, so it's not really
something that happens - we usually use the mapping between adding and
removing it.

What this gains us is the ability to remove contention on the mmap lock
during page faults. If you were to test contention around that lock,
you will see a slowdown until you reach v6.4, where per-vma locking
started to show up. More benchmarking will show different types of
fault handling outside of the mmap lock until (I believe) 6.6, where
most (or all?) types are supported.

Although this is expected, I am still looking to reduce any real
workloads that may suffer. I've been reducing the allocations, for
example.

> >
> > The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call.? Measured time of a sample run with different vanilla kernel versions are:
> > k-5.10 and k-6.0: ~157us
> > k-6.1: ~194us
> > k-6.7: ~214us

I would have expected v6.7 to remain closer to v6.1, but that may depend
on the minor versions you have been testing and what fixes have landed
there.


> > Results are pretty consistent across multiple runs with a small percentage variance.? Ftrace shows that latency of mmap_region(...) has increased since k-6.1.??An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.
> >
> > Please find the ftrace results and kernel config files in this folder:
> > https://drive.google.com/drive/folders/1qy8YTBqxu8Gdbs7IigYbSd4FXldId5sd?usp=drive_link
> >
> > The test program can be found in here:
> > https://drive.google.com/file/d/1tG6_BbQMCHwfKebvAIAg_xqbM_lpPcuM/view?usp=sharing
> >
> > Info on the testing environment:
> > cpufreq_governor: performance
> > Test machine: Raspberry Pi 4, 8GB DDR
> > SCHED_FIFO with priority 99 for running the test program
> >
> > Vanilla kernels are not tainted. However on k-6.0 and k-6.7, I have to patch the drivers/clk/bcm/clk-raspberrypi.c file with the version in Raspberry Pi kernel tree for the CPU frequency governor to work.
> >
>
> The next step is to find the commit that introduces your regression with
> `git bisect`. If you haven't done so, see
> Documentation/admin-guide/bug-bisect.rst for instructions.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot ^introduced: v6.0..v6.1
>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara


2023-11-23 15:38:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot ^introduced: v6.0..v6.1

this is not a regression. close it, you idiot.


2023-11-24 00:19:54

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Thu, Nov 23, 2023 at 03:38:18PM +0000, Matthew Wilcox wrote:
> On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> > Anyway, I'm adding this regression to regzbot:
> >
> > #regzbot ^introduced: v6.0..v6.1
>
> this is not a regression. close it, you idiot.
>
>

why?

Confused...

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (375.00 B)
signature.asc (235.00 B)
Download all attachments

2023-11-24 01:04:45

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Fri, Nov 24, 2023 at 07:15:34AM +0700, Bagas Sanjaya wrote:
> On Thu, Nov 23, 2023 at 03:38:18PM +0000, Matthew Wilcox wrote:
> > On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> > > Anyway, I'm adding this regression to regzbot:
> > >
> > > #regzbot ^introduced: v6.0..v6.1
> >
> > this is not a regression. close it, you idiot.
> >
> >
>
> why?
>
> Confused...

yes. you're perpetually confused. stop trying to help, you only make
things worse. learn about the things you work on, or give up.

Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 23.11.23 15:34, Liam R. Howlett wrote:
> * Bagas Sanjaya <[email protected]> [231123 00:07]:
>> On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:
>>>
>>> Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.
>>>
>>> Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.
>
> This is almost certainly the maple tree. The tree is slower on writes
> than the rbtree and so if the benchmark mmaps/munmaps in a tight loop
> you will see this slow down. [...]
>
>> Anyway, I'm adding this regression to regzbot:
>> #regzbot ^introduced: v6.0..v6.1

Liam, many thx for your reply. I known that you are still working on
optimizing things in this area again, so I don't think this is worth
tracking this as a regression: that doesn't buy us much afaics. And it
might not be a regression at all anyway (not totally sure, didn't look
into the details due to the former aspect; sounded a bit like the
problem only can be seen in a microbenchmark; whatever).

#regzbot resolve: not worth tracking

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-11-24 11:52:34

by Greg KH

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Fri, Nov 24, 2023 at 01:04:28AM +0000, Matthew Wilcox wrote:
> On Fri, Nov 24, 2023 at 07:15:34AM +0700, Bagas Sanjaya wrote:
> > On Thu, Nov 23, 2023 at 03:38:18PM +0000, Matthew Wilcox wrote:
> > > On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> > > > Anyway, I'm adding this regression to regzbot:
> > > >
> > > > #regzbot ^introduced: v6.0..v6.1
> > >
> > > this is not a regression. close it, you idiot.
> > >
> > >
> >
> > why?
> >
> > Confused...
>
> yes. you're perpetually confused. stop trying to help, you only make
> things worse. learn about the things you work on, or give up.

Um, is this really called for? Bagas is trying to help track
regressions, and if something isn't a regression like you say here, a
simple "This is not a regression, it's already resolved in newer
kernels" is fine.

Resorting to insults on the reporter is not ok.

greg k-h

2023-11-24 15:07:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Fri, Nov 24, 2023 at 11:52:06AM +0000, Greg KH wrote:
> On Fri, Nov 24, 2023 at 01:04:28AM +0000, Matthew Wilcox wrote:
> > On Fri, Nov 24, 2023 at 07:15:34AM +0700, Bagas Sanjaya wrote:
> > > On Thu, Nov 23, 2023 at 03:38:18PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> > > > > Anyway, I'm adding this regression to regzbot:
> > > > >
> > > > > #regzbot ^introduced: v6.0..v6.1
> > > >
> > > > this is not a regression. close it, you idiot.
> > > >
> > > >
> > >
> > > why?
> > >
> > > Confused...
> >
> > yes. you're perpetually confused. stop trying to help, you only make
> > things worse. learn about the things you work on, or give up.
>
> Um, is this really called for? Bagas is trying to help track
> regressions, and if something isn't a regression like you say here, a
> simple "This is not a regression, it's already resolved in newer
> kernels" is fine.

Bagas has a long history of inappropriately attempting to "help" and due
to a lack of understanding wasting peoples time. He's not too dissimilar
to the various wrongbots we've had over the years, including Richard B
Johnson, Markus Elfring, etc. I've tried to help guide him towards being
a more productive contributor, but'have failed. Mostly I ignore him now,
but when he's instructing a bot to harass me, that crosses a line.

2023-11-24 15:14:52

by Greg KH

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Fri, Nov 24, 2023 at 03:06:44PM +0000, Matthew Wilcox wrote:
> On Fri, Nov 24, 2023 at 11:52:06AM +0000, Greg KH wrote:
> > On Fri, Nov 24, 2023 at 01:04:28AM +0000, Matthew Wilcox wrote:
> > > On Fri, Nov 24, 2023 at 07:15:34AM +0700, Bagas Sanjaya wrote:
> > > > On Thu, Nov 23, 2023 at 03:38:18PM +0000, Matthew Wilcox wrote:
> > > > > On Thu, Nov 23, 2023 at 12:07:40PM +0700, Bagas Sanjaya wrote:
> > > > > > Anyway, I'm adding this regression to regzbot:
> > > > > >
> > > > > > #regzbot ^introduced: v6.0..v6.1
> > > > >
> > > > > this is not a regression. close it, you idiot.
> > > > >
> > > > >
> > > >
> > > > why?
> > > >
> > > > Confused...
> > >
> > > yes. you're perpetually confused. stop trying to help, you only make
> > > things worse. learn about the things you work on, or give up.
> >
> > Um, is this really called for? Bagas is trying to help track
> > regressions, and if something isn't a regression like you say here, a
> > simple "This is not a regression, it's already resolved in newer
> > kernels" is fine.
>
> Bagas has a long history of inappropriately attempting to "help" and due
> to a lack of understanding wasting peoples time. He's not too dissimilar
> to the various wrongbots we've had over the years, including Richard B
> Johnson, Markus Elfring, etc. I've tried to help guide him towards being
> a more productive contributor, but'have failed. Mostly I ignore him now,
> but when he's instructing a bot to harass me, that crosses a line.

Nope, still not justification to lash out at an individual, sorry.

Please be more careful, and kind, in the future.

greg k-h

Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On 24.11.23 16:06, Matthew Wilcox wrote:
> Mostly I ignore him now, but when he's instructing a bot
> to harass me, that crosses a line.

I'm curious: How is regzbot able to harass you? It as of now and likely
for at least another year is not able to send mails on its own -- by
design, as I wanted to ensure it doesn't harass anyone.

Sure, I might manually send a mail if something looks stalled in
regzbot. But before I do that I always do a sanity check to avoid
annoying people. Do I sometimes make mistakes or miss something in that
process? Sure. But that happens to all of us.

Ciao, Thorsten

2023-11-26 07:20:29

by David Wang

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1


> What this gains us is the ability to remove contention on the mmap lock
> during page faults. If you were to test contention around that lock,
> you will see a slowdown until you reach v6.4, where per-vma locking
> started to show up. More benchmarking will show different types of
> fault handling outside of the mmap lock until (I believe) 6.6, where
> most (or all?) types are supported.

I add memory access between mmap and munmap to the simple stress, and timeit.
Following are the numbers measuring total system time for 10,000,000 rounds, it is not very good for 6.7.0-rc2
(The delta column is just "page fault" - "no page fault", roughly the extra time
needed in kernel to deal with page fault, I guess.)

+-----------+------------+---------------+------------------------------------+
| | page fault | no page fault | delta(kernel time for page fault?) |
+-----------+------------+---------------+------------------------------------+
| 6.0.0 | 64s | 13s | 51s |
| 6.1.0 | 104s | 49s | 55s |
| 6.7.0-rc2 | ~210s | 67s | 143s |
+-----------+------------+---------------+------------------------------------+

Maybe there is something here needed to be tracked.

My test code now is:

#define MAXN 1024
struct { void* addr; size_t n; } maps[MAXN];
void accessit(char *addr, size_t n) {
for (int i=0; i<n; i+=128) addr[i]=i;
}
int main() {
int i, n, k, r;
void *p;
for (i=0; i<MAXN; i++) {
n = 1024*((rand()%32)+1);
p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("fail to mmap");
return -1;
}
maps[i].addr = p;
maps[i].n = n;

}
for (i=0; i<10000000; i++) {
k = rand()%MAXN;
#ifdef PAGE_FAULT
accessit((char*)maps[k].addr, maps[k].n);
#endif
r = munmap(maps[k].addr, maps[k].n);
if (r) {
perror("fail to munmap");
return -1;
}
n = 1024*((rand()%32)+1);
p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("fail to mmap");
return -1;
}
maps[k].addr = p;
maps[k].n = n;
}
for (i=0; i<MAXN; i++) munmap(maps[i].addr, maps[i].n);
return 0;
}

Thanks
David Wang

2023-11-26 13:58:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1

On Sun, Nov 26, 2023 at 03:18:54PM +0800, David Wang wrote:
> I add memory access between mmap and munmap to the simple stress, and timeit.

It's still not a very good benchmark ...

> My test code now is:
>
> #define MAXN 1024
> struct { void* addr; size_t n; } maps[MAXN];
> void accessit(char *addr, size_t n) {
> for (int i=0; i<n; i+=128) addr[i]=i;
> }
> int main() {
> int i, n, k, r;
> void *p;
> for (i=0; i<MAXN; i++) {
> n = 1024*((rand()%32)+1);
> p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

So 'n' is now a number between 1kB and 32kB. That's not terribly
realistic; I'd say you want to be more like

n = 4096 * ((rand() % 512) + 1));

> for (i=0; i<10000000; i++) {
> k = rand()%MAXN;
> #ifdef PAGE_FAULT
> accessit((char*)maps[k].addr, maps[k].n);
> #endif
> r = munmap(maps[k].addr, maps[k].n);
> if (r) {
> perror("fail to munmap");
> return -1;
> }
> n = 1024*((rand()%32)+1);
> p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Are you simulating something a real application actually does?
Because this all seems very weird and micro-benchmark to me. The real
applications we've benchmarked see a speedup so I'm not thrilled about
chasing down something that no real application does.

In terms of what's going on in the kernel, for each loop, you're calling
munmap(), taking between 1 and 8 page faults, then calling mmap().
That may just be too few page faults to see the benefit of the maple tree.