2004-04-13 07:40:03

by Martin J. Bligh

[permalink] [raw]
Subject: Benchmarking objrmap under memory pressure

UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
then did "make clean; make vmlinux; make clean". Then I timed a
"make -j 256 vmlinux" to get some testing under mem pressure.

I was trying to test the overhead of objrmap under memory pressure,
but it seems it's actually distinctly negative overhead - rather pleasing
really ;-)

2.6.5
225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps

2.6.5-anon_mm
224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps

2.6.5-aa5
224.35user 27.47system 5:35.09elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (28675major+2589179minor)pagefaults 0swaps


2004-04-13 07:51:33

by Andrew Morton

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

"Martin J. Bligh" <[email protected]> wrote:
>
> UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
> then did "make clean; make vmlinux; make clean". Then I timed a
> "make -j 256 vmlinux" to get some testing under mem pressure.
>
> I was trying to test the overhead of objrmap under memory pressure,
> but it seems it's actually distinctly negative overhead - rather pleasing
> really ;-)
>
> 2.6.5
> 225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
>
> 2.6.5-anon_mm
> 224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps

A four second reduction in system time caused a one minute reduction in
runtime? Pull the other one ;)

Average of five runs, please...

2004-04-13 07:55:57

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

--Andrew Morton <[email protected]> wrote (on Tuesday, April 13, 2004 00:51:11 -0700):

> "Martin J. Bligh" <[email protected]> wrote:
>>
>> UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
>> then did "make clean; make vmlinux; make clean". Then I timed a
>> "make -j 256 vmlinux" to get some testing under mem pressure.
>>
>> I was trying to test the overhead of objrmap under memory pressure,
>> but it seems it's actually distinctly negative overhead - rather pleasing
>> really ;-)
>>
>> 2.6.5
>> 225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
>>
>> 2.6.5-anon_mm
>> 224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps
>
> A four second reduction in system time caused a one minute reduction in
> runtime? Pull the other one ;)

Look at the cpu percentage though. I presume it's blocked itself on disk IO.
Possibly because the space overhead of pte_chains causes more mem pressure.

> Average of five runs, please...

Maybe in the morning ;-). I mean ... a sensible time later this morning ;-)

M.

2004-04-13 21:59:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Tue, Apr 13, 2004 at 12:51:11AM -0700, Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
> >
> > UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
> > then did "make clean; make vmlinux; make clean". Then I timed a
> > "make -j 256 vmlinux" to get some testing under mem pressure.
> >
> > I was trying to test the overhead of objrmap under memory pressure,
> > but it seems it's actually distinctly negative overhead - rather pleasing
> > really ;-)
> >
> > 2.6.5
> > 225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
> >
> > 2.6.5-anon_mm
> > 224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps
>
> A four second reduction in system time caused a one minute reduction in
> runtime? Pull the other one ;)
>
> Average of five runs, please...

at the very least the 6 seconds difference on a ~6 minutes load between
anonvma and anonmm sounds smaller than the measurement error generated
by disk seeks for a swapping workload, so yes, I'd like to see all 5
runs (not just the average).

As for the difference between 2.6.5 and 2.6.5-anonvma, that might be the
memory saved by the removal of rmap that in turn reduces the swap-I/O
required to complete the load or something like that, so that one may
not be a measurement error but just the benefit of anon-vma or anonmm,
but for a 6 seconds difference during a swap load I'd definitely like to
see the 5 runs.

Thanks!

2004-04-14 00:26:42

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

>> UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
>> then did "make clean; make vmlinux; make clean". Then I timed a
>> "make -j 256 vmlinux" to get some testing under mem pressure.
>>
>> I was trying to test the overhead of objrmap under memory pressure,
>> but it seems it's actually distinctly negative overhead - rather pleasing
>> really ;-)
>>
>> 2.6.5
>> 225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
>>
>> 2.6.5-anon_mm
>> 224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps
>
> A four second reduction in system time caused a one minute reduction in
> runtime? Pull the other one ;)
>
> Average of five runs, please...

You're right - it's rather variable. Still doesn't look bad though.

2.6.5
Average elapsed = 6:11
224.92user 30.15system 5:44.19elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
225.04user 30.23system 6:02.49elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
225.28user 29.60system 5:48.22elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
225.81user 31.75system 6:42.38elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
225.23user 30.20system 6:40.48elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k

2.6.5-anon_mm
Average elapsed = 5:43
224.34user 25.43system 4:51.23elapsed 85%CPU (0avgtext+0avgdata 0maxresident)k
224.23user 25.93system 5:00.79elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
224.39user 26.36system 5:37.71elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
225.65user 27.13system 6:28.00elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
225.14user 27.26system 6:39.61elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k

I've kicked off the -aa tree tests - will post them later tonight.

2004-04-14 16:28:00

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Tue, Apr 13, 2004 at 05:38:02PM -0700, Martin J. Bligh wrote:
> >> UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
> >> then did "make clean; make vmlinux; make clean". Then I timed a
> >> "make -j 256 vmlinux" to get some testing under mem pressure.
> >>
> >> I was trying to test the overhead of objrmap under memory pressure,
> >> but it seems it's actually distinctly negative overhead - rather pleasing
> >> really ;-)
> >>
> >> 2.6.5
> >> 225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
> >> 0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
> >>
> >> 2.6.5-anon_mm
> >> 224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
> >> 0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps
> >
> > A four second reduction in system time caused a one minute reduction in
> > runtime? Pull the other one ;)
> >
> > Average of five runs, please...
>
> You're right - it's rather variable. Still doesn't look bad though.
>
> 2.6.5
> Average elapsed = 6:11
> 224.92user 30.15system 5:44.19elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
> 225.04user 30.23system 6:02.49elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
> 225.28user 29.60system 5:48.22elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
> 225.81user 31.75system 6:42.38elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
> 225.23user 30.20system 6:40.48elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
>
> 2.6.5-anon_mm
> Average elapsed = 5:43
> 224.34user 25.43system 4:51.23elapsed 85%CPU (0avgtext+0avgdata 0maxresident)k
> 224.23user 25.93system 5:00.79elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
> 224.39user 26.36system 5:37.71elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
> 225.65user 27.13system 6:28.00elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
> 225.14user 27.26system 6:39.61elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
>
> I've kicked off the -aa tree tests - will post them later tonight.

As expected the 6 second difference was nothing compared the the noise,
though I'd be curious to see an average number.

the degradation of runtimes is interesting, runtimes should go downs not
up after more unused stuff is pushed into swap and so more ram is free
at every new start of the workload.

BTW, I've no idea idea why you used an UP machine for this, (plus if you
can load kde on it it'd be better because kde is extremely smart at
optimizing the ram usage with cow anonymous memory, the thing anon-vma
can optimize and anonmm not, plus kde may use even mremap on this
anonymous ram, and the very single reason it was impossible for me to
take anonmm in production is that there's no way I can preodict which
critical app is using mremap on anonymous COW memory to save ram). You
definitely should use your 32-way booted with mem=512m to run this test
or there's no way you'll ever botice the additional boost in scalability
that anon-vma provides compared to anonmm, and that anonmm will never be
able to reach.

2004-04-14 16:42:38

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

> As expected the 6 second difference was nothing compared the the noise,
> though I'd be curious to see an average number.

Yeah, I don't think either is worse or better - I really want a more stable
test though, if I can find one.

> the degradation of runtimes is interesting, runtimes should go downs not
> up after more unused stuff is pushed into swap and so more ram is free
> at every new start of the workload.

Yeah, that's odd.

> BTW, I've no idea idea why you used an UP machine for this, (plus if you

Because it's frigging hard to make a 16GB machine swap ;-) 'twas just my
desktop.

> critical app is using mremap on anonymous COW memory to save ram). You
> definitely should use your 32-way booted with mem=512m to run this test
> or there's no way you'll ever botice the additional boost in scalability
> that anon-vma provides compared to anonmm, and that anonmm will never be
> able to reach.

Yeah, it's hard to do mem= on NUMA, but I have a patch from someone
somehwere. Those machines don't tend to swap heavily anyway, but I suppose
page reclaim in general will happen.

M.

2004-04-14 17:11:29

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Wed, Apr 14, 2004 at 09:42:24AM -0700, Martin J. Bligh wrote:
> > As expected the 6 second difference was nothing compared the the noise,
> > though I'd be curious to see an average number.
>
> Yeah, I don't think either is worse or better - I really want a more stable
> test though, if I can find one.

a test involving less tasks and that cannot take any advantage from the
cache and the age information should be more stable, though I don't have
obvious suggestions.

> Yeah, that's odd.

I just wonder the VM needs a bit of fixing besides the rmap removal, or
if it was just a pure concidence. if it happens again in the -aa pass
too then it may not be a conincidence.

> Because it's frigging hard to make a 16GB machine swap ;-) 'twas just my
> desktop.

mem= should fix the problem for the benchmarking ;)

swapping in general is important for 16GB 32-way too (and that's the
thing that 2.4 mainline cannot swap efficiently, and that's why I had to
add objrmap in 2.4-aa too).

> Yeah, it's hard to do mem= on NUMA, but I have a patch from someone
> somehwere. Those machines don't tend to swap heavily anyway, but I suppose
> page reclaim in general will happen.

I see what you mean with mem= being troublesome, I forgot you're numa=y,
you can either disable numa temporarily, or use the more complex syntax
that you should find in arch/i386/kernel/setup.c, that should work w/o
kernel changes and w/o patches since it simply trimes the e820 map,
everything else numa is built on top of that map, you've just to give
an hundred megs from the start of every node, and hopefully it'll boot ;).

2004-04-14 17:48:49

by Hugh Dickins

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Wed, 14 Apr 2004, Andrea Arcangeli wrote:
>
> BTW, I've no idea idea why you used an UP machine for this, (plus if you
> can load kde on it it'd be better because kde is extremely smart at
> optimizing the ram usage with cow anonymous memory, the thing anon-vma
> can optimize and anonmm not, plus kde may use even mremap on this
> anonymous ram, and the very single reason it was impossible for me to
> take anonmm in production is that there's no way I can preodict which
> critical app is using mremap on anonymous COW memory to save ram). You
> definitely should use your 32-way booted with mem=512m to run this test
> or there's no way you'll ever botice the additional boost in scalability
> that anon-vma provides compared to anonmm, and that anonmm will never be
> able to reach.

This is just your guess at present, isn't it, Andrea? Any evidence?

Our current intention is to merge anonmm into mainline in a day or two.
The current consensus (in your absence!) seemed to be that anonmm is
likely to be good enough, no obvious need to go beyond it.

We'll happily replace it with anon_vma once we see the practical
problems which anon_vma solves and anonmm cannot, so long as the
greater cost of anon_vma (in complexity, memory usage, and vma
merge limitations) is worth it. Can happen just days later,
but would need evidence.

Hugh

2004-04-14 18:10:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
>
>>UP Athlon 2100+ with 512Mb of RAM. Rebooted clean before each test
>>then did "make clean; make vmlinux; make clean". Then I timed a
>>"make -j 256 vmlinux" to get some testing under mem pressure.
>>
>>I was trying to test the overhead of objrmap under memory pressure,
>>but it seems it's actually distinctly negative overhead - rather pleasing
>>really ;-)
>>
>>2.6.5
>>225.18user 30.05system 6:33.72elapsed 64%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (37590major+2604444minor)pagefaults 0swaps
>>
>>2.6.5-anon_mm
>>224.53user 26.00system 5:29.08elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (29127major+2577211minor)pagefaults 0swaps
>
>
> A four second reduction in system time caused a one minute reduction in
> runtime? Pull the other one ;)

I was looking at the pagefault counts, myself. I'd like to see disk io
counts for each run, that sometimes brings enlightenment. Maybe do 20
sec counts with diorate or some such.
(http://pages.prodigy.net/davidsen/ if you don't have your own favorite
tool)
>
> Average of five runs, please...

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-04-14 23:43:52

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Wed, Apr 14, 2004 at 06:48:40PM +0100, Hugh Dickins wrote:
> This is just your guess at present, isn't it, Andrea? Any evidence?

the evidence is pretty obvious, the single fact it's painful to remove
the page_table_lock with anonmm around the vma manipulations, and the
little benefit that the vma->page_table_lock provides with anonmm is
quite a tangible measurements, I'm talking about the 256 ways here, any
UP measurements is pretty useless.

Last but not the least, you cannot know if any important app is going to
be hurted with mremap doing copies and invalidating important
optimizations for any application doing similar things that kde is doing
to save memory and speedup startup times (we don't even know yet if kde
itself is going to be hurted), you can take these risks with mainline, I
cannot risk with -aa, and anon-vma provides other minor benefits too
that we already discussed plus the IMHO important scalability point above.

So I don't see why should mainline go with an inferior solution when
I've already sorted out a better one.

2004-04-15 10:21:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, 15 Apr 2004, Andrea Arcangeli wrote:
> On Wed, Apr 14, 2004 at 06:48:40PM +0100, Hugh Dickins wrote:
> > This is just your guess at present, isn't it, Andrea? Any evidence?
>
> the evidence is pretty obvious, the single fact it's painful to remove
> the page_table_lock with anonmm around the vma manipulations, and the
> little benefit that the vma->page_table_lock provides with anonmm is
> quite a tangible measurements, I'm talking about the 256 ways here, any
> UP measurements is pretty useless.

Quite possibly. Quite possibly not (anonmm can perfectly well use a
different lock than the page_table_lock to make find_vma safe against
try_to_unmap, if page_table_lock too contended, or split into vmas).

A good hypothesis for you to base your design on. But the evidence
is yet to come. My own bet is that it will make very little difference
to 256-way performance, whether anonmm or anon_vma: that their issues
will be with the file-based.

> Last but not the least, you cannot know if any important app is going to
> be hurted with mremap doing copies and invalidating important
> optimizations for any application doing similar things that kde is doing
> to save memory and speedup startup times (we don't even know yet if kde
> itself is going to be hurted), you can take these risks with mainline, I
> cannot risk with -aa, and anon-vma provides other minor benefits too
> that we already discussed plus the IMHO important scalability point above.

You're on shakier ground there.

The worst that will happen with anonmm's mremap move, is that some
app might go slower and need more swap. Unlikely, but agreed possible.

In your case, some app may actually break (I was going to say
mysteriously, but that's unfair, ptrace should quickly identify it):
because of your limitations on anon vma merging, and the way mremap
is only allowed on a single vma. Again, unlikely, but possible.

I'm sure the right answer to that is to fix mremap to work across vmas:
it's unique and wrong to be letting the kernel implementation detail of
vma percolate up to userspace semantics. But, on the other hand,
I'm glad of any excuse not to have to go in there and fix it!

(Of course, your file vma merging will make some mremaps
possible which were ruled out before: that's nice.)

> So I don't see why should mainline go with an inferior solution when
> I've already sorted out a better one.

That's your opinion, fine, but we've not yet seen the evidence.

(I was, of course, quite mistaken to say "mainline in a day or two":
2.6.6-rc1 does now have some of the preparatory work, e.g. PageAnon
and swp_entry_t in private, applicable to either of our solutions.
But the real changes are going into -mm, and whether and when and
which proceed from there to mainline is up to Linus and Andrew.)

Hugh

2004-04-15 13:22:57

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, Apr 15, 2004 at 11:21:30AM +0100, Hugh Dickins wrote:
> On Thu, 15 Apr 2004, Andrea Arcangeli wrote:
> > On Wed, Apr 14, 2004 at 06:48:40PM +0100, Hugh Dickins wrote:
> > > This is just your guess at present, isn't it, Andrea? Any evidence?
> >
> > the evidence is pretty obvious, the single fact it's painful to remove
> > the page_table_lock with anonmm around the vma manipulations, and the
> > little benefit that the vma->page_table_lock provides with anonmm is
> > quite a tangible measurements, I'm talking about the 256 ways here, any
> > UP measurements is pretty useless.
>
> Quite possibly. Quite possibly not (anonmm can perfectly well use a
> different lock than the page_table_lock to make find_vma safe against
> try_to_unmap, if page_table_lock too contended, or split into vmas).

"use another global mm wide lock" == less scalable

> A good hypothesis for you to base your design on. But the evidence
> is yet to come. My own bet is that it will make very little difference
> to 256-way performance, whether anonmm or anon_vma: that their issues
> will be with the file-based.

it depends on the workload.

> The worst that will happen with anonmm's mremap move, is that some
> app might go slower and need more swap. Unlikely, but agreed possible.

this is exactly the point and I cannot take that risk in -aa, perod,
taking a dozen more mbytes of ram with some dozen kde task in a desktop
machine would be an huge penalty, the kde people worked hard to make the
library design possible and save ram with anonymous COW memory whenever
possible, I cannot risk to screw their effort, plus I've absolutely no
idea if any big proprietary app may be hurted too, and I cannot take
that risk either, the 12 bytes per vma are nothing compared a potential
invalidation of cow through mremap, plus the anon-vma provided other
advantages as well.

> In your case, some app may actually break (I was going to say
> mysteriously, but that's unfair, ptrace should quickly identify it):
> because of your limitations on anon vma merging, and the way mremap
> is only allowed on a single vma. Again, unlikely, but possible.

any application relaying on the vma merging to make mremap work is
definitely totally broken period, no need to argue further about this
point. the vma merging is a "best effort" provided by the VM, it's
something userspace must not be aware about since it can change over
time, and it's not available at all with mlock and with older 2.4
kernels. so you're definitely wrong claiming I take any risk with
mremap, plus I'm actually _fixing_ mremap by disabling the vma merging
there, and that's the primary reason I disabled the vma merging, the vma
merging of mremap has never been able to retire correctly, so if there
would be an oom failure during a pagetable allocation the vma merging in
2.4 the extension generated by the vma merging wouldn't be retired
correctly. I didn't fix that in 2.4 because it's not possible to
reproduce an oom failure exactly in the move_page_tables and secondly
because it's not exploitable anyways even if it triggers the oom failure
there, unless you also can join this kernel bug with an userspace bug,
so the probability is near zero and even in the extremely unlikely case
it's not exploitable remotely, but with anon-vma I made the probability
zero by disabling the merging in 2.6, until it gets implemented properly
(maybe we should disable it in 2.4 too and go 100% safe, but I didn't
bother to disable it in 2.4).

as for mprotect you're right it can in some extremely unlikely case
merge less, but that's the feature, so the swapping will be more
efficient, and the locking scalable too during paging. It's a trade-off
and I think it makes lots of sense. If an application wants merging it
should create adiacent mappings in the first place. Secondly if an
applications creates an hole inside an anonymous mapping I still
guarantee that the gap-fill will be merged 100%. So I think this is an
unrelistic problem, something I don't need to worry about, unlike the
mremap cow-break that anonmm might introduce in some app.

> I'm sure the right answer to that is to fix mremap to work across vmas:

the limitation on the mremap on a single vma is stupid indeed, it's just
that the implementation was lazy, that needs fixing anyways eventually,
but it's not a problem at all as far as anon-vma is concerned, in no way
an application can relay on the vma merging to do mremap or mlock and
older 2.4 kernels would break too.

> it's unique and wrong to be letting the kernel implementation detail of
> vma percolate up to userspace semantics. But, on the other hand,
> I'm glad of any excuse not to have to go in there and fix it!

I definitely don't need to go in there and fix it either.

> (Of course, your file vma merging will make some mremaps
> possible which were ruled out before: that's nice.)

yes, that's a major benefit, doing vma merging with file mappings is a
lot more important than for anonymous ram, most people only uses
mprotect to switch the write bit on the vma before/after using some
MAP_SHARED segment, if a bug accours while they don't use the mapping
they won't risk to corrupt the data. That's a very common behaviour for
big apps and it has never been possible to merge until now in anon-vma.

But this is pretty orthgonal with the anon-vma vs anonmm comparison, if
you're ok to deal with the anon-vma complexity you can merge this bit on
top of anonmm too, the compexity of doing anon-vma merging is the same
one of doing the inode-vma merging.

2004-04-15 13:45:28

by Hugh Dickins

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, 15 Apr 2004, Andrea Arcangeli wrote:
>
> yes, that's a major benefit, doing vma merging with file mappings is a
> lot more important than for anonymous ram, most people only uses
> mprotect to switch the write bit on the vma before/after using some
> MAP_SHARED segment, if a bug accours while they don't use the mapping
> they won't risk to corrupt the data. That's a very common behaviour for
> big apps and it has never been possible to merge until now in anon-vma.

I like file vma merging, but I am puzzled why we (you) bothered
to implement anon vma merging before and not file vma merging,
if the file vma merging is so much more important. I suppose
it's something you learnt later, or the apps evolved.

> But this is pretty orthgonal with the anon-vma vs anonmm comparison, if
> you're ok to deal with the anon-vma complexity you can merge this bit on
> top of anonmm too, the compexity of doing anon-vma merging is the same
> one of doing the inode-vma merging.

Indeed. If anonmm does live on, I would want to add the file
vma merging; but when things (mpol, prio_tree, i_shared locking)
have settled down rather than now - we've lived without it for
some years, can live without it for a few weeks more.

Hugh

2004-04-15 14:08:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, Apr 15, 2004 at 02:45:21PM +0100, Hugh Dickins wrote:
> I like file vma merging, but I am puzzled why we (you) bothered
> to implement anon vma merging before and not file vma merging,
> if the file vma merging is so much more important. I suppose
> it's something you learnt later, or the apps evolved.

both things, I learnt it later, but it's also because the anon-vma
pretty much "forced" me not to be lazy about the inodes ;). It's
something I planned already when I changed the mmap mering to handle
inodes too and submitted to Andrew that merged it in 2.5 mainline, the
only reason I didn't do it at that time, is that it was originally
developed for 2.4, and the less changes the better, so at that time I
only fixed the showstopper inode-merging for mmap and not mprotect.

At 2.4 time I also planned eventually to add the vma merging to mlock
but it didn't happen yet ;).

> Indeed. If anonmm does live on, I would want to add the file
> vma merging; but when things (mpol, prio_tree, i_shared locking)
> have settled down rather than now - we've lived without it for
> some years, can live without it for a few weeks more.

Sure.

2004-04-15 16:22:54

by Bill Davidsen

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

Hugh Dickins wrote:

>>Last but not the least, you cannot know if any important app is going to
>>be hurted with mremap doing copies and invalidating important
>>optimizations for any application doing similar things that kde is doing
>>to save memory and speedup startup times (we don't even know yet if kde
>>itself is going to be hurted), you can take these risks with mainline, I
>>cannot risk with -aa, and anon-vma provides other minor benefits too
>>that we already discussed plus the IMHO important scalability point above.
>
>
> You're on shakier ground there.
>
> The worst that will happen with anonmm's mremap move, is that some
> app might go slower and need more swap. Unlikely, but agreed possible.

It appears that users on small memory machines running kde are not of
concern to you. Unfortunately that describes a fair number of people,
not everyone has the big memory fast system. I will try to get some
reproducible numbers, but "consistently feels faster" is a reason to
keep running -aa even if I can't quantify it.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-04-15 16:49:05

by Hugh Dickins

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, 15 Apr 2004, Bill Davidsen wrote:
> Hugh Dickins wrote:
> >
> > The worst that will happen with anonmm's mremap move, is that some
> > app might go slower and need more swap. Unlikely, but agreed possible.
>
> It appears that users on small memory machines running kde are not of
> concern to you. Unfortunately that describes a fair number of people,
> not everyone has the big memory fast system. I will try to get some
> reproducible numbers, but "consistently feels faster" is a reason to
> keep running -aa even if I can't quantify it.

Appearances can be deceptive. Of course I care about users,
of small or large memory machines, running kde or not.

It appears that you do not understand that we're talking about a
case so rare that we've never seen it in practice, only by testing.

But perhaps we haven't looked out for it enough (no printk), I'd better
put something in to tell us when it does occur, thanks for the reminder.

If -aa consistently feels faster to you, great, go with it:
but I doubt it's because of this issue we're discussing!

Hugh

2004-04-22 19:58:38

by Bill Davidsen

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

Hugh Dickins wrote:
> On Thu, 15 Apr 2004, Bill Davidsen wrote:
>
>>Hugh Dickins wrote:
>>
>>>The worst that will happen with anonmm's mremap move, is that some
>>>app might go slower and need more swap. Unlikely, but agreed possible.
>>
>>It appears that users on small memory machines running kde are not of
>>concern to you. Unfortunately that describes a fair number of people,
>>not everyone has the big memory fast system. I will try to get some
>>reproducible numbers, but "consistently feels faster" is a reason to
>>keep running -aa even if I can't quantify it.
>
>
> Appearances can be deceptive. Of course I care about users,
> of small or large memory machines, running kde or not.
>
> It appears that you do not understand that we're talking about a
> case so rare that we've never seen it in practice, only by testing.
>
> But perhaps we haven't looked out for it enough (no printk), I'd better
> put something in to tell us when it does occur, thanks for the reminder.
>
> If -aa consistently feels faster to you, great, go with it:
> but I doubt it's because of this issue we're discussing!

I don't disagree on that, but it seems that KDE developers have put some
serious effort into making the software well-behaved, and unless there
is some measurable benefit from the code which negates the benefits of
that effort, it seems desirable to appreciate code code by letting it work.

I was more commenting on the good performance at the bottom end than
addressing the large machines. All the big machines I have are in the
overkill range, and finding small benefits doesn't do much with
production loads, so I can't contribute any useful info at that end.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-04-22 21:26:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: Benchmarking objrmap under memory pressure

On Thu, 22 Apr 2004, Bill Davidsen wrote:
>
> I don't disagree on that, but it seems that KDE developers have put some
> serious effort into making the software well-behaved, and unless there
> is some measurable benefit from the code which negates the benefits of
> that effort, it seems desirable to appreciate code code by letting it work.

2.6.6-rc2-mm1 does now have a "cmd: mremap moved N cows" kernel warning
of this inefficiency. Please let us know if you see it in your log/dmesg,
when running KDE or whatever. One sighting of 49 cows in xterm so far.

Thanks,
Hugh