2002-08-09 21:36:53

by Andrew Morton

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

Linus Torvalds wrote:
>
> ...
> Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> it.

The most glaring problem has been the fork/exec/exit overhead.

Anton had a program which did 10,000 forks and we were looking at
the time it took for them all to exit. Initial rmap slowed the exitting
by 400%, and we now have that down to 70%.

I've been treating a gcc configure script as the most forky workload
which we're likely to care about. rmap slowed configure down by 7%
and the work Daniel and I have done has reduced that to 2.8%.

(Not that rmap is the biggest problem for configure:

c013c07c 176 1.93046 __page_add_rmap
c013c194 225 2.46792 __page_remove_rmap
c012a274 236 2.58857 free_one_pgd
c012a7f8 405 4.44225 __constant_c_and_count_memset
c01055fc 917 10.0581 poll_idle
c012a6cc 1253 13.7436 __constant_memcpy

It's that i387 struct copy.)

There don't seem to be any catastrophic failure modes here, and
I expect tests could be concocted against the virtual scan which
_do_ have gross performance problems.

So. Not great, but OK if the reverse map gives us something back.
And I don't agree that the quality of page replacement is all too
hard to measure. It's just that nobody has got off their butt
and tried to measure it.

The other worry is the ZONE_NORMAL space consumption of pte_chains.
We've halved that, but it will still make high sharing levels
unfeasible on the big ia32 machines. We are dependant upon large
pages to solve that problem. (Resurrection of pte_highmem is in
progress, but it doesn't work yet).

I don't see a sufficient case for reverting rmap at present, and
it's time to move on with other work. There is nothing in the
queue at present which _requires_ rmap, so if we do hit a
showstopper then going back to a virtual scan will be feasible
for at least the next month.

Two points:

1) It would be most useful to have *some* damn test on the table
which works better with 2.4-rmap, along with a believable
description of why it's better.

2) If would be most irritating to reach 2.6.5 before discovering
that there is some terrible resource consumption problem
arising from the reverse map. Now is a good time for people
with large machines to be testing 2.5, please. This is
happening, and I expect we'll be in better shape in a month
or so.


2002-08-10 18:29:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

Andrew Morton <[email protected]> writes:
>
> The other worry is the ZONE_NORMAL space consumption of pte_chains.
> We've halved that, but it will still make high sharing levels
> unfeasible on the big ia32 machines. We are dependant upon large
> pages to solve that problem. (Resurrection of pte_highmem is in
> progress, but it doesn't work yet).

There is a second method to address this. Pages can be swapped out
of the page tables and still remain in the page cache, the virtual
scan does this all of the time. This should allow for arbitrary
amounts of sharing. There is some overhead, in faulting the pages
back in but it is much better than cases that do not work. A simple
implementation would have a maximum pte_chain length.

For any page that is not backed by anonymous memory we do not need to
keep the pte entries after the page has been swapped of the page
table. Which should show a reduction in page table size. In a highly
shared setting with anonymous pages it is likely worth it to promote
those pages to being posix shared memory.

All of the above should allow us to keep a limit on the amount of
resources that go towards sharing, reducing the need for something
like pte_highmem, and keeping memory pressure down in general.

For the cases you describe I have trouble seeing pte_highmem as
anything other than a performance optimization. Only placing shmem
direct and indirect entries in high memory or in swap can I see as
limit to feasibility.

Eric

2002-08-10 18:58:06

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 10 August 2002 20:20, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
> > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > We've halved that, but it will still make high sharing levels
> > unfeasible on the big ia32 machines. We are dependant upon large
> > pages to solve that problem. (Resurrection of pte_highmem is in
> > progress, but it doesn't work yet).
>
> There is a second method to address this. Pages can be swapped out
> of the page tables and still remain in the page cache, the virtual
> scan does this all of the time. This should allow for arbitrary
> amounts of sharing. There is some overhead, in faulting the pages
> back in but it is much better than cases that do not work. A simple
> implementation would have a maximum pte_chain length.

Oh gosh, nice point. We could put together a lovely cooked benchmark where
copy_page_range just fails to copy all the mmap pages, which are most of them
in the bash test.

--
Daniel

2002-08-10 19:52:22

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On 10 Aug 2002, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
> >
> > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > We've halved that, but it will still make high sharing levels
> > unfeasible on the big ia32 machines.

> There is a second method to address this. Pages can be swapped out
> of the page tables and still remain in the page cache, the virtual
> scan does this all of the time. This should allow for arbitrary
> amounts of sharing. There is some overhead, in faulting the pages
> back in but it is much better than cases that do not work. A simple
> implementation would have a maximum pte_chain length.

Indeed. We need this same thing for page tables too, otherwise
a high sharing situation can easily "require" more page table
memory than the total amount of physical memory in the system ;)

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-10 20:04:09

by Eric W. Biederman

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

Rik van Riel <[email protected]> writes:

> On 10 Aug 2002, Eric W. Biederman wrote:
> > Andrew Morton <[email protected]> writes:
> > >
> > > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > > We've halved that, but it will still make high sharing levels
> > > unfeasible on the big ia32 machines.
>
> > There is a second method to address this. Pages can be swapped out
> > of the page tables and still remain in the page cache, the virtual
> > scan does this all of the time. This should allow for arbitrary
> > amounts of sharing. There is some overhead, in faulting the pages
> > back in but it is much better than cases that do not work. A simple
> > implementation would have a maximum pte_chain length.
>
> Indeed. We need this same thing for page tables too, otherwise
> a high sharing situation can easily "require" more page table
> memory than the total amount of physical memory in the system ;)

It's exactly the same situation. To remove a pte from the chain you must
remove it from the page table as well. Then we just need to free
pages with no interesting pte entries.
Eric