... Why does the kernel page fault on text pages, present in the page cache,
when a program starts? Couldn't the pte's for text present in the page cache
be resolved when they're mapped to memory?
On Sat, 15 Mar 2003, Paul Albrecht wrote:
> ... Why does the kernel page fault on text pages, present in the page
> cache, when a program starts? Couldn't the pte's for text present in the
> page cache be resolved when they're mapped to memory?
The mmap() syscall only sets up the VMA info, it doesn't
fill in the page tables. That only happens when the process
page faults.
Note that filling in a bunch of page table entries mapping
already present pagecache pages at exec() time might be a
good idea. It's just that nobody has gotten around to that
yet...
On Sat, 15 Mar 2003, Paul Albrecht wrote:
>> ... Why does the kernel page fault on text pages, present in the page
>> cache, when a program starts? Couldn't the pte's for text present in the
>> page cache be resolved when they're mapped to memory?
On Mon, Mar 17, 2003 at 10:02:21AM -0500, Rik van Riel wrote:
> The mmap() syscall only sets up the VMA info, it doesn't
> fill in the page tables. That only happens when the process
> page faults.
> Note that filling in a bunch of page table entries mapping
> already present pagecache pages at exec() time might be a
> good idea. It's just that nobody has gotten around to that
> yet...
SVR4 did and saw an improvement wrt. page fault rate, according to
Vahalia.
I'd like to see whether this is useful for Linux.
-- wli
On Mon, 17 Mar 2003, William Lee Irwin III wrote:
> On Sat, 15 Mar 2003, Paul Albrecht wrote:
> >> ... Why does the kernel page fault on text pages, present in the page
> >> cache, when a program starts? Couldn't the pte's for text present in the
> >> page cache be resolved when they're mapped to memory?
>
> SVR4 did and saw an improvement wrt. page fault rate, according to
> Vahalia.
An improvement in the _page fault rate_, well DUH.
> I'd like to see whether this is useful for Linux.
The question is, does it result in an improvement in the
run speed of processes...
cheers,
Rik
>>>>> William Lee Irwin (WLI) writes:
WLI> On Mon, Mar 17, 2003 at 10:02:21AM -0500, Rik van Riel wrote:
>> The mmap() syscall only sets up the VMA info, it doesn't fill in
>> the page tables. That only happens when the process page faults.
>> Note that filling in a bunch of page table entries mapping already
>> present pagecache pages at exec() time might be a good idea. It's
>> just that nobody has gotten around to that yet...
WLI> SVR4 did and saw an improvement wrt. page fault rate, according
WLI> to Vahalia.
WLI> I'd like to see whether this is useful for Linux.
I tried this on dual P3 year and half ago and didn't see any improvement
On Mon, Mar 17, 2003 at 11:01:31AM -0500, Rik van Riel wrote:
> On Mon, 17 Mar 2003, William Lee Irwin III wrote:
> > On Sat, 15 Mar 2003, Paul Albrecht wrote:
> > >> ... Why does the kernel page fault on text pages, present in the page
> > >> cache, when a program starts? Couldn't the pte's for text present in the
> > >> page cache be resolved when they're mapped to memory?
> >
> > SVR4 did and saw an improvement wrt. page fault rate, according to
> > Vahalia.
>
> An improvement in the _page fault rate_, well DUH.
>
> > I'd like to see whether this is useful for Linux.
>
> The question is, does it result in an improvement in the
> run speed of processes...
>
> cheers,
>
> Rik
You should ask Andrew about his patch to do exactly that: he
forced all PROC_EXEC mmaps to be nonlinear-mapped and this
forced all programs to suck entire binaries into memory...
I recall he saw at least 25% improvement at launching gnome.
Andrew?
--
Antonio Vargas
>>>>> wind (w) writes:
w> On Mon, Mar 17, 2003 at 11:01:31AM -0500, Rik van Riel wrote:
>> On Mon, 17 Mar 2003, William Lee Irwin III wrote:
>> > On Sat, 15 Mar 2003, Paul Albrecht wrote:
>> > >> ... Why does the kernel page fault on text pages, present in
>> the page > >> cache, when a program starts? Couldn't the pte's for
>> text present in the > >> page cache be resolved when they're
>> mapped to memory?
>> >
w> You should ask Andrew about his patch to do exactly that: he
w> forced all PROC_EXEC mmaps to be nonlinear-mapped and this forced
w> all programs to suck entire binaries into memory... I recall he
w> saw at least 25% improvement at launching gnome.
they talked about pages _already present_ in pagecache.
On Mon, Mar 17, 2003 at 07:50:04PM +0300, Alex Tomas wrote:
> >>>>> wind (w) writes:
>
> w> On Mon, Mar 17, 2003 at 11:01:31AM -0500, Rik van Riel wrote:
> >> On Mon, 17 Mar 2003, William Lee Irwin III wrote:
> >> > On Sat, 15 Mar 2003, Paul Albrecht wrote:
> >> > >> ... Why does the kernel page fault on text pages, present in
> >> the page > >> cache, when a program starts? Couldn't the pte's for
> >> text present in the > >> page cache be resolved when they're
> >> mapped to memory?
> >> >
>
> w> You should ask Andrew about his patch to do exactly that: he
> w> forced all PROC_EXEC mmaps to be nonlinear-mapped and this forced
> w> all programs to suck entire binaries into memory... I recall he
> w> saw at least 25% improvement at launching gnome.
>
> they talked about pages _already present_ in pagecache.
I wonder if this could be done by walking and faulting
all pages at fs/binfmt_elf.c::elf_map just after do_mmap...
will try it just now :)
On Mon, Mar 17, 2003 at 06:12:46PM +0100, [email protected] wrote:
> On Mon, Mar 17, 2003 at 07:50:04PM +0300, Alex Tomas wrote:
> > >>>>> wind (w) writes:
> >
> > w> On Mon, Mar 17, 2003 at 11:01:31AM -0500, Rik van Riel wrote:
> > >> On Mon, 17 Mar 2003, William Lee Irwin III wrote:
> > >> > On Sat, 15 Mar 2003, Paul Albrecht wrote:
> > >> > >> ... Why does the kernel page fault on text pages, present in
> > >> the page > >> cache, when a program starts? Couldn't the pte's for
> > >> text present in the > >> page cache be resolved when they're
> > >> mapped to memory?
> > >> >
> >
> > w> You should ask Andrew about his patch to do exactly that: he
> > w> forced all PROC_EXEC mmaps to be nonlinear-mapped and this forced
> > w> all programs to suck entire binaries into memory... I recall he
> > w> saw at least 25% improvement at launching gnome.
> >
> > they talked about pages _already present_ in pagecache.
>
> I wonder if this could be done by walking and faulting
> all pages at fs/binfmt_elf.c::elf_map just after do_mmap...
> will try it just now :)
OK, this is not tested, since I'm compiling it now... feel free
to correct :)
On Mon, Mar 17, 2003 at 07:57:49PM +0100, Marc-Christian Petersen wrote:
> On Monday 17 March 2003 18:38, [email protected] wrote:
>
> Hi Wind,
>
> > > I wonder if this could be done by walking and faulting
> > > all pages at fs/binfmt_elf.c::elf_map just after do_mmap...
> > > will try it just now :)
> >
> > OK, this is not tested, since I'm compiling it now... feel free
> > to correct :)
>
> mm/mmap.c:
>
> unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned
> long len,
> unsigned long prot, unsigned long flags, unsigned long pgoff)
> {
>
> your "do_mmap_pgoff" calls 7 arguments. Obviously it cannot compile 8-)
>
My first patch, I'm just becoming intimate with printk ;)
On Monday 17 March 2003 18:38, [email protected] wrote:
Hi Wind,
> > I wonder if this could be done by walking and faulting
> > all pages at fs/binfmt_elf.c::elf_map just after do_mmap...
> > will try it just now :)
>
> OK, this is not tested, since I'm compiling it now... feel free
> to correct :)
mm/mmap.c:
unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned
long len,
unsigned long prot, unsigned long flags, unsigned long pgoff)
{
your "do_mmap_pgoff" calls 7 arguments. Obviously it cannot compile 8-)
ciao, Marc
On Mon, Mar 17, 2003 at 08:06:36PM +0100, [email protected] wrote:
> On Mon, Mar 17, 2003 at 07:57:49PM +0100, Marc-Christian Petersen wrote:
> > On Monday 17 March 2003 18:38, [email protected] wrote:
> >
> > Hi Wind,
> >
> > > > I wonder if this could be done by walking and faulting
> > > > all pages at fs/binfmt_elf.c::elf_map just after do_mmap...
> > > > will try it just now :)
> > >
> > > OK, this is not tested, since I'm compiling it now... feel free
> > > to correct :)
> >
> > mm/mmap.c:
> >
> > unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned
> > long len,
> > unsigned long prot, unsigned long flags, unsigned long pgoff)
> > {
> >
> > your "do_mmap_pgoff" calls 7 arguments. Obviously it cannot compile 8-)
> >
>
> My first patch, I'm just becoming intimate with printk ;)
OK, so I took a different approach, and just called handle_mm_fault just
like if there had been user-level accesses to the file.
Applied on 2.5.63-uml1 and booted debian woody with it.
Can any of you try it on a real machine? (I dont have a test machine :(
Greets, Antonio.
Alex Tomas <[email protected]> wrote:
>
> w> You should ask Andrew about his patch to do exactly that: he
> w> forced all PROC_EXEC mmaps to be nonlinear-mapped and this forced
> w> all programs to suck entire binaries into memory... I recall he
> w> saw at least 25% improvement at launching gnome.
>
> they talked about pages _already present_ in pagecache.
2.5.64-mm8 does that too. At mmap-time it will, for a PROT_EXEC mapping,
pull every affected page off disk and it will instantiate pte's against
them all via install_page().
So there should be zero major and minor faults against that mmap region
during application startup.
The improved IO layout appears to halve startup time for big things. I
haven't attempted to instrument the effects of the reduced minor fault rate.
If indeed the rate _has_ decreased. If it hasn't, it's a bug...
This is all a bit dubious for several reasons. Most particularly, the
up-front instantiation of the pages in pagetables makes unneeded pages harder
to reclaim. It would be really neat if someone could try putting the
madvise(MADV_WILLNEED) into glibc and test that. Maybe on a 2.4 kernel.
On Mon, Mar 17, 2003 at 02:05:06PM -0800, Andrew Morton wrote:
> Alex Tomas <[email protected]> wrote:
> >
> > w> You should ask Andrew about his patch to do exactly that: he
> > w> forced all PROC_EXEC mmaps to be nonlinear-mapped and this forced
> > w> all programs to suck entire binaries into memory... I recall he
> > w> saw at least 25% improvement at launching gnome.
> >
> > they talked about pages _already present_ in pagecache.
>
> 2.5.64-mm8 does that too. At mmap-time it will, for a PROT_EXEC mapping,
> pull every affected page off disk and it will instantiate pte's against
> them all via install_page().
>
> So there should be zero major and minor faults against that mmap region
> during application startup.
>
> The improved IO layout appears to halve startup time for big things. I
> haven't attempted to instrument the effects of the reduced minor fault rate.
> If indeed the rate _has_ decreased. If it hasn't, it's a bug...
>
>
>
> This is all a bit dubious for several reasons. Most particularly, the
> up-front instantiation of the pages in pagetables makes unneeded pages harder
> to reclaim. It would be really neat if someone could try putting the
> madvise(MADV_WILLNEED) into glibc and test that. Maybe on a 2.4 kernel.
something like this one?
[email protected] wrote:
>
> > This is all a bit dubious for several reasons. Most particularly, the
> > up-front instantiation of the pages in pagetables makes unneeded pages harder
> > to reclaim. It would be really neat if someone could try putting the
> > madvise(MADV_WILLNEED) into glibc and test that. Maybe on a 2.4 kernel.
>
>
> something like this one?
>
No, not at all. I meant a patch against glibc, not against the kernel!
Like this:
map = mmap(..., PROT_EXEC, ...);
+ if (getenv("MAP_PREFAULT"))
+ madvise(map, length, MADV_WILLNEED);
On Mon, Mar 17, 2003 at 03:28:55PM -0800, Andrew Morton wrote:
> [email protected] wrote:
> >
> > > This is all a bit dubious for several reasons. Most particularly, the
> > > up-front instantiation of the pages in pagetables makes unneeded pages harder
> > > to reclaim. It would be really neat if someone could try putting the
> > > madvise(MADV_WILLNEED) into glibc and test that. Maybe on a 2.4 kernel.
> >
> >
> > something like this one?
> >
>
> No, not at all. I meant a patch against glibc, not against the kernel!
>
> Like this:
>
> map = mmap(..., PROT_EXEC, ...);
> + if (getenv("MAP_PREFAULT"))
> + madvise(map, length, MADV_WILLNEED);
I know what you mean, but right now it's far easier hacking the
kernel than libc, at least if running a uml-kernel ;)
Anyways, I booted my patch but I don't know if it's working, because
I've got no test machine to try it on... but, it didn't freak out
so I think it works :)))
As for the libc patch, I think it can be easier to make an
exec-prefault.so library and LD_PRELOAD it, at least for testing
purposes.
If you could tell me the locking is right, I might try patching my
physical machine 2.4.19-ck4 with the kernel patch just to see if it
works.
On Monday, March 17, 2003 Rik van Riel wrote:
>
> The mmap() syscall only sets up the VMA info, it doesn't fill in the page
tables. That only happens when the process page faults.
>
> Note that filling in a bunch of page table entries mapping already present
pagecache pages at exec() time might be a good idea. It's just that nobody
has gotten around to that yet...
>
What doesn't make sense to me is that a program's working set isn't loaded
before it starts execution. Can the working set be approximated using the
address_space object? Then the kernel would know what pages should be
allocated when the text and data segments are memory mapped in binary load.