2010-08-10 19:59:13

by Micha Nelissen

[permalink] [raw]
Subject: Why is get_user_pages so slow?

Hi all,

Why is get_user_pages much slower than taking the faults? (I would
expect it to be faster).

Attached example program first mallocs a piece of memory (64MB in this
case) then reads it "to take the faults". Afterwards, it uses mmap with
MAP_POPULATE to "speed up" and not to have to take the faults, but have
everything mapped in one go. I think mmap is using get_user_pages in
this case.

$ ./memspeed
malloc took 0 msecs
read took 14 msecs
write took 0 msecs
free took 1 msecs
mmap took 45 msecs
munmap took 5 msecs

Using MAP_POPULATE is 3 times as slow as the 'stupid' implementation!
I'm running a Core 2 duo e6300 system with linux 2.6.28.4.

Am I doing something wrong? MAP_POPULATE seems a bit of a joke to me.

Thanks,

Micha


Attachments:
memspeed.c (1.55 kB)

2010-08-12 03:50:22

by Kevin Easton

[permalink] [raw]
Subject: Re: Why is get_user_pages so slow?

Quoting Micha Nelissen <[email protected]>:

> Hi all,
>
> Why is get_user_pages much slower than taking the faults? (I would
> expect it to be faster).
>
> Attached example program first mallocs a piece of memory (64MB in
> this case) then reads it "to take the faults". Afterwards, it uses
> mmap with MAP_POPULATE to "speed up" and not to have to take the
> faults, but have everything mapped in one go. I think mmap is using
> get_user_pages in this case.
>
> $ ./memspeed
> malloc took 0 msecs
> read took 14 msecs
> write took 0 msecs
> free took 1 msecs
> mmap took 45 msecs
> munmap took 5 msecs
>
> Using MAP_POPULATE is 3 times as slow as the 'stupid'
> implementation! I'm running a Core 2 duo e6300 system with linux
> 2.6.28.4.
>
> Am I doing something wrong? MAP_POPULATE seems a bit of a joke to me.

Hi Micha,

Yep, you are. Because your pointer 'p' is a pointer to int, when you
increment it by 0x1000 in your loops you are actually incrementing it
by 0x1000 * sizeof(int) - so you're only actually touching one page in
four.

If you change the types of 'buf', 'p' and 'e' to 'char *' then it
touches every page - and (and least on my test box) the MAP_POPULATE
case pulls ahead.

- Kevin



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

2010-08-12 10:08:12

by Micha Nelissen

[permalink] [raw]
Subject: Re: Why is get_user_pages so slow?

Hi Kevin,

Kevin Easton wrote:
> Yep, you are. Because your pointer 'p' is a pointer to int, when you
> increment it by 0x1000 in your loops you are actually incrementing it by
> 0x1000 * sizeof(int) - so you're only actually touching one page in four.

Oops sorry, thanks for catching my mistake.

I also discovered the following: if I read from all pages, then call
get_user_pages, it is still quite slow (did I get a read-only page?).
However, if I touch all pages by writing to them, then get_user_pages
becomes a factor 40 times faster or so.

All is clear now, I think. Thanks.

Micha