2008-01-15 01:46:00

by KOSAKI Motohiro

[permalink] [raw]
Subject: [RFC] mmaped copy too slow?

Hi

at one point, I found the large file copy speed was different depending on
the copy method.

I compared below method
- read(2) and write(2).
- mmap(2) x2 and memcpy.
- mmap(2) and write(2).

in addition, effect of fadvice(2) and madvice(2) is checked.

to a strange thing,
- most faster method is read + write + fadvice.
- worst method is mmap + memcpy.

some famous book(i.e. Advanced Programming in UNIX Environment
by W. Richard Stevens) written mmap copy x2 faster than read-write.
but, linux doesn't.

and, I found bottleneck is page reclaim.
for comparision, I change page reclaim function a bit. and test again.


test machine:
CPU: Pentium4 with HT 2.8GHz
memory: 512M
Disk I/O: can about 20M/s transfer.
(in other word, 1GB transfer need 50s at ideal state)


spent time of 1GB file copy.(unit is second)

2.6.24-rc6 2.6.24-rc6 ratio
+my patch (small is faster)
------------------------------------------------------------
rw_cp 59.32 58.60 98.79%
rw_fadv_cp 57.96 57.96 100.0%
mm_sync_cp 69.97 61.68 88.15%
mm_sync_madv_cp 69.41 62.54 90.10%
mw_cp 61.69 63.11 102.30%
mw_madv_cp 61.35 61.31 99.93%

this patch is too premature and ugly.
but I think that there is enough information to discuss to
page reclaim improvement.

the problem is when almost page is mapped and PTE access bit on,
page reclaim process below steps.

1) page move to inactive list -> active list
2) page move to active list -> inactive list
3) really pageout

It is too roundabout and unnecessary memory pressure happend.
if you don't mind, please discuss.




Signed-off-by: KOSAKI Motohiro <[email protected]>

---
mm/vmscan.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc6-cp3/mm/vmscan.c
===================================================================
--- linux-2.6.24-rc6-cp3.orig/mm/vmscan.c 2008-01-13 21:58:03.000000000 +0900
+++ linux-2.6.24-rc6-cp3/mm/vmscan.c 2008-01-13 22:30:27.000000000 +0900
@@ -446,13 +446,18 @@ static unsigned long shrink_page_list(st
struct pagevec freed_pvec;
int pgactivate = 0;
unsigned long nr_reclaimed = 0;
+ unsigned long nr_scanned = 0;
+ LIST_HEAD(l_mapped_pages);
+ unsigned long nr_mapped_page_activate = 0;
+ struct page *page;
+ int reference_checked = 0;

cond_resched();

pagevec_init(&freed_pvec, 1);
+retry:
while (!list_empty(page_list)) {
struct address_space *mapping;
- struct page *page;
int may_enter_fs;
int referenced;

@@ -466,6 +471,7 @@ static unsigned long shrink_page_list(st

VM_BUG_ON(PageActive(page));

+ nr_scanned++;
sc->nr_scanned++;

if (!sc->may_swap && page_mapped(page))
@@ -493,11 +499,17 @@ static unsigned long shrink_page_list(st
goto keep_locked;
}

- referenced = page_referenced(page, 1);
- /* In active use or really unfreeable? Activate it. */
- if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
- referenced && page_mapping_inuse(page))
- goto activate_locked;
+ if (!reference_checked) {
+ referenced = page_referenced(page, 1);
+ /* In active use or really unfreeable? Activate it. */
+ if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
+ referenced && page_mapping_inuse(page)) {
+ nr_mapped_page_activate++;
+ unlock_page(page);
+ list_add(&page->lru, &l_mapped_pages);
+ continue;
+ }
+ }

#ifdef CONFIG_SWAP
/*
@@ -604,7 +616,31 @@ keep:
list_add(&page->lru, &ret_pages);
VM_BUG_ON(PageLRU(page));
}
+
+ if (nr_scanned == nr_mapped_page_activate) {
+ /* may be under copy by mmap.
+ ignore reference flag. */
+ reference_checked = 1;
+ list_splice(&l_mapped_pages, page_list);
+ goto retry;
+ } else {
+ /* move active list just now */
+ while (!list_empty(&l_mapped_pages)) {
+ page = lru_to_page(&l_mapped_pages);
+ list_del(&page->lru);
+ prefetchw_prev_lru_page(page, &l_mapped_pages, flags);
+
+ if (!TestSetPageLocked(page)) {
+ SetPageActive(page);
+ pgactivate++;
+ unlock_page(page);
+ }
+ list_add(&page->lru, &ret_pages);
+ }
+ }
+
list_splice(&ret_pages, page_list);
+
if (pagevec_count(&freed_pvec))
__pagevec_release_nonlru(&freed_pvec);
count_vm_events(PGACTIVATE, pgactivate);


Attachments:
mmap-write.c (1.04 kB)
read-write.c (1.20 kB)
test.sh (568.00 B)
Makefile (1.02 kB)
mmap-mmap.c (1.42 kB)
Download all attachments

2008-01-15 02:16:01

by Rik van Riel

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

On Tue, 15 Jan 2008 10:45:47 +0900
KOSAKI Motohiro <[email protected]> wrote:

> the problem is when almost page is mapped and PTE access bit on,
> page reclaim process below steps.
>
> 1) page move to inactive list -> active list
> 2) page move to active list -> inactive list
> 3) really pageout
>
> It is too roundabout and unnecessary memory pressure happend.
> if you don't mind, please discuss.

While being able to deal with used-once mappings in page reclaim
could be a good idea, this would require us to be able to determine
the difference between a page that was accessed once since it was
faulted in and a page that got accessed several times.

That kind of infrastructure could end up adding more overhead than
an immediate reclaim of these streaming mmap pages would save.

Given that page faults have overhead too, it does not surprise me
that read+write is faster than mmap+memcpy.

In threaded applications, page fault overhead will be worse still,
since the TLBs need to be synchronized between CPUs (at least at
reclaim time).

Maybe we should just advise people to use read+write, since it is
faster than mmap+memcpy?

--
All rights reversed.

2008-01-15 03:20:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

Hi Rik

> While being able to deal with used-once mappings in page reclaim
> could be a good idea, this would require us to be able to determine
> the difference between a page that was accessed once since it was
> faulted in and a page that got accessed several times.

it makes sense that read ahead hit assume used-once mapping, may be.
I will try it.

(may be, i can repost soon)

> Given that page faults have overhead too, it does not surprise me
> that read+write is faster than mmap+memcpy.
>
> In threaded applications, page fault overhead will be worse still,
> since the TLBs need to be synchronized between CPUs (at least at
> reclaim time).

sure.
but current is unnecessary large performance difference.
I hope improvement it because copy by mmapd is very common operation.

> Maybe we should just advise people to use read+write, since it is
> faster than mmap+memcpy?

Time is solved to it :)
thanks!


- kosaki


2008-01-15 08:58:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?


On Tue, 2008-01-15 at 12:20 +0900, KOSAKI Motohiro wrote:
> Hi Rik
>
> > While being able to deal with used-once mappings in page reclaim
> > could be a good idea, this would require us to be able to determine
> > the difference between a page that was accessed once since it was
> > faulted in and a page that got accessed several times.
>
> it makes sense that read ahead hit assume used-once mapping, may be.
> I will try it.

I once had a patch that made read-ahead give feedback into page reclaim,
but people didn't like it.

2008-01-15 09:04:45

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

Hi Peter,

> > > While being able to deal with used-once mappings in page reclaim
> > > could be a good idea, this would require us to be able to determine
> > > the difference between a page that was accessed once since it was
> > > faulted in and a page that got accessed several times.
> >
> > it makes sense that read ahead hit assume used-once mapping, may be.
> > I will try it.
>
> I once had a patch that made read-ahead give feedback into page reclaim,
> but people didn't like it.

Could you please tell me your mail subject or URL?
I hope know why people didn't like.

thanks

- kosaki


2008-01-15 09:08:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?


On Tue, 2008-01-15 at 18:03 +0900, KOSAKI Motohiro wrote:
> Hi Peter,
>
> > > > While being able to deal with used-once mappings in page reclaim
> > > > could be a good idea, this would require us to be able to determine
> > > > the difference between a page that was accessed once since it was
> > > > faulted in and a page that got accessed several times.
> > >
> > > it makes sense that read ahead hit assume used-once mapping, may be.
> > > I will try it.
> >
> > I once had a patch that made read-ahead give feedback into page reclaim,
> > but people didn't like it.
>
> Could you please tell me your mail subject or URL?
> I hope know why people didn't like.

I think this is the last thread on the subject:

http://lkml.org/lkml/2007/7/21/219


2008-01-15 12:46:42

by Paulo Marques

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

KOSAKI Motohiro wrote:
> Hi
>
> at one point, I found the large file copy speed was different depending on
> the copy method.
>
> I compared below method
> - read(2) and write(2).
> - mmap(2) x2 and memcpy.
> - mmap(2) and write(2).
>
> in addition, effect of fadvice(2) and madvice(2) is checked.
>
> to a strange thing,
> - most faster method is read + write + fadvice.
> - worst method is mmap + memcpy.

One thing you could also try is to pass MAP_POPULATE to mmap so that the
page tables are filled in at the time of the mmap, avoiding a lot of
page faults later.

Just my 2 cents,

--
Paulo Marques - http://www.grupopie.com

"All I ask is a chance to prove that money can't make me happy."

2008-01-16 02:05:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

Hi Paulo

> One thing you could also try is to pass MAP_POPULATE to mmap so that the
> page tables are filled in at the time of the mmap, avoiding a lot of
> page faults later.
>
> Just my 2 cents,

OK, I will test your idea and report about tomorrow.
but I don't think page fault is major performance impact.

may be, below 2 things too big
- stupid page reclaim
- large cache pollution by memcpy.

Just my 2 cents :-p


- kosaki

2008-01-17 03:24:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] mmaped copy too slow?

Hi

> > One thing you could also try is to pass MAP_POPULATE to mmap so that the
> > page tables are filled in at the time of the mmap, avoiding a lot of
> > page faults later.
> >
>
> OK, I will test your idea and report about tomorrow.
> but I don't think page fault is major performance impact.

I got more interesting result :)
MAP_POPULATE is harmful result at large copy.


1G copy
elapse(sec)
--------------------------------------------
mmap 71.54
mmap + madvice 69.63
mmap + populate 100.87
mmap + populate + madvice 101.16


more detail:
time command output of mmap copy
0.50user 3.59system 1:11.54elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k
2101192inputs+2097160outputs (32776major+491573minor)pagefaults 0swaps

time command output of mmap+populate copy
0.53user 5.13system 1:40.87elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k
4200808inputs+2097160outputs (49164major+737340minor)pagefaults 0swaps


input blocks increase about x2.
in fact, mmap(MAP_POPULATE) read disk to memory and drop it just after,
thus read again.


of cource, when copy file size is enough small, MAP_POPULATE is effective.


100M copy
elapse(sec)
--------------------------------------------
mmap 7.38
mmap + madvice 7.29
mmap + populate 7.13
mmap + populate + madvice 6.65


- kosaki