2009-04-15 07:32:20

by Brice Goglin

[permalink] [raw]
Subject: [PATCH] migration: only migrate_prep() once per move_pages()

migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
Commit 3140a2273009c01c27d316f35ab76a37e105fdd8 improved move_pages()
throughput by breaking it into chunks, but it also made migrate_prep()
be called once per chunk (every 128pages or so) instead of once per
move_pages().

This patch reverts to calling migrate_prep() only once per chunk
as we did before 2.6.29.
It is also a followup to commit 0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d
mm: move migrate_prep out from under mmap_sem

This improves migration throughput on the above machine from 600MB/s
to 750MB/s.

Signed-off-by: Brice Goglin <[email protected]>

diff --git a/mm/migrate.c b/mm/migrate.c
index 068655d..a2d3e83 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -820,7 +820,6 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
struct page_to_node *pp;
LIST_HEAD(pagelist);

- migrate_prep();
down_read(&mm->mmap_sem);

/*
@@ -907,6 +906,9 @@ static int do_pages_move(struct mm_struct *mm, struct task_struct *task,
pm = (struct page_to_node *)__get_free_page(GFP_KERNEL);
if (!pm)
goto out;
+
+ migrate_prep();
+
/*
* Store a chunk of page_to_node array in a page,
* but keep the last one as a marker


2009-04-15 07:51:41

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] migration: only migrate_prep() once per move_pages()

On Wed, 15 Apr 2009 09:32:10 +0200
Brice Goglin <[email protected]> wrote:

> migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
> Commit 3140a2273009c01c27d316f35ab76a37e105fdd8 improved move_pages()
> throughput by breaking it into chunks, but it also made migrate_prep()
> be called once per chunk (every 128pages or so) instead of once per
> move_pages().
>
> This patch reverts to calling migrate_prep() only once per chunk
> as we did before 2.6.29.
> It is also a followup to commit 0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d
> mm: move migrate_prep out from under mmap_sem
>
> This improves migration throughput on the above machine from 600MB/s
> to 750MB/s.
>
> Signed-off-by: Brice Goglin <[email protected]>
>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

I think this patch is good. page migration is best-effort syscall ;)

BTW, current users of sys_move_pages() does retry when it gets -EBUSY ?

Thanks,
-Kame


> diff --git a/mm/migrate.c b/mm/migrate.c
> index 068655d..a2d3e83 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -820,7 +820,6 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
> struct page_to_node *pp;
> LIST_HEAD(pagelist);
>
> - migrate_prep();
> down_read(&mm->mmap_sem);
>
> /*
> @@ -907,6 +906,9 @@ static int do_pages_move(struct mm_struct *mm, struct task_struct *task,
> pm = (struct page_to_node *)__get_free_page(GFP_KERNEL);
> if (!pm)
> goto out;
> +
> + migrate_prep();
> +
> /*
> * Store a chunk of page_to_node array in a page,
> * but keep the last one as a marker
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2009-04-15 09:33:21

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH] migration: only migrate_prep() once per move_pages()

KAMEZAWA Hiroyuki wrote:
> On Wed, 15 Apr 2009 09:32:10 +0200
> Brice Goglin <[email protected]> wrote:
>
>
>> migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
>> Commit 3140a2273009c01c27d316f35ab76a37e105fdd8 improved move_pages()
>> throughput by breaking it into chunks, but it also made migrate_prep()
>> be called once per chunk (every 128pages or so) instead of once per
>> move_pages().
>>
>> This patch reverts to calling migrate_prep() only once per chunk
>> as we did before 2.6.29.
>> It is also a followup to commit 0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d
>> mm: move migrate_prep out from under mmap_sem
>>
>> This improves migration throughput on the above machine from 600MB/s
>> to 750MB/s.
>>
>> Signed-off-by: Brice Goglin <[email protected]>
>>
>>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>
> I think this patch is good. page migration is best-effort syscall ;)
>

My next feeling now is about improving migrate_prep() itself. It makes
the move_pages() startup overhead very high.

But lru_add_drain_all() touches some code that I am far from
understanding :/ Can we imagine using IPI instead of a deferred
work_struct for this kind of things? Or maybe, for each processor, check
whether drain_cpu_pagevecs() would have something to do before actually
scheduling the local work_struct? It's racy, but migrate_prep() doesn't
guarantee anyway that pages won't be moved out of the LRU before the
actual migration, so...

Also I don't see why the cost of lru_add_drain_all() seems to increase
linearly with the number of cores in the machine. There may be some lock
contention, but it should scale better when there's pretty-much nothing
in the CPU lists...

> BTW, current users of sys_move_pages() does retry when it gets -EBUSY ?
>

I'd say they ignore it since it doesn't happen often :)

Brice

Subject: Re: [PATCH] migration: only migrate_prep() once per move_pages()

On Wed, 15 Apr 2009, Brice Goglin wrote:

> But lru_add_drain_all() touches some code that I am far from
> understanding :/ Can we imagine using IPI instead of a deferred
> work_struct for this kind of things? Or maybe, for each processor, check
> whether drain_cpu_pagevecs() would have something to do before actually
> scheduling the local work_struct? It's racy, but migrate_prep() doesn't
> guarantee anyway that pages won't be moved out of the LRU before the
> actual migration, so...

IPI means that code must run with interrupts disabled.

> > BTW, current users of sys_move_pages() does retry when it gets -EBUSY ?
> >
>
> I'd say they ignore it since it doesn't happen often :)

Right.