LinuxLists.cc - [PATCH 1/2] Implement lru_add_drain_all

2009-10-06 02:41:24

Subject: [PATCH 1/2] Implement lru_add_drain_all_async()

===================================================================
Implement asynchronous lru_add_drain_all()

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/swap.h | 1 +
mm/swap.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4ec9001..1f5772a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -204,6 +204,7 @@ extern void activate_page(struct page *);
extern void mark_page_accessed(struct page *);
extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
+extern int lru_add_drain_all_async(void);
extern void rotate_reclaimable_page(struct page *page);
extern void swap_setup(void);

diff --git a/mm/swap.c b/mm/swap.c
index 308e57d..e16cd40 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -38,6 +38,7 @@ int page_cluster;

static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct work_struct, lru_drain_work);

/*
* This path almost never happens for VM activity - pages are normally
@@ -312,6 +313,24 @@ int lru_add_drain_all(void)
}

/*
+ * Returns 0 for success
+ */
+int lru_add_drain_all_async(void)
+{
+ int cpu;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ struct work_struct *work = &per_cpu(lru_drain_work, cpu);
+ schedule_work_on(cpu, work);
+ }
+ put_online_cpus();
+
+ return 0;
+}
+
+
+/*
* Batched page_cache_release(). Decrement the reference count on all the
* passed pages. If it fell to zero then remove the page from the LRU and
* free it.
@@ -497,6 +516,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag);
void __init swap_setup(void)
{
unsigned long megs = totalram_pages >> (20 - PAGE_SHIFT);
+ int cpu;

#ifdef CONFIG_SWAP
bdi_init(swapper_space.backing_dev_info);
@@ -511,4 +531,8 @@ void __init swap_setup(void)
* Right now other parts of the system means that we
* _really_ don't want to cluster much more
*/
+
+ for_each_possible_cpu(cpu) {
+ INIT_WORK(&per_cpu(lru_drain_work, cpu), lru_add_drain_per_cpu);
+ }
}
--
1.6.2.5

2009-10-06 02:42:08

by KOSAKI Motohiro

[permalink] [raw]

Subject: [PATCH 2/2] mlock use lru_add_drain_all_async()

Recently, Peter Zijlstra reported RT-task can lead to prevent mlock
very long time.

Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
cpu0 does mlock()->lru_add_drain_all(), which does
schedule_on_each_cpu(), which then waits for all cpus to complete the
work. Except that cpu1, which is busy with the RT task, will never run
keventd until the RT load goes away.

This is not so much an actual deadlock as a serious starvation case.

Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
Thus, this patch replace it with lru_add_drain_all_async().

Cc: Oleg Nesterov <[email protected]>
Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mlock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 22041aa..46a016f 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -458,7 +458,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
if (!can_do_mlock())
return -EPERM;

- lru_add_drain_all(); /* flush pagevec */
+ lru_add_drain_all_async(); /* flush pagevec */

down_write(&current->mm->mmap_sem);
len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
@@ -526,7 +526,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
if (!can_do_mlock())
goto out;

- lru_add_drain_all(); /* flush pagevec */
+ lru_add_drain_all_async(); /* flush pagevec */

down_write(&current->mm->mmap_sem);

--
1.6.2.5

2009-10-06 09:09:26

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

On Tue, 2009-10-06 at 11:41 +0900, KOSAKI Motohiro wrote:
> Recently, Peter Zijlstra reported RT-task can lead to prevent mlock
> very long time.
>
> Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
> cpu0 does mlock()->lru_add_drain_all(), which does
> schedule_on_each_cpu(), which then waits for all cpus to complete the
> work. Except that cpu1, which is busy with the RT task, will never run
> keventd until the RT load goes away.
>
> This is not so much an actual deadlock as a serious starvation case.
>
> Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
> Thus, this patch replace it with lru_add_drain_all_async().
>
> Cc: Oleg Nesterov <[email protected]>
> Reported-by: Peter Zijlstra <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

It was actually Mike Galbraith who brought it to my attention.

Patch looks sane enough, altough I'm not sure I'd have split it in two
like you did (leaves the first without a real changelog too).

2009-10-06 10:12:09

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

> On Tue, 2009-10-06 at 11:41 +0900, KOSAKI Motohiro wrote:
> > Recently, Peter Zijlstra reported RT-task can lead to prevent mlock
> > very long time.
> >
> > Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
> > cpu0 does mlock()->lru_add_drain_all(), which does
> > schedule_on_each_cpu(), which then waits for all cpus to complete the
> > work. Except that cpu1, which is busy with the RT task, will never run
> > keventd until the RT load goes away.
> >
> > This is not so much an actual deadlock as a serious starvation case.
> >
> > Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
> > Thus, this patch replace it with lru_add_drain_all_async().
> >
> > Cc: Oleg Nesterov <[email protected]>
> > Reported-by: Peter Zijlstra <[email protected]>
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>
> It was actually Mike Galbraith who brought it to my attention.
>
> Patch looks sane enough, altough I'm not sure I'd have split it in two
> like you did (leaves the first without a real changelog too).

Ah, yes. they shold be folded. thanks.
In my local patch queue, this patch series have another two caller.

- lumpy reclaim: currently, PCP often cause failure large order allocation.
- page migration: in almost case, PCP doesn't hold the migration target page.

but they are still testing.

2009-10-06 16:33:59

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

On Tue, 6 Oct 2009, KOSAKI Motohiro wrote:

> Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
> cpu0 does mlock()->lru_add_drain_all(), which does
> schedule_on_each_cpu(), which then waits for all cpus to complete the
> work. Except that cpu1, which is busy with the RT task, will never run
> keventd until the RT load goes away.
>
> This is not so much an actual deadlock as a serious starvation case.
>
> Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
> Thus, this patch replace it with lru_add_drain_all_async().

Ok so this will queue up lots of events for the cpu doing a RT task. If
the RT task is continuous then they will be queued there forever?

2009-10-06 23:00:22

by Ying Han

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

Hello?KOSAKI-san,

Few questions on the lru_add_drain_all_async(). If i understand
correctly, the reason that we have lru_add_drain_all() in the mlock()
call is to isolate mlocked pages into the separate LRU in case they
are sitting in pagevec.

And I also understand the RT use cases you put in the patch
description, now my questions is that do we have race after applying
the patch? For example that if the RT task not giving up the cpu by
the time mlock returns, you have pages left in the pagevec which not
being drained back to the lru list. Do we have problem with that?

--Ying

On Mon, Oct 5, 2009 at 7:41 PM, KOSAKI Motohiro
<[email protected]> wrote:
>
> Recently, Peter Zijlstra reported RT-task can lead to prevent mlock
> very long time.
>
> ?Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
> ?cpu0 does mlock()->lru_add_drain_all(), which does
> ?schedule_on_each_cpu(), which then waits for all cpus to complete the
> ?work. Except that cpu1, which is busy with the RT task, will never run
> ?keventd until the RT load goes away.
>
> ?This is not so much an actual deadlock as a serious starvation case.
>
> Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
> Thus, this patch replace it with lru_add_drain_all_async().
>
> Cc: Oleg Nesterov <[email protected]>
> Reported-by: Peter Zijlstra <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> ?mm/mlock.c | ? ?4 ++--
> ?1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 22041aa..46a016f 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -458,7 +458,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> ? ? ? ?if (!can_do_mlock())
> ? ? ? ? ? ? ? ?return -EPERM;
>
> - ? ? ? lru_add_drain_all(); ? ?/* flush pagevec */
> + ? ? ? lru_add_drain_all_async(); ? ? ?/* flush pagevec */
>
> ? ? ? ?down_write(&current->mm->mmap_sem);
> ? ? ? ?len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
> @@ -526,7 +526,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
> ? ? ? ?if (!can_do_mlock())
> ? ? ? ? ? ? ? ?goto out;
>
> - ? ? ? lru_add_drain_all(); ? ?/* flush pagevec */
> + ? ? ? lru_add_drain_all_async(); ? ? ?/* flush pagevec */
>
> ? ? ? ?down_write(&current->mm->mmap_sem);
>
> --
> 1.6.2.5
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-10-07 03:38:20

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

2009/10/7 Christoph Lameter <[email protected]>:
> On Tue, 6 Oct 2009, KOSAKI Motohiro wrote:
>
>> ? Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
>> ? cpu0 does mlock()->lru_add_drain_all(), which does
>> ? schedule_on_each_cpu(), which then waits for all cpus to complete the
>> ? work. Except that cpu1, which is busy with the RT task, will never run
>> ? keventd until the RT load goes away.
>>
>> ? This is not so much an actual deadlock as a serious starvation case.
>>
>> Actually, mlock() doesn't need to wait to finish lru_add_drain_all().
>> Thus, this patch replace it with lru_add_drain_all_async().
>
> Ok so this will queue up lots of events for the cpu doing a RT task. If
> the RT task is continuous then they will be queued there forever?

Yes. this patch solved very specific issue only.
In original bug-report case, the system has two cpuset and the RT task
own one cpuset as monopoly. Thus, your worried thing doesn't occur.

Perhaps, we need complete solution. but I don't think this patch have
bad side effect. then, I hope to push it into mainline.

2009-10-07 03:49:09

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: [PATCH 2/2] mlock use lru_add_drain_all_async()

Hi

> Hello?KOSAKI-san,
>
> Few questions on the lru_add_drain_all_async(). If i understand
> correctly, the reason that we have lru_add_drain_all() in the mlock()
> call is to isolate mlocked pages into the separate LRU in case they
> are sitting in pagevec.
>
> And I also understand the RT use cases you put in the patch
> description, now my questions is that do we have race after applying
> the patch? For example that if the RT task not giving up the cpu by
> the time mlock returns, you have pages left in the pagevec which not
> being drained back to the lru list. Do we have problem with that?

This patch don't introduce new race. current code has following race.

1. call mlock
2. lru_add_drain_all()
3. another cpu grab the page into its pagevec
4. actual PG_mlocked processing

I'd like to explain why this code works. linux has VM_LOCKED in vma
and PG_mlocked in page. if we failed to turn on PG_mlocked, we can
recover it at vmscan phase by VM_LOCKED.

Then, this patch effect are
- increase race possibility a bit
- decrease RT-task problem risk

Thanks.