Hi,
For page migration of CMA, buffer-heads of lru should be dropped.
Please refer to https://lkml.org/lkml/2014/6/23/932 for the detail.
I'm attaching a patch to drop all buffer-head on lru with invalidate_bh_lrus.
Please look into this. Thanks
------------------------------ 8< ---------------------------------
>From d90cd3d13b73f7278c60d8fe625840664adcbb6a Mon Sep 17 00:00:00 2001
From: Gioh Kim <[email protected]>
Date: Fri, 4 Jul 2014 16:53:22 +0900
Subject: [PATCH] [RFC] CMA: clear buffer-head lru before page migration
When CMA try to migrate page, some buffer-heads can exist on lru.
The bh on lru has non-zero count value so that it cannot be dropped
even-if it is not used. We can drop only buffers related to the
migrated page, but it can take long time more than dropping all
because of searching list. There all buffers in lru are dropped.
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Gioh Kim <[email protected]>
---
fs/buffer.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/buffer.c b/fs/buffer.c
index eba6e4f..4f11b7a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3233,6 +3233,19 @@ int try_to_free_buffers(struct page *page)
if (PageWriteback(page))
return 0;
+#ifdef CONFIG_CMA
+ /*
+ * When CMA try to migrate page, some buffer-heads can exist on lru.
+ * The bh on lru has non-zero count value so that it cannot
+ * be dropped even-if it is not used.
+ * We can drop only buffers related to the migrated page,
+ * but it can take long time more than dropping all
+ * because of searching list.
+ * There all buffers in lru are dropped first.
+ */
+ invalidate_bh_lrus();
+#endif
+
if (mapping == NULL) { /* can this still happen? */
ret = drop_buffers(page, &buffers_to_free);
goto out;
--
1.7.9.5
On Fri, 04 Jul 2014 17:25:09 +0900 Gioh Kim <[email protected]> wrote:
> From: Gioh Kim <[email protected]>
> Date: Fri, 4 Jul 2014 16:53:22 +0900
> Subject: [PATCH] [RFC] CMA: clear buffer-head lru before page migration
>
> When CMA try to migrate page, some buffer-heads can exist on lru.
> The bh on lru has non-zero count value so that it cannot be dropped
> even-if it is not used. We can drop only buffers related to the
> migrated page, but it can take long time more than dropping all
> because of searching list. There all buffers in lru are dropped.
>
> Signed-off-by: Laura Abbott <[email protected]>
> Signed-off-by: Gioh Kim <[email protected]>
> ---
> fs/buffer.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index eba6e4f..4f11b7a 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3233,6 +3233,19 @@ int try_to_free_buffers(struct page *page)
> if (PageWriteback(page))
> return 0;
>
> +#ifdef CONFIG_CMA
> + /*
> + * When CMA try to migrate page, some buffer-heads can exist on lru.
> + * The bh on lru has non-zero count value so that it cannot
> + * be dropped even-if it is not used.
> + * We can drop only buffers related to the migrated page,
> + * but it can take long time more than dropping all
> + * because of searching list.
> + * There all buffers in lru are dropped first.
> + */
> + invalidate_bh_lrus();
> +#endif
No, this will be tremendously expensive.
What I proposed is that CMA call invalidate_bh_lrus() right at the
outset. Something along the lines of
--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
};
INIT_LIST_HEAD(&cc.migratepages);
+#ifdef CONFIG_CMA
+ /*
+ * Comment goes here
+ */
+ if (migratetype == MIGRATE_CMA)
+ invalidate_bh_lrus();
+#endif
+
/*
* What we do here is we mark all pageblocks in range as
* MIGRATE_ISOLATE. Because pageblock and max order pages may
- I'd have thought that it would make sense to do this for huge pages
as well (MIGRATE_MOVABLE) but nobody really seems to know.
- There's a patch floating around ("Allow increasing the buffer-head
per-CPU LRU size") which will double the size of the bh lrus, so this
all becomes more important.
- alloc_contig_range() does lru_add_drain_all() and drain_all_pages()
*after* performing the allocation. I can't work out why this is the
case and of course it is undocumented. If this is indeed not a bug
then probably the invalidate_bh_lrus() should happen in the same
place.
It's my fault.
I'm going to send another patch ASAP.
2014-07-08 오전 7:52, Andrew Morton 쓴 글:
> On Fri, 04 Jul 2014 17:25:09 +0900 Gioh Kim <[email protected]> wrote:
>
>> From: Gioh Kim <[email protected]>
>> Date: Fri, 4 Jul 2014 16:53:22 +0900
>> Subject: [PATCH] [RFC] CMA: clear buffer-head lru before page migration
>>
>> When CMA try to migrate page, some buffer-heads can exist on lru.
>> The bh on lru has non-zero count value so that it cannot be dropped
>> even-if it is not used. We can drop only buffers related to the
>> migrated page, but it can take long time more than dropping all
>> because of searching list. There all buffers in lru are dropped.
>>
>> Signed-off-by: Laura Abbott <[email protected]>
>> Signed-off-by: Gioh Kim <[email protected]>
>> ---
>> fs/buffer.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index eba6e4f..4f11b7a 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -3233,6 +3233,19 @@ int try_to_free_buffers(struct page *page)
>> if (PageWriteback(page))
>> return 0;
>>
>> +#ifdef CONFIG_CMA
>> + /*
>> + * When CMA try to migrate page, some buffer-heads can exist on lru.
>> + * The bh on lru has non-zero count value so that it cannot
>> + * be dropped even-if it is not used.
>> + * We can drop only buffers related to the migrated page,
>> + * but it can take long time more than dropping all
>> + * because of searching list.
>> + * There all buffers in lru are dropped first.
>> + */
>> + invalidate_bh_lrus();
>> +#endif
>
> No, this will be tremendously expensive.
>
> What I proposed is that CMA call invalidate_bh_lrus() right at the
> outset. Something along the lines of
>
> --- a/mm/page_alloc.c~a
> +++ a/mm/page_alloc.c
> @@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
> };
> INIT_LIST_HEAD(&cc.migratepages);
>
> +#ifdef CONFIG_CMA
> + /*
> + * Comment goes here
> + */
> + if (migratetype == MIGRATE_CMA)
> + invalidate_bh_lrus();
> +#endif
> +
> /*
> * What we do here is we mark all pageblocks in range as
> * MIGRATE_ISOLATE. Because pageblock and max order pages may
>
>
> - I'd have thought that it would make sense to do this for huge pages
> as well (MIGRATE_MOVABLE) but nobody really seems to know.
>
> - There's a patch floating around ("Allow increasing the buffer-head
> per-CPU LRU size") which will double the size of the bh lrus, so this
> all becomes more important.
>
> - alloc_contig_range() does lru_add_drain_all() and drain_all_pages()
> *after* performing the allocation. I can't work out why this is the
> case and of course it is undocumented. If this is indeed not a bug
> then probably the invalidate_bh_lrus() should happen in the same
> place.
>
On Tue, 08 Jul 2014 13:44:04 +0900 Gioh Kim <[email protected]> wrote:
> 2014-07-08 ______ 7:52, Andrew Morton ___ ___:
> > On Fri, 04 Jul 2014 17:25:09 +0900 Gioh Kim <[email protected]> wrote:
> >
> >> From: Gioh Kim <[email protected]>
> >> Date: Fri, 4 Jul 2014 16:53:22 +0900
> >> Subject: [PATCH] [RFC] CMA: clear buffer-head lru before page migration
> >>
> >> When CMA try to migrate page, some buffer-heads can exist on lru.
> >> The bh on lru has non-zero count value so that it cannot be dropped
> >> even-if it is not used. We can drop only buffers related to the
> >> migrated page, but it can take long time more than dropping all
> >> because of searching list. There all buffers in lru are dropped.
> >>
> >> Signed-off-by: Laura Abbott <[email protected]>
> >> Signed-off-by: Gioh Kim <[email protected]>
> >> ---
> >> fs/buffer.c | 13 +++++++++++++
> >> 1 file changed, 13 insertions(+)
> >>
> >> diff --git a/fs/buffer.c b/fs/buffer.c
> >> index eba6e4f..4f11b7a 100644
> >> --- a/fs/buffer.c
> >> +++ b/fs/buffer.c
> >> @@ -3233,6 +3233,19 @@ int try_to_free_buffers(struct page *page)
> >> if (PageWriteback(page))
> >> return 0;
> >>
> >> +#ifdef CONFIG_CMA
> >> + /*
> >> + * When CMA try to migrate page, some buffer-heads can exist on lru.
> >> + * The bh on lru has non-zero count value so that it cannot
> >> + * be dropped even-if it is not used.
> >> + * We can drop only buffers related to the migrated page,
> >> + * but it can take long time more than dropping all
> >> + * because of searching list.
> >> + * There all buffers in lru are dropped first.
> >> + */
> >> + invalidate_bh_lrus();
> >> +#endif
> >
> > No, this will be tremendously expensive.
> >
> > What I proposed is that CMA call invalidate_bh_lrus() right at the
> > outset. Something along the lines of
Please do not top-post your email replies - it makes it very hard to
conduct a coherent discussion.
> It's my fault.
> I'm going to send another patch ASAP.
No, not "ASAP". Such a patch will require careful testing on numerous
system configurations and workloads. Take however much time it needs
to get it right.
On Mon, Jul 07 2014, Andrew Morton <[email protected]> wrote:
> What I proposed is that CMA call invalidate_bh_lrus() right at the
> outset. Something along the lines of
>
> --- a/mm/page_alloc.c~a
> +++ a/mm/page_alloc.c
> @@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
> };
> INIT_LIST_HEAD(&cc.migratepages);
>
> +#ifdef CONFIG_CMA
> + /*
> + * Comment goes here
> + */
> + if (migratetype == MIGRATE_CMA)
> + invalidate_bh_lrus();
> +#endif
> +
This seems reasonable, except I think it should go after
start_isolate_page_range call because otherwise there's no guarantee
that someone won't grab those pages back.
Also to avoid the #ifdef perhaps we want this as well:
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6cbd1b6..2640a55 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -64,10 +64,11 @@ enum {
};
#ifdef CONFIG_CMA
-# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+# define __is_migrate_cma(migratetype) ((migratetype) == MIGRATE_CMA)
#else
-# define is_migrate_cma(migratetype) false
+# define __is_migrate_cma(migratetype) false
#endif
+#define is_migrate_cma(migratetype) unlikely(__is_migrate_cma(migratetype))
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
and then use “if (__is_migrate_cma(migratetype))”.
> /*
> * What we do here is we mark all pageblocks in range as
> * MIGRATE_ISOLATE. Because pageblock and max order pages may
>
>
> - I'd have thought that it would make sense to do this for huge pages
> as well (MIGRATE_MOVABLE) but nobody really seems to know.
>
> - There's a patch floating around ("Allow increasing the buffer-head
> per-CPU LRU size") which will double the size of the bh lrus, so this
> all becomes more important.
>
> - alloc_contig_range() does lru_add_drain_all() and drain_all_pages()
> *after* performing the allocation. I can't work out why this is the
> case and of course it is undocumented. If this is indeed not a bug
> then probably the invalidate_bh_lrus() should happen in the same
> place.
The purpose is to get free non-buddy pages (so pages on PCP lists for
instance) back onto the buddy list. It's safe to move those calls above
the call to __alloc_contig_migrate_range, but I don't think it will
change anything (except of course the fact that if migration fails,
we'll do the draining for nothing).
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--
On Tue, Jul 08, 2014 at 06:46:31PM +0200, Michal Nazarewicz wrote:
> On Mon, Jul 07 2014, Andrew Morton <[email protected]> wrote:
> > What I proposed is that CMA call invalidate_bh_lrus() right at the
> > outset. Something along the lines of
> >
> > --- a/mm/page_alloc.c~a
> > +++ a/mm/page_alloc.c
> > @@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
> > };
> > INIT_LIST_HEAD(&cc.migratepages);
> >
> > +#ifdef CONFIG_CMA
> > + /*
> > + * Comment goes here
> > + */
> > + if (migratetype == MIGRATE_CMA)
> > + invalidate_bh_lrus();
> > +#endif
> > +
>
> This seems reasonable, except I think it should go after
> start_isolate_page_range call because otherwise there's no guarantee
> that someone won't grab those pages back.
>
> Also to avoid the #ifdef perhaps we want this as well:
I think that we just want to remove ifdef CONFIG_CMA on above code
snippet, because invalidate_bh_lrus() would also help user of
alloc_contig_range() with MIGRATE_MOVABLE.
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6cbd1b6..2640a55 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -64,10 +64,11 @@ enum {
> };
>
> #ifdef CONFIG_CMA
> -# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +# define __is_migrate_cma(migratetype) ((migratetype) == MIGRATE_CMA)
> #else
> -# define is_migrate_cma(migratetype) false
> +# define __is_migrate_cma(migratetype) false
> #endif
> +#define is_migrate_cma(migratetype) unlikely(__is_migrate_cma(migratetype))
>
> #define for_each_migratetype_order(order, type) \
> for (order = 0; order < MAX_ORDER; order++) \
>
> and then use “if (__is_migrate_cma(migratetype))”.
>
> > /*
> > * What we do here is we mark all pageblocks in range as
> > * MIGRATE_ISOLATE. Because pageblock and max order pages may
> >
> >
> > - I'd have thought that it would make sense to do this for huge pages
> > as well (MIGRATE_MOVABLE) but nobody really seems to know.
> >
> > - There's a patch floating around ("Allow increasing the buffer-head
> > per-CPU LRU size") which will double the size of the bh lrus, so this
> > all becomes more important.
> >
> > - alloc_contig_range() does lru_add_drain_all() and drain_all_pages()
> > *after* performing the allocation. I can't work out why this is the
> > case and of course it is undocumented. If this is indeed not a bug
> > then probably the invalidate_bh_lrus() should happen in the same
> > place.
>
> The purpose is to get free non-buddy pages (so pages on PCP lists for
> instance) back onto the buddy list. It's safe to move those calls above
> the call to __alloc_contig_migrate_range, but I don't think it will
> change anything (except of course the fact that if migration fails,
> we'll do the draining for nothing).
At a glance, we don't need that drain_all_pages(), because
drain_all_pages() is also called by set_migratetype_isolate() after changing
migratetype.
And, it is better to move up lru_add_drain_all() to ahead of
__alloc_contig_migrate_range(), because some pages could be skipped
to migrate due to this lru page caching mechanism.
Thanks.
On Mon, Jul 14 2014, Joonsoo Kim <[email protected]> wrote:
> On Tue, Jul 08, 2014 at 06:46:31PM +0200, Michal Nazarewicz wrote:
>> This seems reasonable, except I think [invalidate_bh_lrus()] should
>> go after start_isolate_page_range call because otherwise there's no
>> guarantee that someone won't grab those pages back.
>>
>> Also to avoid the #ifdef perhaps we want this as well:
>
> I think that we just want to remove ifdef CONFIG_CMA on above code
> snippet, because invalidate_bh_lrus() would also help user of
> alloc_contig_range() with MIGRATE_MOVABLE.
Sounds good to me.
>> The purpose is to get free non-buddy pages (so pages on PCP lists for
>> instance) back onto the buddy list. It's safe to move those calls above
>> the call to __alloc_contig_migrate_range, but I don't think it will
>> change anything (except of course the fact that if migration fails,
>> we'll do the draining for nothing).
> At a glance, we don't need that drain_all_pages(), because
> drain_all_pages() is also called by set_migratetype_isolate() after
> changing migratetype.
You are likely correct.
> And, it is better to move up lru_add_drain_all() to ahead of
> __alloc_contig_migrate_range(), because some pages could be skipped
> to migrate due to this lru page caching mechanism.
Again, sounds good to me.
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--
On Mon, 14 Jul 2014 16:02:25 +0900 Joonsoo Kim <[email protected]> wrote:
> On Tue, Jul 08, 2014 at 06:46:31PM +0200, Michal Nazarewicz wrote:
> > On Mon, Jul 07 2014, Andrew Morton <[email protected]> wrote:
> > > What I proposed is that CMA call invalidate_bh_lrus() right at the
> > > outset. Something along the lines of
> > >
> > > --- a/mm/page_alloc.c~a
> > > +++ a/mm/page_alloc.c
> > > @@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
> > > };
> > > INIT_LIST_HEAD(&cc.migratepages);
> > >
> > > +#ifdef CONFIG_CMA
> > > + /*
> > > + * Comment goes here
> > > + */
> > > + if (migratetype == MIGRATE_CMA)
> > > + invalidate_bh_lrus();
> > > +#endif
> > > +
> >
> > This seems reasonable, except I think it should go after
> > start_isolate_page_range call because otherwise there's no guarantee
> > that someone won't grab those pages back.
> >
> > Also to avoid the #ifdef perhaps we want this as well:
>
> I think that we just want to remove ifdef CONFIG_CMA on above code
> snippet, because invalidate_bh_lrus() would also help user of
> alloc_contig_range() with MIGRATE_MOVABLE.
That's what I believe also. I pinged Mel and Johannes off-list and Mel
said "I hit it, the invalidation cost wasn't worth it for a THP alloc".
So hm. I do think it's worth additional investigation but some careful
testing would be needed to demonstrate that it's worthwhile. If the
invalidation cost is hurting then perhaps additional logic will be
needed to perform the invalidation only as a last-resort thing.
2014-07-15 오전 5:37, Andrew Morton 쓴 글:
> On Mon, 14 Jul 2014 16:02:25 +0900 Joonsoo Kim <[email protected]> wrote:
>
>> On Tue, Jul 08, 2014 at 06:46:31PM +0200, Michal Nazarewicz wrote:
>>> On Mon, Jul 07 2014, Andrew Morton <[email protected]> wrote:
>>>> What I proposed is that CMA call invalidate_bh_lrus() right at the
>>>> outset. Something along the lines of
>>>>
>>>> --- a/mm/page_alloc.c~a
>>>> +++ a/mm/page_alloc.c
>>>> @@ -6329,6 +6329,14 @@ int alloc_contig_range(unsigned long sta
>>>> };
>>>> INIT_LIST_HEAD(&cc.migratepages);
>>>>
>>>> +#ifdef CONFIG_CMA
>>>> + /*
>>>> + * Comment goes here
>>>> + */
>>>> + if (migratetype == MIGRATE_CMA)
>>>> + invalidate_bh_lrus();
>>>> +#endif
>>>> +
>>>
>>> This seems reasonable, except I think it should go after
>>> start_isolate_page_range call because otherwise there's no guarantee
>>> that someone won't grab those pages back.
>>>
>>> Also to avoid the #ifdef perhaps we want this as well:
>>
>> I think that we just want to remove ifdef CONFIG_CMA on above code
>> snippet, because invalidate_bh_lrus() would also help user of
>> alloc_contig_range() with MIGRATE_MOVABLE.
>
> That's what I believe also. I pinged Mel and Johannes off-list and Mel
> said "I hit it, the invalidation cost wasn't worth it for a THP alloc".
>
> So hm. I do think it's worth additional investigation but some careful
> testing would be needed to demonstrate that it's worthwhile. If the
> invalidation cost is hurting then perhaps additional logic will be
> needed to perform the invalidation only as a last-resort thing.
>
>
>
Adding invalidate_bh_lrus() between start_isolate_page_range and
__alloc_contig_migrate_range is working well on my platform.
But I'd like to test performance before sending patch.
Would anybody recommend me a benchmark tool?