2012-06-11 00:18:06

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] mm: do not use page_count without a page pin

d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
Let's fix it.

[1] http://comments.gmane.org/gmane.linux.kernel.mm/65844

I copy and paste d179e84ba's contents for description.

"It is unsafe to run page_count during the physical pfn scan because
compound_head could trip on a dangling pointer when reading
page->first_page if the compound page is being freed by another CPU."

Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/page_alloc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 266f267..019c4fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
continue;

page = pfn_to_page(check);
- if (!page_count(page)) {
+ /*
+ * We can't use page_count withou pin a page
+ * because another CPU can free compound page.
+ */
+ if (!atomic_read(&page->_count)) {
if (PageBuddy(page))
iter += (1 << page_order(page)) - 1;
continue;
--
1.7.9.5


2012-06-11 00:23:36

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

On Mon, Jun 11, 2012 at 09:17:51AM +0900, Minchan Kim wrote:
>d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
>Let's fix it.
>
>[1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
>
>I copy and paste d179e84ba's contents for description.
>
>"It is unsafe to run page_count during the physical pfn scan because
>compound_head could trip on a dangling pointer when reading
>page->first_page if the compound page is being freed by another CPU."
>
>Cc: Andrea Arcangeli <[email protected]>
>Cc: Mel Gorman <[email protected]>
>Cc: Michal Hocko <[email protected]>
>Cc: KAMEZAWA Hiroyuki <[email protected]>
>Signed-off-by: Minchan Kim <[email protected]>
>---
> mm/page_alloc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 266f267..019c4fe 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
> continue;
>
> page = pfn_to_page(check);
>- if (!page_count(page)) {
>+ /*
>+ * We can't use page_count withou pin a page
^
without
>+ * because another CPU can free compound page.
>+ */
>+ if (!atomic_read(&page->_count)) {
> if (PageBuddy(page))
> iter += (1 << page_order(page)) - 1;
> continue;
>--
>1.7.9.5
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to [email protected]. For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-06-11 02:09:11

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

Hi Wanpeng,

On 06/11/2012 09:23 AM, Wanpeng Li wrote:

> On Mon, Jun 11, 2012 at 09:17:51AM +0900, Minchan Kim wrote:
>> d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
>> Let's fix it.
>>
>> [1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
>>
>> I copy and paste d179e84ba's contents for description.
>>
>> "It is unsafe to run page_count during the physical pfn scan because
>> compound_head could trip on a dangling pointer when reading
>> page->first_page if the compound page is being freed by another CPU."
>>
>> Cc: Andrea Arcangeli <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: KAMEZAWA Hiroyuki <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/page_alloc.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 266f267..019c4fe 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>> continue;
>>
>> page = pfn_to_page(check);
>> - if (!page_count(page)) {
>> + /*
>> + * We can't use page_count withou pin a page
> ^
> without


I will resend fixed version after reviewer comment out.
Thanks!

--
Kind regards,
Minchan Kim

2012-06-11 07:22:30

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

(2012/06/11 9:17), Minchan Kim wrote:
> d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
> Let's fix it.
>
> [1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
>
> I copy and paste d179e84ba's contents for description.
>
> "It is unsafe to run page_count during the physical pfn scan because
> compound_head could trip on a dangling pointer when reading
> page->first_page if the compound page is being freed by another CPU."
>
> Cc: Andrea Arcangeli<[email protected]>
> Cc: Mel Gorman<[email protected]>
> Cc: Michal Hocko<[email protected]>
> Cc: KAMEZAWA Hiroyuki<[email protected]>
> Signed-off-by: Minchan Kim<[email protected]>
> ---
> mm/page_alloc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 266f267..019c4fe 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
> continue;
>
> page = pfn_to_page(check);
> - if (!page_count(page)) {
> + /*
> + * We can't use page_count withou pin a page
> + * because another CPU can free compound page.
> + */
> + if (!atomic_read(&page->_count)) {
> if (PageBuddy(page))
> iter += (1<< page_order(page)) - 1;
> continue;
Nice Catch.

Other than the comment fix already pointed out..
Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
so, we need more fixes for better code, I think.
Hmm, Don't we need !PageTail() check and 'skip thp' code ?

Thanks,
-Kame

2012-06-11 07:44:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

Hi,

On Mon, Jun 11, 2012 at 04:20:17PM +0900, Kamezawa Hiroyuki wrote:
> (2012/06/11 9:17), Minchan Kim wrote:
> > d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
> > Let's fix it.
> >
> > [1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
> >
> > I copy and paste d179e84ba's contents for description.
> >
> > "It is unsafe to run page_count during the physical pfn scan because
> > compound_head could trip on a dangling pointer when reading
> > page->first_page if the compound page is being freed by another CPU."
> >
> > Cc: Andrea Arcangeli<[email protected]>
> > Cc: Mel Gorman<[email protected]>
> > Cc: Michal Hocko<[email protected]>
> > Cc: KAMEZAWA Hiroyuki<[email protected]>
> > Signed-off-by: Minchan Kim<[email protected]>
> > ---
> > mm/page_alloc.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 266f267..019c4fe 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
> > continue;
> >
> > page = pfn_to_page(check);
> > - if (!page_count(page)) {
> > + /*
> > + * We can't use page_count withou pin a page
> > + * because another CPU can free compound page.
> > + */
> > + if (!atomic_read(&page->_count)) {
> > if (PageBuddy(page))
> > iter += (1<< page_order(page)) - 1;
> > continue;
> Nice Catch.

Agreed!

> Other than the comment fix already pointed out..
> Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
> so, we need more fixes for better code, I think.
> Hmm, Don't we need !PageTail() check and 'skip thp' code ?

So the page->_count for tail pages is guaranteed zero at all times
(tail page refcounting is done on _mapcount).

We could add a comment that "this check already skips compound tails
of THP because their page->_count is zero at all times".

Instead of a comment we could consider defining an inline function
with a special name that does atomic_read(&page->_count) and use it
when we intend to the regular or compound head count and return 0 on
tails. It would make it easier to identify these places later if we
ever want to change the refcounting mechanism, but it may be overkill,
it's up to you.

Tail pages also can't be PageLRU.

The code after the patch should already skip thp tails fine (it won't
skip heads but I believe that's intentional, but one problem that
remains is that the heads should increase found by more than 1...).

Thanks,
Andrea

2012-06-11 08:51:01

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

(2012/06/11 16:44), Andrea Arcangeli wrote:
> Hi,
>
> On Mon, Jun 11, 2012 at 04:20:17PM +0900, Kamezawa Hiroyuki wrote:
>> (2012/06/11 9:17), Minchan Kim wrote:
>>> d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
>>> Let's fix it.
>>>
>>> [1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
>>>
>>> I copy and paste d179e84ba's contents for description.
>>>
>>> "It is unsafe to run page_count during the physical pfn scan because
>>> compound_head could trip on a dangling pointer when reading
>>> page->first_page if the compound page is being freed by another CPU."
>>>
>>> Cc: Andrea Arcangeli<[email protected]>
>>> Cc: Mel Gorman<[email protected]>
>>> Cc: Michal Hocko<[email protected]>
>>> Cc: KAMEZAWA Hiroyuki<[email protected]>
>>> Signed-off-by: Minchan Kim<[email protected]>
>>> ---
>>> mm/page_alloc.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 266f267..019c4fe 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>>> continue;
>>>
>>> page = pfn_to_page(check);
>>> - if (!page_count(page)) {
>>> + /*
>>> + * We can't use page_count withou pin a page
>>> + * because another CPU can free compound page.
>>> + */
>>> + if (!atomic_read(&page->_count)) {
>>> if (PageBuddy(page))
>>> iter += (1<< page_order(page)) - 1;
>>> continue;
>> Nice Catch.
>
> Agreed!
>
>> Other than the comment fix already pointed out..
>> Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
>> so, we need more fixes for better code, I think.
>> Hmm, Don't we need !PageTail() check and 'skip thp' code ?
>
> So the page->_count for tail pages is guaranteed zero at all times
> (tail page refcounting is done on _mapcount).
>
> We could add a comment that "this check already skips compound tails
> of THP because their page->_count is zero at all times".
>

Thank you for clarification.

I'll look into this later. Fortunately, our team has memory-hotplug
team again for our next server and should revisit this :)
I'll give an input to them.

Thanks,
-Kame

> Instead of a comment we could consider defining an inline function
> with a special name that does atomic_read(&page->_count) and use it
> when we intend to the regular or compound head count and return 0 on
> tails. It would make it easier to identify these places later if we
> ever want to change the refcounting mechanism, but it may be overkill,
> it's up to you.
>
> Tail pages also can't be PageLRU.
>
> The code after the patch should already skip thp tails fine (it won't
> skip heads but I believe that's intentional, but one problem that
> remains is that the heads should increase found by more than 1...).
>
> Thanks,
> Andrea

2012-06-11 13:30:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

Hi Andrea,

On Mon, Jun 11, 2012 at 09:44:40AM +0200, Andrea Arcangeli wrote:
> Hi,
>
> On Mon, Jun 11, 2012 at 04:20:17PM +0900, Kamezawa Hiroyuki wrote:
> > (2012/06/11 9:17), Minchan Kim wrote:
> > > d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
> > > Let's fix it.
> > >
> > > [1] http://comments.gmane.org/gmane.linux.kernel.mm/65844
> > >
> > > I copy and paste d179e84ba's contents for description.
> > >
> > > "It is unsafe to run page_count during the physical pfn scan because
> > > compound_head could trip on a dangling pointer when reading
> > > page->first_page if the compound page is being freed by another CPU."
> > >
> > > Cc: Andrea Arcangeli<[email protected]>
> > > Cc: Mel Gorman<[email protected]>
> > > Cc: Michal Hocko<[email protected]>
> > > Cc: KAMEZAWA Hiroyuki<[email protected]>
> > > Signed-off-by: Minchan Kim<[email protected]>
> > > ---
> > > mm/page_alloc.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 266f267..019c4fe 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
> > > continue;
> > >
> > > page = pfn_to_page(check);
> > > - if (!page_count(page)) {
> > > + /*
> > > + * We can't use page_count withou pin a page
> > > + * because another CPU can free compound page.
> > > + */
> > > + if (!atomic_read(&page->_count)) {
> > > if (PageBuddy(page))
> > > iter += (1<< page_order(page)) - 1;
> > > continue;
> > Nice Catch.
>
> Agreed!
>
> > Other than the comment fix already pointed out..
> > Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
> > so, we need more fixes for better code, I think.
> > Hmm, Don't we need !PageTail() check and 'skip thp' code ?
>
> So the page->_count for tail pages is guaranteed zero at all times
> (tail page refcounting is done on _mapcount).

Sure.

>
> We could add a comment that "this check already skips compound tails
> of THP because their page->_count is zero at all times".

No problem.

>
> Instead of a comment we could consider defining an inline function
> with a special name that does atomic_read(&page->_count) and use it
> when we intend to the regular or compound head count and return 0 on
> tails. It would make it easier to identify these places later if we
> ever want to change the refcounting mechanism, but it may be overkill,
> it's up to you.

That's a good idea but it's not proper time because I don't have much time
for it and other patch[1] is pended by this.

I hope it could be another nice clean up patch later. :)

[1] https://lkml.org/lkml/2012/6/11/169

>
> Tail pages also can't be PageLRU.
>
> The code after the patch should already skip thp tails fine (it won't
> skip heads but I believe that's intentional, but one problem that
> remains is that the heads should increase found by more than 1...).

I can't fail to parse your last sentense.
Could you elaborate it more?

AFAIUC, you mean we have to increase reference count of head page?
If so, it's not in __count_immobile_pages because it is already race-likely function
so it shouldn't be critical although race happens.

If I miss something, please let me know it.

2012-06-11 14:41:47

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

Hi Minchan,

On Mon, Jun 11, 2012 at 10:30:43PM +0900, Minchan Kim wrote:
> AFAIUC, you mean we have to increase reference count of head page?
> If so, it's not in __count_immobile_pages because it is already race-likely function
> so it shouldn't be critical although race happens.

I meant, shouldn't we take into account the full size? If it's in the
lru the whole thing can be moved away.

if (!PageLRU(page)) {
nr_pages = hpage_nr_pages(page);
barrier();
found += nr_pages;
iter += nr_pages-1;
}

2012-06-11 22:49:35

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

On 06/11/2012 11:41 PM, Andrea Arcangeli wrote:

> Hi Minchan,
>
> On Mon, Jun 11, 2012 at 10:30:43PM +0900, Minchan Kim wrote:
>> AFAIUC, you mean we have to increase reference count of head page?
>> If so, it's not in __count_immobile_pages because it is already race-likely function
>> so it shouldn't be critical although race happens.
>
> I meant, shouldn't we take into account the full size? If it's in the
> lru the whole thing can be moved away.
>
> if (!PageLRU(page)) {
> nr_pages = hpage_nr_pages(page);
> barrier();


Could you explain why we need barrier?

> found += nr_pages;
> iter += nr_pages-1;
> }
>


Thanks for the explain.

For the normal pages, the logic accounts it as "non-movable pages" so for the consistency,
it seems you're right. But let's think about a bit.

If THP page isn't LRU and it's still PageTransHuge, I think it's rather rare and
although it happens, it means migration/reclaimer is about to split or isolate/putback
so it ends up making THP page movable pages.

IMHO, it would be better to account it by movable pages.
What do you think about it?

Thanks.
--
Kind regards,
Minchan Kim

2012-06-14 01:21:15

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

On Tue, Jun 12, 2012 at 07:49:34AM +0900, Minchan Kim wrote:
> If THP page isn't LRU and it's still PageTransHuge, I think it's rather rare and
> although it happens, it means migration/reclaimer is about to split or isolate/putback
> so it ends up making THP page movable pages.
>
> IMHO, it would be better to account it by movable pages.
> What do you think about it?

Agreed. Besides THP don't fragment pageblocks. It was just about
speeding up the scanning the same way it happens with the pagebuddy
check, but probably not worth it because we're in a racy area here not
holding locks. pagebuddy is safe because the zone lock is hold, or
it'd run in the same problem.

2012-06-14 01:49:50

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: do not use page_count without a page pin

On 06/14/2012 10:21 AM, Andrea Arcangeli wrote:

> On Tue, Jun 12, 2012 at 07:49:34AM +0900, Minchan Kim wrote:
>> If THP page isn't LRU and it's still PageTransHuge, I think it's rather rare and
>> although it happens, it means migration/reclaimer is about to split or isolate/putback
>> so it ends up making THP page movable pages.
>>
>> IMHO, it would be better to account it by movable pages.
>> What do you think about it?
>
> Agreed. Besides THP don't fragment pageblocks. It was just about
> speeding up the scanning the same way it happens with the pagebuddy
> check, but probably not worth it because we're in a racy area here not
> holding locks. pagebuddy is safe because the zone lock is hold, or
> it'd run in the same problem.


Yeb. zone lock is already hold so pagebuddy check is safe but THP still in a racy so let's leave it as it is.
If you don't have concern about this patch any more, could you add Acked-by in my latest patch for Andrew
to pick up? Although you have a concern, let's make it as separate patch because it's optimization patch and
other patch is pending by this.

Thanks, Andrea.

--
Kind regards,
Minchan Kim