2016-03-29 16:39:48

by Steve Capper

[permalink] [raw]
Subject: [PATCH] mm: Exclude HugeTLB pages from THP page_mapped logic

HugeTLB pages cannot be split, thus use the compound_mapcount to
track rmaps.

Currently the page_mapped function will check the compound_mapcount, but
will also go through the constituent pages of a THP compound page and
query the individual _mapcount's too.

Unfortunately, the page_mapped function does not distinguish between
HugeTLB and THP compound pages and assumes that a compound page always
needs to have HPAGE_PMD_NR pages querying.

For most cases when dealing with HugeTLB this is just inefficient, but
for scenarios where the HugeTLB page size is less than the pmd block
size (e.g. when using contiguous bit on ARM) this can lead to crashes.

This patch adjusts the page_mapped function such that we skip the
unnecessary THP reference checks for HugeTLB pages.

Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Steve Capper <[email protected]>
---

Hi,

This patch is my approach to fixing a problem that unearthed with
HugeTLB pages on arm64. We ran with PAGE_SIZE=64KB and placed down 32
contiguous ptes to create 2MB HugeTLB pages. (We can provide hints to
the MMU that page table entries are contiguous thus larger TLB entries
can be used to represent them).

The PMD_SIZE was 512MB thus the old version of page_mapped would read
through too many struct pages and lead to BUGs.

Original problem reported here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-March/414657.html

Having examined the HugeTLB code, I understand that only the
compound_mapcount_ptr is used to track rmap presence so going through
the individual _mapcounts for HugeTLB pages is superfluous? Or should I
instead post a patch that changes hpage_nr_pages to use the compound
order?

Also, for the sake of readability, would it be worth changing the
definition of PageTransHuge to refer to only THPs (not both HugeTLB
and THP)?

(I misinterpreted PageTransHuge in hpage_nr_pages initially which is one
reason this problem took me longer than normal to pin down this issue).

Cheers,
--
Steve

---
include/linux/mm.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed6407d..4b223dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1031,6 +1031,8 @@ static inline bool page_mapped(struct page *page)
page = compound_head(page);
if (atomic_read(compound_mapcount_ptr(page)) >= 0)
return true;
+ if (PageHuge(page))
+ return false;
for (i = 0; i < hpage_nr_pages(page); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
--
2.1.0


2016-03-29 16:51:54

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] mm: Exclude HugeTLB pages from THP page_mapped logic

On Tue, Mar 29, 2016 at 05:39:41PM +0100, Steve Capper wrote:
> HugeTLB pages cannot be split, thus use the compound_mapcount to
> track rmaps.
>
> Currently the page_mapped function will check the compound_mapcount, but
> will also go through the constituent pages of a THP compound page and
> query the individual _mapcount's too.
>
> Unfortunately, the page_mapped function does not distinguish between
> HugeTLB and THP compound pages and assumes that a compound page always
> needs to have HPAGE_PMD_NR pages querying.
>
> For most cases when dealing with HugeTLB this is just inefficient, but
> for scenarios where the HugeTLB page size is less than the pmd block
> size (e.g. when using contiguous bit on ARM) this can lead to crashes.
>
> This patch adjusts the page_mapped function such that we skip the
> unnecessary THP reference checks for HugeTLB pages.
>
> Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
> Cc: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Steve Capper <[email protected]>

Acked-by: Kirill A. Shutemov <[email protected]>

> ---
>
> Hi,
>
> This patch is my approach to fixing a problem that unearthed with
> HugeTLB pages on arm64. We ran with PAGE_SIZE=64KB and placed down 32
> contiguous ptes to create 2MB HugeTLB pages. (We can provide hints to
> the MMU that page table entries are contiguous thus larger TLB entries
> can be used to represent them).
>
> The PMD_SIZE was 512MB thus the old version of page_mapped would read
> through too many struct pages and lead to BUGs.
>
> Original problem reported here:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-March/414657.html
>
> Having examined the HugeTLB code, I understand that only the
> compound_mapcount_ptr is used to track rmap presence so going through
> the individual _mapcounts for HugeTLB pages is superfluous? Or should I
> instead post a patch that changes hpage_nr_pages to use the compound
> order?

I would not touch hpage_nr_page().

We probably need to introduce compound_nr_pages() or something to replace
(1 << compound_order(page)) to be used independetely from thp/hugetlb
pages.

> Also, for the sake of readability, would it be worth changing the
> definition of PageTransHuge to refer to only THPs (not both HugeTLB
> and THP)?

I don't think so.

That would have overhead, since we wound need to do function call inside
PageTransHuge(). HugeTLB() is not inlinable.

hugetlb deverges from rest of mm pretty early, so thp vs. hugetlb
confusion is not that ofter. We just don't share enough codepath.

> (I misinterpreted PageTransHuge in hpage_nr_pages initially which is one
> reason this problem took me longer than normal to pin down this issue).
>
> Cheers,
> --
> Steve
>
> ---
> include/linux/mm.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ed6407d..4b223dc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1031,6 +1031,8 @@ static inline bool page_mapped(struct page *page)
> page = compound_head(page);
> if (atomic_read(compound_mapcount_ptr(page)) >= 0)
> return true;
> + if (PageHuge(page))
> + return false;
> for (i = 0; i < hpage_nr_pages(page); i++) {
> if (atomic_read(&page[i]._mapcount) >= 0)
> return true;
> --
> 2.1.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2016-03-30 09:24:58

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH] mm: Exclude HugeTLB pages from THP page_mapped logic

On Tue, Mar 29, 2016 at 07:51:49PM +0300, Kirill A. Shutemov wrote:
> On Tue, Mar 29, 2016 at 05:39:41PM +0100, Steve Capper wrote:
> > HugeTLB pages cannot be split, thus use the compound_mapcount to
> > track rmaps.
> >
> > Currently the page_mapped function will check the compound_mapcount, but
> > will also go through the constituent pages of a THP compound page and
> > query the individual _mapcount's too.
> >
> > Unfortunately, the page_mapped function does not distinguish between
> > HugeTLB and THP compound pages and assumes that a compound page always
> > needs to have HPAGE_PMD_NR pages querying.
> >
> > For most cases when dealing with HugeTLB this is just inefficient, but
> > for scenarios where the HugeTLB page size is less than the pmd block
> > size (e.g. when using contiguous bit on ARM) this can lead to crashes.
> >
> > This patch adjusts the page_mapped function such that we skip the
> > unnecessary THP reference checks for HugeTLB pages.
> >
> > Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
> > Cc: Kirill A. Shutemov <[email protected]>
> > Signed-off-by: Steve Capper <[email protected]>
>
> Acked-by: Kirill A. Shutemov <[email protected]>

Thanks!

>
> > ---
> >
> > Hi,
> >
> > This patch is my approach to fixing a problem that unearthed with
> > HugeTLB pages on arm64. We ran with PAGE_SIZE=64KB and placed down 32
> > contiguous ptes to create 2MB HugeTLB pages. (We can provide hints to
> > the MMU that page table entries are contiguous thus larger TLB entries
> > can be used to represent them).
> >
> > The PMD_SIZE was 512MB thus the old version of page_mapped would read
> > through too many struct pages and lead to BUGs.
> >
> > Original problem reported here:
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2016-March/414657.html
> >
> > Having examined the HugeTLB code, I understand that only the
> > compound_mapcount_ptr is used to track rmap presence so going through
> > the individual _mapcounts for HugeTLB pages is superfluous? Or should I
> > instead post a patch that changes hpage_nr_pages to use the compound
> > order?
>
> I would not touch hpage_nr_page().
>
> We probably need to introduce compound_nr_pages() or something to replace
> (1 << compound_order(page)) to be used independetely from thp/hugetlb
> pages.

Okay, I will stick with the approach in this patch. With HugeTLB we also
have hstate information to use.

>
> > Also, for the sake of readability, would it be worth changing the
> > definition of PageTransHuge to refer to only THPs (not both HugeTLB
> > and THP)?
>
> I don't think so.
>
> That would have overhead, since we wound need to do function call inside
> PageTransHuge(). HugeTLB() is not inlinable.

Ahh, I hadn't considered that...

>
> hugetlb deverges from rest of mm pretty early, so thp vs. hugetlb
> confusion is not that ofter. We just don't share enough codepath.

Thanks Kirill, agreed.

Cheers,
--
Steve

2016-03-31 23:06:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: Exclude HugeTLB pages from THP page_mapped logic

On Tue, 29 Mar 2016 17:39:41 +0100 Steve Capper <[email protected]> wrote:

> HugeTLB pages cannot be split, thus use the compound_mapcount to
> track rmaps.
>
> Currently the page_mapped function will check the compound_mapcount, but

s/the page_mapped function/page_mapped()/. It's so much simpler!

> will also go through the constituent pages of a THP compound page and
> query the individual _mapcount's too.
>
> Unfortunately, the page_mapped function does not distinguish between
> HugeTLB and THP compound pages and assumes that a compound page always
> needs to have HPAGE_PMD_NR pages querying.
>
> For most cases when dealing with HugeTLB this is just inefficient, but
> for scenarios where the HugeTLB page size is less than the pmd block
> size (e.g. when using contiguous bit on ARM) this can lead to crashes.
>
> This patch adjusts the page_mapped function such that we skip the
> unnecessary THP reference checks for HugeTLB pages.
>
> Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
> Cc: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Steve Capper <[email protected]>
> ---
>
> Hi,
>
> This patch is my approach to fixing a problem that unearthed with
> HugeTLB pages on arm64. We ran with PAGE_SIZE=64KB and placed down 32
> contiguous ptes to create 2MB HugeTLB pages. (We can provide hints to
> the MMU that page table entries are contiguous thus larger TLB entries
> can be used to represent them).

So which kernel version(s) need this patch? I think both 4.4 and 4.5
will crash in this manner? Should we backport the fix into 4.4.x and
4.5.x?

>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1031,6 +1031,8 @@ static inline bool page_mapped(struct page *page)
> page = compound_head(page);
> if (atomic_read(compound_mapcount_ptr(page)) >= 0)
> return true;
> + if (PageHuge(page))
> + return false;
> for (i = 0; i < hpage_nr_pages(page); i++) {
> if (atomic_read(&page[i]._mapcount) >= 0)
> return true;

page_mapped() is moronically huge. Uninlining it saves 206 bytes per
callsite. It has 40+ callsites.




btw, is anyone else seeing this `make M=' breakage?

akpm3:/usr/src/25> make M=mm
Makefile:679: Cannot use CONFIG_KCOV: -fsanitize-coverage=trace-pc is not supported by compiler

WARNING: Symbol version dump ./Module.symvers
is missing; modules will have no dependencies and modversions.

make[1]: *** No rule to make target `mm/filemap.o', needed by `mm/built-in.o'. Stop.
make: *** [_module_mm] Error 2

It's a post-4.5 thing.



From: Andrew Morton <[email protected]>
Subject: mm: uninline page_mapped()

It's huge. Uninlining it saves 206 bytes per callsite. Shaves 4924 bytes
from the x86_64 allmodconfig vmlinux.

Cc: Steve Capper <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/linux/mm.h | 21 +--------------------
mm/util.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 20 deletions(-)

diff -puN include/linux/mm.h~mm-uninline-page_mapped include/linux/mm.h
--- a/include/linux/mm.h~mm-uninline-page_mapped
+++ a/include/linux/mm.h
@@ -1019,26 +1019,7 @@ static inline pgoff_t page_file_index(st
return page->index;
}

-/*
- * Return true if this page is mapped into pagetables.
- * For compound page it returns true if any subpage of compound page is mapped.
- */
-static inline bool page_mapped(struct page *page)
-{
- int i;
- if (likely(!PageCompound(page)))
- return atomic_read(&page->_mapcount) >= 0;
- page = compound_head(page);
- if (atomic_read(compound_mapcount_ptr(page)) >= 0)
- return true;
- if (PageHuge(page))
- return false;
- for (i = 0; i < hpage_nr_pages(page); i++) {
- if (atomic_read(&page[i]._mapcount) >= 0)
- return true;
- }
- return false;
-}
+bool page_mapped(struct page *page);

/*
* Return true only if the page has been allocated with
diff -puN mm/util.c~mm-uninline-page_mapped mm/util.c
--- a/mm/util.c~mm-uninline-page_mapped
+++ a/mm/util.c
@@ -346,6 +346,28 @@ void *page_rmapping(struct page *page)
return __page_rmapping(page);
}

+/*
+ * Return true if this page is mapped into pagetables.
+ * For compound page it returns true if any subpage of compound page is mapped.
+ */
+bool page_mapped(struct page *page)
+{
+ int i;
+ if (likely(!PageCompound(page)))
+ return atomic_read(&page->_mapcount) >= 0;
+ page = compound_head(page);
+ if (atomic_read(compound_mapcount_ptr(page)) >= 0)
+ return true;
+ if (PageHuge(page))
+ return false;
+ for (i = 0; i < hpage_nr_pages(page); i++) {
+ if (atomic_read(&page[i]._mapcount) >= 0)
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL(page_mapped);
+
struct anon_vma *page_anon_vma(struct page *page)
{
unsigned long mapping;
_

2016-04-01 13:24:18

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH] mm: Exclude HugeTLB pages from THP page_mapped logic

Hi Andrew,

On Thu, Mar 31, 2016 at 04:06:50PM -0700, Andrew Morton wrote:
> On Tue, 29 Mar 2016 17:39:41 +0100 Steve Capper <[email protected]> wrote:
>
> > HugeTLB pages cannot be split, thus use the compound_mapcount to
> > track rmaps.
> >
> > Currently the page_mapped function will check the compound_mapcount, but
>
> s/the page_mapped function/page_mapped()/. It's so much simpler!

Thanks, agreed :-).

>
> > will also go through the constituent pages of a THP compound page and
> > query the individual _mapcount's too.
> >
> > Unfortunately, the page_mapped function does not distinguish between
> > HugeTLB and THP compound pages and assumes that a compound page always
> > needs to have HPAGE_PMD_NR pages querying.
> >
> > For most cases when dealing with HugeTLB this is just inefficient, but
> > for scenarios where the HugeTLB page size is less than the pmd block
> > size (e.g. when using contiguous bit on ARM) this can lead to crashes.
> >
> > This patch adjusts the page_mapped function such that we skip the
> > unnecessary THP reference checks for HugeTLB pages.
> >
> > Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
> > Cc: Kirill A. Shutemov <[email protected]>
> > Signed-off-by: Steve Capper <[email protected]>
> > ---
> >
> > Hi,
> >
> > This patch is my approach to fixing a problem that unearthed with
> > HugeTLB pages on arm64. We ran with PAGE_SIZE=64KB and placed down 32
> > contiguous ptes to create 2MB HugeTLB pages. (We can provide hints to
> > the MMU that page table entries are contiguous thus larger TLB entries
> > can be used to represent them).
>
> So which kernel version(s) need this patch? I think both 4.4 and 4.5
> will crash in this manner? Should we backport the fix into 4.4.x and
> 4.5.x?

We de-activated the contiguous hint support just before 4.5 (as we ran
into the problem too late). So no kernels are currently crashing due to
this. If this goes in, we can then re-enable contiguous hint on ARM.

>
> >
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1031,6 +1031,8 @@ static inline bool page_mapped(struct page *page)
> > page = compound_head(page);
> > if (atomic_read(compound_mapcount_ptr(page)) >= 0)
> > return true;
> > + if (PageHuge(page))
> > + return false;
> > for (i = 0; i < hpage_nr_pages(page); i++) {
> > if (atomic_read(&page[i]._mapcount) >= 0)
> > return true;
>
> page_mapped() is moronically huge. Uninlining it saves 206 bytes per
> callsite. It has 40+ callsites.
>
>
>
>
> btw, is anyone else seeing this `make M=' breakage?
>
> akpm3:/usr/src/25> make M=mm
> Makefile:679: Cannot use CONFIG_KCOV: -fsanitize-coverage=trace-pc is not supported by compiler
>
> WARNING: Symbol version dump ./Module.symvers
> is missing; modules will have no dependencies and modversions.
>
> make[1]: *** No rule to make target `mm/filemap.o', needed by `mm/built-in.o'. Stop.
> make: *** [_module_mm] Error 2
>
> It's a post-4.5 thing.

Sorry I have not yet tried out KCOV.

>
>
>
> From: Andrew Morton <[email protected]>
> Subject: mm: uninline page_mapped()
>
> It's huge. Uninlining it saves 206 bytes per callsite. Shaves 4924 bytes
> from the x86_64 allmodconfig vmlinux.
>
> Cc: Steve Capper <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---

The below looks reasonable to me, I don't have any benchmarks handy to
test for a performance regression on this though.

>
> include/linux/mm.h | 21 +--------------------
> mm/util.c | 22 ++++++++++++++++++++++
> 2 files changed, 23 insertions(+), 20 deletions(-)
>
> diff -puN include/linux/mm.h~mm-uninline-page_mapped include/linux/mm.h
> --- a/include/linux/mm.h~mm-uninline-page_mapped
> +++ a/include/linux/mm.h
> @@ -1019,26 +1019,7 @@ static inline pgoff_t page_file_index(st
> return page->index;
> }
>
> -/*
> - * Return true if this page is mapped into pagetables.
> - * For compound page it returns true if any subpage of compound page is mapped.
> - */
> -static inline bool page_mapped(struct page *page)
> -{
> - int i;
> - if (likely(!PageCompound(page)))
> - return atomic_read(&page->_mapcount) >= 0;
> - page = compound_head(page);
> - if (atomic_read(compound_mapcount_ptr(page)) >= 0)
> - return true;
> - if (PageHuge(page))
> - return false;
> - for (i = 0; i < hpage_nr_pages(page); i++) {
> - if (atomic_read(&page[i]._mapcount) >= 0)
> - return true;
> - }
> - return false;
> -}
> +bool page_mapped(struct page *page);
>
> /*
> * Return true only if the page has been allocated with
> diff -puN mm/util.c~mm-uninline-page_mapped mm/util.c
> --- a/mm/util.c~mm-uninline-page_mapped
> +++ a/mm/util.c
> @@ -346,6 +346,28 @@ void *page_rmapping(struct page *page)
> return __page_rmapping(page);
> }
>
> +/*
> + * Return true if this page is mapped into pagetables.
> + * For compound page it returns true if any subpage of compound page is mapped.
> + */
> +bool page_mapped(struct page *page)
> +{
> + int i;
> + if (likely(!PageCompound(page)))
> + return atomic_read(&page->_mapcount) >= 0;
> + page = compound_head(page);
> + if (atomic_read(compound_mapcount_ptr(page)) >= 0)
> + return true;
> + if (PageHuge(page))
> + return false;
> + for (i = 0; i < hpage_nr_pages(page); i++) {
> + if (atomic_read(&page[i]._mapcount) >= 0)
> + return true;
> + }
> + return false;
> +}
> +EXPORT_SYMBOL(page_mapped);
> +
> struct anon_vma *page_anon_vma(struct page *page)
> {
> unsigned long mapping;
> _
>

Cheers,
--
Steve