2022-05-13 21:28:58

by Yang Shi

[permalink] [raw]
Subject: [v2 PATCH] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

IIUC PVMW checks if the vma is possibly huge PMD mapped by
transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".

Actually pvmw->nr_pages is returned by compound_nr() or
folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
>= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA
in the first place. But it may be not PMD mapped if the VMA is file
VMA and it is not properly aligned. The transhuge_vma_suitable()
is used to do such check, so replace transparent_hugepage_active() to
it, which is too heavy and overkilling.

Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
* Removed fixes tag per Willy

include/linux/huge_mm.h | 8 ++++++--
mm/page_vma_mapped.c | 2 +-
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fbf36bb1be22..c2826b1f4069 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
extern unsigned long transparent_hugepage_flags;

static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
- unsigned long haddr)
+ unsigned long addr)
{
+ unsigned long haddr;
+
/* Don't have to check pgoff for anonymous vma */
if (!vma_is_anonymous(vma)) {
if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
@@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
return false;
}

+ haddr = addr & HPAGE_PMD_MASK;
+
if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
return false;
return true;
@@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
}

static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
- unsigned long haddr)
+ unsigned long addr)
{
return false;
}
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..e971a467fcdf 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
* cleared *pmd but not decremented compound_mapcount().
*/
if ((pvmw->flags & PVMW_SYNC) &&
- transparent_hugepage_active(vma) &&
+ transhuge_vma_suitable(vma, pvmw->address) &&
(pvmw->nr_pages >= HPAGE_PMD_NR)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);

--
2.26.3



2022-05-18 04:50:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

On Fri, 13 May 2022 12:17:05 -0700 Yang Shi <[email protected]> wrote:

> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
>
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA
> in the first place. But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned. The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.

I messed with the changelog a bit. The function is called
page_vma_mapped_walk(), so let's call it that.

This patch has been in the trees since May 12, which isn't terribly
long. Does anyone feel up to a reviewed-by?

Thanks.

From: Yang Shi <[email protected]>
Subject: mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable()
Date: Fri, 13 May 2022 12:17:05 -0700

IIUC page_vma_mapped_walk() checks if the vma is possibly huge PMD mapped
with transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".

Actually pvmw->nr_pages is returned by compound_nr() or folio_nr_pages(),
so the page should be THP as long as "pvmw->nr_pages >= HPAGE_PMD_NR".
And it is guaranteed THP is allocated for valid VMA in the first place.
But it may be not PMD mapped if the VMA is file VMA and it is not properly
aligned. The transhuge_vma_suitable() is used to do such check, so
replace transparent_hugepage_active() to it, which is too heavy and
overkilling.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/linux/huge_mm.h | 8 ++++++--
mm/page_vma_mapped.c | 2 +-
2 files changed, 7 insertions(+), 3 deletions(-)

--- a/include/linux/huge_mm.h~mm-pvmw-check-possible-huge-pmd-map-by-transhuge_vma_suitable
+++ a/include/linux/huge_mm.h
@@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabl
extern unsigned long transparent_hugepage_flags;

static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
- unsigned long haddr)
+ unsigned long addr)
{
+ unsigned long haddr;
+
/* Don't have to check pgoff for anonymous vma */
if (!vma_is_anonymous(vma)) {
if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
@@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitabl
return false;
}

+ haddr = addr & HPAGE_PMD_MASK;
+
if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
return false;
return true;
@@ -342,7 +346,7 @@ static inline bool transparent_hugepage_
}

static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
- unsigned long haddr)
+ unsigned long addr)
{
return false;
}
--- a/mm/page_vma_mapped.c~mm-pvmw-check-possible-huge-pmd-map-by-transhuge_vma_suitable
+++ a/mm/page_vma_mapped.c
@@ -243,7 +243,7 @@ restart:
* cleared *pmd but not decremented compound_mapcount().
*/
if ((pvmw->flags & PVMW_SYNC) &&
- transparent_hugepage_active(vma) &&
+ transhuge_vma_suitable(vma, pvmw->address) &&
(pvmw->nr_pages >= HPAGE_PMD_NR)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);

_


2022-05-18 05:44:39

by Muchun Song

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
>
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA
> in the first place. But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned. The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.
>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Muchun Song <[email protected]>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
> * Removed fixes tag per Willy
>
> include/linux/huge_mm.h | 8 ++++++--
> mm/page_vma_mapped.c | 2 +-
> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index fbf36bb1be22..c2826b1f4069 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
> extern unsigned long transparent_hugepage_flags;
>
> static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> - unsigned long haddr)
> + unsigned long addr)
> {
> + unsigned long haddr;
> +
> /* Don't have to check pgoff for anonymous vma */
> if (!vma_is_anonymous(vma)) {
> if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> return false;
> }
>
> + haddr = addr & HPAGE_PMD_MASK;
> +
> if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
> return false;
> return true;
> @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> }
>
> static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> - unsigned long haddr)
> + unsigned long addr)
> {
> return false;
> }
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index c10f839fc410..e971a467fcdf 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> * cleared *pmd but not decremented compound_mapcount().
> */
> if ((pvmw->flags & PVMW_SYNC) &&
> - transparent_hugepage_active(vma) &&
> + transhuge_vma_suitable(vma, pvmw->address) &&

How about the following diff? Then we do not need to change
transhuge_vma_suitable(). All the users of transhuge_vma_suitable()
are already do the alignment by themselves.

Thanks.

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..0aed5ca60c67 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -243,7 +243,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
* cleared *pmd but not decremented compound_mapcount().
*/
if ((pvmw->flags & PVMW_SYNC) &&
- transparent_hugepage_active(vma) &&
+ IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ transhuge_vma_suitable(vma, pvmw->address & HPAGE_PMD_MASK) &&
(pvmw->nr_pages >= HPAGE_PMD_NR)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);

> (pvmw->nr_pages >= HPAGE_PMD_NR)) {
> spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>
> --
> 2.26.3
>
>

2022-05-18 18:49:00

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

On Tue, May 17, 2022 at 10:31 PM Muchun Song <[email protected]> wrote:
>
> On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> >
> > Actually pvmw->nr_pages is returned by compound_nr() or
> > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA
> > in the first place. But it may be not PMD mapped if the VMA is file
> > VMA and it is not properly aligned. The transhuge_vma_suitable()
> > is used to do such check, so replace transparent_hugepage_active() to
> > it, which is too heavy and overkilling.
> >
> > Cc: Matthew Wilcox (Oracle) <[email protected]>
> > Cc: Muchun Song <[email protected]>
> > Signed-off-by: Yang Shi <[email protected]>
> > ---
> > v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
> > * Removed fixes tag per Willy
> >
> > include/linux/huge_mm.h | 8 ++++++--
> > mm/page_vma_mapped.c | 2 +-
> > 2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index fbf36bb1be22..c2826b1f4069 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
> > extern unsigned long transparent_hugepage_flags;
> >
> > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > - unsigned long haddr)
> > + unsigned long addr)
> > {
> > + unsigned long haddr;
> > +
> > /* Don't have to check pgoff for anonymous vma */
> > if (!vma_is_anonymous(vma)) {
> > if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> > @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > return false;
> > }
> >
> > + haddr = addr & HPAGE_PMD_MASK;
> > +
> > if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
> > return false;
> > return true;
> > @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > }
> >
> > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > - unsigned long haddr)
> > + unsigned long addr)
> > {
> > return false;
> > }
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index c10f839fc410..e971a467fcdf 100644
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > * cleared *pmd but not decremented compound_mapcount().
> > */
> > if ((pvmw->flags & PVMW_SYNC) &&
> > - transparent_hugepage_active(vma) &&
> > + transhuge_vma_suitable(vma, pvmw->address) &&
>
> How about the following diff? Then we do not need to change
> transhuge_vma_suitable(). All the users of transhuge_vma_suitable()
> are already do the alignment by themselves.

Thanks for the suggestion. But TBH I don't think this is a better way.
I did think about this before proposing v2, but I don't prefer to
pollute the code with IS_ENABLED(CONFIG_xxx) since the definition of
transhuge_vma_suitable() is already protected by #ifdef. Rounding the
address in transhuge_vma_suitable() seems neater and more readable to
me IMHO.

Some callers of transhuge_vma_suitable() do round the address before
calling it, but the rounded address is used by other codes in the
callers too.

>
> Thanks.
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index c10f839fc410..0aed5ca60c67 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,7 +243,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> * cleared *pmd but not decremented compound_mapcount().
> */
> if ((pvmw->flags & PVMW_SYNC) &&
> - transparent_hugepage_active(vma) &&
> + IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> + transhuge_vma_suitable(vma, pvmw->address & HPAGE_PMD_MASK) &&
> (pvmw->nr_pages >= HPAGE_PMD_NR)) {
> spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>
> > (pvmw->nr_pages >= HPAGE_PMD_NR)) {
> > spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
> >
> > --
> > 2.26.3
> >
> >

2022-05-19 13:31:00

by Muchun Song

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

On Wed, May 18, 2022 at 11:45:14AM -0700, Yang Shi wrote:
> On Tue, May 17, 2022 at 10:31 PM Muchun Song <[email protected]> wrote:
> >
> > On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> > > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> > >
> > > Actually pvmw->nr_pages is returned by compound_nr() or
> > > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > > >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA
> > > in the first place. But it may be not PMD mapped if the VMA is file
> > > VMA and it is not properly aligned. The transhuge_vma_suitable()
> > > is used to do such check, so replace transparent_hugepage_active() to
> > > it, which is too heavy and overkilling.
> > >
> > > Cc: Matthew Wilcox (Oracle) <[email protected]>
> > > Cc: Muchun Song <[email protected]>
> > > Signed-off-by: Yang Shi <[email protected]>
> > > ---
> > > v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
> > > * Removed fixes tag per Willy
> > >
> > > include/linux/huge_mm.h | 8 ++++++--
> > > mm/page_vma_mapped.c | 2 +-
> > > 2 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index fbf36bb1be22..c2826b1f4069 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
> > > extern unsigned long transparent_hugepage_flags;
> > >
> > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > - unsigned long haddr)
> > > + unsigned long addr)
> > > {
> > > + unsigned long haddr;
> > > +
> > > /* Don't have to check pgoff for anonymous vma */
> > > if (!vma_is_anonymous(vma)) {
> > > if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> > > @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > return false;
> > > }
> > >
> > > + haddr = addr & HPAGE_PMD_MASK;
> > > +
> > > if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
> > > return false;
> > > return true;
> > > @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > }
> > >
> > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > - unsigned long haddr)
> > > + unsigned long addr)
> > > {
> > > return false;
> > > }
> > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > > index c10f839fc410..e971a467fcdf 100644
> > > --- a/mm/page_vma_mapped.c
> > > +++ b/mm/page_vma_mapped.c
> > > @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > > * cleared *pmd but not decremented compound_mapcount().
> > > */
> > > if ((pvmw->flags & PVMW_SYNC) &&
> > > - transparent_hugepage_active(vma) &&
> > > + transhuge_vma_suitable(vma, pvmw->address) &&
> >
> > How about the following diff? Then we do not need to change
> > transhuge_vma_suitable(). All the users of transhuge_vma_suitable()
> > are already do the alignment by themselves.
>
> Thanks for the suggestion. But TBH I don't think this is a better way.
> I did think about this before proposing v2, but I don't prefer to
> pollute the code with IS_ENABLED(CONFIG_xxx) since the definition of
> transhuge_vma_suitable() is already protected by #ifdef. Rounding the
> address in transhuge_vma_suitable() seems neater and more readable to
> me IMHO.
>
> Some callers of transhuge_vma_suitable() do round the address before
> calling it, but the rounded address is used by other codes in the
> callers too.
>

All right.

Reviewed-by: Muchun Song <[email protected]>

Thanks.