2020-02-18 05:42:41

by David Rientjes

[permalink] [raw]
Subject: [patch] mm, thp: track fallbacks due to failed memcg charges separately

The thp_fault_fallback stat in either /proc/vmstat is incremented if
either the hugepage allocation fails through the page allocator or the
hugepage charge fails through mem cgroup.

This patch leaves this field untouched but adds a new field,
thp_fault_fallback_charge, which is incremented only when the mem cgroup
charge fails.

This distinguishes between faults that want to be backed by hugepages but
fail due to fragmentation (or low memory conditions) and those that fail
due to mem cgroup limits. That can be used to determine the impact of
fragmentation on the system by excluding faults that failed due to memcg
usage.

Signed-off-by: David Rientjes <[email protected]>
---
Documentation/admin-guide/mm/transhuge.rst | 5 +++++
include/linux/vm_event_item.h | 1 +
mm/huge_memory.c | 2 ++
mm/vmstat.c | 1 +
4 files changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -310,6 +310,11 @@ thp_fault_fallback
is incremented if a page fault fails to allocate
a huge page and instead falls back to using small pages.

+thp_fault_fallback_charge
+ is incremented if a page fault fails to charge a huge page and
+ instead falls back to using small pages even through the
+ allocation was successful.
+
thp_collapse_alloc_failed
is incremented if khugepaged found a range
of pages that should be collapsed into one huge page but failed
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -73,6 +73,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
THP_FAULT_ALLOC,
THP_FAULT_FALLBACK,
+ THP_FAULT_FALLBACK_CHARGE,
THP_COLLAPSE_ALLOC,
THP_COLLAPSE_ALLOC_FAILED,
THP_FILE_ALLOC,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -597,6 +597,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
+ count_vm_event(THP_FAULT_FALLBACK_CHARGE);
return VM_FAULT_FALLBACK;
}

@@ -1406,6 +1407,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
put_page(page);
ret |= VM_FAULT_FALLBACK;
count_vm_event(THP_FAULT_FALLBACK);
+ count_vm_event(THP_FAULT_FALLBACK_CHARGE);
goto out;
}

diff --git a/mm/vmstat.c b/mm/vmstat.c
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1254,6 +1254,7 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
"thp_fault_alloc",
"thp_fault_fallback",
+ "thp_fault_fallback_charge",
"thp_collapse_alloc",
"thp_collapse_alloc_failed",
"thp_file_alloc",


2020-02-18 08:26:41

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [patch] mm, thp: track fallbacks due to failed memcg charges separately

On Mon, Feb 17, 2020 at 09:41:40PM -0800, David Rientjes wrote:
> The thp_fault_fallback stat in either /proc/vmstat is incremented if
> either the hugepage allocation fails through the page allocator or the
> hugepage charge fails through mem cgroup.
>
> This patch leaves this field untouched but adds a new field,
> thp_fault_fallback_charge, which is incremented only when the mem cgroup
> charge fails.
>
> This distinguishes between faults that want to be backed by hugepages but
> fail due to fragmentation (or low memory conditions) and those that fail
> due to mem cgroup limits. That can be used to determine the impact of
> fragmentation on the system by excluding faults that failed due to memcg
> usage.
>
> Signed-off-by: David Rientjes <[email protected]>

The patch looks good to me, but I noticed that we miss THP_FAULT_FALLBACK
(and THP_FAULT_FALLBACK_CHARGE) accounting in shmem_getpage_gfp().

Could you fix this while you are there?
--
Kirill A. Shutemov

2020-02-18 09:34:48

by Mike Rapoport

[permalink] [raw]
Subject: Re: [patch] mm, thp: track fallbacks due to failed memcg charges separately

On Mon, Feb 17, 2020 at 09:41:40PM -0800, David Rientjes wrote:
> The thp_fault_fallback stat in either /proc/vmstat is incremented if

Nit: ^ "either" here looks wrong

> either the hugepage allocation fails through the page allocator or the
> hugepage charge fails through mem cgroup.
>
> This patch leaves this field untouched but adds a new field,
> thp_fault_fallback_charge, which is incremented only when the mem cgroup
> charge fails.
>
> This distinguishes between faults that want to be backed by hugepages but
> fail due to fragmentation (or low memory conditions) and those that fail
> due to mem cgroup limits. That can be used to determine the impact of
> fragmentation on the system by excluding faults that failed due to memcg
> usage.
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> Documentation/admin-guide/mm/transhuge.rst | 5 +++++
> include/linux/vm_event_item.h | 1 +
> mm/huge_memory.c | 2 ++
> mm/vmstat.c | 1 +
> 4 files changed, 9 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -310,6 +310,11 @@ thp_fault_fallback
> is incremented if a page fault fails to allocate
> a huge page and instead falls back to using small pages.
>
> +thp_fault_fallback_charge
> + is incremented if a page fault fails to charge a huge page and
> + instead falls back to using small pages even through the

^ though

> + allocation was successful.
> +
> thp_collapse_alloc_failed
> is incremented if khugepaged found a range
> of pages that should be collapsed into one huge page but failed
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -73,6 +73,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> THP_FAULT_ALLOC,
> THP_FAULT_FALLBACK,
> + THP_FAULT_FALLBACK_CHARGE,
> THP_COLLAPSE_ALLOC,
> THP_COLLAPSE_ALLOC_FAILED,
> THP_FILE_ALLOC,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -597,6 +597,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
> put_page(page);
> count_vm_event(THP_FAULT_FALLBACK);
> + count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> return VM_FAULT_FALLBACK;
> }
>
> @@ -1406,6 +1407,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
> put_page(page);
> ret |= VM_FAULT_FALLBACK;
> count_vm_event(THP_FAULT_FALLBACK);
> + count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> goto out;
> }
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1254,6 +1254,7 @@ const char * const vmstat_text[] = {
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> "thp_fault_alloc",
> "thp_fault_fallback",
> + "thp_fault_fallback_charge",
> "thp_collapse_alloc",
> "thp_collapse_alloc_failed",
> "thp_file_alloc",

--
Sincerely yours,
Mike.

2020-02-19 02:01:13

by David Rientjes

[permalink] [raw]
Subject: Re: [patch] mm, thp: track fallbacks due to failed memcg charges separately

On Tue, 18 Feb 2020, Kirill A. Shutemov wrote:

> On Mon, Feb 17, 2020 at 09:41:40PM -0800, David Rientjes wrote:
> > The thp_fault_fallback stat in either /proc/vmstat is incremented if
> > either the hugepage allocation fails through the page allocator or the
> > hugepage charge fails through mem cgroup.
> >
> > This patch leaves this field untouched but adds a new field,
> > thp_fault_fallback_charge, which is incremented only when the mem cgroup
> > charge fails.
> >
> > This distinguishes between faults that want to be backed by hugepages but
> > fail due to fragmentation (or low memory conditions) and those that fail
> > due to mem cgroup limits. That can be used to determine the impact of
> > fragmentation on the system by excluding faults that failed due to memcg
> > usage.
> >
> > Signed-off-by: David Rientjes <[email protected]>
>
> The patch looks good to me, but I noticed that we miss THP_FAULT_FALLBACK
> (and THP_FAULT_FALLBACK_CHARGE) accounting in shmem_getpage_gfp().
>
> Could you fix this while you are there?

Sure, I'll add it as a predecessor and also count THP_FAULT_ALLOC for
consistency.

2020-02-19 02:30:36

by David Rientjes

[permalink] [raw]
Subject: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

The thp_fault_alloc and thp_fault_fallback vmstats are incremented when a
hugepage is successfully or unsuccessfully allocated, respectively, during
a page fault for anonymous memory.

Extend this to shmem as well. Note that care is taken to increment
thp_fault_alloc only when the fault succeeds; this is the same behavior as
anonymous thp.

Signed-off-by: David Rientjes <[email protected]>
---
mm/shmem.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
return page;
}

-static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
- struct inode *inode,
- pgoff_t index, bool huge)
+static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
+ pgoff_t index, bool fault, bool huge)
{
struct shmem_inode_info *info = SHMEM_I(inode);
struct page *page;
@@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
if (!shmem_inode_acct_block(inode, nr))
goto failed;

- if (huge)
+ if (huge) {
page = shmem_alloc_hugepage(gfp, info, index);
- else
+ if (!page && fault)
+ count_vm_event(THP_FAULT_FALLBACK);
+ } else
page = shmem_alloc_page(gfp, info, index);
if (page) {
__SetPageLocked(page);
@@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
}

alloc_huge:
- page = shmem_alloc_and_acct_page(gfp, inode, index, true);
+ page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
if (IS_ERR(page)) {
alloc_nohuge:
- page = shmem_alloc_and_acct_page(gfp, inode,
- index, false);
+ page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
}
if (IS_ERR(page)) {
int retry = 5;
@@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,

error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
PageTransHuge(page));
- if (error)
+ if (error) {
+ if (vmf && PageTransHuge(page))
+ count_vm_event(THP_FAULT_FALLBACK);
goto unacct;
+ }
error = shmem_add_to_page_cache(page, mapping, hindex,
NULL, gfp & GFP_RECLAIM_MASK);
if (error) {
@@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
mem_cgroup_commit_charge(page, memcg, false,
PageTransHuge(page));
lru_cache_add_anon(page);
+ if (vmf && PageTransHuge(page))
+ count_vm_event(THP_FAULT_ALLOC);

spin_lock_irq(&info->lock);
info->alloced += compound_nr(page);

2020-02-19 02:31:36

by David Rientjes

[permalink] [raw]
Subject: [patch 2/2] mm, thp: track fallbacks due to failed memcg charges separately

The thp_fault_fallback stat in /proc/vmstat is incremented if either the
hugepage allocation fails through the page allocator or the hugepage
charge fails through mem cgroup.

This patch leaves this field untouched but adds a new field,
thp_fault_fallback_charge, which is incremented only when the mem cgroup
charge fails.

This distinguishes between faults that want to be backed by hugepages but
fail due to fragmentation (or low memory conditions) and those that fail
due to mem cgroup limits. That can be used to determine the impact of
fragmentation on the system by excluding faults that failed due to memcg
usage.

Signed-off-by: David Rientjes <[email protected]>
---
v2:
- supported for shmem faults as well per Kirill
- fixed worked in documentation and commit description per Mike

Documentation/admin-guide/mm/transhuge.rst | 5 +++++
include/linux/vm_event_item.h | 1 +
mm/huge_memory.c | 2 ++
mm/shmem.c | 4 +++-
mm/vmstat.c | 1 +
5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -310,6 +310,11 @@ thp_fault_fallback
is incremented if a page fault fails to allocate
a huge page and instead falls back to using small pages.

+thp_fault_fallback_charge
+ is incremented if a page fault fails to charge a huge page and
+ instead falls back to using small pages even though the
+ allocation was successful.
+
thp_collapse_alloc_failed
is incremented if khugepaged found a range
of pages that should be collapsed into one huge page but failed
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -73,6 +73,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
THP_FAULT_ALLOC,
THP_FAULT_FALLBACK,
+ THP_FAULT_FALLBACK_CHARGE,
THP_COLLAPSE_ALLOC,
THP_COLLAPSE_ALLOC_FAILED,
THP_FILE_ALLOC,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -597,6 +597,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
+ count_vm_event(THP_FAULT_FALLBACK_CHARGE);
return VM_FAULT_FALLBACK;
}

@@ -1406,6 +1407,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
put_page(page);
ret |= VM_FAULT_FALLBACK;
count_vm_event(THP_FAULT_FALLBACK);
+ count_vm_event(THP_FAULT_FALLBACK_CHARGE);
goto out;
}

diff --git a/mm/shmem.c b/mm/shmem.c
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1872,8 +1872,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
PageTransHuge(page));
if (error) {
- if (vmf && PageTransHuge(page))
+ if (vmf && PageTransHuge(page)) {
count_vm_event(THP_FAULT_FALLBACK);
+ count_vm_event(THP_FAULT_FALLBACK_CHARGE);
+ }
goto unacct;
}
error = shmem_add_to_page_cache(page, mapping, hindex,
diff --git a/mm/vmstat.c b/mm/vmstat.c
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1254,6 +1254,7 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
"thp_fault_alloc",
"thp_fault_fallback",
+ "thp_fault_fallback_charge",
"thp_collapse_alloc",
"thp_collapse_alloc_failed",
"thp_file_alloc",

2020-02-19 03:25:05

by Yang Shi

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Tue, Feb 18, 2020 at 6:29 PM David Rientjes <[email protected]> wrote:
>
> The thp_fault_alloc and thp_fault_fallback vmstats are incremented when a
> hugepage is successfully or unsuccessfully allocated, respectively, during
> a page fault for anonymous memory.
>
> Extend this to shmem as well. Note that care is taken to increment
> thp_fault_alloc only when the fault succeeds; this is the same behavior as
> anonymous thp.
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> mm/shmem.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> return page;
> }
>
> -static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> - struct inode *inode,
> - pgoff_t index, bool huge)
> +static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
> + pgoff_t index, bool fault, bool huge)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> struct page *page;
> @@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> if (!shmem_inode_acct_block(inode, nr))
> goto failed;
>
> - if (huge)
> + if (huge) {
> page = shmem_alloc_hugepage(gfp, info, index);
> - else
> + if (!page && fault)
> + count_vm_event(THP_FAULT_FALLBACK);
> + } else
> page = shmem_alloc_page(gfp, info, index);
> if (page) {
> __SetPageLocked(page);
> @@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> }
>
> alloc_huge:
> - page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
> if (IS_ERR(page)) {
> alloc_nohuge:
> - page = shmem_alloc_and_acct_page(gfp, inode,
> - index, false);
> + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
> }
> if (IS_ERR(page)) {
> int retry = 5;
> @@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>
> error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> PageTransHuge(page));
> - if (error)
> + if (error) {
> + if (vmf && PageTransHuge(page))
> + count_vm_event(THP_FAULT_FALLBACK);
> goto unacct;
> + }
> error = shmem_add_to_page_cache(page, mapping, hindex,
> NULL, gfp & GFP_RECLAIM_MASK);
> if (error) {
> @@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> mem_cgroup_commit_charge(page, memcg, false,
> PageTransHuge(page));
> lru_cache_add_anon(page);
> + if (vmf && PageTransHuge(page))
> + count_vm_event(THP_FAULT_ALLOC);

I think shmem THP alloc is accounted to THP_FILE_ALLOC. And it has
been accounted by shmem_add_to_page_cache(). So, it sounds like a
double count.

>
> spin_lock_irq(&info->lock);
> info->alloced += compound_nr(page);
>

2020-02-19 03:45:51

by David Rientjes

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Tue, 18 Feb 2020, Yang Shi wrote:

> > diff --git a/mm/shmem.c b/mm/shmem.c
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> > return page;
> > }
> >
> > -static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > - struct inode *inode,
> > - pgoff_t index, bool huge)
> > +static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
> > + pgoff_t index, bool fault, bool huge)
> > {
> > struct shmem_inode_info *info = SHMEM_I(inode);
> > struct page *page;
> > @@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > if (!shmem_inode_acct_block(inode, nr))
> > goto failed;
> >
> > - if (huge)
> > + if (huge) {
> > page = shmem_alloc_hugepage(gfp, info, index);
> > - else
> > + if (!page && fault)
> > + count_vm_event(THP_FAULT_FALLBACK);
> > + } else
> > page = shmem_alloc_page(gfp, info, index);
> > if (page) {
> > __SetPageLocked(page);
> > @@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > }
> >
> > alloc_huge:
> > - page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
> > if (IS_ERR(page)) {
> > alloc_nohuge:
> > - page = shmem_alloc_and_acct_page(gfp, inode,
> > - index, false);
> > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
> > }
> > if (IS_ERR(page)) {
> > int retry = 5;
> > @@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> >
> > error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> > PageTransHuge(page));
> > - if (error)
> > + if (error) {
> > + if (vmf && PageTransHuge(page))
> > + count_vm_event(THP_FAULT_FALLBACK);
> > goto unacct;
> > + }
> > error = shmem_add_to_page_cache(page, mapping, hindex,
> > NULL, gfp & GFP_RECLAIM_MASK);
> > if (error) {
> > @@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > mem_cgroup_commit_charge(page, memcg, false,
> > PageTransHuge(page));
> > lru_cache_add_anon(page);
> > + if (vmf && PageTransHuge(page))
> > + count_vm_event(THP_FAULT_ALLOC);
>
> I think shmem THP alloc is accounted to THP_FILE_ALLOC. And it has
> been accounted by shmem_add_to_page_cache(). So, it sounds like a
> double count.
>

I think we can choose to either include file allocations into both
thp_fault_alloc and thp_fault_fallback or we can exclude them from both of
them. I don't think we can account for only one of them.

> >
> > spin_lock_irq(&info->lock);
> > info->alloced += compound_nr(page);
> >
>

2020-02-19 08:24:31

by Mike Rapoport

[permalink] [raw]
Subject: Re: [patch 2/2] mm, thp: track fallbacks due to failed memcg charges separately

On Tue, Feb 18, 2020 at 06:29:21PM -0800, David Rientjes wrote:
> The thp_fault_fallback stat in /proc/vmstat is incremented if either the
> hugepage allocation fails through the page allocator or the hugepage
> charge fails through mem cgroup.
>
> This patch leaves this field untouched but adds a new field,
> thp_fault_fallback_charge, which is incremented only when the mem cgroup
> charge fails.
>
> This distinguishes between faults that want to be backed by hugepages but
> fail due to fragmentation (or low memory conditions) and those that fail
> due to mem cgroup limits. That can be used to determine the impact of
> fragmentation on the system by excluding faults that failed due to memcg
> usage.
>
> Signed-off-by: David Rientjes <[email protected]>

Reviewed-by: Mike Rapoport <[email protected]> # Documentation

> ---
> v2:
> - supported for shmem faults as well per Kirill
> - fixed worked in documentation and commit description per Mike
>
> Documentation/admin-guide/mm/transhuge.rst | 5 +++++
> include/linux/vm_event_item.h | 1 +
> mm/huge_memory.c | 2 ++
> mm/shmem.c | 4 +++-
> mm/vmstat.c | 1 +
> 5 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -310,6 +310,11 @@ thp_fault_fallback
> is incremented if a page fault fails to allocate
> a huge page and instead falls back to using small pages.
>
> +thp_fault_fallback_charge
> + is incremented if a page fault fails to charge a huge page and
> + instead falls back to using small pages even though the
> + allocation was successful.
> +
> thp_collapse_alloc_failed
> is incremented if khugepaged found a range
> of pages that should be collapsed into one huge page but failed
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -73,6 +73,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> THP_FAULT_ALLOC,
> THP_FAULT_FALLBACK,
> + THP_FAULT_FALLBACK_CHARGE,
> THP_COLLAPSE_ALLOC,
> THP_COLLAPSE_ALLOC_FAILED,
> THP_FILE_ALLOC,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -597,6 +597,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
> put_page(page);
> count_vm_event(THP_FAULT_FALLBACK);
> + count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> return VM_FAULT_FALLBACK;
> }
>
> @@ -1406,6 +1407,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
> put_page(page);
> ret |= VM_FAULT_FALLBACK;
> count_vm_event(THP_FAULT_FALLBACK);
> + count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> goto out;
> }
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1872,8 +1872,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> PageTransHuge(page));
> if (error) {
> - if (vmf && PageTransHuge(page))
> + if (vmf && PageTransHuge(page)) {
> count_vm_event(THP_FAULT_FALLBACK);
> + count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> + }
> goto unacct;
> }
> error = shmem_add_to_page_cache(page, mapping, hindex,
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1254,6 +1254,7 @@ const char * const vmstat_text[] = {
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> "thp_fault_alloc",
> "thp_fault_fallback",
> + "thp_fault_fallback_charge",
> "thp_collapse_alloc",
> "thp_collapse_alloc_failed",
> "thp_file_alloc",

--
Sincerely yours,
Mike.

2020-02-19 17:03:00

by Yang Shi

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Tue, Feb 18, 2020 at 7:44 PM David Rientjes <[email protected]> wrote:
>
> On Tue, 18 Feb 2020, Yang Shi wrote:
>
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> > > return page;
> > > }
> > >
> > > -static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > - struct inode *inode,
> > > - pgoff_t index, bool huge)
> > > +static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
> > > + pgoff_t index, bool fault, bool huge)
> > > {
> > > struct shmem_inode_info *info = SHMEM_I(inode);
> > > struct page *page;
> > > @@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > if (!shmem_inode_acct_block(inode, nr))
> > > goto failed;
> > >
> > > - if (huge)
> > > + if (huge) {
> > > page = shmem_alloc_hugepage(gfp, info, index);
> > > - else
> > > + if (!page && fault)
> > > + count_vm_event(THP_FAULT_FALLBACK);
> > > + } else
> > > page = shmem_alloc_page(gfp, info, index);
> > > if (page) {
> > > __SetPageLocked(page);
> > > @@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > }
> > >
> > > alloc_huge:
> > > - page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
> > > if (IS_ERR(page)) {
> > > alloc_nohuge:
> > > - page = shmem_alloc_and_acct_page(gfp, inode,
> > > - index, false);
> > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
> > > }
> > > if (IS_ERR(page)) {
> > > int retry = 5;
> > > @@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > >
> > > error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> > > PageTransHuge(page));
> > > - if (error)
> > > + if (error) {
> > > + if (vmf && PageTransHuge(page))
> > > + count_vm_event(THP_FAULT_FALLBACK);
> > > goto unacct;
> > > + }
> > > error = shmem_add_to_page_cache(page, mapping, hindex,
> > > NULL, gfp & GFP_RECLAIM_MASK);
> > > if (error) {
> > > @@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > mem_cgroup_commit_charge(page, memcg, false,
> > > PageTransHuge(page));
> > > lru_cache_add_anon(page);
> > > + if (vmf && PageTransHuge(page))
> > > + count_vm_event(THP_FAULT_ALLOC);
> >
> > I think shmem THP alloc is accounted to THP_FILE_ALLOC. And it has
> > been accounted by shmem_add_to_page_cache(). So, it sounds like a
> > double count.
> >
>
> I think we can choose to either include file allocations into both
> thp_fault_alloc and thp_fault_fallback or we can exclude them from both of
> them. I don't think we can account for only one of them.

How's about the 3rd option, adding THP_FILE_FALLBACK.

According to the past discussion with Hugh and Kirill, basically
shmem/file THP is treated differently and separately from anonymous
THP, and they have separate enabling knobs
(/sys/kernel/mm/transparent_hugepage/enabled just enables anonymous
THP). Since we already have THP_FILE_ALLOC for shmem THP allocation,
IMHO it makes more sense to have dedicated FALLBACK counter. And, this
won't change the old behavior either.

>
> > >
> > > spin_lock_irq(&info->lock);
> > > info->alloced += compound_nr(page);
> > >
> >

2020-02-20 13:12:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Wed, Feb 19, 2020 at 09:01:23AM -0800, Yang Shi wrote:
> On Tue, Feb 18, 2020 at 7:44 PM David Rientjes <[email protected]> wrote:
> >
> > On Tue, 18 Feb 2020, Yang Shi wrote:
> >
> > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > --- a/mm/shmem.c
> > > > +++ b/mm/shmem.c
> > > > @@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> > > > return page;
> > > > }
> > > >
> > > > -static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > > - struct inode *inode,
> > > > - pgoff_t index, bool huge)
> > > > +static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
> > > > + pgoff_t index, bool fault, bool huge)
> > > > {
> > > > struct shmem_inode_info *info = SHMEM_I(inode);
> > > > struct page *page;
> > > > @@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > > if (!shmem_inode_acct_block(inode, nr))
> > > > goto failed;
> > > >
> > > > - if (huge)
> > > > + if (huge) {
> > > > page = shmem_alloc_hugepage(gfp, info, index);
> > > > - else
> > > > + if (!page && fault)
> > > > + count_vm_event(THP_FAULT_FALLBACK);
> > > > + } else
> > > > page = shmem_alloc_page(gfp, info, index);
> > > > if (page) {
> > > > __SetPageLocked(page);
> > > > @@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > > }
> > > >
> > > > alloc_huge:
> > > > - page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> > > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
> > > > if (IS_ERR(page)) {
> > > > alloc_nohuge:
> > > > - page = shmem_alloc_and_acct_page(gfp, inode,
> > > > - index, false);
> > > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
> > > > }
> > > > if (IS_ERR(page)) {
> > > > int retry = 5;
> > > > @@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > >
> > > > error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> > > > PageTransHuge(page));
> > > > - if (error)
> > > > + if (error) {
> > > > + if (vmf && PageTransHuge(page))
> > > > + count_vm_event(THP_FAULT_FALLBACK);
> > > > goto unacct;
> > > > + }
> > > > error = shmem_add_to_page_cache(page, mapping, hindex,
> > > > NULL, gfp & GFP_RECLAIM_MASK);
> > > > if (error) {
> > > > @@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > > mem_cgroup_commit_charge(page, memcg, false,
> > > > PageTransHuge(page));
> > > > lru_cache_add_anon(page);
> > > > + if (vmf && PageTransHuge(page))
> > > > + count_vm_event(THP_FAULT_ALLOC);
> > >
> > > I think shmem THP alloc is accounted to THP_FILE_ALLOC. And it has
> > > been accounted by shmem_add_to_page_cache(). So, it sounds like a
> > > double count.
> > >
> >
> > I think we can choose to either include file allocations into both
> > thp_fault_alloc and thp_fault_fallback or we can exclude them from both of
> > them. I don't think we can account for only one of them.
>
> How's about the 3rd option, adding THP_FILE_FALLBACK.

I like this option.

Problem with THP_FAULT_* is that shmem_getpage_gfp() is called not only
from fault path, but also from syscalls.

--
Kirill A. Shutemov

2020-03-06 17:25:09

by Yang Shi

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Thu, Feb 20, 2020 at 5:11 AM Kirill A. Shutemov <[email protected]> wrote:
>
> On Wed, Feb 19, 2020 at 09:01:23AM -0800, Yang Shi wrote:
> > On Tue, Feb 18, 2020 at 7:44 PM David Rientjes <[email protected]> wrote:
> > >
> > > On Tue, 18 Feb 2020, Yang Shi wrote:
> > >
> > > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > > --- a/mm/shmem.c
> > > > > +++ b/mm/shmem.c
> > > > > @@ -1502,9 +1502,8 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> > > > > return page;
> > > > > }
> > > > >
> > > > > -static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > > > - struct inode *inode,
> > > > > - pgoff_t index, bool huge)
> > > > > +static struct page *shmem_alloc_and_acct_page(gfp_t gfp, struct inode *inode,
> > > > > + pgoff_t index, bool fault, bool huge)
> > > > > {
> > > > > struct shmem_inode_info *info = SHMEM_I(inode);
> > > > > struct page *page;
> > > > > @@ -1518,9 +1517,11 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> > > > > if (!shmem_inode_acct_block(inode, nr))
> > > > > goto failed;
> > > > >
> > > > > - if (huge)
> > > > > + if (huge) {
> > > > > page = shmem_alloc_hugepage(gfp, info, index);
> > > > > - else
> > > > > + if (!page && fault)
> > > > > + count_vm_event(THP_FAULT_FALLBACK);
> > > > > + } else
> > > > > page = shmem_alloc_page(gfp, info, index);
> > > > > if (page) {
> > > > > __SetPageLocked(page);
> > > > > @@ -1832,11 +1833,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > > > }
> > > > >
> > > > > alloc_huge:
> > > > > - page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> > > > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, true);
> > > > > if (IS_ERR(page)) {
> > > > > alloc_nohuge:
> > > > > - page = shmem_alloc_and_acct_page(gfp, inode,
> > > > > - index, false);
> > > > > + page = shmem_alloc_and_acct_page(gfp, inode, index, vmf, false);
> > > > > }
> > > > > if (IS_ERR(page)) {
> > > > > int retry = 5;
> > > > > @@ -1871,8 +1871,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > > >
> > > > > error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
> > > > > PageTransHuge(page));
> > > > > - if (error)
> > > > > + if (error) {
> > > > > + if (vmf && PageTransHuge(page))
> > > > > + count_vm_event(THP_FAULT_FALLBACK);
> > > > > goto unacct;
> > > > > + }
> > > > > error = shmem_add_to_page_cache(page, mapping, hindex,
> > > > > NULL, gfp & GFP_RECLAIM_MASK);
> > > > > if (error) {
> > > > > @@ -1883,6 +1886,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > > > > mem_cgroup_commit_charge(page, memcg, false,
> > > > > PageTransHuge(page));
> > > > > lru_cache_add_anon(page);
> > > > > + if (vmf && PageTransHuge(page))
> > > > > + count_vm_event(THP_FAULT_ALLOC);
> > > >
> > > > I think shmem THP alloc is accounted to THP_FILE_ALLOC. And it has
> > > > been accounted by shmem_add_to_page_cache(). So, it sounds like a
> > > > double count.
> > > >
> > >
> > > I think we can choose to either include file allocations into both
> > > thp_fault_alloc and thp_fault_fallback or we can exclude them from both of
> > > them. I don't think we can account for only one of them.
> >
> > How's about the 3rd option, adding THP_FILE_FALLBACK.
>
> I like this option.
>
> Problem with THP_FAULT_* is that shmem_getpage_gfp() is called not only
> from fault path, but also from syscalls.

I found another usecase for THP_FILE_FALLBACK. I wanted to measure
file THP allocation success rate in our uecase. It looks nr_file_alloc
/ (nr_file_alloc + nr_file_fallback) is the most simple way.

David, are you still working on this patch?

>
> --
> Kirill A. Shutemov

2020-03-06 21:28:29

by David Rientjes

[permalink] [raw]
Subject: Re: [patch 1/2] mm, shmem: add thp fault alloc and fallback stats

On Fri, 6 Mar 2020, Yang Shi wrote:

> > > > I think we can choose to either include file allocations into both
> > > > thp_fault_alloc and thp_fault_fallback or we can exclude them from both of
> > > > them. I don't think we can account for only one of them.
> > >
> > > How's about the 3rd option, adding THP_FILE_FALLBACK.
> >
> > I like this option.
> >
> > Problem with THP_FAULT_* is that shmem_getpage_gfp() is called not only
> > from fault path, but also from syscalls.
>
> I found another usecase for THP_FILE_FALLBACK. I wanted to measure
> file THP allocation success rate in our uecase. It looks nr_file_alloc
> / (nr_file_alloc + nr_file_fallback) is the most simple way.
>
> David, are you still working on this patch?
>

Yes, I have a refresh to send out. I don't enable CONFIG_FS_DAX but the
THP_FAULT_FALLBACK there seems somewhat out of place. It's not
necessarily within the scope of my patchset but thought I'd mention it if
someone had strong feelings about whether the DAX cases should be
separated out as well.