Although the zero huge page is being shared across various processes, each
mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
to be consistent. This just updates the stats in set_huge_zero_page() after
the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
Cc: Andrew Morton <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
This applies on v5.13-rc2.
Changes in V1:
- Updated MM_ANONPAGES stat in zap_huge_pmd()
- Updated the commit message
Changes in RFC:
https://lore.kernel.org/linux-mm/[email protected]/
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 63ed6b25deaa..306d0a41bf75 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -706,6 +706,7 @@ static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
if (pgtable)
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
+ add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
mm_inc_nr_ptes(mm);
}
@@ -1678,6 +1679,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
} else if (is_huge_zero_pmd(orig_pmd)) {
zap_deposited_table(tlb->mm, pmd);
+ add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
spin_unlock(ptl);
tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
} else {
--
2.20.1
On 18 May 2021, at 0:48, Anshuman Khandual wrote:
> Although the zero huge page is being shared across various processes, each
> mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
> to be consistent. This just updates the stats in set_huge_zero_page() after
> the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Zi Yan <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> This applies on v5.13-rc2.
>
> Changes in V1:
>
> - Updated MM_ANONPAGES stat in zap_huge_pmd()
> - Updated the commit message
>
> Changes in RFC:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> mm/huge_memory.c | 2 ++
> 1 file changed, 2 insertions(+)
>
LGTM. Thanks. Reviewed-by: Zi Yan<[email protected]>
—
Best Regards,
Yan, Zi
On 5/21/21 7:47 AM, Hugh Dickins wrote:
> On Tue, 18 May 2021, Anshuman Khandual wrote:
>
>> Although the zero huge page is being shared across various processes, each
>> mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
>> to be consistent. This just updates the stats in set_huge_zero_page() after
>> the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Zi Yan <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>
> NAK.
>
> For consistency with what? In the all the years that the huge zero page
Huge zero page after all is an anon mapping. Hence why it must not be
counted for process rss MM_ANONPAGES accounting ? Is there a specific
reason why zero huge pages should not counted for MM_ANONPAGES ? Does
not it risk an inaccurate MM_ANONPAGES accounting for the processes
using huge zero pages ?
> has existed, it has been intentionally exempted from rss accounting:
> consistent with the small zero page, which itself has always been
> intentionally exempted from rss accounting. In fact, that's a good part
Why are small zero pages exempted from rss accounting which in turn
might have caused huge zero page to take the same approach as well ?
> of the reason the huge zero page was introduced (see 4a6c1297268c).
Huge zero page reduces memory consumption on read faults which is a
definite advantage but would there be a problem if rss stat goes up
for each huge zero page mapping instances. The commit mentioned here
talks about increase in actual memory utilization as seen from the
rss accounting stat, without huge zero page usage.
I am wondering if actual memory usage remains the same (reduced with
huge zero page usage), what could be the problem in an increased
MM_ANONPAGES accounting for a given process. Dont we update the rss
accounting for shared memory as well ?
>
> To change that now will break any users depending on it.
Just to understand it correctly, are rss accounting procedures/methods
something which cannot be changed as users are currently dependent on
it ? OR this proposal here is not a good enough reason to change it ?
>
> Not to mention the
> BUG: Bad rss-counter state mm:00000000aa61ef82 type:MM_ANONPAGES val:512
> I just got from mmotm.
Okay, unfortunately some how did not see this problem while testing. Is
there a specific test case which will be able to trigger this bug ?
- Anshuman
On Fri, 21 May 2021, Anshuman Khandual wrote:
> On 5/21/21 7:47 AM, Hugh Dickins wrote:
> > On Tue, 18 May 2021, Anshuman Khandual wrote:
> >
> >> Although the zero huge page is being shared across various processes, each
> >> mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
> >> to be consistent. This just updates the stats in set_huge_zero_page() after
> >> the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
> >>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Zi Yan <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Signed-off-by: Anshuman Khandual <[email protected]>
> >
> > NAK.
> >
> > For consistency with what? In the all the years that the huge zero page
>
> Huge zero page after all is an anon mapping.
No, it's a shared page used inside an anon mapping. It's not PageAnon.
> Hence why it must not be
> counted for process rss MM_ANONPAGES accounting ?
Because it's being shared between (potentially) many tasks, and many
parts of their address space. It's intended to give them all a saving.
> Is there a specific
> reason why zero huge pages should not counted for MM_ANONPAGES ?
We'd have to look back to ~20-year-old mail (maybe lore has it) to see
whatever arguments were made for and against counting the ZERO_PAGE()
in rss - certainly it can be argued either way, but the way it was
decided is to leave it out of rss. And the huge zero page was
introduced to follow the same rule.
> Does
> not it risk an inaccurate MM_ANONPAGES accounting for the processes
> using huge zero pages ?
They are accurately and consistently accounted (until your patch);
it's only inaccurate if you're expecting it to be included in rss
(agreed, it can be a surprise to find that it is not included).
>
> > has existed, it has been intentionally exempted from rss accounting:
> > consistent with the small zero page, which itself has always been
> > intentionally exempted from rss accounting. In fact, that's a good part
>
> Why are small zero pages exempted from rss accounting which in turn
> might have caused huge zero page to take the same approach as well ?
A gift from the kernel, so large unCOWed zero areas come for free.
>
> > of the reason the huge zero page was introduced (see 4a6c1297268c).
>
> Huge zero page reduces memory consumption on read faults which is a
> definite advantage but would there be a problem if rss stat goes up
> for each huge zero page mapping instances. The commit mentioned here
> talks about increase in actual memory utilization as seen from the
> rss accounting stat, without huge zero page usage.
>
> I am wondering if actual memory usage remains the same (reduced with
> huge zero page usage), what could be the problem in an increased
> MM_ANONPAGES accounting for a given process.
If a process is monitoring its usage, or its usage is being monitored,
then a sudden increase in rss like this will cause them trouble:
it's an incompatible change which would have to be reverted.
> Dont we update the rss
> accounting for shared memory as well ?
Yes, in MM_SHMEMPAGES; but I don't get your point there.
I suppose we could add an MM_ZEROPAGES, but I don't know where
it would be shown, nor who would be interested to look at it.
>
> >
> > To change that now will break any users depending on it.
>
> Just to understand it correctly, are rss accounting procedures/methods
> something which cannot be changed as users are currently dependent on
> it ? OR this proposal here is not a good enough reason to change it ?
I guess we could change it if there were a security issue with it,
but I don't think you're suggesting that. You have not actually
given any reason to change it, other than "to be consistent".
Yet you're choosing inconsistency with long-standing usage.
>
> >
> > Not to mention the
> > BUG: Bad rss-counter state mm:00000000aa61ef82 type:MM_ANONPAGES val:512
> > I just got from mmotm.
>
> Okay, unfortunately some how did not see this problem while testing. Is
> there a specific test case which will be able to trigger this bug ?
I did a read fault into a large private anonymous mapping, to map in
the huge zero page; I then wrote a byte to that page, to COW it; and
exited. Checked dmesg and found that "Bad rss-counter" as expected.
(I lie: actually, I thought I mapped in the huge zero page in two
places, and COWed them both; so was a bit surprised to see val:512
when I was expecting val:1024. I guess I screwed up my addresses.)
There's probably one or more of the LTP tests which would show it,
but don't care to chase down implementation details in a NAKed patch.
Hugh
On Tue, 18 May 2021, Anshuman Khandual wrote:
> Although the zero huge page is being shared across various processes, each
> mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
> to be consistent. This just updates the stats in set_huge_zero_page() after
> the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Zi Yan <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
NAK.
For consistency with what? In the all the years that the huge zero page
has existed, it has been intentionally exempted from rss accounting:
consistent with the small zero page, which itself has always been
intentionally exempted from rss accounting. In fact, that's a good part
of the reason the huge zero page was introduced (see 4a6c1297268c).
To change that now will break any users depending on it.
Not to mention the
BUG: Bad rss-counter state mm:00000000aa61ef82 type:MM_ANONPAGES val:512
I just got from mmotm.
Hugh
> ---
> This applies on v5.13-rc2.
>
> Changes in V1:
>
> - Updated MM_ANONPAGES stat in zap_huge_pmd()
> - Updated the commit message
>
> Changes in RFC:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> mm/huge_memory.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 63ed6b25deaa..306d0a41bf75 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -706,6 +706,7 @@ static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
> if (pgtable)
> pgtable_trans_huge_deposit(mm, pmd, pgtable);
> set_pmd_at(mm, haddr, pmd, entry);
> + add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
> mm_inc_nr_ptes(mm);
> }
>
> @@ -1678,6 +1679,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
> } else if (is_huge_zero_pmd(orig_pmd)) {
> zap_deposited_table(tlb->mm, pmd);
> + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> spin_unlock(ptl);
> tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
> } else {
> --
> 2.20.1