2020-01-17 23:24:06

by Wei Yang

[permalink] [raw]
Subject: [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little

move_page_tables() tries to move page table by PMD or PTE.

The root reason is if it tries to move PMD, both old and new range should
be PMD aligned. But current code calculate old range and new range
separately. This leads to some redundant check and calculation.

This cleanup tries to consolidate the range check in one place to reduce
some extra range handling.

Wei Yang (5):
mm/mremap: format the check in move_normal_pmd() same as
move_huge_pmd()
mm/mremap: it is sure to have enough space when extent meets
requirement
mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
mm/mremap: calculate extent in one place
mm/mremap: start addresses are properly aligned

include/linux/huge_mm.h | 2 +-
mm/huge_memory.c | 8 +-------
mm/mremap.c | 24 +++++++-----------------
3 files changed, 9 insertions(+), 25 deletions(-)

--
2.17.1


2020-01-17 23:24:13

by Wei Yang

[permalink] [raw]
Subject: [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd()

No functional change, just improve the readability and prepare for
following cleanup.

Signed-off-by: Wei Yang <[email protected]>
---
mm/mremap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..bcc7aa62f2d9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -200,8 +200,9 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
struct mm_struct *mm = vma->vm_mm;
pmd_t pmd;

- if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
- || old_end - old_addr < PMD_SIZE)
+ if ((old_addr & ~PMD_MASK) ||
+ (new_addr & ~PMD_MASK) ||
+ old_end - old_addr < PMD_SIZE)
return false;

/*
--
2.17.1

2020-01-17 23:24:17

by Wei Yang

[permalink] [raw]
Subject: [PATCH 2/5] mm/mremap: it is sure to have enough space when extent meets requirement

old_end is passed to these two function to check whether there is enough
space to do the move, while this check is done before invoking these
functions.

These two functions only would be invoked when extent meets the
requirement and there is one check before invoking these functions:

if (extent > old_end - old_addr)
extent = old_end - old_addr;

This implies (old_end - old_addr) won't fail the check in these two
functions.

Signed-off-by: Wei Yang <[email protected]>
---
include/linux/huge_mm.h | 2 +-
mm/huge_memory.c | 7 ++-----
mm/mremap.c | 11 ++++-------
3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0b84e13e88e2..2a5281ca46c8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -42,7 +42,7 @@ extern int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
unsigned char *vec);
extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
- unsigned long new_addr, unsigned long old_end,
+ unsigned long new_addr,
pmd_t *old_pmd, pmd_t *new_pmd);
extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, pgprot_t newprot,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5b2876942639..8f1bbbf01f5b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1871,17 +1871,14 @@ static pmd_t move_soft_dirty_pmd(pmd_t pmd)
}

bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
- unsigned long new_addr, unsigned long old_end,
- pmd_t *old_pmd, pmd_t *new_pmd)
+ unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
{
spinlock_t *old_ptl, *new_ptl;
pmd_t pmd;
struct mm_struct *mm = vma->vm_mm;
bool force_flush = false;

- if ((old_addr & ~HPAGE_PMD_MASK) ||
- (new_addr & ~HPAGE_PMD_MASK) ||
- old_end - old_addr < HPAGE_PMD_SIZE)
+ if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK))
return false;

/*
diff --git a/mm/mremap.c b/mm/mremap.c
index bcc7aa62f2d9..c2af8ba4ba43 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -193,16 +193,13 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,

#ifdef CONFIG_HAVE_MOVE_PMD
static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
- unsigned long new_addr, unsigned long old_end,
- pmd_t *old_pmd, pmd_t *new_pmd)
+ unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
pmd_t pmd;

- if ((old_addr & ~PMD_MASK) ||
- (new_addr & ~PMD_MASK) ||
- old_end - old_addr < PMD_SIZE)
+ if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK))
return false;

/*
@@ -274,7 +271,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
if (need_rmap_locks)
take_rmap_locks(vma);
moved = move_huge_pmd(vma, old_addr, new_addr,
- old_end, old_pmd, new_pmd);
+ old_pmd, new_pmd);
if (need_rmap_locks)
drop_rmap_locks(vma);
if (moved)
@@ -294,7 +291,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
if (need_rmap_locks)
take_rmap_locks(vma);
moved = move_normal_pmd(vma, old_addr, new_addr,
- old_end, old_pmd, new_pmd);
+ old_pmd, new_pmd);
if (need_rmap_locks)
drop_rmap_locks(vma);
if (moved)
--
2.17.1

2020-01-17 23:24:21

by Wei Yang

[permalink] [raw]
Subject: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

Use the general helper instead of do it by hand.

Signed-off-by: Wei Yang <[email protected]>
---
mm/mremap.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index c2af8ba4ba43..a258914f3ee1 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,

for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
cond_resched();
- next = (old_addr + PMD_SIZE) & PMD_MASK;
- /* even if next overflowed, extent below will be ok */
+ next = pmd_addr_end(old_addr, old_end);
extent = next - old_addr;
- if (extent > old_end - old_addr)
- extent = old_end - old_addr;
old_pmd = get_old_pmd(vma->vm_mm, old_addr);
if (!old_pmd)
continue;
@@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,

if (pte_alloc(new_vma->vm_mm, new_pmd))
break;
- next = (new_addr + PMD_SIZE) & PMD_MASK;
+ next = pmd_addr_end(new_addr, new_addr + len);
if (extent > next - new_addr)
extent = next - new_addr;
move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
--
2.17.1

2020-01-17 23:24:28

by Wei Yang

[permalink] [raw]
Subject: [PATCH 4/5] mm/mremap: calculate extent in one place

Page tables is moved on the base of PMD. This requires both source
and destination range should meet the requirement.

Current code works well since move_huge_pmd() and move_normal_pmd()
would check old_addr and new_addr again. And then return to move_ptes()
if the either of them is not aligned.

In stead of calculating the extent separately, it is better to calculate
in one place, so we know it is not necessary to try move pmd. By doing
so, the logic seems a little clear.

Signed-off-by: Wei Yang <[email protected]>
---
mm/mremap.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index a258914f3ee1..37335a10779d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -240,7 +240,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long new_addr, unsigned long len,
bool need_rmap_locks)
{
- unsigned long extent, next, old_end;
+ unsigned long extent, old_next, new_next, old_end;
struct mmu_notifier_range range;
pmd_t *old_pmd, *new_pmd;

@@ -253,8 +253,9 @@ unsigned long move_page_tables(struct vm_area_struct *vma,

for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
cond_resched();
- next = pmd_addr_end(old_addr, old_end);
- extent = next - old_addr;
+ old_next = pmd_addr_end(old_addr, old_end);
+ new_next = pmd_addr_end(new_addr, new_addr + len);
+ extent = min((old_next - old_addr), (new_next - new_addr));
old_pmd = get_old_pmd(vma->vm_mm, old_addr);
if (!old_pmd)
continue;
@@ -298,9 +299,6 @@ unsigned long move_page_tables(struct vm_area_struct *vma,

if (pte_alloc(new_vma->vm_mm, new_pmd))
break;
- next = pmd_addr_end(new_addr, new_addr + len);
- if (extent > next - new_addr)
- extent = next - new_addr;
move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
new_pmd, new_addr, need_rmap_locks);
}
--
2.17.1

2020-01-17 23:24:35

by Wei Yang

[permalink] [raw]
Subject: [PATCH 5/5] mm/mremap: start addresses are properly aligned

After previous cleanup, extent is the minimal step for both source and
destination. This means when extent is HPAGE_PMD_SIZE or PMD_SIZE,
old_addr and new_addr are properly aligned too.

Since these two functions are only invoked in move_page_tables, it is
safe to remove the check now.

Signed-off-by: Wei Yang <[email protected]>
---
mm/huge_memory.c | 3 ---
mm/mremap.c | 3 ---
2 files changed, 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f1bbbf01f5b..cc98d0f07d0a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1878,9 +1878,6 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
struct mm_struct *mm = vma->vm_mm;
bool force_flush = false;

- if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK))
- return false;
-
/*
* The destination pmd shouldn't be established, free_pgtables()
* should have release it.
diff --git a/mm/mremap.c b/mm/mremap.c
index 37335a10779d..5d7597fb3023 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -199,9 +199,6 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
struct mm_struct *mm = vma->vm_mm;
pmd_t pmd;

- if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK))
- return false;
-
/*
* The destination pmd shouldn't be established, free_pgtables()
* should have release it.
--
2.17.1

2020-01-19 00:10:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little

On Sat, 18 Jan 2020 07:22:49 +0800 Wei Yang <[email protected]> wrote:

> move_page_tables() tries to move page table by PMD or PTE.
>
> The root reason is if it tries to move PMD, both old and new range should
> be PMD aligned. But current code calculate old range and new range
> separately. This leads to some redundant check and calculation.
>
> This cleanup tries to consolidate the range check in one place to reduce
> some extra range handling.

Thanks, I grabbed these, aimed at 5.7-rc1.

2020-01-19 02:13:05

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little

On Sat, Jan 18, 2020 at 04:07:02PM -0800, Andrew Morton wrote:
>On Sat, 18 Jan 2020 07:22:49 +0800 Wei Yang <[email protected]> wrote:
>
>> move_page_tables() tries to move page table by PMD or PTE.
>>
>> The root reason is if it tries to move PMD, both old and new range should
>> be PMD aligned. But current code calculate old range and new range
>> separately. This leads to some redundant check and calculation.
>>
>> This cleanup tries to consolidate the range check in one place to reduce
>> some extra range handling.
>
>Thanks, I grabbed these, aimed at 5.7-rc1.

Thanks :)

--
Wei Yang
Help you, Help me

2020-01-26 14:49:31

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

18.01.2020 02:22, Wei Yang пишет:
> Use the general helper instead of do it by hand.
>
> Signed-off-by: Wei Yang <[email protected]>
> ---
> mm/mremap.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index c2af8ba4ba43..a258914f3ee1 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>
> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> cond_resched();
> - next = (old_addr + PMD_SIZE) & PMD_MASK;
> - /* even if next overflowed, extent below will be ok */
> + next = pmd_addr_end(old_addr, old_end);
> extent = next - old_addr;
> - if (extent > old_end - old_addr)
> - extent = old_end - old_addr;
> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> if (!old_pmd)
> continue;
> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>
> if (pte_alloc(new_vma->vm_mm, new_pmd))
> break;
> - next = (new_addr + PMD_SIZE) & PMD_MASK;
> + next = pmd_addr_end(new_addr, new_addr + len);
> if (extent > next - new_addr)
> extent = next - new_addr;
> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>

Hello Wei,

Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
Tegra (ARM32):

BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190

and eventually kernel hangs.

Git's bisection points to this patch and reverting it helps. Please fix,
thanks in advance.

2020-01-27 03:01:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <[email protected]> wrote:

> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> >
> > Signed-off-by: Wei Yang <[email protected]>
> > ---
> > mm/mremap.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >
> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> > cond_resched();
> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
> > - /* even if next overflowed, extent below will be ok */
> > + next = pmd_addr_end(old_addr, old_end);
> > extent = next - old_addr;
> > - if (extent > old_end - old_addr)
> > - extent = old_end - old_addr;
> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> > if (!old_pmd)
> > continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >
> > if (pte_alloc(new_vma->vm_mm, new_pmd))
> > break;
> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
> > + next = pmd_addr_end(new_addr, new_addr + len);
> > if (extent > next - new_addr)
> > extent = next - new_addr;
> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >
>
> Hello Wei,
>
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
>
> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>
> and eventually kernel hangs.
>
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

Thanks. I had these tagged for 5.7-rc1 anyway, so I'll drop all five
patches.

2020-01-28 00:44:55

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>18.01.2020 02:22, Wei Yang пишет:
>> Use the general helper instead of do it by hand.
>>
>> Signed-off-by: Wei Yang <[email protected]>
>> ---
>> mm/mremap.c | 7 ++-----
>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index c2af8ba4ba43..a258914f3ee1 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>
>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> cond_resched();
>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>> - /* even if next overflowed, extent below will be ok */
>> + next = pmd_addr_end(old_addr, old_end);
>> extent = next - old_addr;
>> - if (extent > old_end - old_addr)
>> - extent = old_end - old_addr;
>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> if (!old_pmd)
>> continue;
>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>
>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>> break;
>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>> + next = pmd_addr_end(new_addr, new_addr + len);
>> if (extent > next - new_addr)
>> extent = next - new_addr;
>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>
>
>Hello Wei,
>
>Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>Tegra (ARM32):
>
> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>

Thanks.

Would you mind letting me know which case you are testing? Or the special
thing is 32-bit platform?

>and eventually kernel hangs.
>
>Git's bisection points to this patch and reverting it helps. Please fix,
>thanks in advance.

--
Wei Yang
Help you, Help me

2020-01-28 16:02:27

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

28.01.2020 03:43, Wei Yang пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <[email protected]>
>>> ---
>>> mm/mremap.c | 7 ++-----
>>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>
>>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>> cond_resched();
>>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> - /* even if next overflowed, extent below will be ok */
>>> + next = pmd_addr_end(old_addr, old_end);
>>> extent = next - old_addr;
>>> - if (extent > old_end - old_addr)
>>> - extent = old_end - old_addr;
>>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>> if (!old_pmd)
>>> continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>
>>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>>> break;
>>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> + next = pmd_addr_end(new_addr, new_addr + len);
>>> if (extent > next - new_addr)
>>> extent = next - new_addr;
>>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>
> Thanks.
>
> Would you mind letting me know which case you are testing?

Nothing special, systemd starts to fall apart during boot.

> Or the special thing is 32-bit platform?
I have a limited knowledge about mm/, so can't provide detailed explanation.

Please take a look at this:

[1]
https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210

[2]
https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b

2020-01-28 23:30:04

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>28.01.2020 03:43, Wei Yang пишет:
>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>> 18.01.2020 02:22, Wei Yang пишет:
>>>> Use the general helper instead of do it by hand.
>>>>
>>>> Signed-off-by: Wei Yang <[email protected]>
>>>> ---
>>>> mm/mremap.c | 7 ++-----
>>>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>> --- a/mm/mremap.c
>>>> +++ b/mm/mremap.c
>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>
>>>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>> cond_resched();
>>>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>> - /* even if next overflowed, extent below will be ok */
>>>> + next = pmd_addr_end(old_addr, old_end);
>>>> extent = next - old_addr;
>>>> - if (extent > old_end - old_addr)
>>>> - extent = old_end - old_addr;
>>>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>> if (!old_pmd)
>>>> continue;
>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>
>>>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>> break;
>>>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>> + next = pmd_addr_end(new_addr, new_addr + len);
>>>> if (extent > next - new_addr)
>>>> extent = next - new_addr;
>>>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>
>>>
>>> Hello Wei,
>>>
>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>> Tegra (ARM32):
>>>
>>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>
>>
>> Thanks.
>>
>> Would you mind letting me know which case you are testing?
>
>Nothing special, systemd starts to fall apart during boot.
>
>> Or the special thing is 32-bit platform?
>I have a limited knowledge about mm/, so can't provide detailed explanation.
>
>Please take a look at this:
>
>[1]
>https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>
>[2]
>https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>
>[3]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b

Thanks, I see the difference here.

If this is the case, we can't use pmd_addr_end() to simplify the calculation.
This changes the behavior.

I would prepare another patch set to fix this. Would you mind helping me
verify on your platform?

--
Wei Yang
Help you, Help me

2020-01-28 23:36:40

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

29.01.2020 02:29, Wei Yang пишет:
> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>> 28.01.2020 03:43, Wei Yang пишет:
>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>> Use the general helper instead of do it by hand.
>>>>>
>>>>> Signed-off-by: Wei Yang <[email protected]>
>>>>> ---
>>>>> mm/mremap.c | 7 ++-----
>>>>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>> --- a/mm/mremap.c
>>>>> +++ b/mm/mremap.c
>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>
>>>>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>> cond_resched();
>>>>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>> - /* even if next overflowed, extent below will be ok */
>>>>> + next = pmd_addr_end(old_addr, old_end);
>>>>> extent = next - old_addr;
>>>>> - if (extent > old_end - old_addr)
>>>>> - extent = old_end - old_addr;
>>>>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>> if (!old_pmd)
>>>>> continue;
>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>
>>>>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>> break;
>>>>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>> + next = pmd_addr_end(new_addr, new_addr + len);
>>>>> if (extent > next - new_addr)
>>>>> extent = next - new_addr;
>>>>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>
>>>>
>>>> Hello Wei,
>>>>
>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>> Tegra (ARM32):
>>>>
>>>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>
>>>
>>> Thanks.
>>>
>>> Would you mind letting me know which case you are testing?
>>
>> Nothing special, systemd starts to fall apart during boot.
>>
>>> Or the special thing is 32-bit platform?
>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>
>> Please take a look at this:
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>
>> [2]
>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>
>> [3]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
>
> Thanks, I see the difference here.
>
> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
> This changes the behavior.
>
> I would prepare another patch set to fix this. Would you mind helping me
> verify on your platform?

Sure, please feel free to CC me on that patch.

2020-01-29 00:30:09

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Wed, Jan 29, 2020 at 02:35:25AM +0300, Dmitry Osipenko wrote:
>29.01.2020 02:29, Wei Yang пишет:
>> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>>> 28.01.2020 03:43, Wei Yang пишет:
>>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>>> Use the general helper instead of do it by hand.
>>>>>>
>>>>>> Signed-off-by: Wei Yang <[email protected]>
>>>>>> ---
>>>>>> mm/mremap.c | 7 ++-----
>>>>>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>
>>>>>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>>> cond_resched();
>>>>>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>>> - /* even if next overflowed, extent below will be ok */
>>>>>> + next = pmd_addr_end(old_addr, old_end);
>>>>>> extent = next - old_addr;
>>>>>> - if (extent > old_end - old_addr)
>>>>>> - extent = old_end - old_addr;
>>>>>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>>> if (!old_pmd)
>>>>>> continue;
>>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>
>>>>>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>>> break;
>>>>>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>>> + next = pmd_addr_end(new_addr, new_addr + len);
>>>>>> if (extent > next - new_addr)
>>>>>> extent = next - new_addr;
>>>>>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>>
>>>>>
>>>>> Hello Wei,
>>>>>
>>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>>> Tegra (ARM32):
>>>>>
>>>>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Would you mind letting me know which case you are testing?
>>>
>>> Nothing special, systemd starts to fall apart during boot.
>>>
>>>> Or the special thing is 32-bit platform?
>>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>>
>>> Please take a look at this:
>>>
>>> [1]
>>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>>
>>> [2]
>>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>>
>>> [3]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
>>
>> Thanks, I see the difference here.
>>
>> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
>> This changes the behavior.
>>
>> I would prepare another patch set to fix this. Would you mind helping me
>> verify on your platform?
>
>Sure, please feel free to CC me on that patch.

Thanks, you are in the cc list of v2.

Hope this one works fine on ARM.

--
Wei Yang
Help you, Help me

2020-01-29 09:49:28

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> >
> > Signed-off-by: Wei Yang <[email protected]>
> > ---
> > mm/mremap.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >
> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> > cond_resched();
> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
> > - /* even if next overflowed, extent below will be ok */
> > + next = pmd_addr_end(old_addr, old_end);
> > extent = next - old_addr;
> > - if (extent > old_end - old_addr)
> > - extent = old_end - old_addr;
> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> > if (!old_pmd)
> > continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >
> > if (pte_alloc(new_vma->vm_mm, new_pmd))
> > break;
> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
> > + next = pmd_addr_end(new_addr, new_addr + len);
> > if (extent > next - new_addr)
> > extent = next - new_addr;
> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >
>
> Hello Wei,
>
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
>
> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>
> and eventually kernel hangs.
>
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

The above is definitely wrong - pXX_addr_end() are designed to be used
with an address index within the pXX table table and the address index
of either the last entry in the same pXX table or the beginning of the
_next_ pXX table. Arbitary end address indicies are not allowed.

When page tables are "rolled up" when levels don't exist, it is common
practice for these macros to just return their end address index.
Hence, if they are used with arbitary end address indicies, then the
iteration will fail.

The only way to do this is:

next = pmd_addr_end(old_addr,
pud_addr_end(old_addr,
p4d_addr_end(old_addr,
pgd_addr_end(old_addr, old_end))));

which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
the correct end argument. However, that's a more complex and verbose,
and likely less efficient than the current code.

I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
and trying to "clean it up" will just result in less efficient or
broken code.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2020-01-29 14:23:33

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

29.01.2020 12:47, Russell King - ARM Linux admin пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <[email protected]>
>>> ---
>>> mm/mremap.c | 7 ++-----
>>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>
>>> for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>> cond_resched();
>>> - next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> - /* even if next overflowed, extent below will be ok */
>>> + next = pmd_addr_end(old_addr, old_end);
>>> extent = next - old_addr;
>>> - if (extent > old_end - old_addr)
>>> - extent = old_end - old_addr;
>>> old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>> if (!old_pmd)
>>> continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>
>>> if (pte_alloc(new_vma->vm_mm, new_pmd))
>>> break;
>>> - next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> + next = pmd_addr_end(new_addr, new_addr + len);
>>> if (extent > next - new_addr)
>>> extent = next - new_addr;
>>> move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
>
> The above is definitely wrong - pXX_addr_end() are designed to be used
> with an address index within the pXX table table and the address index
> of either the last entry in the same pXX table or the beginning of the
> _next_ pXX table. Arbitary end address indicies are not allowed.
>
> When page tables are "rolled up" when levels don't exist, it is common
> practice for these macros to just return their end address index.
> Hence, if they are used with arbitary end address indicies, then the
> iteration will fail.
>
> The only way to do this is:
>
> next = pmd_addr_end(old_addr,
> pud_addr_end(old_addr,
> p4d_addr_end(old_addr,
> pgd_addr_end(old_addr, old_end))));
>
> which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> the correct end argument. However, that's a more complex and verbose,
> and likely less efficient than the current code.
>
> I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> and trying to "clean it up" will just result in less efficient or
> broken code.
>

Hello Russell,

Thank you very much for the extra clarification!

2020-01-29 17:20:23

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

27.01.2020 05:59, Andrew Morton пишет:
> On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <[email protected]> wrote:
...
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
>
> Thanks. I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> patches.
>

Hello Andrew,

FYI, I'm still seeing the offending patches in the today's next-20200129.

2020-01-29 18:59:42

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

29.01.2020 03:28, Wei Yang пишет:
...
>>> I would prepare another patch set to fix this. Would you mind helping me
>>> verify on your platform?
>>
>> Sure, please feel free to CC me on that patch.
>
> Thanks, you are in the cc list of v2.
>
> Hope this one works fine on ARM.

Okay, I'll reply to the v2 after some more extensive testing (tomorrow).

2020-01-29 21:58:47

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>> > Use the general helper instead of do it by hand.
>> >
>> > Signed-off-by: Wei Yang <[email protected]>
>> > ---
>> > mm/mremap.c | 7 ++-----
>> > 1 file changed, 2 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/mm/mremap.c b/mm/mremap.c
>> > index c2af8ba4ba43..a258914f3ee1 100644
>> > --- a/mm/mremap.c
>> > +++ b/mm/mremap.c
>> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >
>> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> > cond_resched();
>> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
>> > - /* even if next overflowed, extent below will be ok */
>> > + next = pmd_addr_end(old_addr, old_end);
>> > extent = next - old_addr;
>> > - if (extent > old_end - old_addr)
>> > - extent = old_end - old_addr;
>> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> > if (!old_pmd)
>> > continue;
>> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >
>> > if (pte_alloc(new_vma->vm_mm, new_pmd))
>> > break;
>> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
>> > + next = pmd_addr_end(new_addr, new_addr + len);
>> > if (extent > next - new_addr)
>> > extent = next - new_addr;
>> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
>
>The above is definitely wrong - pXX_addr_end() are designed to be used
>with an address index within the pXX table table and the address index
>of either the last entry in the same pXX table or the beginning of the
>_next_ pXX table. Arbitary end address indicies are not allowed.
>

#define pmd_addr_end(addr, end) \
({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})

If my understanding is correct, the definition here align the addr to next PMD
boundary or end.

I don't see the possibility to across another PMD. Do I miss something?

>When page tables are "rolled up" when levels don't exist, it is common
>practice for these macros to just return their end address index.
>Hence, if they are used with arbitary end address indicies, then the
>iteration will fail.
>
>The only way to do this is:
>
> next = pmd_addr_end(old_addr,
> pud_addr_end(old_addr,
> p4d_addr_end(old_addr,
> pgd_addr_end(old_addr, old_end))));
>
>which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>the correct end argument. However, that's a more complex and verbose,
>and likely less efficient than the current code.
>
>I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>and trying to "clean it up" will just result in less efficient or
>broken code.
>
>--
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

--
Wei Yang
Help you, Help me

2020-01-29 23:26:26

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> 18.01.2020 02:22, Wei Yang пишет:
> >> > Use the general helper instead of do it by hand.
> >> >
> >> > Signed-off-by: Wei Yang <[email protected]>
> >> > ---
> >> > mm/mremap.c | 7 ++-----
> >> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> > --- a/mm/mremap.c
> >> > +++ b/mm/mremap.c
> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >
> >> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> > cond_resched();
> >> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> > - /* even if next overflowed, extent below will be ok */
> >> > + next = pmd_addr_end(old_addr, old_end);
> >> > extent = next - old_addr;
> >> > - if (extent > old_end - old_addr)
> >> > - extent = old_end - old_addr;
> >> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> > if (!old_pmd)
> >> > continue;
> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >
> >> > if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> > break;
> >> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> > + next = pmd_addr_end(new_addr, new_addr + len);
> >> > if (extent > next - new_addr)
> >> > extent = next - new_addr;
> >> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> >
> >>
> >> Hello Wei,
> >>
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >>
> >> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >>
> >> and eventually kernel hangs.
> >>
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> >
> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >with an address index within the pXX table table and the address index
> >of either the last entry in the same pXX table or the beginning of the
> >_next_ pXX table. Arbitary end address indicies are not allowed.
> >
>
> #define pmd_addr_end(addr, end) \
> ({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
> (__boundary - 1 < (end) - 1)? __boundary: (end); \
> })
>
> If my understanding is correct, the definition here align the addr to next PMD
> boundary or end.
>
> I don't see the possibility to across another PMD. Do I miss something?

Look at the definition of p*_addr_end() that are used when page tables
are rolled up.

> >When page tables are "rolled up" when levels don't exist, it is common
> >practice for these macros to just return their end address index.
> >Hence, if they are used with arbitary end address indicies, then the
> >iteration will fail.
> >
> >The only way to do this is:
> >
> > next = pmd_addr_end(old_addr,
> > pud_addr_end(old_addr,
> > p4d_addr_end(old_addr,
> > pgd_addr_end(old_addr, old_end))));
> >
> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> >the correct end argument. However, that's a more complex and verbose,
> >and likely less efficient than the current code.
> >
> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> >and trying to "clean it up" will just result in less efficient or
> >broken code.
> >
> >--
> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> >According to speedtest.net: 11.9Mbps down 500kbps up
>
> --
> Wei Yang
> Help you, Help me
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2020-01-30 00:00:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Wed, 29 Jan 2020 20:18:56 +0300 Dmitry Osipenko <[email protected]> wrote:

> 27.01.2020 05:59, Andrew Morton пишет:
> > On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <[email protected]> wrote:
> ...
> >> Hello Wei,
> >>
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >>
> >> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >>
> >> and eventually kernel hangs.
> >>
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> >
> > Thanks. I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> > patches.
> >
>
> Hello Andrew,
>
> FYI, I'm still seeing the offending patches in the today's next-20200129.

hm, me too. Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
use pmd_addr_end to calculate next in move_page_tables()") is still in
the -next lineup? It was dropped from -mm on Jan 26?

2020-01-30 00:30:26

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

Hi Andrew,

On Wed, 29 Jan 2020 15:59:07 -0800 Andrew Morton <[email protected]> wrote:
>
> hm, me too. Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
> use pmd_addr_end to calculate next in move_page_tables()") is still in
> the -next lineup? It was dropped from -mm on Jan 26?

The mmotm 2020-01-28-20-05 arrived just to late for yesterday's
linux-next (it will be in today's linux-next). The mmotm before that
was 2020-01-23-21-12. I attempt to fetch mmotm (along with all the
remaining unmerged trees) about every 30 minutes (sometimes more often)
during the day.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2020-01-30 01:30:52

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> > Use the general helper instead of do it by hand.
>> >> >
>> >> > Signed-off-by: Wei Yang <[email protected]>
>> >> > ---
>> >> > mm/mremap.c | 7 ++-----
>> >> > 1 file changed, 2 insertions(+), 5 deletions(-)
>> >> >
>> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> > --- a/mm/mremap.c
>> >> > +++ b/mm/mremap.c
>> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >
>> >> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> > cond_resched();
>> >> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> > - /* even if next overflowed, extent below will be ok */
>> >> > + next = pmd_addr_end(old_addr, old_end);
>> >> > extent = next - old_addr;
>> >> > - if (extent > old_end - old_addr)
>> >> > - extent = old_end - old_addr;
>> >> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> > if (!old_pmd)
>> >> > continue;
>> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >
>> >> > if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> > break;
>> >> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> > + next = pmd_addr_end(new_addr, new_addr + len);
>> >> > if (extent > next - new_addr)
>> >> > extent = next - new_addr;
>> >> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> >
>> >>
>> >> Hello Wei,
>> >>
>> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> Tegra (ARM32):
>> >>
>> >> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >>
>> >> and eventually kernel hangs.
>> >>
>> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> thanks in advance.
>> >
>> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >with an address index within the pXX table table and the address index
>> >of either the last entry in the same pXX table or the beginning of the
>> >_next_ pXX table. Arbitary end address indicies are not allowed.
>> >
>>
>> #define pmd_addr_end(addr, end) \
>> ({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
>> (__boundary - 1 < (end) - 1)? __boundary: (end); \
>> })
>>
>> If my understanding is correct, the definition here align the addr to next PMD
>> boundary or end.
>>
>> I don't see the possibility to across another PMD. Do I miss something?
>
>Look at the definition of p*_addr_end() that are used when page tables
>are rolled up.
>

Sorry, I don't get your point.

What's the meaning of "roll up" here?

Would you mind giving me an example? I see pmd_addr_end() is not used in many
places in core kernel. By glancing those usages, all the places use it like
pmd_addr_end(addr, end). Seems no specially handing on the end address.

Or you mean the case when pmd_addr_end() is defined to return "end" directly?

>> >When page tables are "rolled up" when levels don't exist, it is common
>> >practice for these macros to just return their end address index.
>> >Hence, if they are used with arbitary end address indicies, then the
>> >iteration will fail.
>> >
>> >The only way to do this is:
>> >
>> > next = pmd_addr_end(old_addr,
>> > pud_addr_end(old_addr,
>> > p4d_addr_end(old_addr,
>> > pgd_addr_end(old_addr, old_end))));
>> >
>> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>> >the correct end argument. However, that's a more complex and verbose,
>> >and likely less efficient than the current code.
>> >
>> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>> >and trying to "clean it up" will just result in less efficient or
>> >broken code.
>> >
>> >--
>> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>> >According to speedtest.net: 11.9Mbps down 500kbps up
>>
>> --
>> Wei Yang
>> Help you, Help me
>>
>
>--
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

--
Wei Yang
Help you, Help me

2020-01-30 14:17:04

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> >> 18.01.2020 02:22, Wei Yang пишет:
> >> >> > Use the general helper instead of do it by hand.
> >> >> >
> >> >> > Signed-off-by: Wei Yang <[email protected]>
> >> >> > ---
> >> >> > mm/mremap.c | 7 ++-----
> >> >> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >> >> >
> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> >> > --- a/mm/mremap.c
> >> >> > +++ b/mm/mremap.c
> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >
> >> >> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> >> > cond_resched();
> >> >> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> >> > - /* even if next overflowed, extent below will be ok */
> >> >> > + next = pmd_addr_end(old_addr, old_end);
> >> >> > extent = next - old_addr;
> >> >> > - if (extent > old_end - old_addr)
> >> >> > - extent = old_end - old_addr;
> >> >> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> >> > if (!old_pmd)
> >> >> > continue;
> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >
> >> >> > if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> >> > break;
> >> >> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> >> > + next = pmd_addr_end(new_addr, new_addr + len);
> >> >> > if (extent > next - new_addr)
> >> >> > extent = next - new_addr;
> >> >> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> >> >
> >> >>
> >> >> Hello Wei,
> >> >>
> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> >> Tegra (ARM32):
> >> >>
> >> >> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >> >>
> >> >> and eventually kernel hangs.
> >> >>
> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> >> thanks in advance.
> >> >
> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >> >with an address index within the pXX table table and the address index
> >> >of either the last entry in the same pXX table or the beginning of the
> >> >_next_ pXX table. Arbitary end address indicies are not allowed.
> >> >
> >>
> >> #define pmd_addr_end(addr, end) \
> >> ({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
> >> (__boundary - 1 < (end) - 1)? __boundary: (end); \
> >> })
> >>
> >> If my understanding is correct, the definition here align the addr to next PMD
> >> boundary or end.
> >>
> >> I don't see the possibility to across another PMD. Do I miss something?
> >
> >Look at the definition of p*_addr_end() that are used when page tables
> >are rolled up.
> >
>
> Sorry, I don't get your point.
>
> What's the meaning of "roll up" here?
>
> Would you mind giving me an example? I see pmd_addr_end() is not used in many
> places in core kernel. By glancing those usages, all the places use it like
> pmd_addr_end(addr, end). Seems no specially handing on the end address.
>
> Or you mean the case when pmd_addr_end() is defined to return "end" directly?

Not all hardware has five levels of page tables. When hardware does not
have five levels, it is common to "roll up" some of the page tables into
others.

There are generic ways to implement this, which include using:

include/asm-generic/pgtable-nop4d.h
include/asm-generic/pgtable-nopud.h
include/asm-generic/pgtable-nopmd.h

and then there's architecture ways to implement this. 32-bit ARM takes
its implementation for PMD not from the generic version, which
post-dates 32-bit ARM, but from how page table roll-up was implemented
back at the time when the current ARM scheme was devised. The generic
scheme is unsuitable for 32-bit ARM since we do more than just roll-up
page tables, but this is irrelevent for this discussion.

All three of the generic implementations, and 32-bit ARM, define the
pXX_addr_end() macros thusly:

include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)

since, as I stated, pXX_addr_end() expects its "end" argument to be
the address index of the next entry in the immediately upper page
table level, or the address index of the last entry we wish to
process, which ever is smaller.

If it's larger than the address index of the next entry in the
immediately upper page table level, then the effect of all these
macros will be to walk off the end of the current level of page
table.

To see how they _should_ be used, see the loops in free_pgd_range()
and the free_pXX_range() functions called from there and below.

In all cases when the pXX_addr_end() macro was introduced, what I state
above holds true - and I believe still holds true today, until this
patch that has reportedly caused issues.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2020-01-30 21:59:35

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()

On Thu, Jan 30, 2020 at 02:15:05PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> >> > Use the general helper instead of do it by hand.
>> >> >> >
>> >> >> > Signed-off-by: Wei Yang <[email protected]>
>> >> >> > ---
>> >> >> > mm/mremap.c | 7 ++-----
>> >> >> > 1 file changed, 2 insertions(+), 5 deletions(-)
>> >> >> >
>> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> >> > --- a/mm/mremap.c
>> >> >> > +++ b/mm/mremap.c
>> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >
>> >> >> > for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> >> > cond_resched();
>> >> >> > - next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > - /* even if next overflowed, extent below will be ok */
>> >> >> > + next = pmd_addr_end(old_addr, old_end);
>> >> >> > extent = next - old_addr;
>> >> >> > - if (extent > old_end - old_addr)
>> >> >> > - extent = old_end - old_addr;
>> >> >> > old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> >> > if (!old_pmd)
>> >> >> > continue;
>> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >
>> >> >> > if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> >> > break;
>> >> >> > - next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > + next = pmd_addr_end(new_addr, new_addr + len);
>> >> >> > if (extent > next - new_addr)
>> >> >> > extent = next - new_addr;
>> >> >> > move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> >> >
>> >> >>
>> >> >> Hello Wei,
>> >> >>
>> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> >> Tegra (ARM32):
>> >> >>
>> >> >> BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >> >>
>> >> >> and eventually kernel hangs.
>> >> >>
>> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> >> thanks in advance.
>> >> >
>> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >> >with an address index within the pXX table table and the address index
>> >> >of either the last entry in the same pXX table or the beginning of the
>> >> >_next_ pXX table. Arbitary end address indicies are not allowed.
>> >> >
>> >>
>> >> #define pmd_addr_end(addr, end) \
>> >> ({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
>> >> (__boundary - 1 < (end) - 1)? __boundary: (end); \
>> >> })
>> >>
>> >> If my understanding is correct, the definition here align the addr to next PMD
>> >> boundary or end.
>> >>
>> >> I don't see the possibility to across another PMD. Do I miss something?
>> >
>> >Look at the definition of p*_addr_end() that are used when page tables
>> >are rolled up.
>> >
>>
>> Sorry, I don't get your point.
>>
>> What's the meaning of "roll up" here?
>>
>> Would you mind giving me an example? I see pmd_addr_end() is not used in many
>> places in core kernel. By glancing those usages, all the places use it like
>> pmd_addr_end(addr, end). Seems no specially handing on the end address.
>>
>> Or you mean the case when pmd_addr_end() is defined to return "end" directly?
>
>Not all hardware has five levels of page tables. When hardware does not
>have five levels, it is common to "roll up" some of the page tables into
>others.
>
>There are generic ways to implement this, which include using:
>
>include/asm-generic/pgtable-nop4d.h
>include/asm-generic/pgtable-nopud.h
>include/asm-generic/pgtable-nopmd.h
>
>and then there's architecture ways to implement this. 32-bit ARM takes
>its implementation for PMD not from the generic version, which
>post-dates 32-bit ARM, but from how page table roll-up was implemented
>back at the time when the current ARM scheme was devised. The generic
>scheme is unsuitable for 32-bit ARM since we do more than just roll-up
>page tables, but this is irrelevent for this discussion.
>
>All three of the generic implementations, and 32-bit ARM, define the
>pXX_addr_end() macros thusly:
>
>include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
>arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)
>
>since, as I stated, pXX_addr_end() expects its "end" argument to be
>the address index of the next entry in the immediately upper page
>table level, or the address index of the last entry we wish to
>process, which ever is smaller.
>
>If it's larger than the address index of the next entry in the
>immediately upper page table level, then the effect of all these
>macros will be to walk off the end of the current level of page
>table.
>
>To see how they _should_ be used, see the loops in free_pgd_range()
>and the free_pXX_range() functions called from there and below.
>
>In all cases when the pXX_addr_end() macro was introduced, what I state
>above holds true - and I believe still holds true today, until this
>patch that has reportedly caused issues.
>

Thanks for your patience in explaining this for me.

I got your point. This is my fault in understanding the code.

>--
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

--
Wei Yang
Help you, Help me