2018-10-15 16:43:42

by Martin Schwidefsky

[permalink] [raw]
Subject: [RFC][PATCH 0/3] pgtable bytes mis-accounting v2

Greetings,

the first test patch to fix the pgtable_bytes mis-accounting on s390
still had a few problems. For one it didn't work for x86 ..

Changes v1 -> v2:

- Split the patch into three parts, one patch to add the mm_pxd_folded
helpers, one patch to use to the helpers in mm_[dec|inc]_nr_[pmds|puds]
and finally the fix for s390.

- Drop the use of __is_defined, it does not work with the
__PAGETABLE_PxD_FOLDED defines

- Do not change the basic #ifdef'ery in mm.h, just add the calls
to mm_pxd_folded to the pgtable_bytes accounting functions. This
fixes the compile error on alpha (and potentially on other archs).

Martin Schwidefsky (3):
mm: introduce mm_[p4d|pud|pmd]_folded
mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
s390/mm: fix mis-accounting of pgtable_bytes

arch/s390/include/asm/mmu_context.h | 5 ----
arch/s390/include/asm/pgalloc.h | 6 ++---
arch/s390/include/asm/pgtable.h | 18 ++++++++++++++
arch/s390/include/asm/tlb.h | 6 ++---
include/linux/mm.h | 48 +++++++++++++++++++++++++++++++++++++
5 files changed, 72 insertions(+), 11 deletions(-)

--
2.16.4



2018-10-15 16:43:44

by Martin Schwidefsky

[permalink] [raw]
Subject: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

In case a fork or a clone system fails in copy_process and the error
handling does the mmput() at the bad_fork_cleanup_mm label, the
following warning messages will appear on the console:

BUG: non-zero pgtables_bytes on freeing mm: 16384

The reason for that is the tricks we play with mm_inc_nr_puds() and
mm_inc_nr_pmds() in init_new_context().

A normal 64-bit process has 3 levels of page table, the p4d level and
the pud level are folded. On process termination the free_pud_range()
function in mm/memory.c will subtract 16KB from pgtable_bytes with a
mm_dec_nr_puds() call, but there actually is not really a pud table.

One issue with this is the fact that pgtable_bytes is usually off
by a few kilobytes, but the more severe problem is that for a failed
fork or clone the free_pgtables() function is not called. In this case
there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
BUG message. The message itself is purely cosmetic, but annoying.

To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
function to check for the true size of the address space.

Reported-by: Li Wang <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/include/asm/mmu_context.h | 5 -----
arch/s390/include/asm/pgalloc.h | 6 +++---
arch/s390/include/asm/pgtable.h | 18 ++++++++++++++++++
arch/s390/include/asm/tlb.h | 6 +++---
4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index 0717ee76885d..f1ab9420ccfb 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -45,8 +45,6 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.asce_limit = STACK_TOP_MAX;
mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
_ASCE_USER_BITS | _ASCE_TYPE_REGION3;
- /* pgd_alloc() did not account this pud */
- mm_inc_nr_puds(mm);
break;
case -PAGE_SIZE:
/* forked 5-level task, set new asce with new_mm->pgd */
@@ -62,9 +60,6 @@ static inline int init_new_context(struct task_struct *tsk,
/* forked 2-level compat task, set new asce with new mm->pgd */
mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
_ASCE_USER_BITS | _ASCE_TYPE_SEGMENT;
- /* pgd_alloc() did not account this pmd */
- mm_inc_nr_pmds(mm);
- mm_inc_nr_puds(mm);
}
crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm));
return 0;
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index f0f9bcf94c03..5ee733720a57 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -36,11 +36,11 @@ static inline void crst_table_init(unsigned long *crst, unsigned long entry)

static inline unsigned long pgd_entry_type(struct mm_struct *mm)
{
- if (mm->context.asce_limit <= _REGION3_SIZE)
+ if (mm_pmd_folded(mm))
return _SEGMENT_ENTRY_EMPTY;
- if (mm->context.asce_limit <= _REGION2_SIZE)
+ if (mm_pud_folded(mm))
return _REGION3_ENTRY_EMPTY;
- if (mm->context.asce_limit <= _REGION1_SIZE)
+ if (mm_p4d_folded(mm))
return _REGION2_ENTRY_EMPTY;
return _REGION1_ENTRY_EMPTY;
}
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0e7cb0dc9c33..de05466ce50c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -485,6 +485,24 @@ static inline int is_module_addr(void *addr)
_REGION_ENTRY_PROTECT | \
_REGION_ENTRY_NOEXEC)

+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+ return mm->context.asce_limit <= _REGION1_SIZE;
+}
+#define mm_p4d_folded(mm) mm_p4d_folded(mm)
+
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+ return mm->context.asce_limit <= _REGION2_SIZE;
+}
+#define mm_pud_folded(mm) mm_pud_folded(mm)
+
+static inline bool mm_pmd_folded(struct mm_struct *mm)
+{
+ return mm->context.asce_limit <= _REGION3_SIZE;
+}
+#define mm_pmd_folded(mm) mm_pmd_folded(mm)
+
static inline int mm_has_pgste(struct mm_struct *mm)
{
#ifdef CONFIG_PGSTE
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 457b7ba0fbb6..b31c779cf581 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -136,7 +136,7 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
unsigned long address)
{
- if (tlb->mm->context.asce_limit <= _REGION3_SIZE)
+ if (mm_pmd_folded(tlb->mm))
return;
pgtable_pmd_page_dtor(virt_to_page(pmd));
tlb_remove_table(tlb, pmd);
@@ -152,7 +152,7 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
unsigned long address)
{
- if (tlb->mm->context.asce_limit <= _REGION1_SIZE)
+ if (mm_p4d_folded(tlb->mm))
return;
tlb_remove_table(tlb, p4d);
}
@@ -167,7 +167,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
unsigned long address)
{
- if (tlb->mm->context.asce_limit <= _REGION2_SIZE)
+ if (mm_pud_folded(tlb->mm))
return;
tlb_remove_table(tlb, pud);
}
--
2.16.4


2018-10-15 16:43:53

by Martin Schwidefsky

[permalink] [raw]
Subject: [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions

The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds()
in free_pgtables() if the address range spans a full pud or pmd.
If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration
settings they blindly subtract the size of the pmd or pud table from
pgtable_bytes even if the pud or pmd page table layer is folded.

Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes
accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds
and mm_dec_nr_pmds. As the check for folded page tables can be
overwritten by the architecture, this allows to keep a correct
pgtable_bytes value for platforms that use a dynamic number of
page table levels.

Signed-off-by: Martin Schwidefsky <[email protected]>
---
include/linux/mm.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d1029972541c..67f55c71e59a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1764,11 +1764,15 @@ int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address);

static inline void mm_inc_nr_puds(struct mm_struct *mm)
{
+ if (mm_pud_folded(mm))
+ return;
atomic_long_add(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes);
}

static inline void mm_dec_nr_puds(struct mm_struct *mm)
{
+ if (mm_pud_folded(mm))
+ return;
atomic_long_sub(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes);
}
#endif
@@ -1788,11 +1792,15 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);

static inline void mm_inc_nr_pmds(struct mm_struct *mm)
{
+ if (mm_pmd_folded(mm))
+ return;
atomic_long_add(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes);
}

static inline void mm_dec_nr_pmds(struct mm_struct *mm)
{
+ if (mm_pmd_folded(mm))
+ return;
atomic_long_sub(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes);
}
#endif
--
2.16.4


2018-10-15 16:45:58

by Martin Schwidefsky

[permalink] [raw]
Subject: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded

Add three architecture overrideable function to test if the
p4d, pud, or pmd layer of a page table is folded or not.

Signed-off-by: Martin Schwidefsky <[email protected]>
---
include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0416a7204be3..d1029972541c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
#define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page)))
#endif

+/*
+ * On some architectures it depends on the mm if the p4d/pud or pmd
+ * layer of the page table hierarchy is folded or not.
+ */
+#ifndef mm_p4d_folded
+#define mm_p4d_folded(mm) mm_p4d_folded(mm)
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_P4D_FOLDED
+ return 1;
+#else
+ return 0;
+#endif
+}
+#endif
+
+#ifndef mm_pud_folded
+#define mm_pud_folded(mm) mm_pud_folded(mm)
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_PUD_FOLDED
+ return 1;
+#else
+ return 0;
+#endif
+}
+#endif
+
+#ifndef mm_pmd_folded
+#define mm_pmd_folded(mm) mm_pmd_folded(mm)
+static inline bool mm_pmd_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_PMD_FOLDED
+ return 1;
+#else
+ return 0;
+#endif
+}
+#endif
+
/*
* Default maximum number of active map areas, this limits the number of vmas
* per mm struct. Users can overwrite this number by sysctl but there is a
--
2.16.4


2018-10-31 06:38:35

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, 31 Oct 2018 14:18:33 +0800
Li Wang <[email protected]> wrote:

> On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <[email protected]
> > wrote:
>
> > In case a fork or a clone system fails in copy_process and the error
> > handling does the mmput() at the bad_fork_cleanup_mm label, the
> > following warning messages will appear on the console:
> >
> > BUG: non-zero pgtables_bytes on freeing mm: 16384
> >
> > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > mm_inc_nr_pmds() in init_new_context().
> >
> > A normal 64-bit process has 3 levels of page table, the p4d level and
> > the pud level are folded. On process termination the free_pud_range()
> > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > mm_dec_nr_puds() call, but there actually is not really a pud table.
> >
> > One issue with this is the fact that pgtable_bytes is usually off
> > by a few kilobytes, but the more severe problem is that for a failed
> > fork or clone the free_pgtables() function is not called. In this case
> > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
> > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
> > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
> > BUG message. The message itself is purely cosmetic, but annoying.
> >
> > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
> > function to check for the true size of the address space.
> >
>
> I can confirm that it works to the problem, the warning message is gone
> after applying this patch on s390x. And I also done ltp syscalls/cve test
> for the patch set on x86_64 arch, there has no new regression.
>
> Tested-by: Li Wang <[email protected]>

Thanks for testing. Unfortunately Heiko reported another issue yesterday
with the patch applied. This time the other way around:

BUG: non-zero pgtables_bytes on freeing mm: -16384

I am trying to understand how this can happen. For now I would like to
keep the patch on hold in case they need another change.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2018-10-31 06:48:02

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, 31 Oct 2018 14:43:38 +0800
Li Wang <[email protected]> wrote:

> On Wed, Oct 31, 2018 at 2:31 PM, Martin Schwidefsky <[email protected]>
> wrote:
>
> > On Wed, 31 Oct 2018 14:18:33 +0800
> > Li Wang <[email protected]> wrote:
> >
> > > On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <
> > [email protected]
> > > > wrote:
> > >
> > > > In case a fork or a clone system fails in copy_process and the error
> > > > handling does the mmput() at the bad_fork_cleanup_mm label, the
> > > > following warning messages will appear on the console:
> > > >
> > > > BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > >
> > > > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > > > mm_inc_nr_pmds() in init_new_context().
> > > >
> > > > A normal 64-bit process has 3 levels of page table, the p4d level and
> > > > the pud level are folded. On process termination the free_pud_range()
> > > > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > > > mm_dec_nr_puds() call, but there actually is not really a pud table.
> > > >
> > > > One issue with this is the fact that pgtable_bytes is usually off
> > > > by a few kilobytes, but the more severe problem is that for a failed
> > > > fork or clone the free_pgtables() function is not called. In this case
> > > > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
> > > > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
> > > > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
> > > > BUG message. The message itself is purely cosmetic, but annoying.
> > > >
> > > > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
> > > > function to check for the true size of the address space.
> > > >
> > >
> > > I can confirm that it works to the problem, the warning message is gone
> > > after applying this patch on s390x. And I also done ltp syscalls/cve test
> > > for the patch set on x86_64 arch, there has no new regression.
> > >
> > > Tested-by: Li Wang <[email protected]>
> >
> > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > with the patch applied. This time the other way around:
> >
> > BUG: non-zero pgtables_bytes on freeing mm: -16384
> >
>
> Okay, the problem is still triggered by LTP/cve-2017-17052.c?

No, unfortunately we do not have a simple testcase to trigger this new bug.
It happened once with one of our test kernels, the path that leads to this
is completely unclear.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2018-10-31 09:04:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded

On Mon, Oct 15, 2018 at 06:42:37PM +0200, Martin Schwidefsky wrote:
> Add three architecture overrideable function to test if the
> p4d, pud, or pmd layer of a page table is folded or not.
>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0416a7204be3..d1029972541c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h

Shouldn't it be somewhere in asm-generic/pgtable*?

> @@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
> #define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page)))
> #endif
>
> +/*
> + * On some architectures it depends on the mm if the p4d/pud or pmd
> + * layer of the page table hierarchy is folded or not.
> + */
> +#ifndef mm_p4d_folded
> +#define mm_p4d_folded(mm) mm_p4d_folded(mm)

Do we need to define it in generic header?

> +static inline bool mm_p4d_folded(struct mm_struct *mm)
> +{
> +#ifdef __PAGETABLE_P4D_FOLDED
> + return 1;
> +#else
> + return 0;
> +#endif

Maybe
return __is_defined(__PAGETABLE_P4D_FOLDED);

?

--
Kirill A. Shutemov

2018-10-31 09:05:33

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions

On Mon, Oct 15, 2018 at 06:42:38PM +0200, Martin Schwidefsky wrote:
> The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds()
> in free_pgtables() if the address range spans a full pud or pmd.
> If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration
> settings they blindly subtract the size of the pmd or pud table from
> pgtable_bytes even if the pud or pmd page table layer is folded.
>
> Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes
> accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds
> and mm_dec_nr_pmds. As the check for folded page tables can be
> overwritten by the architecture, this allows to keep a correct
> pgtable_bytes value for platforms that use a dynamic number of
> page table levels.
>
> Signed-off-by: Martin Schwidefsky <[email protected]>

Looks fine to me.

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kirill A. Shutemov

2018-10-31 09:36:22

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded

On Wed, 31 Oct 2018 12:02:55 +0300
"Kirill A. Shutemov" <[email protected]> wrote:

> On Mon, Oct 15, 2018 at 06:42:37PM +0200, Martin Schwidefsky wrote:
> > Add three architecture overrideable function to test if the
> > p4d, pud, or pmd layer of a page table is folded or not.
> >
> > Signed-off-by: Martin Schwidefsky <[email protected]>
> > ---
> > include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 40 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0416a7204be3..d1029972541c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
>
> Shouldn't it be somewhere in asm-generic/pgtable*?

If you prefer the definitions in asm-generic that is fine with me.
I'll give it a try to see if it still compiles.

> > @@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
> > #define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page)))
> > #endif
> >
> > +/*
> > + * On some architectures it depends on the mm if the p4d/pud or pmd
> > + * layer of the page table hierarchy is folded or not.
> > + */
> > +#ifndef mm_p4d_folded
> > +#define mm_p4d_folded(mm) mm_p4d_folded(mm)
>
> Do we need to define it in generic header?

That is true, it should work without the #define in the generic header.

> > +static inline bool mm_p4d_folded(struct mm_struct *mm)
> > +{
> > +#ifdef __PAGETABLE_P4D_FOLDED
> > + return 1;
> > +#else
> > + return 0;
> > +#endif
>
> Maybe
> return __is_defined(__PAGETABLE_P4D_FOLDED);
>
> ?

I have tried that, doesn't work. The reason is that the
__PAGETABLE_xxx_FOLDED defines to not have a value.

#define __PAGETABLE_P4D_FOLDED
#define __PAGETABLE_PMD_FOLDED
#define __PAGETABLE_PUD_FOLDED

While the definition of CONFIG_xxx symbols looks like this

#define CONFIG_xxx 1

The __is_defined needs the value for the __take_second_arg trick.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2018-10-31 09:40:16

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, 31 Oct 2018 07:46:47 +0100
Martin Schwidefsky <[email protected]> wrote:

> On Wed, 31 Oct 2018 14:43:38 +0800
> Li Wang <[email protected]> wrote:
>
> > On Wed, Oct 31, 2018 at 2:31 PM, Martin Schwidefsky <[email protected]>
> > wrote:
> >
> > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > >
> >
> > Okay, the problem is still triggered by LTP/cve-2017-17052.c?
>
> No, unfortunately we do not have a simple testcase to trigger this new bug.
> It happened once with one of our test kernels, the path that leads to this
> is completely unclear.

Ok, got it. There is a mm_inc_nr_puds(mm) missing in the s390 code:

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 76d89ee8b428..814f26520aa2 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -101,6 +101,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
mm->context.asce_limit = _REGION1_SIZE;
mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
_ASCE_USER_BITS | _ASCE_TYPE_REGION2;
+ mm_inc_nr_puds(mm);
} else {
crst_table_init(table, _REGION1_ENTRY_EMPTY);
pgd_populate(mm, (pgd_t *) table, (p4d_t *) pgd);

One of our test-cases did an upgrade of a 3-level page table.
I'll update the patch and send a v3.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2018-10-31 09:49:37

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded

On Wed, Oct 31, 2018 at 10:35:36AM +0100, Martin Schwidefsky wrote:
> > Maybe
> > return __is_defined(__PAGETABLE_P4D_FOLDED);
> >
> > ?
>
> I have tried that, doesn't work. The reason is that the
> __PAGETABLE_xxx_FOLDED defines to not have a value.
>
> #define __PAGETABLE_P4D_FOLDED
> #define __PAGETABLE_PMD_FOLDED
> #define __PAGETABLE_PUD_FOLDED
>
> While the definition of CONFIG_xxx symbols looks like this
>
> #define CONFIG_xxx 1
>
> The __is_defined needs the value for the __take_second_arg trick.

I guess this is easily fixable :)

--
Kirill A. Shutemov

2018-10-31 10:10:31

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> Thanks for testing. Unfortunately Heiko reported another issue yesterday
> with the patch applied. This time the other way around:
>
> BUG: non-zero pgtables_bytes on freeing mm: -16384
>
> I am trying to understand how this can happen. For now I would like to
> keep the patch on hold in case they need another change.

FWIW, Kirill: is there a reason why this "BUG:" output is done with
pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?

That would to get more information with DEBUG_VM and / or
panic_on_warn=1 set. At least for automated testing it would be nice
to have such triggers.


2018-10-31 10:37:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > with the patch applied. This time the other way around:
> >
> > BUG: non-zero pgtables_bytes on freeing mm: -16384
> >
> > I am trying to understand how this can happen. For now I would like to
> > keep the patch on hold in case they need another change.
>
> FWIW, Kirill: is there a reason why this "BUG:" output is done with
> pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
>
> That would to get more information with DEBUG_VM and / or
> panic_on_warn=1 set. At least for automated testing it would be nice
> to have such triggers.

Stack trace is not helpful there. It will always show the exit path which
is useless.

--
Kirill A. Shutemov

2018-11-27 08:12:56

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > with the patch applied. This time the other way around:
> > >
> > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > >
> > > I am trying to understand how this can happen. For now I would like to
> > > keep the patch on hold in case they need another change.
> >
> > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> >
> > That would to get more information with DEBUG_VM and / or
> > panic_on_warn=1 set. At least for automated testing it would be nice
> > to have such triggers.
>
> Stack trace is not helpful there. It will always show the exit path which
> is useless.

So, even with the updated version of these patches I can flood dmesg
and the console with

BUG: non-zero pgtables_bytes on freeing mm: 16384

messages with this complex reproducer on s390:

echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat

Besides that this needs to be fixed, I'd really like to see this
changed to either a printk_once() or a WARN_ON_ONCE() within
check_mm() so that an arbitrary user cannot flood the console.

E.g. something like the below. If there aren't any objections, I will
provide a proper patch with changelog, etc.

diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff89c7b..d7aeec03c57f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
}

if (mm_pgtables_bytes(mm))
- pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
- mm_pgtables_bytes(mm));
+ printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
+ mm_pgtables_bytes(mm));

#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
VM_BUG_ON_MM(mm->pmd_huge_pte, mm);


2018-11-27 10:58:05

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Tue, Nov 27, 2018 at 08:34:12AM +0100, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> > > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > > with the patch applied. This time the other way around:
> > > >
> > > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > > >
> > > > I am trying to understand how this can happen. For now I would like to
> > > > keep the patch on hold in case they need another change.
> > >
> > > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> > >
> > > That would to get more information with DEBUG_VM and / or
> > > panic_on_warn=1 set. At least for automated testing it would be nice
> > > to have such triggers.
> >
> > Stack trace is not helpful there. It will always show the exit path which
> > is useless.
>
> So, even with the updated version of these patches I can flood dmesg
> and the console with
>
> BUG: non-zero pgtables_bytes on freeing mm: 16384
>
> messages with this complex reproducer on s390:
>
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat
>
> Besides that this needs to be fixed, I'd really like to see this
> changed to either a printk_once() or a WARN_ON_ONCE() within
> check_mm() so that an arbitrary user cannot flood the console.
>
> E.g. something like the below. If there aren't any objections, I will
> provide a proper patch with changelog, etc.
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 07cddff89c7b..d7aeec03c57f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> }
>
> if (mm_pgtables_bytes(mm))
> - pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> - mm_pgtables_bytes(mm));
> + printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> + mm_pgtables_bytes(mm));

You can be the first user of pr_alert_once(). Don't miss a chance! ;)

--
Kirill A. Shutemov

2018-11-27 10:58:55

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Tue, Nov 27, 2018 at 11:05:15AM +0300, Kirill A. Shutemov wrote:
> > E.g. something like the below. If there aren't any objections, I will
> > provide a proper patch with changelog, etc.
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 07cddff89c7b..d7aeec03c57f 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> > }
> >
> > if (mm_pgtables_bytes(mm))
> > - pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> > - mm_pgtables_bytes(mm));
> > + printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> > + mm_pgtables_bytes(mm));
>
> You can be the first user of pr_alert_once(). Don't miss a chance! ;)

I didn't expect that that one exists. ;) Will do.


2018-11-27 12:23:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On 11/26/18 11:34 PM, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
>> On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
>>> On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
>>>> Thanks for testing. Unfortunately Heiko reported another issue yesterday
>>>> with the patch applied. This time the other way around:
>>>>
>>>> BUG: non-zero pgtables_bytes on freeing mm: -16384
>>>>
>>>> I am trying to understand how this can happen. For now I would like to
>>>> keep the patch on hold in case they need another change.
>>>
>>> FWIW, Kirill: is there a reason why this "BUG:" output is done with
>>> pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
>>>
>>> That would to get more information with DEBUG_VM and / or
>>> panic_on_warn=1 set. At least for automated testing it would be nice
>>> to have such triggers.
>>
>> Stack trace is not helpful there. It will always show the exit path which
>> is useless.
>
> So, even with the updated version of these patches I can flood dmesg
> and the console with
>
> BUG: non-zero pgtables_bytes on freeing mm: 16384
>
> messages with this complex reproducer on s390:
>
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat
>
> Besides that this needs to be fixed, I'd really like to see this
> changed to either a printk_once() or a WARN_ON_ONCE() within
> check_mm() so that an arbitrary user cannot flood the console.
>
> E.g. something like the below. If there aren't any objections, I will
> provide a proper patch with changelog, etc.
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 07cddff89c7b..d7aeec03c57f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> }
>
> if (mm_pgtables_bytes(mm))
> - pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> - mm_pgtables_bytes(mm));
> + printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> + mm_pgtables_bytes(mm));
>

pr_alert_once ?

Guenter

> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
> VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
>
>


2018-11-27 12:24:45

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Tue, Nov 27, 2018 at 03:47:13AM -0800, Guenter Roeck wrote:
> >E.g. something like the below. If there aren't any objections, I will
> >provide a proper patch with changelog, etc.
> >
> >diff --git a/kernel/fork.c b/kernel/fork.c
> >index 07cddff89c7b..d7aeec03c57f 100644
> >--- a/kernel/fork.c
> >+++ b/kernel/fork.c
> >@@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> > }
> > if (mm_pgtables_bytes(mm))
> >- pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> >- mm_pgtables_bytes(mm));
> >+ printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> >+ mm_pgtables_bytes(mm));
>
> pr_alert_once ?

Already changed and posted:

https://lore.kernel.org/lkml/[email protected]/


2018-11-27 14:38:09

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes

On Tue, 27 Nov 2018 08:34:12 +0100
Heiko Carstens <[email protected]> wrote:

> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> > > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > > with the patch applied. This time the other way around:
> > > >
> > > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > > >
> > > > I am trying to understand how this can happen. For now I would like to
> > > > keep the patch on hold in case they need another change.
> > >
> > > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> > >
> > > That would to get more information with DEBUG_VM and / or
> > > panic_on_warn=1 set. At least for automated testing it would be nice
> > > to have such triggers.
> >
> > Stack trace is not helpful there. It will always show the exit path which
> > is useless.
>
> So, even with the updated version of these patches I can flood dmesg
> and the console with
>
> BUG: non-zero pgtables_bytes on freeing mm: 16384
>
> messages with this complex reproducer on s390:
>
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat

Forgot a hunk in the fix.. I claim not enough coffee :-/
Patch is queued and I will send a please pull by the end of the week.
--
From c0499f2aa853939984ecaf0d393012486e56c7ce Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <[email protected]>
Date: Tue, 27 Nov 2018 14:04:04 +0100
Subject: [PATCH] s390/mm: correct pgtable_bytes on page table downgrade

The downgrade of a page table from 3 levels to 2 levels for a 31-bit compat
process removes a pmd table which has to be counted against pgtable_bytes.

Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/mm/pgalloc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 814f26520aa2..6791562779ee 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -131,6 +131,7 @@ void crst_table_downgrade(struct mm_struct *mm)
}

pgd = mm->pgd;
+ mm_dec_nr_pmds(mm);
mm->pgd = (pgd_t *) (pgd_val(*pgd) & _REGION_ENTRY_ORIGIN);
mm->context.asce_limit = _REGION3_SIZE;
mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
--
2.16.4
--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.