2020-08-07 08:42:35

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

From: Joerg Roedel <[email protected]>

The code for preallocate_vmalloc_pages() was written under the
assumption that the p4d_offset() and pud_offset() functions will perform
present checks before dereferencing the parent entries.

This assumption is wrong an leads to a bug in the code which causes the
physical address found in the PGD be used as a page-table page, even if
the PGD is not present.

So the code flow currently is:

pgd = pgd_offset_k(addr);
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
p4d = p4d_alloc(&init_mm, pgd, addr);

This lacks a check for pgd_none() at least, the correct flow would be:

pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
p4d = p4d_alloc(&init_mm, pgd, addr);
else
p4d = p4d_offset(pgd, addr);

But this is the same flow that the p4d_alloc() and the pud_alloc()
functions use internally, so there is no need to duplicate them.

Remove the p?d_none() checks from the function and just call into
p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.

Reported-by: Jason A. Donenfeld <[email protected]>
Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/mm/init_64.c | 31 +++++++++++++------------------
1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3f4e29a78f2b..449e071240e1 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
p4d_t *p4d;
pud_t *pud;

- p4d = p4d_offset(pgd, addr);
- if (p4d_none(*p4d)) {
- /* Can only happen with 5-level paging */
- p4d = p4d_alloc(&init_mm, pgd, addr);
- if (!p4d) {
- lvl = "p4d";
- goto failed;
- }
- }
+ lvl = "p4d";
+ p4d = p4d_alloc(&init_mm, pgd, addr);
+ if (!p4d)
+ goto failed;

+ /*
+ * With 5-level paging the P4D level is not folded. So the PGDs
+ * are now populated and there is no need to walk down to the
+ * PUD level.
+ */
if (pgtable_l5_enabled())
continue;

- pud = pud_offset(p4d, addr);
- if (pud_none(*pud)) {
- /* Ends up here only with 4-level paging */
- pud = pud_alloc(&init_mm, p4d, addr);
- if (!pud) {
- lvl = "pud";
- goto failed;
- }
- }
+ lvl = "pud";
+ pud = pud_alloc(&init_mm, p4d, addr);
+ if (!pud)
+ goto failed;
}

return;
--
2.26.2


2020-08-07 09:53:52

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

On Fri, Aug 7, 2020 at 10:40 AM Joerg Roedel <[email protected]> wrote:
>
> From: Joerg Roedel <[email protected]>
>
> The code for preallocate_vmalloc_pages() was written under the
> assumption that the p4d_offset() and pud_offset() functions will perform
> present checks before dereferencing the parent entries.
>
> This assumption is wrong an leads to a bug in the code which causes the
> physical address found in the PGD be used as a page-table page, even if
> the PGD is not present.
>
> So the code flow currently is:
>
> pgd = pgd_offset_k(addr);
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> p4d = p4d_alloc(&init_mm, pgd, addr);
>
> This lacks a check for pgd_none() at least, the correct flow would be:
>
> pgd = pgd_offset_k(addr);
> if (pgd_none(*pgd))
> p4d = p4d_alloc(&init_mm, pgd, addr);
> else
> p4d = p4d_offset(pgd, addr);
>
> But this is the same flow that the p4d_alloc() and the pud_alloc()
> functions use internally, so there is no need to duplicate them.
>
> Remove the p?d_none() checks from the function and just call into
> p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
>
> Reported-by: Jason A. Donenfeld <[email protected]>
> Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/mm/init_64.c | 31 +++++++++++++------------------
> 1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f4e29a78f2b..449e071240e1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
> p4d_t *p4d;
> pud_t *pud;
>
> - p4d = p4d_offset(pgd, addr);
> - if (p4d_none(*p4d)) {
> - /* Can only happen with 5-level paging */
> - p4d = p4d_alloc(&init_mm, pgd, addr);
> - if (!p4d) {
> - lvl = "p4d";
> - goto failed;
> - }
> - }
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;
>
> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }
>
> return;
> --
> 2.26.2


This appears to fix the issue, so:

Tested-by: Jason A. Donenfeld <[email protected]>

2020-08-07 10:11:08

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

On Fri, Aug 07, 2020 at 10:40:13AM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> The code for preallocate_vmalloc_pages() was written under the
> assumption that the p4d_offset() and pud_offset() functions will perform
> present checks before dereferencing the parent entries.
>
> This assumption is wrong an leads to a bug in the code which causes the
> physical address found in the PGD be used as a page-table page, even if
> the PGD is not present.
>
> So the code flow currently is:
>
> pgd = pgd_offset_k(addr);
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> p4d = p4d_alloc(&init_mm, pgd, addr);
>
> This lacks a check for pgd_none() at least, the correct flow would be:
>
> pgd = pgd_offset_k(addr);
> if (pgd_none(*pgd))
> p4d = p4d_alloc(&init_mm, pgd, addr);
> else
> p4d = p4d_offset(pgd, addr);
>
> But this is the same flow that the p4d_alloc() and the pud_alloc()
> functions use internally, so there is no need to duplicate them.
>
> Remove the p?d_none() checks from the function and just call into
> p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
>
> Reported-by: Jason A. Donenfeld <[email protected]>
> Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
> Signed-off-by: Joerg Roedel <[email protected]>

LGTM,

Reviewed-by: Mike Rapoport <[email protected]>

> ---
> arch/x86/mm/init_64.c | 31 +++++++++++++------------------
> 1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f4e29a78f2b..449e071240e1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
> p4d_t *p4d;
> pud_t *pud;
>
> - p4d = p4d_offset(pgd, addr);
> - if (p4d_none(*p4d)) {
> - /* Can only happen with 5-level paging */
> - p4d = p4d_alloc(&init_mm, pgd, addr);
> - if (!p4d) {
> - lvl = "p4d";
> - goto failed;
> - }
> - }
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;
>
> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }
>
> return;
> --
> 2.26.2
>

--
Sincerely yours,
Mike.

2020-08-10 14:28:27

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

... adding Kirill

On 8/7/20 1:40 AM, Joerg Roedel wrote:
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;

It's early and I'm a coffee or two short of awake, but I had to stare at
the comment for a but to make sense of it.

It feels wrong, I think, because the 5-level code usually ends up doing
*more* allocations and in this case, it is _appearing_ to do fewer.
Would something like this make sense?

/*
* The goal here is to allocate all possibly required
* hardware page tables pointed to by the top hardware
* level.
*
* On 4-level systems, the p4d layer is folded away and
* the above code does no preallocation. Below, go down
* to the pud _software_ level to ensure the second
* hardware level is allocated.
*/


> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }

2020-08-10 15:56:14

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

On Mon, Aug 10, 2020 at 07:27:33AM -0700, Dave Hansen wrote:
> ... adding Kirill
>
> On 8/7/20 1:40 AM, Joerg Roedel wrote:
> > + lvl = "p4d";
> > + p4d = p4d_alloc(&init_mm, pgd, addr);
> > + if (!p4d)
> > + goto failed;
> >
> > + /*
> > + * With 5-level paging the P4D level is not folded. So the PGDs
> > + * are now populated and there is no need to walk down to the
> > + * PUD level.
> > + */
> > if (pgtable_l5_enabled())
> > continue;
>
> It's early and I'm a coffee or two short of awake, but I had to stare at
> the comment for a but to make sense of it.
>
> It feels wrong, I think, because the 5-level code usually ends up doing
> *more* allocations and in this case, it is _appearing_ to do fewer.
> Would something like this make sense?

Unless I miss something, with 5 levels vmalloc mappings are shared at
p4d level, so allocating a p4d page would be enough. With 4 levels,
p4d_alloc() is a nop and pud is the first actually populated level below
pgd.

> /*
> * The goal here is to allocate all possibly required
> * hardware page tables pointed to by the top hardware
> * level.
> *
> * On 4-level systems, the p4d layer is folded away and
> * the above code does no preallocation. Below, go down
> * to the pud _software_ level to ensure the second
> * hardware level is allocated.
> */
>
>
> > - pud = pud_offset(p4d, addr);
> > - if (pud_none(*pud)) {
> > - /* Ends up here only with 4-level paging */
> > - pud = pud_alloc(&init_mm, p4d, addr);
> > - if (!pud) {
> > - lvl = "pud";
> > - goto failed;
> > - }
> > - }
> > + lvl = "pud";
> > + pud = pud_alloc(&init_mm, p4d, addr);
> > + if (!pud)
> > + goto failed;
> > }

--
Sincerely yours,
Mike.

2020-08-13 19:24:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries


* Mike Rapoport <[email protected]> wrote:

> On Mon, Aug 10, 2020 at 07:27:33AM -0700, Dave Hansen wrote:
> > ... adding Kirill
> >
> > On 8/7/20 1:40 AM, Joerg Roedel wrote:
> > > + lvl = "p4d";
> > > + p4d = p4d_alloc(&init_mm, pgd, addr);
> > > + if (!p4d)
> > > + goto failed;
> > >
> > > + /*
> > > + * With 5-level paging the P4D level is not folded. So the PGDs
> > > + * are now populated and there is no need to walk down to the
> > > + * PUD level.
> > > + */
> > > if (pgtable_l5_enabled())
> > > continue;
> >
> > It's early and I'm a coffee or two short of awake, but I had to stare at
> > the comment for a but to make sense of it.
> >
> > It feels wrong, I think, because the 5-level code usually ends up doing
> > *more* allocations and in this case, it is _appearing_ to do fewer.
> > Would something like this make sense?
>
> Unless I miss something, with 5 levels vmalloc mappings are shared at
> p4d level, so allocating a p4d page would be enough. With 4 levels,
> p4d_alloc() is a nop and pud is the first actually populated level below
> pgd.
>
> > /*
> > * The goal here is to allocate all possibly required
> > * hardware page tables pointed to by the top hardware
> > * level.
> > *
> > * On 4-level systems, the p4d layer is folded away and
> > * the above code does no preallocation. Below, go down
> > * to the pud _software_ level to ensure the second
> > * hardware level is allocated.
> > */

Would be nice to integrate all these explanations into the comment itself?

Thanks,

Ingo