2018-03-07 17:48:57

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH 0/2] fix memory leak / panic in ioremap huge pages

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings. Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86. INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue. This patch 01 was leveraged from Hanjun's patch. [1]
Patch 02 fixes the issue on x86 by implementing the interfaces.

[1] https://patchwork.kernel.org/patch/10134581/

---
Toshi Kani (2):
1/2 mm/vmalloc: Add interfaces to free unused page table
2/2 x86/mm: implement free pmd/pte page interfaces

---
arch/arm64/mm/mmu.c | 10 ++++++++++
arch/x86/mm/pgtable.c | 44 +++++++++++++++++++++++++++++++++++++++++++
include/asm-generic/pgtable.h | 10 ++++++++++
lib/ioremap.c | 6 ++++--
4 files changed, 68 insertions(+), 2 deletions(-)


2018-03-07 17:48:53

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings. Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86. INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Wang Xuefeng <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/arm64/mm/mmu.c | 10 ++++++++++
arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++
include/asm-generic/pgtable.h | 10 ++++++++++
lib/ioremap.c | 6 ++++--
4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 84a019f55022..84a37b4bc28e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
}
+
+int pud_free_pmd_page(pud_t *pud)
+{
+ return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+ return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..942f4fa341f1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)

return 0;
}
+
+/**
+ * pud_free_pmd_page - clear pud entry and free pmd page
+ *
+ * Returns 1 on success and 0 on failure (pud not cleared).
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+ return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - clear pmd entry and free pte page
+ *
+ * Returns 1 on success and 0 on failure (pmd not cleared).
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+ return pmd_none(*pmd);
+}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
{
return 0;
}
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+ return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+ return 0;
+}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */

#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,

if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
- IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+ IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+ pmd_free_pte_page(pmd)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,

if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
- IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+ IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
+ pud_free_pmd_page(pud)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}

2018-03-07 17:50:32

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces

Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

Signed-off-by: Toshi Kani <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/mm/pgtable.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 942f4fa341f1..121c0114439e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -710,7 +710,22 @@ int pmd_clear_huge(pmd_t *pmd)
*/
int pud_free_pmd_page(pud_t *pud)
{
- return pud_none(*pud);
+ pmd_t *pmd;
+ int i;
+
+ if (pud_none(*pud))
+ return 1;
+
+ pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+ for (i = 0; i < PTRS_PER_PMD; i++)
+ if (!pmd_free_pte_page(&pmd[i]))
+ return 0;
+
+ pud_clear(pud);
+ free_page((unsigned long)pmd);
+
+ return 1;
}

/**
@@ -720,6 +735,15 @@ int pud_free_pmd_page(pud_t *pud)
*/
int pmd_free_pte_page(pmd_t *pmd)
{
- return pmd_none(*pmd);
+ pte_t *pte;
+
+ if (pmd_none(*pmd))
+ return 1;
+
+ pte = (pte_t *)pmd_page_vaddr(*pmd);
+ pmd_clear(pmd);
+ free_page((unsigned long)pte);
+
+ return 1;
}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */

2018-03-07 22:56:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <[email protected]> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
>
> index 004abf9ebf12..942f4fa341f1 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
>
> return 0;
> }
> +
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> + return pud_none(*pud);
> +}
> +
> +/**
> + * pmd_free_pte_page - clear pmd entry and free pte page
> + *
> + * Returns 1 on success and 0 on failure (pmd not cleared).
> + */
> +int pmd_free_pte_page(pmd_t *pmd)
> +{
> + return pmd_none(*pmd);
> +}

Are these functions well named? I mean, the comment says "clear pud
entry and free pmd page" but the implementatin does neither of those
things. The name implies that the function frees a pmd_page but the
callsites use the function as a way of querying state.



2018-03-07 22:57:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <[email protected]> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.

Also, as this fixes an arm64 kernel panic, should we be using cc:stable?

2018-03-07 23:02:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces

On Wed, 7 Mar 2018 11:32:27 -0700 Toshi Kani <[email protected]> wrote:

> Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
> clear a given pud/pmd entry and free up lower level page table(s).
> Address range associated with the pud/pmd entry must have been purged
> by INVLPG.

OK, now we have implementations which match the naming ;) Again, is a
cc:stable warranted?

Do you have any preferences/suggestions as to which tree these should
be merged through? You're hitting core, arm and x86.

2018-03-07 23:04:23

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Wed, 2018-03-07 at 14:54 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <[email protected]> wrote:
>
> > On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> > may create pud/pmd mappings. Kernel panic was observed on arm64
> > systems with Cortex-A75 in the following steps as described by
> > Hanjun Guo.
> >
> > 1. ioremap a 4K size, valid page table will build,
> > 2. iounmap it, pte0 will set to 0;
> > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > then set the a new value for pmd;
> > 4. pte0 is leaked;
> > 5. CPU may meet exception because the old pmd is still in TLB,
> > which will lead to kernel panic.
> >
> > This panic is not reproducible on x86. INVLPG, called from iounmap,
> > purges all levels of entries associated with purged address on x86.
> > x86 still has memory leak.
> >
> > Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> > which clear a given pud/pmd entry and free up a page for the lower
> > level entries.
> >
> > This patch implements their stub functions on x86 and arm64, which
> > work as workaround.
> >
> > index 004abf9ebf12..942f4fa341f1 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
> >
> > return 0;
> > }
> > +
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > + return pud_none(*pud);
> > +}
> > +
> > +/**
> > + * pmd_free_pte_page - clear pmd entry and free pte page
> > + *
> > + * Returns 1 on success and 0 on failure (pmd not cleared).
> > + */
> > +int pmd_free_pte_page(pmd_t *pmd)
> > +{
> > + return pmd_none(*pmd);
> > +}
>
> Are these functions well named? I mean, the comment says "clear pud
> entry and free pmd page" but the implementatin does neither of those
> things. The name implies that the function frees a pmd_page but the
> callsites use the function as a way of querying state.

This patch 1/2 only implements stubs. Patch 2/2 implements what is
described here.

> Also, as this fixes an arm64 kernel panic, should we be using
> cc:stable?

Right.

Thanks,
-Toshi

2018-03-07 23:23:38

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces

On Wed, 2018-03-07 at 15:01 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2018 11:32:27 -0700 Toshi Kani <[email protected]> wrote:
>
> > Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
> > clear a given pud/pmd entry and free up lower level page table(s).
> > Address range associated with the pud/pmd entry must have been purged
> > by INVLPG.
>
> OK, now we have implementations which match the naming ;) Again, is a
> cc:stable warranted?

Right. This patch 2/2 fixes the memory leak on x86.

Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")

Patch 1/2 fixes the panic on arm64.

Fixes: 324420bf91f60 ("arm64: add support for ioremap() block mappings")

> Do you have any preferences/suggestions as to which tree these should
> be merged through? You're hitting core, arm and x86.

No, I do not have any preference.

Thanks,
-Toshi

2018-03-08 04:03:31

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> + return pud_none(*pud);
> +}

Wouldn't it be clearer if you returned 'bool' instead of 'int' here?

Also you didn't document the pud parameter, nor use the approved form
for documenting the return type, nor the calling context. So I would
have written it out like this:

/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
*
* Context: Caller should hold mmap_sem write-locked.
* Return: %true if clearing the entry succeeded.
*/


2018-03-08 08:10:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table


* Toshi Kani <[email protected]> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.

Where does x86 iounmap() do that?

> x86 still has memory leak.
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.

At minimum the ordering of the patches is very confusing: why don't you introduce
the new methods in patch #1, and then use them in patch #2?

Also please double check the coding style of your patches, there's a number of
obvious problems of outright bad patterns and also cases where you clearly don't
try to follow the (correct) style of existing code.

Thanks,

Ingo

2018-03-08 15:57:55

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > + return pud_none(*pud);
> > +}
>
> Wouldn't it be clearer if you returned 'bool' instead of 'int' here?

I thought about it and decided to use 'int' since all other pud/pmd/pte
interfaces, such as pud_none() above, use 'int'.

> Also you didn't document the pud parameter, nor use the approved form
> for documenting the return type, nor the calling context. So I would
> have written it out like this:
>
> /**
> * pud_free_pmd_page - Clear pud entry and free pmd page.
> * @pud: Pointer to a PUD.
> *
> * Context: Caller should hold mmap_sem write-locked.
> * Return: %true if clearing the entry succeeded.
> */

Will do.

Thanks!
-Toshi

2018-03-08 18:05:59

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

Hi Toshi,

Thanks for the patches!

On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
>
> Reported-by: Lei Li <[email protected]>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Wang Xuefeng <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Hanjun Guo <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> ---
> arch/arm64/mm/mmu.c | 10 ++++++++++
> arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++
> include/asm-generic/pgtable.h | 10 ++++++++++
> lib/ioremap.c | 6 ++++--
> 4 files changed, 44 insertions(+), 2 deletions(-)

[...]

> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index b808a390e4c3..54e5bbaa3200 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>
> if (ioremap_pmd_enabled() &&
> ((next - addr) == PMD_SIZE) &&
> - IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> + IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> + pmd_free_pte_page(pmd)) {

I find it a bit weird that we're postponing this to the subsequent map. If
we want to address the break-before-make issue that was causing a panic on
arm64, then I think it would be better to do this on the unmap path to avoid
duplicating TLB invalidation.

Will

2018-03-08 19:32:03

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Thu, 2018-03-08 at 18:04 +0000, Will Deacon wrote:
:
> > diff --git a/lib/ioremap.c b/lib/ioremap.c
> > index b808a390e4c3..54e5bbaa3200 100644
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >
> > if (ioremap_pmd_enabled() &&
> > ((next - addr) == PMD_SIZE) &&
> > - IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> > + IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> > + pmd_free_pte_page(pmd)) {
>
> I find it a bit weird that we're postponing this to the subsequent map. If
> we want to address the break-before-make issue that was causing a panic on
> arm64, then I think it would be better to do this on the unmap path to avoid
> duplicating TLB invalidation.

Hi Will,

Yes, I started looking into doing it the unmap path, but found the
following issues:

- The iounmap() path is shared with vunmap(). Since vmap() only
supports pte mappings, making vunmap() to free pte pages is an overhead
for regular vmap users as they do not need pte pages freed up.
- Checking to see if all entries in a pte page are cleared in the unmap
path is racy, and serializing this check is expensive.
- The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.

Hence, I decided to postpone and do it in the ioremap path when a
pud/pmd mapping is set. The "break" on arm64 happens when you update a
pmd entry without purging it. So, the unmap path is not broken. I
understand that arm64 may need extra TLB purge in pmd_free_pte_page(),
but it limits this overhead only when it sets up a pud/pmd mapping.

Thanks,
-Toshi

2018-03-08 22:08:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > +/**
> > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > + *
> > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > + */
> > > +int pud_free_pmd_page(pud_t *pud)
> > > +{
> > > + return pud_none(*pud);
> > > +}
> >
> > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
>
> I thought about it and decided to use 'int' since all other pud/pmd/pte
> interfaces, such as pud_none() above, use 'int'.

These interfaces were introduced before we had bool ... I suspect nobody's
taken the time to go through and convert them all.


2018-03-08 23:28:50

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

On Thu, 2018-03-08 at 14:07 -0800, Matthew Wilcox wrote:
> On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> > On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > > +/**
> > > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > > + *
> > > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > > + */
> > > > +int pud_free_pmd_page(pud_t *pud)
> > > > +{
> > > > + return pud_none(*pud);
> > > > +}
> > >
> > > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
> >
> > I thought about it and decided to use 'int' since all other pud/pmd/pte
> > interfaces, such as pud_none() above, use 'int'.
>
> These interfaces were introduced before we had bool ... I suspect nobody's
> taken the time to go through and convert them all.

I see. Since this patchset already changes core, arm and x86, and will
be back ported to stables. So, if you do not mind, I'd like to leave it
consistent with others with 'int', and make the footstep minimum.

Thanks,
-Toshi