This series fixes x86 ioremap free page handlings when setting up
pud/pmd maps.
Patch 01 is from Chintan's v9 01/04 patch [1], which adds a new arg 'addr'.
This avoids merge conflicts with his series.
Patch 02 adds a TLB purge (INVLPG) to purge page-structure caches that
may be cached by speculation. See patch 2/2 for the detals.
Patch 03 disables free page handling on x86-PAE to address BUG_ON reported
by Joerg.
[1] https://patchwork.kernel.org/patch/10371015/
---
Chintan Pandya (1):
1/3 ioremap: Update pgtable free interfaces with addr
Toshi Kani (2):
2/3 x86/mm: add TLB purge to free pmd/pte page interfaces
3/3 x86/mm: disable ioremap free page handling on x86-PAE
---
arch/arm64/mm/mmu.c | 4 +--
arch/x86/mm/pgtable.c | 57 +++++++++++++++++++++++++++++++++++++------
include/asm-generic/pgtable.h | 8 +++---
lib/ioremap.c | 4 +--
4 files changed, 57 insertions(+), 16 deletions(-)
ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd
tables are not shared among processes on x86-PAE. Therefore, any
update to sync'd pmd entries need re-syncing. Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
Disable free page handling on x86-PAE. pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: <[email protected]>
---
arch/x86/mm/pgtable.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 816fd41ee854..809115150d8b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
}
+#ifdef CONFIG_X86_64
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
@@ -784,4 +785,22 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
}
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+ return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+ return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
From: Chintan Pandya <[email protected]>
This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.
pud_free_pmd_page() and pmd_free_pte_page()
Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/
Pass 'addr' in these interfaces so that proper TLB ops
can be performed.
Fixes: b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
Signed-off-by: Chintan Pandya <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: <[email protected]>
---
arch/arm64/mm/mmu.c | 4 ++--
arch/x86/mm/pgtable.c | 8 +++++---
include/asm-generic/pgtable.h | 8 ++++----
lib/ioremap.c | 4 ++--
4 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
}
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return pud_none(*pud);
}
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return pmd_none(*pmd);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..37e3cbac59b9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
*
* Context: The pud range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
pmd_t *pmd;
int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i]))
+ if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
return 0;
pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
/**
* pmd_free_pte_page - Clear pmd entry and free pte page.
* @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
*
* Context: The pmd range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
pte_t *pte;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
{
return 0;
}
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return 0;
}
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return 0;
}
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
- pmd_free_pte_page(pmd)) {
+ pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
- pud_free_pmd_page(pud)) {
+ pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: <[email protected]>
---
arch/x86/mm/pgtable.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 37e3cbac59b9..816fd41ee854 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
On Mon, 2018-04-30 at 11:59 -0600, Toshi Kani wrote:
> ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd
> tables are not shared among processes on x86-PAE. Therefore, any
> update to sync'd pmd entries need re-syncing. Freeing a pte page
> also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
>
> Disable free page handling on x86-PAE. pud_free_pmd_page() and
> pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
> This assures that ioremap() does not update sync'd pmd entries at the
> cost of falling back to pte mappings.
>
> Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> Reported-by: Joerg Roedel <[email protected]>
Hi Joerg,
Does it solve your problem? Let me know if you have any issue with the
series.
Thanks,
-Toshi
On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
> int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> {
> - pmd_t *pmd;
> + pmd_t *pmd, *pmd_sv;
> + pte_t *pte;
> int i;
>
> if (pud_none(*pud))
> return 1;
>
> pmd = (pmd_t *)pud_page_vaddr(*pud);
> + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
So you need to allocate a page to free a page? It is better to put the
pages into a list with a list_head on the stack.
I am still on favour of just reverting the broken commit and do a
correct and working fix for the/a merge window.
Joerg
On Tue, 2018-05-15 at 16:05 +0200, Joerg Roedel wrote:
> On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
> > int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> > {
> > - pmd_t *pmd;
> > + pmd_t *pmd, *pmd_sv;
> > + pte_t *pte;
> > int i;
> >
> > if (pud_none(*pud))
> > return 1;
> >
> > pmd = (pmd_t *)pud_page_vaddr(*pud);
> > + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
>
> So you need to allocate a page to free a page? It is better to put the
> pages into a list with a list_head on the stack.
The code should have checked if pmd_sv is NULL... I will update the
patch.
For performance, I do not think this page alloc is a problem. Unlike
pmd_free_pte_page(), pud_free_pmd_page() covers an extremely rare case.
Since pud requires 1GB-alignment, pud and pmd/pte mappings do not
share the same ranges within the vmalloc space. I had to instrument the
kernel to force them share the same ranges in order to test this patch.
> I am still on favour of just reverting the broken commit and do a
> correct and working fix for the/a merge window.
I will reorder the patch series, and change patch 3/3 to 1/3 so that we
can take it first to fix the BUG_ON on PAE. This revert will disable
2MB ioremap on PAE in some cases, but I do not think it's important on
PAE anyway.
I do not think revert on x86/64 is necessary and I am more worried about
disabling 2MB ioremap in some cases, which can be seen as degradation.
Patch 2/3 fixes a possible page-directory cache issue that I cannot hit
even though I put ioremap/iounmap with various sizes into a tight loop
for a day.
Thanks,
-Toshi