2021-12-16 17:47:38

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 1/2] powerpc/set_memory: Avoid spinlock recursion in change_page_attr()

Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
included a spin_lock() to change_page_attr() in order to
safely perform the three step operations. But then
commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
concurrent accesses") modify it to use pte_update() and do
the operation atomically.

In the meantime, Maxime reported some spinlock recursion.

[ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217
[ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0
[ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523
[ 15.373350] Workqueue: events do_free_init
[ 15.377615] Call Trace:
[ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable)
[ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4
[ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310
[ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0
[ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8
[ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94
[ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310
[ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134
[ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8
[ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c
[ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8
[ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94
[ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8
[ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8
[ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210
[ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c

Remove the spin_lock() in change_page_attr().

Reported-by: Maxime Bizon <[email protected]>
Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/
Cc: Russell Currey <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/pageattr.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index edea388e9d3f..308adc51da9d 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -30,8 +30,6 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
long action = (long)data;
pte_t pte;

- spin_lock(&init_mm.page_table_lock);
-
pte = ptep_get(ptep);

/* modify the PTE bits as desired, then apply */
@@ -61,8 +59,6 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)

flush_tlb_kernel_range(addr, addr + PAGE_SIZE);

- spin_unlock(&init_mm.page_table_lock);
-
return 0;
}

--
2.33.1


2021-12-16 17:47:46

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 2/2] powerpc: Add set_memory_{p/np}() and remove set_memory_attr()

set_memory_attr() was implemented by commit 4d1755b6a762 ("powerpc/mm:
implement set_memory_attr()") because the set_memory_xx() couldn't
be used at that time to modify memory "on the fly" as explained it
the commit.

But set_memory_attr() uses set_pte_at() which leads to warnings when
CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for
updating existing page table entries.

The check could be bypassed by using __set_pte_at() instead,
as it was the case before commit c988cfd38e48 ("powerpc/32:
use set_memory_attr()") but since commit 9f7853d7609d ("powerpc/mm:
Fix set_memory_*() against concurrent accesses") it is now possible
to use set_memory_xx() functions to update page table entries
"on the fly" because the update is now atomic.

For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT.
Add set_memory_np() and set_memory_p() for that.

Replace all uses of set_memory_attr() by the relevant set_memory_xx()
and remove set_memory_attr().

Reported-by: Maxime Bizon <[email protected]>
Fixes: c988cfd38e48 ("powerpc/32: use set_memory_attr()")
Cc: [email protected]
Depends-on: 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses")
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Russell Currey <[email protected]>
Tested-by: Maxime Bizon <[email protected]>
---
v2: Add comment to SET_MEMORY_P and SET_MEMORY_NP
---
arch/powerpc/include/asm/book3s/32/pgtable.h | 10 +++++
arch/powerpc/include/asm/book3s/64/pgtable.h | 10 +++++
arch/powerpc/include/asm/nohash/pgtable.h | 10 +++++
arch/powerpc/include/asm/set_memory.h | 12 +++++-
arch/powerpc/mm/pageattr.c | 39 +++-----------------
arch/powerpc/mm/pgtable_32.c | 24 ++++++------
6 files changed, 58 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 609c80f67194..4ceebb291896 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -518,6 +518,16 @@ static inline pte_t pte_mkuser(pte_t pte)
return __pte(pte_val(pte) | _PAGE_USER);
}

+static inline pte_t pte_mkpresent(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_PRESENT);
+}
+
+static inline pte_t pte_mkabsent(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_PRESENT);
+}
+
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 33e073d6b0c4..2bbc8b69b7f4 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -724,6 +724,16 @@ static inline pte_t pte_mkuser(pte_t pte)
return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRIVILEGED));
}

+static inline pte_t pte_mkpresent(pte_t pte)
+{
+ return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PRESENT));
+}
+
+static inline pte_t pte_mkabsent(pte_t pte)
+{
+ return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRESENT));
+}
+
/*
* This is potentially called with a pmd as the argument, in which case it's not
* safe to check _PAGE_DEVMAP unless we also confirm that _PAGE_PTE is set.
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index ac75f4ab0dba..3d4169969900 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -166,6 +166,16 @@ static inline pte_t pte_mkuser(pte_t pte)
}
#endif

+static inline pte_t pte_mkpresent(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_PRESENT);
+}
+
+static inline pte_t pte_mkabsent(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_PRESENT);
+}
+
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
diff --git a/arch/powerpc/include/asm/set_memory.h b/arch/powerpc/include/asm/set_memory.h
index b040094f7920..7ebc807aa8cc 100644
--- a/arch/powerpc/include/asm/set_memory.h
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -6,6 +6,8 @@
#define SET_MEMORY_RW 1
#define SET_MEMORY_NX 2
#define SET_MEMORY_X 3
+#define SET_MEMORY_NP 4 /* Set memory non present */
+#define SET_MEMORY_P 5 /* Set memory present */

int change_memory_attr(unsigned long addr, int numpages, long action);

@@ -29,6 +31,14 @@ static inline int set_memory_x(unsigned long addr, int numpages)
return change_memory_attr(addr, numpages, SET_MEMORY_X);
}

-int set_memory_attr(unsigned long addr, int numpages, pgprot_t prot);
+static inline int set_memory_np(unsigned long addr, int numpages)
+{
+ return change_memory_attr(addr, numpages, SET_MEMORY_NP);
+}
+
+static inline int set_memory_p(unsigned long addr, int numpages)
+{
+ return change_memory_attr(addr, numpages, SET_MEMORY_P);
+}

#endif
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 308adc51da9d..eb5405d410f1 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -46,6 +46,12 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
case SET_MEMORY_X:
pte = pte_mkexec(pte);
break;
+ case SET_MEMORY_NP:
+ pte = pte_mkabsent(pte);
+ break;
+ case SET_MEMORY_P:
+ pte = pte_mkpresent(pte);
+ break;
default:
WARN_ON_ONCE(1);
break;
@@ -92,36 +98,3 @@ int change_memory_attr(unsigned long addr, int numpages, long action)
return apply_to_existing_page_range(&init_mm, start, size,
change_page_attr, (void *)action);
}
-
-/*
- * Set the attributes of a page:
- *
- * This function is used by PPC32 at the end of init to set final kernel memory
- * protection. It includes changing the maping of the page it is executing from
- * and data pages it is using.
- */
-static int set_page_attr(pte_t *ptep, unsigned long addr, void *data)
-{
- pgprot_t prot = __pgprot((unsigned long)data);
-
- spin_lock(&init_mm.page_table_lock);
-
- set_pte_at(&init_mm, addr, ptep, pte_modify(*ptep, prot));
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
-
- spin_unlock(&init_mm.page_table_lock);
-
- return 0;
-}
-
-int set_memory_attr(unsigned long addr, int numpages, pgprot_t prot)
-{
- unsigned long start = ALIGN_DOWN(addr, PAGE_SIZE);
- unsigned long sz = numpages * PAGE_SIZE;
-
- if (numpages <= 0)
- return 0;
-
- return apply_to_existing_page_range(&init_mm, start, sz, set_page_attr,
- (void *)pgprot_val(prot));
-}
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 906e4e4328b2..f71ededdc02a 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -135,10 +135,12 @@ void mark_initmem_nx(void)
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
PFN_DOWN((unsigned long)_sinittext);

- if (v_block_mapped((unsigned long)_sinittext))
+ if (v_block_mapped((unsigned long)_sinittext)) {
mmu_mark_initmem_nx();
- else
- set_memory_attr((unsigned long)_sinittext, numpages, PAGE_KERNEL);
+ } else {
+ set_memory_nx((unsigned long)_sinittext, numpages);
+ set_memory_rw((unsigned long)_sinittext, numpages);
+ }
}

#ifdef CONFIG_STRICT_KERNEL_RWX
@@ -152,18 +154,14 @@ void mark_rodata_ro(void)
return;
}

- numpages = PFN_UP((unsigned long)_etext) -
- PFN_DOWN((unsigned long)_stext);
-
- set_memory_attr((unsigned long)_stext, numpages, PAGE_KERNEL_ROX);
/*
- * mark .rodata as read only. Use __init_begin rather than __end_rodata
- * to cover NOTES and EXCEPTION_TABLE.
+ * mark .text and .rodata as read only. Use __init_begin rather than
+ * __end_rodata to cover NOTES and EXCEPTION_TABLE.
*/
numpages = PFN_UP((unsigned long)__init_begin) -
- PFN_DOWN((unsigned long)__start_rodata);
+ PFN_DOWN((unsigned long)_stext);

- set_memory_attr((unsigned long)__start_rodata, numpages, PAGE_KERNEL_RO);
+ set_memory_ro((unsigned long)_stext, numpages);

// mark_initmem_nx() should have already run by now
ptdump_check_wx();
@@ -179,8 +177,8 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
return;

if (enable)
- set_memory_attr(addr, numpages, PAGE_KERNEL);
+ set_memory_p(addr, numpages);
else
- set_memory_attr(addr, numpages, __pgprot(0));
+ set_memory_np(addr, numpages);
}
#endif /* CONFIG_DEBUG_PAGEALLOC */
--
2.33.1

2021-12-17 09:35:12

by Maxime Bizon

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/set_memory: Avoid spinlock recursion in change_page_attr()


On Thu, 2021-12-16 at 17:47 +0000, Christophe Leroy wrote:

Tested-by: Maxime Bizon <[email protected]>

Now running fine with every CONFIG_DEBUG_xxx enabled, thanks!

--
Maxime




2021-12-23 12:09:50

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/set_memory: Avoid spinlock recursion in change_page_attr()

Christophe Leroy <[email protected]> writes:
> Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
> included a spin_lock() to change_page_attr() in order to
> safely perform the three step operations. But then
> commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
> concurrent accesses") modify it to use pte_update() and do
> the operation atomically.

It's not really atomic, it's just safe against concurrent access.

We still do a read / modify / write of the pte value.

Which isn't safe against concurrent calls to change_page_attr() for the
same address.

But maybe that's OK? AFAICS other architectures (eg. arm64) have no
protection against concurrent callers. I think the assumption is higher
level code is ensuring there's only a single caller at a time.

On the other hand x86 and s390 do have locking (cpa_lock / cpa_mutex).
But it seems that's mostly to protect against splitting of page tables,
which we aren't doing.

We'd be a bit safer if we used pte_update() "properly", like I did in:

https://lore.kernel.org/linuxppc-dev/[email protected]/


cheers

> In the meantime, Maxime reported some spinlock recursion.
>
> [ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217
> [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0
> [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523
> [ 15.373350] Workqueue: events do_free_init
> [ 15.377615] Call Trace:
> [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable)
> [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4
> [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310
> [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0
> [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8
> [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94
> [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310
> [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134
> [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8
> [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c
> [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8
> [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94
> [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8
> [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8
> [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210
> [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c
>
> Remove the spin_lock() in change_page_attr().
>
> Reported-by: Maxime Bizon <[email protected]>
> Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/
> Cc: Russell Currey <[email protected]>
> Signed-off-by: Christophe Leroy <[email protected]>
> ---
> arch/powerpc/mm/pageattr.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
> index edea388e9d3f..308adc51da9d 100644
> --- a/arch/powerpc/mm/pageattr.c
> +++ b/arch/powerpc/mm/pageattr.c
> @@ -30,8 +30,6 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
> long action = (long)data;
> pte_t pte;
>
> - spin_lock(&init_mm.page_table_lock);
> -
> pte = ptep_get(ptep);
>
> /* modify the PTE bits as desired, then apply */
> @@ -61,8 +59,6 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
>
> flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>
> - spin_unlock(&init_mm.page_table_lock);
> -
> return 0;
> }
>
> --
> 2.33.1

2021-12-23 15:14:26

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/set_memory: Avoid spinlock recursion in change_page_attr()



Le 23/12/2021 à 13:09, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>> Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
>> included a spin_lock() to change_page_attr() in order to
>> safely perform the three step operations. But then
>> commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
>> concurrent accesses") modify it to use pte_update() and do
>> the operation atomically.
>
> It's not really atomic, it's just safe against concurrent access.
>
> We still do a read / modify / write of the pte value.
>
> Which isn't safe against concurrent calls to change_page_attr() for the
> same address.
>
> But maybe that's OK? AFAICS other architectures (eg. arm64) have no
> protection against concurrent callers. I think the assumption is higher
> level code is ensuring there's only a single caller at a time.
>
> On the other hand x86 and s390 do have locking (cpa_lock / cpa_mutex).
> But it seems that's mostly to protect against splitting of page tables,
> which we aren't doing.
>
> We'd be a bit safer if we used pte_update() "properly", like I did in:
>
> https://lore.kernel.org/linuxppc-dev/[email protected]/
>
>

Probably not so simple as that patch, but I get the idea.

See b6cb20fdc273 ("powerpc/book3e: Fix set_memory_x() and set_memory_nx()")

I think we then need to define platform specific helpers to do it,
similar to ptep_set_wrprotect() and avoid an #ifdefery in change_page_attr()

Christophe