2023-10-12 19:54:36

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 0/2] Allow nesting of lazy MMU mode

Dave Woodhouse reported that we now nest calls to
arch_enter_lazy_mmu_mode(). That was inadvertent, but in principle we
should allow it. On further investigation, Juergen already fixed it
for Xen, but didn't tell anyone. Fix it for Sparc & PowerPC too.
This may or may not help fix the problem that Erhard reported.

Matthew Wilcox (Oracle) (2):
powerpc: Allow nesting of lazy MMU mode
sparc: Allow nesting of lazy MMU mode

arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++---
arch/sparc/mm/tlb.c | 5 ++---
2 files changed, 4 insertions(+), 6 deletions(-)

--
2.40.1


2023-10-12 19:54:43

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/2] sparc: Allow nesting of lazy MMU mode

As noted in commit 49147beb0ccb ("x86/xen: allow nesting of same lazy
mode"), we can now nest calls to arch_enter_lazy_mmu_mode(). Use ->active
as a counter instead of a flag and only drain the batch when the counter
hits 0.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
---
arch/sparc/mm/tlb.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index b44d79d778c7..a82c7c32e47d 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -54,16 +54,15 @@ void arch_enter_lazy_mmu_mode(void)
{
struct tlb_batch *tb = this_cpu_ptr(&tlb_batch);

- tb->active = 1;
+ tb->active++;
}

void arch_leave_lazy_mmu_mode(void)
{
struct tlb_batch *tb = this_cpu_ptr(&tlb_batch);

- if (tb->tlb_nr)
+ if ((--tb->active == 0) && tb->tlb_nr)
flush_tlb_pending();
- tb->active = 0;
}

static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,
--
2.40.1

2023-10-12 19:54:52

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/2] powerpc: Allow nesting of lazy MMU mode

As noted in commit 49147beb0ccb ("x86/xen: allow nesting of same lazy
mode"), we can now nest calls to arch_enter_lazy_mmu_mode(). Use ->active
as a counter instead of a flag and only drain the batch when the counter
hits 0.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
---
arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index 146287d9580f..bc845d876ed2 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -38,7 +38,7 @@ static inline void arch_enter_lazy_mmu_mode(void)
*/
preempt_disable();
batch = this_cpu_ptr(&ppc64_tlb_batch);
- batch->active = 1;
+ batch->active++;
}

static inline void arch_leave_lazy_mmu_mode(void)
@@ -49,9 +49,8 @@ static inline void arch_leave_lazy_mmu_mode(void)
return;
batch = this_cpu_ptr(&ppc64_tlb_batch);

- if (batch->index)
+ if ((--batch->active == 0) && batch->index)
__flush_tlb_pending(batch);
- batch->active = 0;
preempt_enable();
}

--
2.40.1

2023-10-13 13:42:47

by Erhard Furtner

[permalink] [raw]
Subject: Re: [PATCH 0/2] Allow nesting of lazy MMU mode

On Thu, 12 Oct 2023 20:54:13 +0100
"Matthew Wilcox (Oracle)" <[email protected]> wrote:

> Dave Woodhouse reported that we now nest calls to
> arch_enter_lazy_mmu_mode(). That was inadvertent, but in principle we
> should allow it. On further investigation, Juergen already fixed it
> for Xen, but didn't tell anyone. Fix it for Sparc & PowerPC too.
> This may or may not help fix the problem that Erhard reported.
>
> Matthew Wilcox (Oracle) (2):
> powerpc: Allow nesting of lazy MMU mode
> sparc: Allow nesting of lazy MMU mode
>
> arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++---
> arch/sparc/mm/tlb.c | 5 ++---
> 2 files changed, 4 insertions(+), 6 deletions(-)
>
> --
> 2.40.1

Applied the patch on top of v6.6-rc5 but unfortunately it did not fix my reported issue.

Regards,
Erhard

2023-10-17 06:05:29

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 0/2] Allow nesting of lazy MMU mode

Erhard Furtner <[email protected]> writes:

> On Thu, 12 Oct 2023 20:54:13 +0100
> "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
>> Dave Woodhouse reported that we now nest calls to
>> arch_enter_lazy_mmu_mode(). That was inadvertent, but in principle we
>> should allow it. On further investigation, Juergen already fixed it
>> for Xen, but didn't tell anyone. Fix it for Sparc & PowerPC too.
>> This may or may not help fix the problem that Erhard reported.
>>
>> Matthew Wilcox (Oracle) (2):
>> powerpc: Allow nesting of lazy MMU mode
>> sparc: Allow nesting of lazy MMU mode
>>
>> arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++---
>> arch/sparc/mm/tlb.c | 5 ++---
>> 2 files changed, 4 insertions(+), 6 deletions(-)
>>
>> --
>> 2.40.1
>
> Applied the patch on top of v6.6-rc5 but unfortunately it did not fix my reported issue.
>
> Regards,
> Erhard
>

With the problem reported I guess we are finding the page->compound_head
wrong and hence folio->flags PG_dcache_clean check crashing. I still
don't know why we find page->compound_head wrong. Michael noted we are
using FLAT_MEM. That implies we are suppose to inialize struct page correctly
via init_unavailable_range because we are hitting this on an ioremap
address. We need to instrument the kernel to track the initialization of
the struct page backing these pfns which we know is crashing.

W.r.t arch_enter_lazy_mmu_mode() we can skip that completely on powerpc
because we don't allow the usage of set_pte on a valid pte entries. pte
updates are not done via set_pte interface and hence there is no TLB
invalidate required while using set_pte().

ie, we can do something like below. The change also make sure we call
set_pte_filter on all the ptes we are setting via set_ptes(). I haven't
sent this as a proper patch because we still are not able to fix the
issue Erhard reported.

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 3ba9fe411604..95ab20cca2da 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr)
{
/*
- * Make sure hardware valid bit is not set. We don't do
- * tlb flush for this update.
+ * We don't need to call arch_enter/leave_lazy_mmu_mode()
+ * because we expect set_ptes to be only be used on not present
+ * and not hw_valid ptes. Hence there is not translation cache flush
+ * involved that need to be batched.
*/
- VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
+ for (;;) {

- /* Note: mm->context.id might not yet have been assigned as
- * this context might not have been activated yet when this
- * is called.
- */
- pte = set_pte_filter(pte);
+ /*
+ * Make sure hardware valid bit is not set. We don't do
+ * tlb flush for this update.
+ */
+ VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));

- /* Perform the setting of the PTE */
- arch_enter_lazy_mmu_mode();
- for (;;) {
+ /* Note: mm->context.id might not yet have been assigned as
+ * this context might not have been activated yet when this
+ * is called.
+ */
+ pte = set_pte_filter(pte);
+
+ /* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
break;
ptep++;
- pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
addr += PAGE_SIZE;
+ /* increment the pfn */
+ pte = __pte(pte_val(pte) + PAGE_SIZE);
+
}
- arch_leave_lazy_mmu_mode();
}

void unmap_kernel_page(unsigned long va)

2023-10-17 23:15:02

by Erhard Furtner

[permalink] [raw]
Subject: Re: [PATCH 0/2] Allow nesting of lazy MMU mode

On Tue, 17 Oct 2023 11:34:23 +0530
"Aneesh Kumar K.V" <[email protected]> wrote:

> ie, we can do something like below. The change also make sure we call
> set_pte_filter on all the ptes we are setting via set_ptes(). I haven't
> sent this as a proper patch because we still are not able to fix the
> issue Erhard reported.
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 3ba9fe411604..95ab20cca2da 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> pte_t pte, unsigned int nr)
> {
> /*
> - * Make sure hardware valid bit is not set. We don't do
> - * tlb flush for this update.
> + * We don't need to call arch_enter/leave_lazy_mmu_mode()
> + * because we expect set_ptes to be only be used on not present
> + * and not hw_valid ptes. Hence there is not translation cache flush
> + * involved that need to be batched.
> */
> - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> + for (;;) {
>
> - /* Note: mm->context.id might not yet have been assigned as
> - * this context might not have been activated yet when this
> - * is called.
> - */
> - pte = set_pte_filter(pte);
> + /*
> + * Make sure hardware valid bit is not set. We don't do
> + * tlb flush for this update.
> + */
> + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>
> - /* Perform the setting of the PTE */
> - arch_enter_lazy_mmu_mode();
> - for (;;) {
> + /* Note: mm->context.id might not yet have been assigned as
> + * this context might not have been activated yet when this
> + * is called.
> + */
> + pte = set_pte_filter(pte);
> +
> + /* Perform the setting of the PTE */
> __set_pte_at(mm, addr, ptep, pte, 0);
> if (--nr == 0)
> break;
> ptep++;
> - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
> addr += PAGE_SIZE;
> + /* increment the pfn */
> + pte = __pte(pte_val(pte) + PAGE_SIZE);
> +
> }
> - arch_leave_lazy_mmu_mode();
> }
>
> void unmap_kernel_page(unsigned long va)

Was this a new version of the patch for me to test? Sorry for asking but this was a bit unclear to me.

In any case I tried it on top of v6.6-rc6 and it did not help with the issue I reported.

Regards,
Erhard