2020-05-04 12:39:59

by Mark Marshall

[permalink] [raw]
Subject: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

Hi RT experts,

We are using the RT kernel with the PowerPC e500. Until recently we
were on the 4.19 kernel series, and are in the process of upgrading.
When we switched to the v5.4 version, we get a reproducible kernel
crash. The crashes all contain the "BUG: Bad rss-counter state" line,
and then after that it appears that a structure of type mm_struct or
vm_area_struct is corrupted.

The easiest way we have found to reproduce the crash is to repeatedly
insert and then remove a module. The crash then appears to be related
to either paging in the module or in exiting the mdev process. (The
crash does also happen at other times, but it is hard to reproduce
reliably then). This simple script will almost always crash:

for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done

(The crc7 module is chosen as it is small and simple. Any module will
work / crash).

We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels
do not show the problem. The v5.6 kernel does show the problem.
Switching of RT fixes the problem.

I have reduced the functionality in the kernel to a bare minimum
(removing networking, USB and PCI, as we have some out-of-tree patches
in those areas) and we still get the crash.

Here are a couple of example stack traces:

000: NIP [c003f8e0] __mmdrop+0x2c8/0x3dc
000: LR [c003f8e0] __mmdrop+0x2c8/0x3dc
000: Call Trace:
000: [e953fd48] [c003f8e0] __mmdrop+0x2c8/0x3dc
000: (unreliable)
000: [e953fd88] [c00c6d28] rcu_core+0x324/0x78c
000: [e953fe58] [c00c79e0] rcu_cpu_kthread+0x1f4/0x42c
000: [e953fe98] [c00838fc] smpboot_thread_fn+0x2e8/0x488
000: [e953fef8] [c007d514] kthread+0x1b0/0x1b8
000: [e953ff38] [c001a26c] ret_from_kernel_thread+0x14/0x1c


000: NIP [c010cdd4] acct_collect+0x3a8/0x3e0
000: LR [c010cdd4] acct_collect+0x3a8/0x3e0
000: Call Trace:
000: [c6f2bbe0] [c010cdd4] acct_collect+0x3a8/0x3e0
000: (unreliable)
000: [c6f2bc10] [c0049354] do_exit+0x294/0xf9c
000: [c6f2bcf0] [c0013030] die+0x220/0x2c4
000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238
000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238
000: [c6f2bd70] [c0013404] _exception+0x34/0x80
000: [c6f2bd90] [c001a4a8] ret_from_except_full+0x0/0x4


I have added some debugging code where the mm_struct and
vma_area_struct have "poision" values at the start and the end, and
this seems to show that the vma_area_struct is getting corrupted, but
I'm not able to see where.

We have switched on all of the debugging that we can, including
KASAN, and this shows nothing.


Can anyone help us? What can we try next? Is anyone using the e500
with the RT kernel? Does anyone have any idea how to debug problems
related to the error message "Bad rss-counter state"?

Any help or advice would be most gratefully received.

Many thanks,
Mark Marshall and Thomas Graziadei

PS. Thomas Grazidei (my colleague) did find a bug in the start_32.S
file for the e500, and we have the fix for that included. We have
also tried removing the LAZY_PREEMPTION patch completely, and this
doesn't help.


Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote:
> The easiest way we have found to reproduce the crash is to repeatedly
> insert and then remove a module. The crash then appears to be related
> to either paging in the module or in exiting the mdev process. (The
> crash does also happen at other times, but it is hard to reproduce
> reliably then). This simple script will almost always crash:
>
> for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done

So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and
UP version). No luck. I don't have anything with real hardware.
Could you share the .config in case this is related?

> (The crc7 module is chosen as it is small and simple. Any module will
> work / crash).
>
> We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels
> do not show the problem. The v5.6 kernel does show the problem.
> Switching of RT fixes the problem.
>
> I have reduced the functionality in the kernel to a bare minimum
> (removing networking, USB and PCI, as we have some out-of-tree patches
> in those areas) and we still get the crash.

> I have added some debugging code where the mm_struct and
> vma_area_struct have "poision" values at the start and the end, and
> this seems to show that the vma_area_struct is getting corrupted, but
> I'm not able to see where.

oh.

> We have switched on all of the debugging that we can, including
> KASAN, and this shows nothing.
>
>
> Can anyone help us? What can we try next? Is anyone using the e500
> with the RT kernel? Does anyone have any idea how to debug problems
> related to the error message "Bad rss-counter state"?
>
> Any help or advice would be most gratefully received.

I don't have any ideas. You could try to apply only a part of the RT
patch and see if it problem is still there. If you are lucky you find
the patch that introduces the problem. If not, the problem appears with
the RT switch…

> Many thanks,
> Mark Marshall and Thomas Graziadei

Sebastian

2020-05-29 15:41:23

by Mark Marshall

[permalink] [raw]
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

Hi Sebastian & list,

I had assumed that my e-mail had got lost or overlooked, I was meaning to
post a follow up message this week...

All I could find from the debugging and tracing that we added was that
something was going wrong with the mm data structures somewhere in the
exec code. In the end I just spent a week or two pouring over the diffs
of this code between the versions that I new worked and didn't work.

I eventually found the culprit. On the working kernel versions there is
a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
Although the commit message talks about ARM, it seems that we need this for
PowerPC too (I guess, any PowerPC with the "nohash" MMU?).

Could you please add this commit back to the RT branch? I'm not sure how
to find out the history of this commit. For instance, why has it been
removed from the RT patchset? How are these things tracked, generally?

Best regards,
Mark

On Fri, 29 May 2020 at 15:14, Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote:
> > The easiest way we have found to reproduce the crash is to repeatedly
> > insert and then remove a module. The crash then appears to be related
> > to either paging in the module or in exiting the mdev process. (The
> > crash does also happen at other times, but it is hard to reproduce
> > reliably then). This simple script will almost always crash:
> >
> > for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done
>
> So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and
> UP version). No luck. I don't have anything with real hardware.
> Could you share the .config in case this is related?
>
> > (The crc7 module is chosen as it is small and simple. Any module will
> > work / crash).
> >
> > We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels
> > do not show the problem. The v5.6 kernel does show the problem.
> > Switching of RT fixes the problem.
> >
> > I have reduced the functionality in the kernel to a bare minimum
> > (removing networking, USB and PCI, as we have some out-of-tree patches
> > in those areas) and we still get the crash.
> …
> > I have added some debugging code where the mm_struct and
> > vma_area_struct have "poision" values at the start and the end, and
> > this seems to show that the vma_area_struct is getting corrupted, but
> > I'm not able to see where.
>
> oh.
>
> > We have switched on all of the debugging that we can, including
> > KASAN, and this shows nothing.
> >
> >
> > Can anyone help us? What can we try next? Is anyone using the e500
> > with the RT kernel? Does anyone have any idea how to debug problems
> > related to the error message "Bad rss-counter state"?
> >
> > Any help or advice would be most gratefully received.
>
> I don't have any ideas. You could try to apply only a part of the RT
> patch and see if it problem is still there. If you are lucky you find
> the patch that introduces the problem. If not, the problem appears with
> the RT switch…
>
> > Many thanks,
> > Mark Marshall and Thomas Graziadei
>
> Sebastian

Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> Hi Sebastian & list,
Hi,

> I had assumed that my e-mail had got lost or overlooked, I was meaning to
> post a follow up message this week...
>
> All I could find from the debugging and tracing that we added was that
> something was going wrong with the mm data structures somewhere in the
> exec code. In the end I just spent a week or two pouring over the diffs
> of this code between the versions that I new worked and didn't work.
>
> I eventually found the culprit. On the working kernel versions there is
> a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> Although the commit message talks about ARM, it seems that we need this for
> PowerPC too (I guess, any PowerPC with the "nohash" MMU?).

Could you drop me your config, please? I need to dig here a little and I
should have seen this on qemu, right?

> Could you please add this commit back to the RT branch? I'm not sure how
> to find out the history of this commit. For instance, why has it been
> removed from the RT patchset? How are these things tracked, generally?

I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
was documented in the patch and the code that triggered the warning was
removed / reworked in commit
b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")

So it looked like no longer needed and then got dropped during the
rebase.
In order to get it back into the RT queue I need to understand why it is
required. What exactly is it fixing. Let me stare at for a little…

> Best regards,
> Mark

Sebastian

Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a little…

it used to be local_irq_disable() which then became preempt_disable()
local_irq_disable() due to ARM's limitation.

> > Best regards,
> > Mark
>
Sebastian

2020-05-29 19:08:20

by Mark Marshall

[permalink] [raw]
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

My config is attached. This is the greatly reduced config that I used
when trying to narrow down the problem. We normally have much more
enabled, but that had no effect on the bug in my testing. We do,
unfortunately, have quite a few out-of-tree patches, but they are all
in USB or Networking, which are disabled here.

I've never tried out the kernel under qemu, but I will try that next
week to see if I can reproduce the problem there. It's certainly
quite a narrow race window though, so it might behave quite
differently under qemu. In general, how reliable is qemu at showing
these kinds of problems?

Thanks,
Mark

PS.
I've also noticed that THREAD_SHIFT is set in this config. That's
because when I added lots of debug options, I got warnings about the
stack being too small. This had no impact on the bug that I had, I
increased the size of the stack, and the stack warnings stopped, but
the bug was still the same.

On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> > Hi Sebastian & list,
> Hi,
>
> > I had assumed that my e-mail had got lost or overlooked, I was meaning to
> > post a follow up message this week...
> >
> > All I could find from the debugging and tracing that we added was that
> > something was going wrong with the mm data structures somewhere in the
> > exec code. In the end I just spent a week or two pouring over the diffs
> > of this code between the versions that I new worked and didn't work.
> >
> > I eventually found the culprit. On the working kernel versions there is
> > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> > Although the commit message talks about ARM, it seems that we need this for
> > PowerPC too (I guess, any PowerPC with the "nohash" MMU?).
>
> Could you drop me your config, please? I need to dig here a little and I
> should have seen this on qemu, right?
>
> > Could you please add this commit back to the RT branch? I'm not sure how
> > to find out the history of this commit. For instance, why has it been
> > removed from the RT patchset? How are these things tracked, generally?
>
> I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
> was documented in the patch and the code that triggered the warning was
> removed / reworked in commit
> b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")
>
> So it looked like no longer needed and then got dropped during the
> rebase.
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a little…
>
> > Best regards,
> > Mark
>
> Sebastian


Attachments:
config-5.4-rt (5.02 kB)
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > In order to get it back into the RT queue I need to understand why it is
> > required. What exactly is it fixing. Let me stare at for a little…
>
> it used to be local_irq_disable() which then became preempt_disable()
> local_irq_disable() due to ARM's limitation.

Any luck on your side?

I *think* if you swap the mm assignment in exec_mmap() then it should be
gone. Basically:
| tsk->active_mm = mm;
| tsk->mm = mm;

However I think to apply something like this:

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
}
}
task_lock(tsk);
+
+ task_lock_mm();
active_mm = tsk->active_mm;
membarrier_exec_mmap(mm);
tsk->mm = mm;
tsk->active_mm = mm;
activate_mm(active_mm, mm);
+ task_unlock_mm();
+
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p)
spin_unlock(&p->alloc_lock);
}

+#ifdef CONFIG_PREEMPT_RT
+/*
+ * Protects ->mm and ->active_mm.
+ * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read the
+ * members while they are updated.
+ */
+static inline void task_lock_mm(void)
+{
+ preempt_disable();
+}
+
+static inline void task_unlock_mm(void)
+{
+ preempt_enable();
+}
+
+#else
+
+static inline void task_lock_mm(void)
+{
+}
+
+static inline void task_unlock_mm(void)
+{
+}
+#endif
+
#endif /* _LINUX_SCHED_TASK_H */
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
struct task_struct *tsk = current;

task_lock(tsk);
+ task_lock_mm();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
@@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
}
tsk->mm = mm;
switch_mm(active_mm, mm, tsk);
+ task_unlock_mm();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
struct task_struct *tsk = current;

task_lock(tsk);
+ task_lock_mm();
sync_mm_rss(mm);
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ task_unlock_mm();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(unuse_mm);
--
2.27.0

> > > Best regards,
> > > Mark

Sebastian

2020-07-10 11:09:30

by Thomas Graziadei

[permalink] [raw]
Subject: RE: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

Hi Sebastian,

thanks for looking into this.

We could reproduce the issue with QEMU.
At runtime you need to set mdev as the kernel's hotplug client (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and crc7.ko.

Swapping the mm assignment did not work -> exception after 1900 iterations
Your second suggestion with check.patch (attached to this email for completeness, only protecting the exec_mmap function) did not work eighter -> exception after 2600 iterations

Your third suggestion (a modification to the original revert) enclosed in this e-mail does seem to work. Still no problems after 30000 iterations.

By the way, as noticed in your kernel config, we would be quite interested in a gcc 9 compiler for our platform. Is there a mainline/maintained version or fork for this or another possibility to get it?

Regards,
Thomas

-----Original Message-----
From: Sebastian Andrzej Siewior [mailto:[email protected]]
Sent: Monday, July 06, 2020 6:50 PM
To: Mark Marshall <[email protected]>
Cc: linux-rt-users <[email protected]>; Mark Marshall <[email protected]>; Thomas Graziadei <[email protected]>; Thomas Gleixner <[email protected]>; [email protected]; [email protected]
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > In order to get it back into the RT queue I need to understand why
> > it is required. What exactly is it fixing. Let me stare at for a
> > little…
>
> it used to be local_irq_disable() which then became preempt_disable()
> local_irq_disable() due to ARM's limitation.

Any luck on your side?

I *think* if you swap the mm assignment in exec_mmap() then it should be gone. Basically:
| tsk->active_mm = mm;
| tsk->mm = mm;

However I think to apply something like this:

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
}
}
task_lock(tsk);
+
+ task_lock_mm();
active_mm = tsk->active_mm;
membarrier_exec_mmap(mm);
tsk->mm = mm;
tsk->active_mm = mm;
activate_mm(active_mm, mm);
+ task_unlock_mm();
+
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p)
spin_unlock(&p->alloc_lock);
}

+#ifdef CONFIG_PREEMPT_RT
+/*
+ * Protects ->mm and ->active_mm.
+ * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read
+the
+ * members while they are updated.
+ */
+static inline void task_lock_mm(void)
+{
+ preempt_disable();
+}
+
+static inline void task_unlock_mm(void) {
+ preempt_enable();
+}
+
+#else
+
+static inline void task_lock_mm(void)
+{
+}
+
+static inline void task_unlock_mm(void) { } #endif
+
#endif /* _LINUX_SCHED_TASK_H */
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
struct task_struct *tsk = current;

task_lock(tsk);
+ task_lock_mm();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
@@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
}
tsk->mm = mm;
switch_mm(active_mm, mm, tsk);
+ task_unlock_mm();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
struct task_struct *tsk = current;

task_lock(tsk);
+ task_lock_mm();
sync_mm_rss(mm);
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ task_unlock_mm();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(unuse_mm);
--
2.27.0

> > > Best regards,
> > > Mark

Sebastian


Attachments:
check.patch (437.00 B)
check.patch

2020-08-12 13:01:12

by Thomas Graziadei

[permalink] [raw]
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

Hi Sebastian,

any progress on your side?

Do you think the patch could be applied for the next versions?

Regards,
Thomas

On Fri, 2020-07-10 at 10:59 +0000, Thomas Graziadei wrote:
> Hi Sebastian,
>
> thanks for looking into this.
>
> We could reproduce the issue with QEMU.
> At runtime you need to set mdev as the kernel's hotplug client
> (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like
> (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and
> crc7.ko.
>
> Swapping the mm assignment did not work -> exception after 1900
> iterations
> Your second suggestion with check.patch (attached to this email for
> completeness, only protecting the exec_mmap function) did not work
> eighter -> exception after 2600 iterations
>
> Your third suggestion (a modification to the original revert)
> enclosed in this e-mail does seem to work. Still no problems after
> 30000 iterations.
>
> By the way, as noticed in your kernel config, we would be quite
> interested in a gcc 9 compiler for our platform. Is there a
> mainline/maintained version or fork for this or another possibility
> to get it?
>
> Regards,
> Thomas
>
> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:[email protected]]
> Sent: Monday, July 06, 2020 6:50 PM
> To: Mark Marshall <[email protected]>
> Cc: linux-rt-users <[email protected]>; Mark Marshall <
> [email protected]>; Thomas Graziadei <
> [email protected]>; Thomas Gleixner <
> [email protected]>; [email protected];
> [email protected]
> Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17
> and PowerPC e500
>
> On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > > In order to get it back into the RT queue I need to understand
> > > why
> > > it is required. What exactly is it fixing. Let me stare at for a
> > > little…
> >
> > it used to be local_irq_disable() which then became
> > preempt_disable()
> > local_irq_disable() due to ARM's limitation.
>
> Any luck on your side?
>
> I *think* if you swap the mm assignment in exec_mmap() then it should
> be gone. Basically:
> > tsk->active_mm = mm;
> > tsk->mm = mm;
>
> However I think to apply something like this:
>
> diff --git a/fs/exec.c b/fs/exec.c
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
> }
> }
> task_lock(tsk);
> +
> + task_lock_mm();
> active_mm = tsk->active_mm;
> membarrier_exec_mmap(mm);
> tsk->mm = mm;
> tsk->active_mm = mm;
> activate_mm(active_mm, mm);
> + task_unlock_mm();
> +
> tsk->mm->vmacache_seqnum = 0;
> vmacache_flush(tsk);
> task_unlock(tsk);
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -176,4 +176,31 @@ static inline void task_unlock(struct
> task_struct *p)
> spin_unlock(&p->alloc_lock);
> }
>
> +#ifdef CONFIG_PREEMPT_RT
> +/*
> + * Protects ->mm and ->active_mm.
> + * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not
> read
> +the
> + * members while they are updated.
> + */
> +static inline void task_lock_mm(void)
> +{
> + preempt_disable();
> +}
> +
> +static inline void task_unlock_mm(void) {
> + preempt_enable();
> +}
> +
> +#else
> +
> +static inline void task_lock_mm(void)
> +{
> +}
> +
> +static inline void task_unlock_mm(void) { } #endif
> +
> #endif /* _LINUX_SCHED_TASK_H */
> diff --git a/mm/mmu_context.c b/mm/mmu_context.c
> --- a/mm/mmu_context.c
> +++ b/mm/mmu_context.c
> @@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
> struct task_struct *tsk = current;
>
> task_lock(tsk);
> + task_lock_mm();
> active_mm = tsk->active_mm;
> if (active_mm != mm) {
> mmgrab(mm);
> @@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
> }
> tsk->mm = mm;
> switch_mm(active_mm, mm, tsk);
> + task_unlock_mm();
> task_unlock(tsk);
> #ifdef finish_arch_post_lock_switch
> finish_arch_post_lock_switch();
> @@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
> struct task_struct *tsk = current;
>
> task_lock(tsk);
> + task_lock_mm();
> sync_mm_rss(mm);
> tsk->mm = NULL;
> /* active_mm is still 'mm' */
> enter_lazy_tlb(mm, tsk);
> + task_unlock_mm();
> task_unlock(tsk);
> }
> EXPORT_SYMBOL_GPL(unuse_mm);
> --
> 2.27.0
>
> > > > Best regards,
> > > > Mark
>
> Sebastian

Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote:
> Hi Sebastian,
Hi Thomas,

> any progress on your side?

due to lack of time none. But I am on it…

> Do you think the patch could be applied for the next versions?

So I had a theory why it happens but then you said no so now I need
to figure out why it happens so I can write it in the changelog.

I believe you made it happen in qemu and you sent a .config and
everything so I will stare into it as soon as I can.

> Regards,
> Thomas

Sebastian

Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote:
> Hi Sebastian,
Hi Thomas,

> any progress on your side?
>
> Do you think the patch could be applied for the next versions?

Yes. The ->active_mm change needs to be protected against scheduling
regardless of the arch/mmu. Otherwise the mm will be put twice. For this
to trigger you need to exec from a kernel thread and get preempted.
This will be addressed in use_mm() by commit
38cf307c1f201 ("mm: fix kthread_use_mm() vs TLB invalidate")

which is in v5.9-rc1 and exec_mmap() is under discussion at
https://lore.kernel.org/linux-arch/[email protected]/

> Regards,
> Thomas

Sebastian