Subject: Re: [PATCH] kasan: Fix sleeping function called from invalid context in PREEMPT_RT

On 2022-04-01 17:10:06 [+0800], Zqiang wrote:
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> ...........
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x60/0x8c
> dump_stack+0x10/0x12
> __might_resched.cold+0x13b/0x173
> rt_spin_lock+0x5b/0xf0
> ___cache_free+0xa5/0x180
> qlist_free_all+0x7a/0x160
> per_cpu_remove_cache+0x5f/0x70
> smp_call_function_many_cond+0x4c4/0x4f0
> on_each_cpu_cond_mask+0x49/0xc0
> kasan_quarantine_remove_cache+0x54/0xf0
> kasan_cache_shrink+0x9/0x10
> kmem_cache_shrink+0x13/0x20
> acpi_os_purge_cache+0xe/0x20
> acpi_purge_cached_objects+0x21/0x6d
> acpi_initialize_objects+0x15/0x3b
> acpi_init+0x130/0x5ba
> do_one_initcall+0xe5/0x5b0
> kernel_init_freeable+0x34f/0x3ad
> kernel_init+0x1e/0x140
> ret_from_fork+0x22/0x30
>
> When the kmem_cache_shrink() be called, the IPI was triggered, the
> ___cache_free() is called in IPI interrupt context, the local lock
> or spin lock will be acquired. on PREEMPT_RT kernel, these lock is
> replaced with sleepbale rt spin lock, so the above problem is triggered.
> fix it by migrating the release action from the IPI interrupt context
> to the task context on RT kernel.

I haven't seen that while playing with kasan. Is this new?
Could we fix in a way that we don't involve freeing memory from in-IRQ?
This could trigger a lockdep splat if the local-lock in SLUB is acquired
from in-IRQ context on !PREEMPT_RT.

> Signed-off-by: Zqiang <[email protected]>

Sebastian


2022-04-01 15:27:52

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] kasan: Fix sleeping function called from invalid context in PREEMPT_RT

On 2022-04-01 17:10:06 [+0800], Zqiang wrote:
> BUG: sleeping function called from invalid context at
> kernel/locking/spinlock_rt.c:46
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name:
> swapper/0
> preempt_count: 1, expected: 0
> ...........
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt
> #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
> <TASK>
> dump_stack_lvl+0x60/0x8c
> dump_stack+0x10/0x12
> __might_resched.cold+0x13b/0x173
> rt_spin_lock+0x5b/0xf0
> ___cache_free+0xa5/0x180
> qlist_free_all+0x7a/0x160
> per_cpu_remove_cache+0x5f/0x70
> smp_call_function_many_cond+0x4c4/0x4f0
> on_each_cpu_cond_mask+0x49/0xc0
> kasan_quarantine_remove_cache+0x54/0xf0
> kasan_cache_shrink+0x9/0x10
> kmem_cache_shrink+0x13/0x20
> acpi_os_purge_cache+0xe/0x20
> acpi_purge_cached_objects+0x21/0x6d
> acpi_initialize_objects+0x15/0x3b
> acpi_init+0x130/0x5ba
> do_one_initcall+0xe5/0x5b0
> kernel_init_freeable+0x34f/0x3ad
> kernel_init+0x1e/0x140
> ret_from_fork+0x22/0x30
>
> When the kmem_cache_shrink() be called, the IPI was triggered, the
> ___cache_free() is called in IPI interrupt context, the local lock or
> spin lock will be acquired. on PREEMPT_RT kernel, these lock is
> replaced with sleepbale rt spin lock, so the above problem is triggered.
> fix it by migrating the release action from the IPI interrupt context
> to the task context on RT kernel.

>I haven't seen that while playing with kasan. Is this new?
>Could we fix in a way that we don't involve freeing memory from in-IRQ?
>This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT.

Hi, I will move qlist_free_all() from IPI context to task context,
This operation and the next release members
in the quarantine pool operate similarly

I don't know the phenomenon you described. Can you explain it in detail?

Thanks
Zqiang


> Signed-off-by: Zqiang <[email protected]>

Sebastian

Subject: Re: [PATCH] kasan: Fix sleeping function called from invalid context in PREEMPT_RT

On 2022-04-01 10:10:38 [+0000], Zhang, Qiang1 wrote:
> >Could we fix in a way that we don't involve freeing memory from in-IRQ?
> >This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT.
>
> Hi, I will move qlist_free_all() from IPI context to task context,
> This operation and the next release members
> in the quarantine pool operate similarly
>
> I don't know the phenomenon you described. Can you explain it in detail?

If you mean by phenomenon my second sentence then the kernel option
CONFIG_PROVE_RAW_LOCK_NESTING will trigger on !PREEMPT_RT in a code
sequence like
raw_spin_lock()
spin_lock();

which is wrong on PREEMPT_RT. So we have a warning on both
configurations.
The call chain in your case will probably not lead to a warning since
there is no raw_spinlock_t involved within the IPI call. We worked on
avoiding memory allocation and freeing from in-IRQ context therefore I
would prefer to have something that works for both and not just ifdef
around the RT-case.

> Thanks
> Zqiang

Sebastian

2022-04-05 01:45:51

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] kasan: Fix sleeping function called from invalid context in PREEMPT_RT

On Fri, 2022-04-01 at 11:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2022-04-01 17:10:06 [+0800], Zqiang wrote:
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 1, expected: 0
> > ...........
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x60/0x8c
> > dump_stack+0x10/0x12
> >  __might_resched.cold+0x13b/0x173
> > rt_spin_lock+0x5b/0xf0
> >  ___cache_free+0xa5/0x180
> > qlist_free_all+0x7a/0x160
> > per_cpu_remove_cache+0x5f/0x70
> > smp_call_function_many_cond+0x4c4/0x4f0
> > on_each_cpu_cond_mask+0x49/0xc0
> > kasan_quarantine_remove_cache+0x54/0xf0
> > kasan_cache_shrink+0x9/0x10
> > kmem_cache_shrink+0x13/0x20
> > acpi_os_purge_cache+0xe/0x20
> > acpi_purge_cached_objects+0x21/0x6d
> > acpi_initialize_objects+0x15/0x3b
> > acpi_init+0x130/0x5ba
> > do_one_initcall+0xe5/0x5b0
> > kernel_init_freeable+0x34f/0x3ad
> > kernel_init+0x1e/0x140
> > ret_from_fork+0x22/0x30
> >
> > When the kmem_cache_shrink() be called, the IPI was triggered, the
> > ___cache_free() is called in IPI interrupt context, the local lock
> > or spin lock will be acquired. on PREEMPT_RT kernel, these lock is
> > replaced with sleepbale rt spin lock, so the above problem is triggered.
> > fix it by migrating the release action from the IPI interrupt context
> > to the task context on RT kernel.
>
> I haven't seen that while playing with kasan. Is this new?

Don't think so, the rock below was apparently first tossed at 5.12.

---
lib/stackdepot.c | 3 +++
mm/kasan/quarantine.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -375,6 +375,9 @@ depot_stack_handle_t __stack_depot_save(
if (found)
goto exit;

+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && can_alloc && !preemptible())
+ can_alloc = false;
+
/*
* Check if the current or the next stack slab need to be initialized.
* If so, allocate the memory - we won't be able to do that under the
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -19,6 +19,9 @@
#include <linux/srcu.h>
#include <linux/string.h>
#include <linux/types.h>
+#include <linux/cpu.h>
+#include <linux/mutex.h>
+#include <linux/workqueue.h>
#include <linux/cpuhotplug.h>

#include "../slab.h"
@@ -319,6 +322,48 @@ static void per_cpu_remove_cache(void *a
qlist_free_all(&to_free, cache);
}

+#ifdef CONFIG_PREEMPT_RT
+struct remove_cache_work {
+ struct work_struct work;
+ struct kmem_cache *cache;
+};
+
+static DEFINE_MUTEX(remove_caches_lock);
+static DEFINE_PER_CPU(struct remove_cache_work, remove_cache_work);
+
+static void per_cpu_remove_cache_work(struct work_struct *w)
+{
+ struct remove_cache_work *rcw;
+
+ rcw = container_of(w, struct remove_cache_work, work);
+ per_cpu_remove_cache(rcw->cache);
+}
+
+static void per_cpu_remove_caches_sync(struct kmem_cache *cache)
+{
+ struct remove_cache_work *rcw;
+ unsigned int cpu;
+
+ cpus_read_lock();
+ mutex_lock(&remove_caches_lock);
+
+ for_each_online_cpu(cpu) {
+ rcw = &per_cpu(remove_cache_work, cpu);
+ INIT_WORK(&rcw->work, per_cpu_remove_cache_work);
+ rcw->cache = cache;
+ schedule_work_on(cpu, &rcw->work);
+ }
+
+ for_each_online_cpu(cpu) {
+ rcw = &per_cpu(remove_cache_work, cpu);
+ flush_work(&rcw->work);
+ }
+
+ mutex_unlock(&remove_caches_lock);
+ cpus_read_unlock();
+}
+#endif
+
/* Free all quarantined objects belonging to cache. */
void kasan_quarantine_remove_cache(struct kmem_cache *cache)
{
@@ -332,7 +377,11 @@ void kasan_quarantine_remove_cache(struc
* achieves the first goal, while synchronize_srcu() achieves the
* second.
*/
+#ifndef CONFIG_PREEMPT_RT
on_each_cpu(per_cpu_remove_cache, cache, 1);
+#else
+ per_cpu_remove_caches_sync(cache);
+#endif

raw_spin_lock_irqsave(&quarantine_lock, flags);
for (i = 0; i < QUARANTINE_BATCHES; i++) {