LinuxLists.cc - Question on KASAN calltrace record in RT

2021-04-06 14:10:52

Subject: Question on KASAN calltrace record in RT

Hello everyone

In RT system, after Andrew test, found the following calltrace ,
in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
will trigger following calltrace.

maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.

Is there a better solution ??
Thanks
Qiang

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951
[ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, name: mount
[ 14.522304] Call Trace:
[ 14.522306] dump_stack+0x92/0xc1
[ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef
[ 14.522319] rt_spin_lock+0x3e/0xc0
[ 14.522329] local_lock_acquire+0x52/0x3c0
[ 14.522332] get_page_from_freelist+0x176c/0x3fd0
[ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0
[ 14.522559] stack_depot_save+0x3a1/0x470
[ 14.522564] kasan_save_stack+0x2f/0x40
[ 14.523575] kasan_record_aux_stack+0xa3/0xb0
[ 14.523580] insert_work+0x48/0x340
[ 14.523589] __queue_work+0x430/0x1280
[ 14.523595] mod_delayed_work_on+0x98/0xf0
[ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20
[ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0
[ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470
[ 14.523633] blk_mq_submit_bio+0xd2a/0x2330
[ 14.523675] submit_bio_noacct+0x8aa/0xfe0
[ 14.523693] submit_bio+0xf0/0x550
[ 14.523714] submit_bio_wait+0xfe/0x200
[ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs]
[ 14.523831] xlog_do_io+0x155/0x320 [xfs]
[ 14.524032] xlog_bread+0x23/0xb0 [xfs]
[ 14.524133] xlog_find_head+0x131/0x8b0 [xfs]
[ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs]
[ 14.524828] xfs_log_mount+0x379/0x660 [xfs]
[ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs]
[ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs]
[ 14.525522] get_tree_bdev+0x404/0x680
[ 14.525622] vfs_get_tree+0x89/0x2d0
[ 14.525628] path_mount+0xeb2/0x19d0
[ 14.525648] do_mount+0xcb/0xf0
[ 14.525665] __x64_sys_mount+0x162/0x1b0
[ 14.525670] do_syscall_64+0x33/0x40
[ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 14.525677] RIP: 0033:0x7fd6c15eaade

2021-04-13 21:10:08

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
>
> Hello everyone
>
> In RT system, after Andrew test, found the following calltrace ,
> in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> will trigger following calltrace.
>
> maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
>
> Is there a better solution ？

Hi Qiang,

Adding 2 full stacks per heap object can increase memory usage too much.
The stackdepot has a preallocation mechanism, I would start with
adding interrupts check here:
https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
and just not do preallocation in interrupt context. This will solve
the problem, right?

> Thanks
> Qiang
>
> BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951
> [ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, name: mount
> [ 14.522304] Call Trace:
> [ 14.522306] dump_stack+0x92/0xc1
> [ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef
> [ 14.522319] rt_spin_lock+0x3e/0xc0
> [ 14.522329] local_lock_acquire+0x52/0x3c0
> [ 14.522332] get_page_from_freelist+0x176c/0x3fd0
> [ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0
> [ 14.522559] stack_depot_save+0x3a1/0x470
> [ 14.522564] kasan_save_stack+0x2f/0x40
> [ 14.523575] kasan_record_aux_stack+0xa3/0xb0
> [ 14.523580] insert_work+0x48/0x340
> [ 14.523589] __queue_work+0x430/0x1280
> [ 14.523595] mod_delayed_work_on+0x98/0xf0
> [ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20
> [ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0
> [ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470
> [ 14.523633] blk_mq_submit_bio+0xd2a/0x2330
> [ 14.523675] submit_bio_noacct+0x8aa/0xfe0
> [ 14.523693] submit_bio+0xf0/0x550
> [ 14.523714] submit_bio_wait+0xfe/0x200
> [ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs]
> [ 14.523831] xlog_do_io+0x155/0x320 [xfs]
> [ 14.524032] xlog_bread+0x23/0xb0 [xfs]
> [ 14.524133] xlog_find_head+0x131/0x8b0 [xfs]
> [ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs]
> [ 14.524828] xfs_log_mount+0x379/0x660 [xfs]
> [ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs]
> [ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs]
> [ 14.525522] get_tree_bdev+0x404/0x680
> [ 14.525622] vfs_get_tree+0x89/0x2d0
> [ 14.525628] path_mount+0xeb2/0x19d0
> [ 14.525648] do_mount+0xcb/0xf0
> [ 14.525665] __x64_sys_mount+0x162/0x1b0
> [ 14.525670] do_syscall_64+0x33/0x40
> [ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 14.525677] RIP: 0033:0x7fd6c15eaade

2021-04-14 13:40:04

by Mike Galbraith

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote:
> On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
> >
> > Hello everyone
> >
> > In RT system, after Andrew test, found the following calltrace ,
> > in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> > rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> > will trigger following calltrace.
> >
> > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
> >
> > Is there a better solution ？
>
> Hi Qiang,
>
> Adding 2 full stacks per heap object can increase memory usage too much.
> The stackdepot has a preallocation mechanism, I would start with
> adding interrupts check here:
> https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> and just not do preallocation in interrupt context. This will solve
> the problem, right?

Hm, this thing might actually be (sorta?) working, modulo one startup
gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut
up when told I don't care given kdump has worked just fine for ages:),
but no more might_sleep() gripeage.

CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=1
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_MODULE_TEST is not set

---
lib/stackdepot.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
static int depot_index;
static int next_slab_inited;
static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);

static bool init_stack_slab(void **prealloc)
{
@@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
- u32 hash;
+ u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible();

if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
@@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
* The smp_load_acquire() here pairs with smp_store_release() to
* |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
*/
- if (unlikely(!smp_load_acquire(&next_slab_inited))) {
+ if (unlikely(!smp_load_acquire(&next_slab_inited) && may_prealloc)) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
prealloc = page_address(page);
}

- spin_lock_irqsave(&depot_lock, flags);
+ raw_spin_lock_irqsave(&depot_lock, flags);

found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
WARN_ON(!init_stack_slab(&prealloc));
}

- spin_unlock_irqrestore(&depot_lock, flags);
+ raw_spin_unlock_irqrestore(&depot_lock, flags);
exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */

[ 0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943
[ 0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
[ 0.692442] Preemption disabled at:
[ 0.692443] [<ffffffff811a1510>] on_each_cpu_cond_mask+0x30/0xb0
[ 0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5
[ 0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 0.692456] Call Trace:
[ 0.692458] ? on_each_cpu_cond_mask+0x30/0xb0
[ 0.692462] dump_stack+0x8a/0xb5
[ 0.692467] ___might_sleep.cold+0xfe/0x112
[ 0.692471] rt_spin_lock+0x1c/0x60
[ 0.692475] free_unref_page+0x117/0x3c0
[ 0.692481] qlist_free_all+0x60/0xd0
[ 0.692485] per_cpu_remove_cache+0x5b/0x70
[ 0.692488] smp_call_function_many_cond+0x185/0x3d0
[ 0.692492] ? qlist_move_cache+0xe0/0xe0
[ 0.692495] ? qlist_move_cache+0xe0/0xe0
[ 0.692497] on_each_cpu_cond_mask+0x44/0xb0
[ 0.692501] kasan_quarantine_remove_cache+0x52/0xf0
[ 0.692505] ? acpi_bus_init+0x183/0x183
[ 0.692510] kmem_cache_shrink+0xe/0x20
[ 0.692513] acpi_os_purge_cache+0xa/0x10
[ 0.692517] acpi_purge_cached_objects+0x1d/0x68
[ 0.692522] acpi_initialize_objects+0x11/0x39
[ 0.692524] ? acpi_ev_install_xrupt_handlers+0x6f/0x7c
[ 0.692529] acpi_bus_init+0x50/0x183
[ 0.692532] acpi_init+0xce/0x182
[ 0.692536] ? acpi_bus_init+0x183/0x183
[ 0.692539] ? intel_idle_init+0x36d/0x36d
[ 0.692543] ? acpi_bus_init+0x183/0x183
[ 0.692546] do_one_initcall+0x71/0x300
[ 0.692550] ? trace_event_raw_event_initcall_finish+0x120/0x120
[ 0.692553] ? parameq+0x90/0x90
[ 0.692556] ? __wake_up_common+0x1e0/0x200
[ 0.692560] ? kasan_unpoison+0x21/0x50
[ 0.692562] ? __kasan_slab_alloc+0x24/0x70
[ 0.692567] do_initcalls+0xff/0x129
[ 0.692571] kernel_init_freeable+0x19c/0x1ce
[ 0.692574] ? rest_init+0xc6/0xc6
[ 0.692577] kernel_init+0xd/0x11a
[ 0.692580] ret_from_fork+0x1f/0x30

[ 15.428008] ==================================================================
[ 15.428011] BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428018] Write of size 8 at addr ffffc90000426008 by task kexec/1187
[ 15.428022] CPU: 2 PID: 1187 Comm: kexec Tainted: G W E 5.12.0.g2afefec-tip-rt #5
[ 15.428025] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 15.428027] Call Trace:
[ 15.428029] ? crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428032] dump_stack+0x8a/0xb5
[ 15.428037] print_address_description.constprop.0+0x16/0xa0
[ 15.428044] kasan_report+0xc4/0x100
[ 15.428047] ? crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428050] crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428053] ? strcmp+0x2e/0x50
[ 15.428057] ? native_machine_crash_shutdown+0x240/0x240
[ 15.428059] ? kexec_purgatory_find_symbol.isra.0+0x145/0x1a0
[ 15.428066] setup_boot_parameters+0x181/0x5c0
[ 15.428069] bzImage64_load+0x6b5/0x740
[ 15.428072] ? bzImage64_probe+0x140/0x140
[ 15.428075] ? iov_iter_kvec+0x5f/0x70
[ 15.428080] ? rw_verify_area+0x80/0x80
[ 15.428087] ? __might_sleep+0x31/0xd0
[ 15.428091] ? __might_sleep+0x31/0xd0
[ 15.428094] ? ___might_sleep+0xc9/0xe0
[ 15.428096] ? bzImage64_probe+0x140/0x140
[ 15.428099] arch_kexec_kernel_image_load+0x102/0x130
[ 15.428102] kimage_file_alloc_init+0xda/0x290
[ 15.428107] __do_sys_kexec_file_load+0x21f/0x390
[ 15.428110] ? __x64_sys_open+0x100/0x100
[ 15.428113] ? kexec_calculate_store_digests+0x390/0x390
[ 15.428117] ? rcu_nocb_flush_deferred_wakeup+0x36/0x50
[ 15.428122] do_syscall_64+0x3d/0x80
[ 15.428127] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 15.428132] RIP: 0033:0x7f46ad026759
[ 15.428135] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0f d7 2b 00 f7 d8 64 89 01 48
[ 15.428137] RSP: 002b:00007ffcf6f96788 EFLAGS: 00000206 ORIG_RAX: 0000000000000140
[ 15.428141] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f46ad026759
[ 15.428143] RDX: 0000000000000182 RSI: 0000000000000005 RDI: 0000000000000003
[ 15.428145] RBP: 00007ffcf6f96a28 R08: 0000000000000002 R09: 0000000000000000
[ 15.428146] R10: 0000000000b0d5e0 R11: 0000000000000206 R12: 0000000000000004
[ 15.428148] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[ 15.428152] Memory state around the buggy address:
[ 15.428164] ffffc90000425f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428166] ffffc90000425f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428168] >ffffc90000426000: 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428169] ^
[ 15.428171] ffffc90000426080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428172] ffffc90000426100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428173] ==================================================================
[ 15.428174] Disabling lock debugging due to kernel taint

kasan: stop grumbling about CRASH_DUMP

Signed-off-by: Mike Galbraith <[email protected]>
---
arch/x86/kernel/Makefile | 1 +
kernel/Makefile | 1 +
2 files changed, 2 insertions(+)

--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
+KASAN_SANITIZE_crash.o := n
obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
obj-y += kprobes/
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
obj-$(CONFIG_KEXEC) += kexec.o
obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
+KASAN_SANITIZE_kexec_file.o := n
obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
obj-$(CONFIG_COMPAT) += compat.o

2021-04-14 13:41:51

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith <[email protected]> wrote:
>
> On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
> > >
> > > Hello everyone
> > >
> > > In RT system, after Andrew test, found the following calltrace ,
> > > in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> > > rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> > > will trigger following calltrace.
> > >
> > > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
> > >
> > > Is there a better solution ？
> >
> > Hi Qiang,
> >
> > Adding 2 full stacks per heap object can increase memory usage too much.
> > The stackdepot has a preallocation mechanism, I would start with
> > adding interrupts check here:
> > https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> > and just not do preallocation in interrupt context. This will solve
> > the problem, right?
>
> Hm, this thing might actually be (sorta?) working, modulo one startup
> gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut
> up when told I don't care given kdump has worked just fine for ages:),
> but no more might_sleep() gripeage.
>
>
> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> CONFIG_HAVE_ARCH_KASAN=y
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> CONFIG_KASAN_OUTLINE=y
> # CONFIG_KASAN_INLINE is not set
> CONFIG_KASAN_STACK=1
> CONFIG_KASAN_VMALLOC=y
> # CONFIG_KASAN_MODULE_TEST is not set
>
> ---
> lib/stackdepot.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
> static int depot_index;
> static int next_slab_inited;
> static size_t depot_offset;
> -static DEFINE_SPINLOCK(depot_lock);
> +static DEFINE_RAW_SPINLOCK(depot_lock);
>
> static bool init_stack_slab(void **prealloc)
> {
> @@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
> struct page *page = NULL;
> void *prealloc = NULL;
> unsigned long flags;
> - u32 hash;
> + u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible();
>
> if (unlikely(nr_entries == 0) || stack_depot_disable)
> goto fast_exit;
> @@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
> * The smp_load_acquire() here pairs with smp_store_release() to
> * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
> */
> - if (unlikely(!smp_load_acquire(&next_slab_inited))) {
> + if (unlikely(!smp_load_acquire(&next_slab_inited) && may_prealloc)) {
> /*
> * Zero out zone modifiers, as we don't have specific zone
> * requirements. Keep the flags related to allocation in atomic
> @@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
> prealloc = page_address(page);
> }
>
> - spin_lock_irqsave(&depot_lock, flags);
> + raw_spin_lock_irqsave(&depot_lock, flags);
>
> found = find_stack(*bucket, entries, nr_entries, hash);
> if (!found) {
> @@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
> WARN_ON(!init_stack_slab(&prealloc));
> }
>
> - spin_unlock_irqrestore(&depot_lock, flags);
> + raw_spin_unlock_irqrestore(&depot_lock, flags);
> exit:
> if (prealloc) {
> /* Nobody used this memory, ok to free it. */
>
> [ 0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943
> [ 0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> [ 0.692442] Preemption disabled at:
> [ 0.692443] [<ffffffff811a1510>] on_each_cpu_cond_mask+0x30/0xb0
> [ 0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5
> [ 0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 0.692456] Call Trace:
> [ 0.692458] ? on_each_cpu_cond_mask+0x30/0xb0
> [ 0.692462] dump_stack+0x8a/0xb5
> [ 0.692467] ___might_sleep.cold+0xfe/0x112
> [ 0.692471] rt_spin_lock+0x1c/0x60

HI Mike,

If freeing pages from smp_call_function is not OK, then perhaps we
need just to collect the objects to be freed to the task/CPU that
executes kasan_quarantine_remove_cache and it will free them (we know
it can free objects).

> [ 0.692475] free_unref_page+0x117/0x3c0
> [ 0.692481] qlist_free_all+0x60/0xd0
> [ 0.692485] per_cpu_remove_cache+0x5b/0x70
> [ 0.692488] smp_call_function_many_cond+0x185/0x3d0
> [ 0.692492] ? qlist_move_cache+0xe0/0xe0
> [ 0.692495] ? qlist_move_cache+0xe0/0xe0
> [ 0.692497] on_each_cpu_cond_mask+0x44/0xb0
> [ 0.692501] kasan_quarantine_remove_cache+0x52/0xf0
> [ 0.692505] ? acpi_bus_init+0x183/0x183
> [ 0.692510] kmem_cache_shrink+0xe/0x20
> [ 0.692513] acpi_os_purge_cache+0xa/0x10
> [ 0.692517] acpi_purge_cached_objects+0x1d/0x68
> [ 0.692522] acpi_initialize_objects+0x11/0x39
> [ 0.692524] ? acpi_ev_install_xrupt_handlers+0x6f/0x7c
> [ 0.692529] acpi_bus_init+0x50/0x183
> [ 0.692532] acpi_init+0xce/0x182
> [ 0.692536] ? acpi_bus_init+0x183/0x183
> [ 0.692539] ? intel_idle_init+0x36d/0x36d
> [ 0.692543] ? acpi_bus_init+0x183/0x183
> [ 0.692546] do_one_initcall+0x71/0x300
> [ 0.692550] ? trace_event_raw_event_initcall_finish+0x120/0x120
> [ 0.692553] ? parameq+0x90/0x90
> [ 0.692556] ? __wake_up_common+0x1e0/0x200
> [ 0.692560] ? kasan_unpoison+0x21/0x50
> [ 0.692562] ? __kasan_slab_alloc+0x24/0x70
> [ 0.692567] do_initcalls+0xff/0x129
> [ 0.692571] kernel_init_freeable+0x19c/0x1ce
> [ 0.692574] ? rest_init+0xc6/0xc6
> [ 0.692577] kernel_init+0xd/0x11a
> [ 0.692580] ret_from_fork+0x1f/0x30
>
> [ 15.428008] ==================================================================
> [ 15.428011] BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0

This looks like a genuine kernel bug on first glance. I think it needs
to be fixed rather than ignored.

> [ 15.428018] Write of size 8 at addr ffffc90000426008 by task kexec/1187
> [ 15.428022] CPU: 2 PID: 1187 Comm: kexec Tainted: G W E 5.12.0.g2afefec-tip-rt #5
> [ 15.428025] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 15.428027] Call Trace:
> [ 15.428029] ? crash_setup_memmap_entries+0x17e/0x3a0
> [ 15.428032] dump_stack+0x8a/0xb5
> [ 15.428037] print_address_description.constprop.0+0x16/0xa0
> [ 15.428044] kasan_report+0xc4/0x100
> [ 15.428047] ? crash_setup_memmap_entries+0x17e/0x3a0
> [ 15.428050] crash_setup_memmap_entries+0x17e/0x3a0
> [ 15.428053] ? strcmp+0x2e/0x50
> [ 15.428057] ? native_machine_crash_shutdown+0x240/0x240
> [ 15.428059] ? kexec_purgatory_find_symbol.isra.0+0x145/0x1a0
> [ 15.428066] setup_boot_parameters+0x181/0x5c0
> [ 15.428069] bzImage64_load+0x6b5/0x740
> [ 15.428072] ? bzImage64_probe+0x140/0x140
> [ 15.428075] ? iov_iter_kvec+0x5f/0x70
> [ 15.428080] ? rw_verify_area+0x80/0x80
> [ 15.428087] ? __might_sleep+0x31/0xd0
> [ 15.428091] ? __might_sleep+0x31/0xd0
> [ 15.428094] ? ___might_sleep+0xc9/0xe0
> [ 15.428096] ? bzImage64_probe+0x140/0x140
> [ 15.428099] arch_kexec_kernel_image_load+0x102/0x130
> [ 15.428102] kimage_file_alloc_init+0xda/0x290
> [ 15.428107] __do_sys_kexec_file_load+0x21f/0x390
> [ 15.428110] ? __x64_sys_open+0x100/0x100
> [ 15.428113] ? kexec_calculate_store_digests+0x390/0x390
> [ 15.428117] ? rcu_nocb_flush_deferred_wakeup+0x36/0x50
> [ 15.428122] do_syscall_64+0x3d/0x80
> [ 15.428127] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 15.428132] RIP: 0033:0x7f46ad026759
> [ 15.428135] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0f d7 2b 00 f7 d8 64 89 01 48
> [ 15.428137] RSP: 002b:00007ffcf6f96788 EFLAGS: 00000206 ORIG_RAX: 0000000000000140
> [ 15.428141] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f46ad026759
> [ 15.428143] RDX: 0000000000000182 RSI: 0000000000000005 RDI: 0000000000000003
> [ 15.428145] RBP: 00007ffcf6f96a28 R08: 0000000000000002 R09: 0000000000000000
> [ 15.428146] R10: 0000000000b0d5e0 R11: 0000000000000206 R12: 0000000000000004
> [ 15.428148] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
> [ 15.428152] Memory state around the buggy address:
> [ 15.428164] ffffc90000425f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [ 15.428166] ffffc90000425f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [ 15.428168] >ffffc90000426000: 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [ 15.428169] ^
> [ 15.428171] ffffc90000426080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [ 15.428172] ffffc90000426100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [ 15.428173] ==================================================================
> [ 15.428174] Disabling lock debugging due to kernel taint
>
> kasan: stop grumbling about CRASH_DUMP
>
> Signed-off-by: Mike Galbraith <[email protected]>
> ---
> arch/x86/kernel/Makefile | 1 +
> kernel/Makefile | 1 +
> 2 files changed, 2 insertions(+)
>
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -105,6 +105,7 @@ obj-$(CONFIG_X86_TSC) += trace_clock.o
> obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
> obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
> obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
> +KASAN_SANITIZE_crash.o := n
> obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o
> obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
> obj-y += kprobes/
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -72,6 +72,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
> obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
> obj-$(CONFIG_KEXEC) += kexec.o
> obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
> +KASAN_SANITIZE_kexec_file.o := n
> obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
> obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
> obj-$(CONFIG_COMPAT) += compat.o
>

2021-04-14 14:09:23

by Zhang, Qiang

[permalink] [raw]

Subject: 回复: Question on KASAN calltrace record in R T

________________________________________
??????: Dmitry Vyukov <[email protected]>
????ʱ??: 2021??4??13?? 23:29
?ռ???: Zhang, Qiang
????: Andrew Halaney; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
????: Re: Question on KASAN calltrace record in RT

[Please note: This e-mail is from an EXTERNAL e-mail address]

On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
>
> Hello everyone
>
> In RT system, after Andrew test, found the following calltrace ,
> in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> will trigger following calltrace.
>
> maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
>
> Is there a better solution ??

>Hi Qiang,
>
>Adding 2 full stacks per heap object can increase memory usage too >much.
>The stackdepot has a preallocation mechanism, I would start with
>adding interrupts check here:
>https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
>and just not do preallocation in interrupt context. This will solve
>the problem, right?

It seems to be useful, however, there are the following situations
If there is a lot of stack information that needs to be saved in interrupts, the memory which has been allocated to hold the stack information is depletion, when need to save stack again in interrupts, there will be no memory available .

Thanks
Qiang

> Thanks
> Qiang
>
> BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951
> [ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, name: mount
> [ 14.522304] Call Trace:
> [ 14.522306] dump_stack+0x92/0xc1
> [ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef
> [ 14.522319] rt_spin_lock+0x3e/0xc0
> [ 14.522329] local_lock_acquire+0x52/0x3c0
> [ 14.522332] get_page_from_freelist+0x176c/0x3fd0
> [ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0
> [ 14.522559] stack_depot_save+0x3a1/0x470
> [ 14.522564] kasan_save_stack+0x2f/0x40
> [ 14.523575] kasan_record_aux_stack+0xa3/0xb0
> [ 14.523580] insert_work+0x48/0x340
> [ 14.523589] __queue_work+0x430/0x1280
> [ 14.523595] mod_delayed_work_on+0x98/0xf0
> [ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20
> [ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0
> [ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470
> [ 14.523633] blk_mq_submit_bio+0xd2a/0x2330
> [ 14.523675] submit_bio_noacct+0x8aa/0xfe0
> [ 14.523693] submit_bio+0xf0/0x550
> [ 14.523714] submit_bio_wait+0xfe/0x200
> [ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs]
> [ 14.523831] xlog_do_io+0x155/0x320 [xfs]
> [ 14.524032] xlog_bread+0x23/0xb0 [xfs]
> [ 14.524133] xlog_find_head+0x131/0x8b0 [xfs]
> [ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs]
> [ 14.524828] xfs_log_mount+0x379/0x660 [xfs]
> [ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs]
> [ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs]
> [ 14.525522] get_tree_bdev+0x404/0x680
> [ 14.525622] vfs_get_tree+0x89/0x2d0
> [ 14.525628] path_mount+0xeb2/0x19d0
> [ 14.525648] do_mount+0xcb/0xf0
> [ 14.525665] __x64_sys_mount+0x162/0x1b0
> [ 14.525670] do_syscall_64+0x33/0x40
> [ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 14.525677] RIP: 0033:0x7fd6c15eaade

2021-04-14 19:20:44

by Mike Galbraith

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
> On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith <[email protected]> wrote:
>
> > [ 0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943
> > [ 0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > [ 0.692442] Preemption disabled at:
> > [ 0.692443] [<ffffffff811a1510>] on_each_cpu_cond_mask+0x30/0xb0
> > [ 0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5
> > [ 0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> > [ 0.692456] Call Trace:
> > [ 0.692458] ? on_each_cpu_cond_mask+0x30/0xb0
> > [ 0.692462] dump_stack+0x8a/0xb5
> > [ 0.692467] ___might_sleep.cold+0xfe/0x112
> > [ 0.692471] rt_spin_lock+0x1c/0x60
>
> HI Mike,
>
> If freeing pages from smp_call_function is not OK, then perhaps we
> need just to collect the objects to be freed to the task/CPU that
> executes kasan_quarantine_remove_cache and it will free them (we know
> it can free objects).

Yeah, RT will have to shove freeing into preemptible context.

> >
> > [ 15.428008] ==================================================================
> > [ 15.428011] BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
>
> This looks like a genuine kernel bug on first glance. I think it needs
> to be fixed rather than ignored.

I figured KASAN probably knew what it was talking about, I just wanted
it to either go find something shiny or leave lockdep the heck alone.

-Mike

2021-04-14 19:55:49

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Wed, Apr 14, 2021 at 8:58 AM Zhang, Qiang <[email protected]> wrote:
> ________________________________________
> 发件人: Dmitry Vyukov <[email protected]>
> 发送时间: 2021年4月13日 23:29
> 收件人: Zhang, Qiang
> 抄送: Andrew Halaney; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> 主题: Re: Question on KASAN calltrace record in RT
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
> >
> > Hello everyone
> >
> > In RT system, after Andrew test, found the following calltrace ,
> > in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> > rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> > will trigger following calltrace.
> >
> > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
> >
> > Is there a better solution ？
>
> >Hi Qiang,
> >
> >Adding 2 full stacks per heap object can increase memory usage too >much.
> >The stackdepot has a preallocation mechanism, I would start with
> >adding interrupts check here:
> >https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> >and just not do preallocation in interrupt context. This will solve
> >the problem, right?
>
> It seems to be useful, however, there are the following situations
> If there is a lot of stack information that needs to be saved in interrupts, the memory which has been allocated to hold the stack information is depletion, when need to save stack again in interrupts, there will be no memory available .

Yes, this is true. This also true now because we allocate with
GFP_ATOMIC. This is deliberate design decision.
Note that a unique allocation stack is saved only once, so it's enough
to be lucky only once per stack. Also interrupts don't tend to
allocate thousands of objects. So I think all in all it should work
fine in practice.
If it turns out to be a problem, we could simply preallocate more
memory in RT config.

> Thanks
> Qiang
>
>
> > Thanks
> > Qiang
> >
> > BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951
> > [ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, name: mount
> > [ 14.522304] Call Trace:
> > [ 14.522306] dump_stack+0x92/0xc1
> > [ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef
> > [ 14.522319] rt_spin_lock+0x3e/0xc0
> > [ 14.522329] local_lock_acquire+0x52/0x3c0
> > [ 14.522332] get_page_from_freelist+0x176c/0x3fd0
> > [ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0
> > [ 14.522559] stack_depot_save+0x3a1/0x470
> > [ 14.522564] kasan_save_stack+0x2f/0x40
> > [ 14.523575] kasan_record_aux_stack+0xa3/0xb0
> > [ 14.523580] insert_work+0x48/0x340
> > [ 14.523589] __queue_work+0x430/0x1280
> > [ 14.523595] mod_delayed_work_on+0x98/0xf0
> > [ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20
> > [ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0
> > [ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470
> > [ 14.523633] blk_mq_submit_bio+0xd2a/0x2330
> > [ 14.523675] submit_bio_noacct+0x8aa/0xfe0
> > [ 14.523693] submit_bio+0xf0/0x550
> > [ 14.523714] submit_bio_wait+0xfe/0x200
> > [ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs]
> > [ 14.523831] xlog_do_io+0x155/0x320 [xfs]
> > [ 14.524032] xlog_bread+0x23/0xb0 [xfs]
> > [ 14.524133] xlog_find_head+0x131/0x8b0 [xfs]
> > [ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs]
> > [ 14.524828] xfs_log_mount+0x379/0x660 [xfs]
> > [ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs]
> > [ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs]
> > [ 14.525522] get_tree_bdev+0x404/0x680
> > [ 14.525622] vfs_get_tree+0x89/0x2d0
> > [ 14.525628] path_mount+0xeb2/0x19d0
> > [ 14.525648] do_mount+0xcb/0xf0
> > [ 14.525665] __x64_sys_mount+0x162/0x1b0
> > [ 14.525670] do_syscall_64+0x33/0x40
> > [ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 14.525677] RIP: 0033:0x7fd6c15eaade

2021-04-14 20:23:41

by Zhang, Qiang

[permalink] [raw]

Subject: 回复: Question on KASAN calltrace record in R T

________________________________________
??????: Mike Galbraith <[email protected]>
????ʱ??: 2021??4??14?? 12:00
?ռ???: Dmitry Vyukov; Zhang, Qiang
????: Andrew Halaney; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
????: Re: Question on KASAN calltrace record in RT

[Please note: This e-mail is from an EXTERNAL e-mail address]

On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote:
> On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang <[email protected]> wrote:
> >
> > Hello everyone
> >
> > In RT system, after Andrew test, found the following calltrace ,
> > in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with
> > rt_mutex in alloc_pages(), if before call this function, the irq is disabled,
> > will trigger following calltrace.
> >
> > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system.
> >
> > Is there a better solution ??
>
> Hi Qiang,
>
> Adding 2 full stacks per heap object can increase memory usage too much.
> The stackdepot has a preallocation mechanism, I would start with
> adding interrupts check here:
> https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> and just not do preallocation in interrupt context. This will solve
> the problem, right?

Hm, this thing might actually be (sorta?) working, modulo one startup
gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut
up when told I don't care given kdump has worked just fine for ages:),
but no more might_sleep() gripeage.

CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=1
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_MODULE_TEST is not set

---
lib/stackdepot.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
static int depot_index;
static int next_slab_inited;
static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);

static bool init_stack_slab(void **prealloc)
{
@@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
- u32 hash;
+ u32 hash, preemptible = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible();

if CONFIG_PREEMPT_RT is enabled and but not in preemptible, the prealloc should be allowed

should be change like this:
may_prealloc = !(IS_ENABLED(CONFIG_PREEMPT_RT) && preemptible());

Thanks
Qiang

if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
@@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
* The smp_load_acquire() here pairs with smp_store_release() to
* |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
*/
- if (unlikely(!smp_load_acquire(&next_slab_inited))) {
+ if (unlikely(!smp_load_acquire(&next_slab_inited) && may_prealloc)) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
prealloc = page_address(page);
}

- spin_lock_irqsave(&depot_lock, flags);
+ raw_spin_lock_irqsave(&depot_lock, flags);

found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
WARN_ON(!init_stack_slab(&prealloc));
}

- spin_unlock_irqrestore(&depot_lock, flags);
+ raw_spin_unlock_irqrestore(&depot_lock, flags);
exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */

[ 0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943
[ 0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
[ 0.692442] Preemption disabled at:
[ 0.692443] [<ffffffff811a1510>] on_each_cpu_cond_mask+0x30/0xb0
[ 0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5
[ 0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 0.692456] Call Trace:
[ 0.692458] ? on_each_cpu_cond_mask+0x30/0xb0
[ 0.692462] dump_stack+0x8a/0xb5
[ 0.692467] ___might_sleep.cold+0xfe/0x112
[ 0.692471] rt_spin_lock+0x1c/0x60
[ 0.692475] free_unref_page+0x117/0x3c0
[ 0.692481] qlist_free_all+0x60/0xd0
[ 0.692485] per_cpu_remove_cache+0x5b/0x70
[ 0.692488] smp_call_function_many_cond+0x185/0x3d0
[ 0.692492] ? qlist_move_cache+0xe0/0xe0
[ 0.692495] ? qlist_move_cache+0xe0/0xe0
[ 0.692497] on_each_cpu_cond_mask+0x44/0xb0
[ 0.692501] kasan_quarantine_remove_cache+0x52/0xf0
[ 0.692505] ? acpi_bus_init+0x183/0x183
[ 0.692510] kmem_cache_shrink+0xe/0x20
[ 0.692513] acpi_os_purge_cache+0xa/0x10
[ 0.692517] acpi_purge_cached_objects+0x1d/0x68
[ 0.692522] acpi_initialize_objects+0x11/0x39
[ 0.692524] ? acpi_ev_install_xrupt_handlers+0x6f/0x7c
[ 0.692529] acpi_bus_init+0x50/0x183
[ 0.692532] acpi_init+0xce/0x182
[ 0.692536] ? acpi_bus_init+0x183/0x183
[ 0.692539] ? intel_idle_init+0x36d/0x36d
[ 0.692543] ? acpi_bus_init+0x183/0x183
[ 0.692546] do_one_initcall+0x71/0x300
[ 0.692550] ? trace_event_raw_event_initcall_finish+0x120/0x120
[ 0.692553] ? parameq+0x90/0x90
[ 0.692556] ? __wake_up_common+0x1e0/0x200
[ 0.692560] ? kasan_unpoison+0x21/0x50
[ 0.692562] ? __kasan_slab_alloc+0x24/0x70
[ 0.692567] do_initcalls+0xff/0x129
[ 0.692571] kernel_init_freeable+0x19c/0x1ce
[ 0.692574] ? rest_init+0xc6/0xc6
[ 0.692577] kernel_init+0xd/0x11a
[ 0.692580] ret_from_fork+0x1f/0x30

[ 15.428008] ==================================================================
[ 15.428011] BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428018] Write of size 8 at addr ffffc90000426008 by task kexec/1187
[ 15.428022] CPU: 2 PID: 1187 Comm: kexec Tainted: G W E 5.12.0.g2afefec-tip-rt #5
[ 15.428025] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 15.428027] Call Trace:
[ 15.428029] ? crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428032] dump_stack+0x8a/0xb5
[ 15.428037] print_address_description.constprop.0+0x16/0xa0
[ 15.428044] kasan_report+0xc4/0x100
[ 15.428047] ? crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428050] crash_setup_memmap_entries+0x17e/0x3a0
[ 15.428053] ? strcmp+0x2e/0x50
[ 15.428057] ? native_machine_crash_shutdown+0x240/0x240
[ 15.428059] ? kexec_purgatory_find_symbol.isra.0+0x145/0x1a0
[ 15.428066] setup_boot_parameters+0x181/0x5c0
[ 15.428069] bzImage64_load+0x6b5/0x740
[ 15.428072] ? bzImage64_probe+0x140/0x140
[ 15.428075] ? iov_iter_kvec+0x5f/0x70
[ 15.428080] ? rw_verify_area+0x80/0x80
[ 15.428087] ? __might_sleep+0x31/0xd0
[ 15.428091] ? __might_sleep+0x31/0xd0
[ 15.428094] ? ___might_sleep+0xc9/0xe0
[ 15.428096] ? bzImage64_probe+0x140/0x140
[ 15.428099] arch_kexec_kernel_image_load+0x102/0x130
[ 15.428102] kimage_file_alloc_init+0xda/0x290
[ 15.428107] __do_sys_kexec_file_load+0x21f/0x390
[ 15.428110] ? __x64_sys_open+0x100/0x100
[ 15.428113] ? kexec_calculate_store_digests+0x390/0x390
[ 15.428117] ? rcu_nocb_flush_deferred_wakeup+0x36/0x50
[ 15.428122] do_syscall_64+0x3d/0x80
[ 15.428127] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 15.428132] RIP: 0033:0x7f46ad026759
[ 15.428135] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0f d7 2b 00 f7 d8 64 89 01 48
[ 15.428137] RSP: 002b:00007ffcf6f96788 EFLAGS: 00000206 ORIG_RAX: 0000000000000140
[ 15.428141] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f46ad026759
[ 15.428143] RDX: 0000000000000182 RSI: 0000000000000005 RDI: 0000000000000003
[ 15.428145] RBP: 00007ffcf6f96a28 R08: 0000000000000002 R09: 0000000000000000
[ 15.428146] R10: 0000000000b0d5e0 R11: 0000000000000206 R12: 0000000000000004
[ 15.428148] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[ 15.428152] Memory state around the buggy address:
[ 15.428164] ffffc90000425f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428166] ffffc90000425f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428168] >ffffc90000426000: 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428169] ^
[ 15.428171] ffffc90000426080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428172] ffffc90000426100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 15.428173] ==================================================================
[ 15.428174] Disabling lock debugging due to kernel taint

kasan: stop grumbling about CRASH_DUMP

Signed-off-by: Mike Galbraith <[email protected]>
---
arch/x86/kernel/Makefile | 1 +
kernel/Makefile | 1 +
2 files changed, 2 insertions(+)

--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
+KASAN_SANITIZE_crash.o := n
obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
obj-y += kprobes/
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
obj-$(CONFIG_KEXEC) += kexec.o
obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
+KASAN_SANITIZE_kexec_file.o := n
obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
obj-$(CONFIG_COMPAT) += compat.o

2021-04-15 00:31:57

by Mike Galbraith

[permalink] [raw]

Subject: [patch] kasan: make it RT aware

On Wed, 2021-04-14 at 08:15 +0200, Mike Galbraith wrote:
> On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
> > On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith <[email protected]> wrote:
> >
> > > [ 0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943
> > > [ 0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > > [ 0.692442] Preemption disabled at:
> > > [ 0.692443] [<ffffffff811a1510>] on_each_cpu_cond_mask+0x30/0xb0
> > > [ 0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5
> > > [ 0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> > > [ 0.692456] Call Trace:
> > > [ 0.692458] ? on_each_cpu_cond_mask+0x30/0xb0
> > > [ 0.692462] dump_stack+0x8a/0xb5
> > > [ 0.692467] ___might_sleep.cold+0xfe/0x112
> > > [ 0.692471] rt_spin_lock+0x1c/0x60
> >
> > HI Mike,
> >
> > If freeing pages from smp_call_function is not OK, then perhaps we
> > need just to collect the objects to be freed to the task/CPU that
> > executes kasan_quarantine_remove_cache and it will free them (we know
> > it can free objects).
>
> Yeah, RT will have to shove freeing into preemptible context.

There's a very similar problem addressed in the RT patch set, so I used
the free samples on top of your *very* convenient hint that pesky
preallocation is optional, to seemingly make KASAN a happy RT camper.
Dunno if RT maintainers would prefer something like this over simply
disabling KASAN for RT configs, but what the heck, I'll show it.

kasan: make it RT aware

Skip preallocation when not possible for RT, and move cache removal
from IPI to synchronous work.

Signed-off-by: Mike Galbraith <[email protected]>
---
lib/stackdepot.c | 10 +++++-----
mm/kasan/quarantine.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+), 5 deletions(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
static int depot_index;
static int next_slab_inited;
static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);

static bool init_stack_slab(void **prealloc)
{
@@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
- u32 hash;
+ u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible();

if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
@@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
* The smp_load_acquire() here pairs with smp_store_release() to
* |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
*/
- if (unlikely(!smp_load_acquire(&next_slab_inited))) {
+ if (unlikely(!smp_load_acquire(&next_slab_inited) && may_prealloc)) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
prealloc = page_address(page);
}

- spin_lock_irqsave(&depot_lock, flags);
+ raw_spin_lock_irqsave(&depot_lock, flags);

found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
WARN_ON(!init_stack_slab(&prealloc));
}

- spin_unlock_irqrestore(&depot_lock, flags);
+ raw_spin_unlock_irqrestore(&depot_lock, flags);
exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -19,6 +19,9 @@
#include <linux/srcu.h>
#include <linux/string.h>
#include <linux/types.h>
+#include <linux/cpu.h>
+#include <linux/mutex.h>
+#include <linux/workqueue.h>
#include <linux/cpuhotplug.h>

#include "../slab.h"
@@ -308,6 +311,48 @@ static void per_cpu_remove_cache(void *a
qlist_free_all(&to_free, cache);
}

+#ifdef CONFIG_PREEMPT_RT
+struct remove_cache_work {
+ struct work_struct work;
+ struct kmem_cache *cache;
+};
+
+static DEFINE_MUTEX(remove_caches_lock);
+static DEFINE_PER_CPU(struct remove_cache_work, remove_cache_work);
+
+static void per_cpu_remove_cache_work(struct work_struct *w)
+{
+ struct remove_cache_work *rcw;
+
+ rcw = container_of(w, struct remove_cache_work, work);
+ per_cpu_remove_cache(rcw->cache);
+}
+
+static void per_cpu_remove_caches_sync(struct kmem_cache *cache)
+{
+ struct remove_cache_work *rcw;
+ unsigned int cpu;
+
+ cpus_read_lock();
+ mutex_lock(&remove_caches_lock);
+
+ for_each_online_cpu(cpu) {
+ rcw = &per_cpu(remove_cache_work, cpu);
+ INIT_WORK(&rcw->work, per_cpu_remove_cache_work);
+ rcw->cache = cache;
+ schedule_work_on(cpu, &rcw->work);
+ }
+
+ for_each_online_cpu(cpu) {
+ rcw = &per_cpu(remove_cache_work, cpu);
+ flush_work(&rcw->work);
+ }
+
+ mutex_unlock(&remove_caches_lock);
+ cpus_read_unlock();
+}
+#endif
+
/* Free all quarantined objects belonging to cache. */
void kasan_quarantine_remove_cache(struct kmem_cache *cache)
{
@@ -321,7 +366,11 @@ void kasan_quarantine_remove_cache(struc
* achieves the first goal, while synchronize_srcu() achieves the
* second.
*/
+#ifndef CONFIG_PREEMPT_RT
on_each_cpu(per_cpu_remove_cache, cache, 1);
+#else
+ per_cpu_remove_caches_sync(cache);
+#endif

raw_spin_lock_irqsave(&quarantine_lock, flags);
for (i = 0; i < QUARANTINE_BATCHES; i++) {

2021-04-15 18:15:55

by Mike Galbraith

[permalink] [raw]

Subject: Re: Question on KASAN calltrace record in RT

On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
>
> > [ 15.428008] ==================================================================
> > [ 15.428011] BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
>
> This looks like a genuine kernel bug on first glance. I think it needs
> to be fixed rather than ignored.

I posted a fix, so KASAN earns another notch in its pistol grips.

-Mike