Subject: [PATCH v2 1/1] kasan: fix memory leak of kasan quarantine

When cpu is going offline, set q->offline as true
and interrupt happened. The interrupt may call the
quarantine_put. But quarantine_put do not free the
the object. The object will cause memory leak.

Add qlink_free() to free the object.

Fixes: 6c82d45c7f03 (kasan: fix object remaining in offline per-cpu quarantine)
Signed-off-by: Kuan-Ying Lee <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: <[email protected]> [5.10-]
---
mm/kasan/quarantine.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 0e3f8494628f..cac7c617df72 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -191,6 +191,7 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)

q = this_cpu_ptr(&cpu_quarantine);
if (q->offline) {
+ qlink_free(&info->quarantine_link, cache);
local_irq_restore(flags);
return;
}
--
2.18.0


2020-12-28 16:09:15

by Shinichiro Kawasaki

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] kasan: fix memory leak of kasan quarantine

On Dec 17, 2020 / 20:18, Kuan-Ying Lee wrote:
> When cpu is going offline, set q->offline as true
> and interrupt happened. The interrupt may call the
> quarantine_put. But quarantine_put do not free the
> the object. The object will cause memory leak.
>
> Add qlink_free() to free the object.
>
> Fixes: 6c82d45c7f03 (kasan: fix object remaining in offline per-cpu quarantine)
> Signed-off-by: Kuan-Ying Lee <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Alexander Potapenko <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Matthias Brugger <[email protected]>
> Cc: <[email protected]> [5.10-]
> ---
> mm/kasan/quarantine.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> index 0e3f8494628f..cac7c617df72 100644
> --- a/mm/kasan/quarantine.c
> +++ b/mm/kasan/quarantine.c
> @@ -191,6 +191,7 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
>
> q = this_cpu_ptr(&cpu_quarantine);
> if (q->offline) {
> + qlink_free(&info->quarantine_link, cache);
> local_irq_restore(flags);
> return;
> }

I ran blktests with kernels v5.10.0 and v5.10.3 enabling KASAN, and observed a
BUG message [1]. The BUG can be recreated with a test case which hotplugs CPUs
during I/O to dm-linear device. The stack trace in the message indicates memory
leak was detected when slab of dm-linear bio set was destroyed.

I bisected and found the commit 6c82d45c7f03 ("kasan: fix object remaining in
offline per-cpu quarantine") triggers the BUG message. I also tried this fix
patch by Kuan-Ying and observed that it avoids the BUG message. I suppose the
fix is required for v5.10.x. Confirmation by kasan maintainers will be
appreciated.

I took following steps to recreate the BUG message:

1. Create dm-linear device on top of HDD (dmsetup create).
2. Run blktests' test case block/008 using the dm-linear device.
3. Remove the dm-linear device (dmsetup remove)

When I repeat the steps, the BUG message is often observed at the step 3.

I repeated the steps with v5.11-rc1 also, and did not observe the BUG. With
v5.11-rc1, quarantine_put() returns bool. Then I think the fix patch is not
required for v5.11, probably.

Wish this report helps for the fix.

[1]

[ 151.201998] =============================================================================
[ 151.212580] BUG bio-3 (Not tainted): Objects remaining in bio-3 on __kmem_cache_shutdown()
[ 151.222321] -----------------------------------------------------------------------------

[ 151.234933] Disabling lock debugging due to kernel taint
[ 151.241634] INFO: Slab 0x0000000010690f30 objects=36 used=3 fp=0x00000000e7351615 flags=0x17ffffc0010200
[ 151.252558] CPU: 6 PID: 1996 Comm: dmsetup Tainted: G B 5.10.3 #21
[ 151.261520] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0 04/24/2015
[ 151.270070] Call Trace:
[ 151.274029] dump_stack+0x9a/0xcc
[ 151.278844] slab_err+0xb7/0xdc
[ 151.283505] ? do_raw_spin_lock+0x115/0x240
[ 151.289220] ? rwlock_bug.part.0+0x90/0x90
[ 151.294833] __kmem_cache_shutdown.cold+0x36/0x19f
[ 151.301182] kmem_cache_destroy+0x5d/0x110
[ 151.306822] bio_put_slab+0xd3/0x180
[ 151.311952] bioset_exit+0xa5/0x100
[ 151.316989] cleanup_mapped_device+0x5e/0x310
[ 151.322886] free_dev+0xb8/0x210
[ 151.327665] __dm_destroy+0x2e0/0x470
[ 151.332872] ? dm_blk_report_zones+0x2b0/0x2b0
[ 151.338877] ? _raw_spin_unlock+0x1f/0x30
[ 151.344481] dev_remove+0x223/0x2f0
[ 151.349565] ctl_ioctl+0x384/0x970
[ 151.354572] ? remove_all+0x90/0x90
[ 151.359677] ? find_held_lock+0x2c/0x110
[ 151.365211] ? free_params+0x30/0x30
[ 151.370401] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 151.377181] dm_ctl_ioctl+0xa/0x10
[ 151.382224] __x64_sys_ioctl+0x127/0x190
[ 151.387800] do_syscall_64+0x33/0x40
[ 151.393017] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 151.399713] RIP: 0033:0x7f6fe944338b
[ 151.404947] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ba 0c 00 f7 d8 64 89 01 48
[ 151.426499] RSP: 002b:00007ffd00632f08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 151.435871] RAX: ffffffffffffffda RBX: 00007f6fe9523f30 RCX: 00007f6fe944338b
[ 151.444829] RDX: 0000561f9b938440 RSI: 00000000c138fd04 RDI: 0000000000000003
[ 151.453812] RBP: 00007f6fe9560494 R08: 00007f6fe9562de0 R09: 00007ffd00632d70
[ 151.462808] R10: 0000000000000000 R11: 0000000000000202 R12: 0000561f9b936c40
[ 151.471820] R13: 0000561f9b938470 R14: 0000561f9b9384f0 R15: 0000561f9b938440
[ 151.480836] INFO: Object 0x00000000f6e5c796 @offset=5824
[ 151.488063] INFO: Object 0x00000000a14f36d2 @offset=6720
[ 151.495274] INFO: Object 0x000000005bfef957 @offset=9856

--
Best Regards,
Shin'ichiro Kawasaki