LinuxLists.cc - dm-crypt with no_read_workqueue and no_write

2020-12-14 18:51:35

Subject: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

Hi,

I hit a reproducible BUG() when scrubbing a btrfs fs on top of
a dm-crypt device with no_read_workqueue and no_write_workqueue
flags enabled.

Steps to reproduce:
1) cryptsetup create -c aes -d /dev/urandom test /dev/vdb1
2) dmsetup table test --showkeys | sed 's/$/ 2 no_read_workqueue no_write_workqueue/' | dmsetup reload test
3) dmsetup suspend test && dmsetup resume test
4) mkfs.btrfs /dev/mapper/test
5) mkdir /tmp/test; mount -t btrfs /dev/mapper/test /tmp/test/
6) dd if=/dev/zero of=/tmp/test/test bs=1M count=900; sync
7) btrfs scrub start /tmp/test

The result is:
[ 2965.025717][ C0] BUG: sleeping function called from invalid context at mm/mempool.c:381
[ 2965.025721][ C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2848, name: kworker/u4:0
[ 2965.025726][ C0] CPU: 0 PID: 2848 Comm: kworker/u4:0 Tainted: G T 5.10.0-rc3+ #278
[ 2965.025727][ C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 2965.025735][ C0] Workqueue: btrfs-scrub btrfs_work_helper
[ 2965.025738][ C0] Call Trace:
[ 2965.025759][ C0] <IRQ>
[ 2965.025766][ C0] dump_stack+0x57/0x6a
[ 2965.025771][ C0] ___might_sleep.cold+0x8a/0x97
[ 2965.025776][ C0] mempool_alloc+0xcb/0x1b0
[ 2965.025781][ C0] ? crypto_skcipher_decrypt+0x5c/0x80
[ 2965.025786][ C0] crypt_convert+0xdf0/0xfa0
[ 2965.025793][ C0] ? update_load_avg+0x92/0x650
[ 2965.025798][ C0] kcryptd_crypt_read_convert+0xc5/0x100
[ 2965.025802][ C0] tasklet_action_common.constprop.0+0xfa/0x120
[ 2965.025807][ C0] __do_softirq+0xc4/0x2a8
[ 2965.025811][ C0] asm_call_irq_on_stack+0x12/0x20
[ 2965.025814][ C0] </IRQ>
[ 2965.025818][ C0] do_softirq_own_stack+0x36/0x40
[ 2965.025821][ C0] irq_exit_rcu+0x8d/0xc0
[ 2965.025825][ C0] sysvec_call_function_single+0x36/0x80
[ 2965.025829][ C0] asm_sysvec_call_function_single+0x12/0x20
[ 2965.025833][ C0] RIP: 0010:crc_18+0xa/0x1e
[ 2965.025837][ C0] Code: ff ff f2 4c 0f 38 f1 81 68 ff ff ff f2 4c 0f 38 f1 8a 68 ff ff ff f2 4d 0f 38 f1 93 68 ff ff ff f2 4c 0f 38 f1 81 70 ff ff ff <f2> 4c 0f 38 f1 8a 70 ff ff ff f2 4d 0f 38 f1 93 70 ff ff ff f2 4c
[ 2965.025839][ C0] RSP: 0018:ffffa2d04023bbf0 EFLAGS: 00000246
[ 2965.025842][ C0] RAX: 0000000000000080 RBX: 0000000000001000 RCX: ffff96e1851ef400
[ 2965.025844][ C0] RDX: ffff96e1851ef800 RSI: 0000000000001000 RDI: 0000000000000000
[ 2965.025845][ C0] RBP: ffff96e1851ef000 R08: 000000000ae37425 R09: 0000000000000000
[ 2965.025847][ C0] R10: 0000000000000000 R11: ffff96e1851efc00 R12: 0000000000001000
[ 2965.025848][ C0] R13: ffff96e18e9a6a80 R14: fffff68440000000 R15: ffff96e184d73c00
[ 2965.025854][ C0] ? crc32c_pcl_intel_digest+0xa0/0xc0
[ 2965.025857][ C0] ? scrub_checksum_data+0x126/0x180
[ 2965.025861][ C0] ? update_load_avg+0x92/0x650
[ 2965.025863][ C0] ? cpuacct_charge+0x35/0x80
[ 2965.025867][ C0] ? check_preempt_wakeup+0xef/0x260
[ 2965.025872][ C0] ? radix_tree_gang_lookup+0xa1/0xd0
[ 2965.025875][ C0] ? __wake_up_common_lock+0x8a/0xc0
[ 2965.025878][ C0] ? scrub_block_put+0x5c/0xa0
[ 2965.025880][ C0] ? scrub_bio_end_io_worker+0x52/0x220
[ 2965.025884][ C0] ? kfree+0x3ec/0x450
[ 2965.025887][ C0] ? scrub_bio_end_io_worker+0xd7/0x220
[ 2965.025890][ C0] ? btrfs_work_helper+0xdb/0x350
[ 2965.025893][ C0] ? __schedule+0x275/0x8c0
[ 2965.025897][ C0] ? process_one_work+0x1d6/0x3b0
[ 2965.025901][ C0] ? worker_thread+0x53/0x400
[ 2965.025904][ C0] ? rescuer_thread+0x440/0x440
[ 2965.025907][ C0] ? kthread+0x13f/0x160
[ 2965.025910][ C0] ? __kthread_bind_mask+0xa0/0xa0
[ 2965.025913][ C0] ? ret_from_fork+0x22/0x30

Additionally, here is a similar BUG() backtrace I've hit on
another system, while scrubbing a btrfs fs on dm-crypt devs
with these flags enabled, the BUG() hit here was actually
scheduling while atomic:
<IRQ>
dump_stack+0x57/0x6a
__schedule_bug.cold+0x47/0x53
__schedule+0x643/0x7d0
schedule+0x62/0xd0
schedule_timeout+0x135/0x180
? cryptd_enqueue_request+0x5e/0xd0
wait_for_completion+0x84/0xe0
crypt_convert.constprop.0+0x7a7/0xf60
? crypt_endio+0x2d/0x90
? kmem_cache_free+0x416/0x470
? ktime_get+0x4a/0xc0
kcryptd_crypt_read_convert+0xbe/0x100
blk_update_request+0x229/0x3d0
scsi_end_request+0x40/0x200
scsi_io_completion+0x8e/0x5b0
blk_done_softirq+0x87/0xb0
__do_softirq+0xd3/0x2ed
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x36/0x40
irq_exit_rcu+0x8d/0xc0
common_interrupt+0x87/0x160
asm_common_interrupt+0x1e/0x40
RIP: 0010:_aesni_dec4+0x2f/0x190
Code: fa 66 0f ef c2 66 0f ef e2 66 0f ef ea 66 0f ef f2 49 83 c2 30 41 83 f9 18 72 70 4d 8d 52 20 74 36 49 83 c2 20 41 0f 28 52 a0 <66> 0f 38 de c2 66 0f 38 de e2 66 0f 3
? _aesni_dec1+0xb0/0xb0
? _aesni_dec1+0xb0/0xb0
? aesni_xts_crypt8+0xe0/0x240
? __glue_xts_req_128bit+0x8f/0xc0
? glue_xts_req_128bit+0xc1/0x400
? aesni_set_key+0x1e0/0x1e0
? ttwu_queue_wakelist+0xb9/0xe0
? kcryptd_async_done+0xa6/0x210
? kcryptd_async_done+0xa6/0x210
? kfree+0x401/0x460
? kcryptd_async_done+0xa6/0x210
? crypto_skcipher_decrypt+0x4b/0x80
? cryptd_skcipher_decrypt+0xa6/0x130
? update_load_avg+0x7e/0x660
? set_next_entity+0xcd/0x240
? __switch_to_asm+0x3f/0x60
? finish_task_switch+0x87/0x280
? __schedule+0x263/0x7d0
? preempt_schedule+0x32/0x50
? preempt_schedule_thunk+0x16/0x18
? cryptd_queue_worker+0x8b/0xe0
? process_one_work+0x1d4/0x3d0
? worker_thread+0x50/0x3f0
? rescuer_thread+0x420/0x420
? kthread+0x134/0x160
? __kthread_bind_mask+0xa0/0xa0
? ret_from_fork+0x22/0x30

It looks like to me that the dm-crypt code in kcryptd_queue_crypt()
checks for a hard IRQ context via in_irq() before deciding whether to
offload processing to a tasklet or proceed directly, but it probably
should check for any interrupt context via in_interrupt() or similar
and then proceed with using a workqueue, as the ordinary code flow
does.

CC'ing btrfs folks in case it is actually a btrfs-specific issue.

Thanks,
Maciej

2020-12-23 15:39:02

by Maciej S. Szmigiero

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On 14.12.2020 19:11, Maciej S. Szmigiero wrote:
> Hi,
>
> I hit a reproducible BUG() when scrubbing a btrfs fs on top of
> a dm-crypt device with no_read_workqueue and no_write_workqueue
> flags enabled.

Still happens on the current torvalds/master.

Due to this bug it is not possible to use btrfs on top of
a dm-crypt device with no_read_workqueue and no_write_workqueue
flags enabled.

@Ignat:
Can you have a look at this as the person who added these flags?

It looks like to me that the skcipher API might not be safe to
call from a softirq context, after all.

Maciej

2020-12-23 21:00:42

by Herbert Xu

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On Wed, Dec 23, 2020 at 04:37:34PM +0100, Maciej S. Szmigiero wrote:
>
> It looks like to me that the skcipher API might not be safe to
> call from a softirq context, after all.

skcipher is safe to use in a softirq. The problem is only in
dm-crypt where it tries to allocate memory with GFP_NOIO.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-12-23 21:21:33

by Maciej S. Szmigiero

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On 23.12.2020 22:09, Ignat Korchagin wrote:
(..)
> I've been looking into this for the last couple of days because of
> other reports [1].
> Just finished testing a possible solution. Will submit soon.

Thanks for looking into it.

By the way, on a bare metal I am actually hitting a different problem
(scheduling while atomic) when scrubbing a btrfs filesystem, just as one
of your users from that GitHub report had [1].

I've pasted that backtrace in my original Dec 14 message.

Thanks,
Maciej

[1]: https://github.com/cloudflare/linux/issues/1#issuecomment-736734243

2020-12-23 21:34:10

by Ignat Korchagin

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On Wed, Dec 23, 2020 at 9:20 PM Maciej S. Szmigiero
<[email protected]> wrote:
>
> On 23.12.2020 22:09, Ignat Korchagin wrote:
> (..)
> > I've been looking into this for the last couple of days because of
> > other reports [1].
> > Just finished testing a possible solution. Will submit soon.
>
> Thanks for looking into it.
>
> By the way, on a bare metal I am actually hitting a different problem
> (scheduling while atomic) when scrubbing a btrfs filesystem, just as one
> of your users from that GitHub report had [1].

That is because dm-crypt calls "wait_for_completion" in rare cases
when Crypto API (cryptd) backlogs the decryption request. I've
reproduced that one as well (even with no FS).
We never hit these problems in the original testing probably due to
the fact we mostly used xtsproxy custom crypto module, which is
totally synchronous.
I did test it later with standard crypto, but did not encounter these
problems as well most likely because it is also depends which storage
driver underneath we are using: most of them do not submit read
requests to dm-crypt in irq/softirq context in the first place

> I've pasted that backtrace in my original Dec 14 message.
>
> Thanks,
> Maciej
>
> [1]: https://github.com/cloudflare/linux/issues/1#issuecomment-736734243

Regards,
Ignat

2020-12-24 18:49:55

by Ignat Korchagin

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On Wed, Dec 23, 2020 at 8:57 PM Herbert Xu <[email protected]> wrote:
>
> On Wed, Dec 23, 2020 at 04:37:34PM +0100, Maciej S. Szmigiero wrote:
> >
> > It looks like to me that the skcipher API might not be safe to
> > call from a softirq context, after all.
>
> skcipher is safe to use in a softirq. The problem is only in
> dm-crypt where it tries to allocate memory with GFP_NOIO.

Hm.. After eliminating the GFP_NOIO (as well as some other sleeping
paths) from dm-crypt softirq code I still hit an occasional crash in
my extreme setup (QEMU with 1 CPU and cryptd_max_cpu_qlen set to 1)
(decoded with stacktrace_decode.sh):

[ 89.324723] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 89.325713] #PF: supervisor write access in kernel mode
[ 89.326460] #PF: error_code(0x0002) - not-present page
[ 89.327211] PGD 0 P4D 0
[ 89.327589] Oops: 0002 [#1] PREEMPT SMP PTI
[ 89.328200] CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted 5.10.0+ #79
[ 89.329109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 0.0.0 02/06/2015
[ 89.330284] Workqueue: cryptd cryptd_queue_worker
[ 89.330999] RIP: 0010:crypto_dequeue_request
(/cfsetup_build/./include/linux/list.h:112
/cfsetup_build/./include/linux/list.h:135
/cfsetup_build/./include/linux/list.h:146
/cfsetup_build/crypto/algapi.c:957)
[ 89.331757] Code: e9 c9 d0 a8 48 c7 c7 f9 c9 d0 a8 e8 c2 88 fe ff 4c
8b 23 48 c7 c6 e9 c9 d0 a8 48 c7 c7 f9 c9 d0 a8 49 8b 14 24 49 8b 44
24 08 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 04 24
48
All code
========
0: e9 c9 d0 a8 48 jmpq 0x48a8d0ce
5: c7 c7 f9 c9 d0 a8 mov $0xa8d0c9f9,%edi
b: e8 c2 88 fe ff callq 0xfffffffffffe88d2
10: 4c 8b 23 mov (%rbx),%r12
13: 48 c7 c6 e9 c9 d0 a8 mov $0xffffffffa8d0c9e9,%rsi
1a: 48 c7 c7 f9 c9 d0 a8 mov $0xffffffffa8d0c9f9,%rdi
21: 49 8b 14 24 mov (%r12),%rdx
25: 49 8b 44 24 08 mov 0x8(%r12),%rax
2a:* 48 89 42 08 mov %rax,0x8(%rdx) <-- trapping instruction
2e: 48 89 10 mov %rdx,(%rax)
31: 48 b8 00 01 00 00 00 movabs $0xdead000000000100,%rax
38: 00 ad de
3b: 49 89 04 24 mov %rax,(%r12)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 89 42 08 mov %rax,0x8(%rdx)
4: 48 89 10 mov %rdx,(%rax)
7: 48 b8 00 01 00 00 00 movabs $0xdead000000000100,%rax
e: 00 ad de
11: 49 89 04 24 mov %rax,(%r12)
15: 48 rex.W
[ 89.334414] RSP: 0018:ffffba64c00bbe68 EFLAGS: 00010246
[ 89.335165] RAX: 0000000000000000 RBX: ffff9b9d6fc28d88 RCX: 0000000000000000
[ 89.336182] RDX: 0000000000000000 RSI: ffffffffa8d0c9e9 RDI: ffffffffa8d0c9f9
[ 89.337204] RBP: 0000000000000000 R08: ffffffffa906e708 R09: 0000000000000058
[ 89.338208] R10: ffffffffa9068720 R11: 00000000fffffc00 R12: ffff9b9a43797478
[ 89.339216] R13: 0000000000000020 R14: ffff9b9d6fc28e00 R15: 0000000000000000
[ 89.340231] FS: 0000000000000000(0000) GS:ffff9b9d6fc00000(0000)
knlGS:0000000000000000
[ 89.341376] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 89.342207] CR2: 0000000000000008 CR3: 000000014cd76002 CR4: 0000000000170ef0
[ 89.343238] Call Trace:
[ 89.343609] cryptd_queue_worker (/cfsetup_build/crypto/cryptd.c:172)
[ 89.344218] process_one_work
(/cfsetup_build/./arch/x86/include/asm/preempt.h:26
/cfsetup_build/kernel/workqueue.c:2284)
[ 89.344821] ? rescuer_thread (/cfsetup_build/kernel/workqueue.c:2364)
[ 89.345399] worker_thread
(/cfsetup_build/./include/linux/list.h:282
/cfsetup_build/kernel/workqueue.c:2422)
[ 89.345923] ? rescuer_thread (/cfsetup_build/kernel/workqueue.c:2364)
[ 89.346504] kthread (/cfsetup_build/kernel/kthread.c:292)
[ 89.346986] ? kthread_create_worker_on_cpu
(/cfsetup_build/kernel/kthread.c:245)
[ 89.347713] ret_from_fork (/cfsetup_build/arch/x86/entry/entry_64.S:302)
[ 89.348255] Modules linked in:
[ 89.348708] CR2: 0000000000000008
[ 89.349197] ---[ end trace b7e9618b4122ed3b ]---
[ 89.349863] RIP: 0010:crypto_dequeue_request
(/cfsetup_build/./include/linux/list.h:112
/cfsetup_build/./include/linux/list.h:135
/cfsetup_build/./include/linux/list.h:146
/cfsetup_build/crypto/algapi.c:957)
[ 89.350606] Code: e9 c9 d0 a8 48 c7 c7 f9 c9 d0 a8 e8 c2 88 fe ff 4c
8b 23 48 c7 c6 e9 c9 d0 a8 48 c7 c7 f9 c9 d0 a8 49 8b 14 24 49 8b 44
24 08 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 04 24
48
All code
========
0: e9 c9 d0 a8 48 jmpq 0x48a8d0ce
5: c7 c7 f9 c9 d0 a8 mov $0xa8d0c9f9,%edi
b: e8 c2 88 fe ff callq 0xfffffffffffe88d2
10: 4c 8b 23 mov (%rbx),%r12
13: 48 c7 c6 e9 c9 d0 a8 mov $0xffffffffa8d0c9e9,%rsi
1a: 48 c7 c7 f9 c9 d0 a8 mov $0xffffffffa8d0c9f9,%rdi
21: 49 8b 14 24 mov (%r12),%rdx
25: 49 8b 44 24 08 mov 0x8(%r12),%rax
2a:* 48 89 42 08 mov %rax,0x8(%rdx) <-- trapping instruction
2e: 48 89 10 mov %rdx,(%rax)
31: 48 b8 00 01 00 00 00 movabs $0xdead000000000100,%rax
38: 00 ad de
3b: 49 89 04 24 mov %rax,(%r12)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 89 42 08 mov %rax,0x8(%rdx)
4: 48 89 10 mov %rdx,(%rax)
7: 48 b8 00 01 00 00 00 movabs $0xdead000000000100,%rax
e: 00 ad de
11: 49 89 04 24 mov %rax,(%r12)
15: 48 rex.W
[ 89.353266] RSP: 0018:ffffba64c00bbe68 EFLAGS: 00010246
[ 89.354003] RAX: 0000000000000000 RBX: ffff9b9d6fc28d88 RCX: 0000000000000000
[ 89.355048] RDX: 0000000000000000 RSI: ffffffffa8d0c9e9 RDI: ffffffffa8d0c9f9
[ 89.356063] RBP: 0000000000000000 R08: ffffffffa906e708 R09: 0000000000000058
[ 89.357082] R10: ffffffffa9068720 R11: 00000000fffffc00 R12: ffff9b9a43797478
[ 89.358088] R13: 0000000000000020 R14: ffff9b9d6fc28e00 R15: 0000000000000000
[ 89.359127] FS: 0000000000000000(0000) GS:ffff9b9d6fc00000(0000)
knlGS:0000000000000000
[ 89.360296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 89.361129] CR2: 0000000000000008 CR3: 000000014cd76002 CR4: 0000000000170ef0
[ 89.362160] Kernel panic - not syncing: Fatal exception in interrupt
[ 89.363145] Kernel Offset: 0x26000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 89.364730] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

This happens when running dm-crypt with no_read_workqueues on top of
an emulated NVME in QEMU (NVME driver "completes" IO in IRQ context).
Somehow sending decryption requests to cryptd in some fashion in
softirq context corrupts the crypto queue it seems.

Regards,
Ignat

> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-12-24 18:55:06

by Maciej S. Szmigiero

[permalink] [raw]

Subject: Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

On 24.12.2020 19:46, Ignat Korchagin wrote:
> On Wed, Dec 23, 2020 at 8:57 PM Herbert Xu <[email protected]> wrote:
>>
>> On Wed, Dec 23, 2020 at 04:37:34PM +0100, Maciej S. Szmigiero wrote:
>>>
>>> It looks like to me that the skcipher API might not be safe to
>>> call from a softirq context, after all.
>>
>> skcipher is safe to use in a softirq. The problem is only in
>> dm-crypt where it tries to allocate memory with GFP_NOIO.
>
> Hm.. After eliminating the GFP_NOIO (as well as some other sleeping
> paths) from dm-crypt softirq code I still hit an occasional crash in
> my extreme setup (QEMU with 1 CPU and cryptd_max_cpu_qlen set to 1)
> (decoded with stacktrace_decode.sh):
(..)
> This happens when running dm-crypt with no_read_workqueues on top of
> an emulated NVME in QEMU (NVME driver "completes" IO in IRQ context).
> Somehow sending decryption requests to cryptd in some fashion in
> softirq context corrupts the crypto queue it seems.

You can try compiling your test kernel with KASAN, as it often
immediately points out where the memory starts to get corrupted
(if that's the bug).

Enabling other "checking" kernel debug options might help debugging
the root case of this, too.

> Regards,
> Ignat

Thanks,
Maciej