2019-08-15 23:51:26

by Gary R Hook

[permalink] [raw]
Subject: Crypto driver callback problem

Context:
We've run into a possible locking issue when using the AMD CCP device
for crypto offload, with the new extra tests, and with lock debugging
enabled.

Pre-reqs:
Enable the crypto self-tests and extra tests
Build the crypto testing module (tcrypt)
Enable the kernel hacking/lock debugging option
"prove locking correctness" (CONFIG_PROVE_LOCKING=y)

Problem:
After booting this kernel, and loading our ccp-crypto module, run the
tests with "modprobe tcrypt" and we get the dmesg output shown following.

With some investigation we discovered that in the situation being
reported here, the callback is executed as a tasklet, where we're not
allowed to sleep. The called routine began its crypto operation with the
MAY_SLEEP flag set, and allocates memory using GFP_KERNEL.

The test itself runs in a typical context, and may sleep; the callback,
however, runs as a tasklet yet thinks it can sleep based on the
CRYPTO_TFM_REQ_MAY_SLEEP flag set in the request.

Drivers for crypto are required (as we understand it) to use a tasklet
approach for callbacks, which is needed (for one example) to support
IPSec. (N.B. we did some poking around in other, similar crypto
drivers, and all appear to be using tasklets.)

We're looking for the best approach to resolve this. Thoughts?

Gary

[ 58.815592] alg: skcipher: skipping comparison tests for ofb(aes)
because ofb(aes-generic) is unavailable
[ 59.105437] alg: skcipher: skipping comparison tests for cfb(aes)
because cfb(aes-generic) is unavailable
[ 64.024347] alg: hash: failed to allocate transform for streebog256: -2
[ 64.039851] alg: hash: failed to allocate transform for streebog512: -2
[ 67.698789] alg: hash: failed to allocate transform for
hmac(streebog256): -2
[ 67.720735] alg: hash: failed to allocate transform for
hmac(streebog512): -2

[ 68.803745] ================================
[ 68.808515] WARNING: inconsistent lock state
[ 68.813284] 5.3.0-rc2-oopschk+ #62 Tainted: G E
[ 68.819796] --------------------------------
[ 68.824565] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 68.831272] ksoftirqd/56/299 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 68.837496] 00000000334d2af0 (fs_reclaim){+.?.}, at:
fs_reclaim_acquire.part.101+0x5/0x30
[ 68.846637] {SOFTIRQ-ON-W} state was registered at:
[ 68.852078] lock_acquire+0xa3/0x180
[ 68.856167] fs_reclaim_acquire.part.101+0x29/0x30
[ 68.861614] fs_reclaim_acquire+0x19/0x20
[ 68.866187] slab_pre_alloc_hook+0x19/0x70
[ 68.870847] __kmalloc_node+0x51/0x1f0
[ 68.875134] alloc_cpumask_var_node+0x1f/0x30
[ 68.880096] zalloc_cpumask_var+0x14/0x20
[ 68.884673] native_smp_prepare_cpus+0xd7/0x28e
[ 68.889829] kernel_init_freeable+0x136/0x2d0
[ 68.894791] kernel_init+0xe/0x110
[ 68.898690] ret_from_fork+0x27/0x50
[ 68.902776] irq event stamp: 53764
[ 68.906577] hardirqs last enabled at (53764): [<ffffffff8e4d1b37>]
slab_free_freelist_hook+0xd7/0x1c0
[ 68.916968] hardirqs last disabled at (53763): [<ffffffff8e4d1b18>]
slab_free_freelist_hook+0xb8/0x1c0
[ 68.927363] softirqs last enabled at (53752): [<ffffffff8f00035e>]
__do_softirq+0x35e/0x3fe
[ 68.936791] softirqs last disabled at (53757): [<ffffffff8e2a0465>]
run_ksoftirqd+0x35/0x50
[ 68.946115]
other info that might help us debug this:
[ 68.953403] Possible unsafe locking scenario:

[ 68.960004] CPU0
[ 68.962732] ----
[ 68.965463] lock(fs_reclaim);
[ 68.968963] <Interrupt>
[ 68.971888] lock(fs_reclaim);
[ 68.975585]
*** DEADLOCK ***

[ 68.982197] no locks held by ksoftirqd/56/299.
[ 68.987157]
stack backtrace:
[ 68.992025] CPU: 56 PID: 299 Comm: ksoftirqd/56 Tainted: G E
5.3.0-rc2-oopschk+ #62
[ 69.002026] Hardware name: AMD Corporation Diesel/Diesel, BIOS
RDL100BB 11/14/2018
[ 69.010480] Call Trace:
[ 69.013212] dump_stack+0x8e/0xd5
[ 69.016915] print_usage_bug+0x1dd/0x1f0
[ 69.021298] mark_lock+0x454/0x4f0
[ 69.025094] ? find_held_lock+0x35/0xa0
[ 69.029376] ? check_usage_backwards+0x180/0x180
[ 69.034526] __lock_acquire+0x586/0x1470
[ 69.038909] ? trace_hardirqs_off+0x3b/0xe0
[ 69.043584] lock_acquire+0xa3/0x180
[ 69.047577] ? fs_reclaim_acquire.part.101+0x5/0x30
[ 69.053023] fs_reclaim_acquire.part.101+0x29/0x30
[ 69.058373] ? fs_reclaim_acquire.part.101+0x5/0x30
[ 69.063819] fs_reclaim_acquire+0x19/0x20
[ 69.068299] slab_pre_alloc_hook+0x19/0x70
[ 69.072875] __kmalloc+0x4e/0x270
[ 69.076578] ? skcipher_walk_next+0x1c9/0x570
[ 69.081444] skcipher_walk_next+0x1c9/0x570
[ 69.086115] skcipher_walk_first+0x46/0x110
[ 69.090788] skcipher_walk_skcipher+0xb7/0xd0
[ 69.095656] skcipher_walk_virt+0x2b/0x60
[ 69.100145] cbc_decrypt+0x4d/0xe0 [aesni_intel]
[ 69.105310] crypto_skcipher_decrypt+0x1b/0x30
[ 69.110273] ? crypto_skcipher_decrypt+0x1b/0x30
[ 69.115431] simd_skcipher_decrypt+0xbe/0xd0 [crypto_simd]
[ 69.121557] crypto_skcipher_decrypt+0x1b/0x30
[ 69.126525] crypto_authenc_decrypt_tail.isra.2+0xee/0x110
[ 69.132654] authenc_verify_ahash_done+0x2e/0x40
[ 69.137812] ccp_crypto_complete+0xde/0x190
[ 69.142488] ccp_do_cmd_complete+0x27/0x40
[ 69.147067] tasklet_action_common.isra.21+0x81/0x1a0
[ 69.152709] tasklet_action+0x22/0x30
[ 69.156796] __do_softirq+0xcd/0x3fe
[ 69.160790] ? lockdep_hardirqs_on+0xfb/0x190
[ 69.165655] run_ksoftirqd+0x35/0x50
[ 69.169649] smpboot_thread_fn+0xfc/0x170
[ 69.174130] kthread+0x121/0x140
[ 69.177735] ? sort_range+0x30/0x30
[ 69.181630] ? kthread_park+0xb0/0xb0
[ 69.185720] ret_from_fork+0x27/0x50
[ 70.850617] alg: skcipher: failed to allocate transform for ecb(sm4): -2
[ 70.866302] alg: skcipher: failed to allocate transform for cbc(sm4): -2
[ 70.881646] alg: skcipher: failed to allocate transform for ctr(sm4): -2
[ 70.889161] tcrypt: one or more tests failed!


2019-09-10 17:10:45

by Gary R Hook

[permalink] [raw]
Subject: RE: Crypto driver callback problem

*ping*

Anyone?

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Gary R Hook
Sent: Thursday, August 15, 2019 5:32 PM
To: [email protected]
Subject: Crypto driver callback problem

Context:
We've run into a possible locking issue when using the AMD CCP device for crypto offload, with the new extra tests, and with lock debugging enabled.

Pre-reqs:
Enable the crypto self-tests and extra tests
Build the crypto testing module (tcrypt)
Enable the kernel hacking/lock debugging option
"prove locking correctness" (CONFIG_PROVE_LOCKING=y)

Problem:
After booting this kernel, and loading our ccp-crypto module, run the tests with "modprobe tcrypt" and we get the dmesg output shown following.

With some investigation we discovered that in the situation being reported here, the callback is executed as a tasklet, where we're not allowed to sleep. The called routine began its crypto operation with the MAY_SLEEP flag set, and allocates memory using GFP_KERNEL.

The test itself runs in a typical context, and may sleep; the callback, however, runs as a tasklet yet thinks it can sleep based on the CRYPTO_TFM_REQ_MAY_SLEEP flag set in the request.

Drivers for crypto are required (as we understand it) to use a tasklet approach for callbacks, which is needed (for one example) to support IPSec. (N.B. we did some poking around in other, similar crypto drivers, and all appear to be using tasklets.)

We're looking for the best approach to resolve this. Thoughts?

Gary

[ 58.815592] alg: skcipher: skipping comparison tests for ofb(aes)
because ofb(aes-generic) is unavailable
[ 59.105437] alg: skcipher: skipping comparison tests for cfb(aes)
because cfb(aes-generic) is unavailable
[ 64.024347] alg: hash: failed to allocate transform for streebog256: -2
[ 64.039851] alg: hash: failed to allocate transform for streebog512: -2
[ 67.698789] alg: hash: failed to allocate transform for
hmac(streebog256): -2
[ 67.720735] alg: hash: failed to allocate transform for
hmac(streebog512): -2

[ 68.803745] ================================
[ 68.808515] WARNING: inconsistent lock state
[ 68.813284] 5.3.0-rc2-oopschk+ #62 Tainted: G E
[ 68.819796] --------------------------------
[ 68.824565] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 68.831272] ksoftirqd/56/299 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 68.837496] 00000000334d2af0 (fs_reclaim){+.?.}, at:
fs_reclaim_acquire.part.101+0x5/0x30
[ 68.846637] {SOFTIRQ-ON-W} state was registered at:
[ 68.852078] lock_acquire+0xa3/0x180
[ 68.856167] fs_reclaim_acquire.part.101+0x29/0x30
[ 68.861614] fs_reclaim_acquire+0x19/0x20
[ 68.866187] slab_pre_alloc_hook+0x19/0x70
[ 68.870847] __kmalloc_node+0x51/0x1f0
[ 68.875134] alloc_cpumask_var_node+0x1f/0x30
[ 68.880096] zalloc_cpumask_var+0x14/0x20
[ 68.884673] native_smp_prepare_cpus+0xd7/0x28e
[ 68.889829] kernel_init_freeable+0x136/0x2d0
[ 68.894791] kernel_init+0xe/0x110
[ 68.898690] ret_from_fork+0x27/0x50
[ 68.902776] irq event stamp: 53764
[ 68.906577] hardirqs last enabled at (53764): [<ffffffff8e4d1b37>]
slab_free_freelist_hook+0xd7/0x1c0
[ 68.916968] hardirqs last disabled at (53763): [<ffffffff8e4d1b18>]
slab_free_freelist_hook+0xb8/0x1c0
[ 68.927363] softirqs last enabled at (53752): [<ffffffff8f00035e>]
__do_softirq+0x35e/0x3fe
[ 68.936791] softirqs last disabled at (53757): [<ffffffff8e2a0465>]
run_ksoftirqd+0x35/0x50
[ 68.946115]
other info that might help us debug this:
[ 68.953403] Possible unsafe locking scenario:

[ 68.960004] CPU0
[ 68.962732] ----
[ 68.965463] lock(fs_reclaim);
[ 68.968963] <Interrupt>
[ 68.971888] lock(fs_reclaim);
[ 68.975585]
*** DEADLOCK ***

[ 68.982197] no locks held by ksoftirqd/56/299.
[ 68.987157]
stack backtrace:
[ 68.992025] CPU: 56 PID: 299 Comm: ksoftirqd/56 Tainted: G E
5.3.0-rc2-oopschk+ #62
[ 69.002026] Hardware name: AMD Corporation Diesel/Diesel, BIOS
RDL100BB 11/14/2018
[ 69.010480] Call Trace:
[ 69.013212] dump_stack+0x8e/0xd5
[ 69.016915] print_usage_bug+0x1dd/0x1f0
[ 69.021298] mark_lock+0x454/0x4f0
[ 69.025094] ? find_held_lock+0x35/0xa0
[ 69.029376] ? check_usage_backwards+0x180/0x180
[ 69.034526] __lock_acquire+0x586/0x1470
[ 69.038909] ? trace_hardirqs_off+0x3b/0xe0
[ 69.043584] lock_acquire+0xa3/0x180
[ 69.047577] ? fs_reclaim_acquire.part.101+0x5/0x30
[ 69.053023] fs_reclaim_acquire.part.101+0x29/0x30
[ 69.058373] ? fs_reclaim_acquire.part.101+0x5/0x30
[ 69.063819] fs_reclaim_acquire+0x19/0x20
[ 69.068299] slab_pre_alloc_hook+0x19/0x70
[ 69.072875] __kmalloc+0x4e/0x270
[ 69.076578] ? skcipher_walk_next+0x1c9/0x570
[ 69.081444] skcipher_walk_next+0x1c9/0x570
[ 69.086115] skcipher_walk_first+0x46/0x110
[ 69.090788] skcipher_walk_skcipher+0xb7/0xd0
[ 69.095656] skcipher_walk_virt+0x2b/0x60
[ 69.100145] cbc_decrypt+0x4d/0xe0 [aesni_intel]
[ 69.105310] crypto_skcipher_decrypt+0x1b/0x30
[ 69.110273] ? crypto_skcipher_decrypt+0x1b/0x30
[ 69.115431] simd_skcipher_decrypt+0xbe/0xd0 [crypto_simd]
[ 69.121557] crypto_skcipher_decrypt+0x1b/0x30
[ 69.126525] crypto_authenc_decrypt_tail.isra.2+0xee/0x110
[ 69.132654] authenc_verify_ahash_done+0x2e/0x40
[ 69.137812] ccp_crypto_complete+0xde/0x190
[ 69.142488] ccp_do_cmd_complete+0x27/0x40
[ 69.147067] tasklet_action_common.isra.21+0x81/0x1a0
[ 69.152709] tasklet_action+0x22/0x30
[ 69.156796] __do_softirq+0xcd/0x3fe
[ 69.160790] ? lockdep_hardirqs_on+0xfb/0x190
[ 69.165655] run_ksoftirqd+0x35/0x50
[ 69.169649] smpboot_thread_fn+0xfc/0x170
[ 69.174130] kthread+0x121/0x140
[ 69.177735] ? sort_range+0x30/0x30
[ 69.181630] ? kthread_park+0xb0/0xb0
[ 69.185720] ret_from_fork+0x27/0x50
[ 70.850617] alg: skcipher: failed to allocate transform for ecb(sm4): -2
[ 70.866302] alg: skcipher: failed to allocate transform for cbc(sm4): -2
[ 70.881646] alg: skcipher: failed to allocate transform for ctr(sm4): -2
[ 70.889161] tcrypt: one or more tests failed!

2019-09-10 17:14:18

by Gary R Hook

[permalink] [raw]
Subject: Re: Crypto driver callback problem

And an apology for a ping using the wrong mail handler, and the top post...

Would love to know if this has gotten any cycles.

On 8/15/19 5:31 PM, Gary R Hook wrote:
> Context:
> We've run into a possible locking issue when using the AMD CCP device
> for crypto offload, with the new extra tests, and with lock debugging
> enabled.
>
> Pre-reqs:
> Enable the crypto self-tests and extra tests
> Build the crypto testing module (tcrypt)
> Enable the kernel hacking/lock debugging option
> "prove locking correctness" (CONFIG_PROVE_LOCKING=y)
>
> Problem:
> After booting this kernel, and loading our ccp-crypto module, run the
> tests with "modprobe tcrypt" and we get the dmesg output shown following.
>
> With some investigation we discovered that in the situation being
> reported here, the callback is executed as a tasklet, where we're not
> allowed to sleep. The called routine began its crypto operation with the
> MAY_SLEEP flag set, and allocates memory using GFP_KERNEL.
>
> The test itself runs in a typical context, and may sleep; the callback,
> however, runs as a tasklet yet thinks it can sleep based on the
> CRYPTO_TFM_REQ_MAY_SLEEP flag set in the request.
>
> Drivers for crypto are required (as we understand it) to use a tasklet
> approach for callbacks, which is needed (for one example) to support
> IPSec. (N.B. we did some poking around in other, similar crypto
> drivers, and all appear to be using tasklets.)
>
> We're looking for the best approach to resolve this. Thoughts?
>
> Gary
>
> [ 58.815592] alg: skcipher: skipping comparison tests for ofb(aes)
> because ofb(aes-generic) is unavailable
> [ 59.105437] alg: skcipher: skipping comparison tests for cfb(aes)
> because cfb(aes-generic) is unavailable
> [ 64.024347] alg: hash: failed to allocate transform for streebog256: -2
> [ 64.039851] alg: hash: failed to allocate transform for streebog512: -2
> [ 67.698789] alg: hash: failed to allocate transform for
> hmac(streebog256): -2
> [ 67.720735] alg: hash: failed to allocate transform for
> hmac(streebog512): -2
>
> [ 68.803745] ================================
> [ 68.808515] WARNING: inconsistent lock state
> [ 68.813284] 5.3.0-rc2-oopschk+ #62 Tainted: G E
> [ 68.819796] --------------------------------
> [ 68.824565] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [ 68.831272] ksoftirqd/56/299 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [ 68.837496] 00000000334d2af0 (fs_reclaim){+.?.}, at:
> fs_reclaim_acquire.part.101+0x5/0x30
> [ 68.846637] {SOFTIRQ-ON-W} state was registered at:
> [ 68.852078] lock_acquire+0xa3/0x180
> [ 68.856167] fs_reclaim_acquire.part.101+0x29/0x30
> [ 68.861614] fs_reclaim_acquire+0x19/0x20
> [ 68.866187] slab_pre_alloc_hook+0x19/0x70
> [ 68.870847] __kmalloc_node+0x51/0x1f0
> [ 68.875134] alloc_cpumask_var_node+0x1f/0x30
> [ 68.880096] zalloc_cpumask_var+0x14/0x20
> [ 68.884673] native_smp_prepare_cpus+0xd7/0x28e
> [ 68.889829] kernel_init_freeable+0x136/0x2d0
> [ 68.894791] kernel_init+0xe/0x110
> [ 68.898690] ret_from_fork+0x27/0x50
> [ 68.902776] irq event stamp: 53764
> [ 68.906577] hardirqs last enabled at (53764): [<ffffffff8e4d1b37>]
> slab_free_freelist_hook+0xd7/0x1c0
> [ 68.916968] hardirqs last disabled at (53763): [<ffffffff8e4d1b18>]
> slab_free_freelist_hook+0xb8/0x1c0
> [ 68.927363] softirqs last enabled at (53752): [<ffffffff8f00035e>]
> __do_softirq+0x35e/0x3fe
> [ 68.936791] softirqs last disabled at (53757): [<ffffffff8e2a0465>]
> run_ksoftirqd+0x35/0x50
> [ 68.946115]
> other info that might help us debug this:
> [ 68.953403] Possible unsafe locking scenario:
>
> [ 68.960004] CPU0
> [ 68.962732] ----
> [ 68.965463] lock(fs_reclaim);
> [ 68.968963] <Interrupt>
> [ 68.971888] lock(fs_reclaim);
> [ 68.975585]
> *** DEADLOCK ***
>
> [ 68.982197] no locks held by ksoftirqd/56/299.
> [ 68.987157]
> stack backtrace:
> [ 68.992025] CPU: 56 PID: 299 Comm: ksoftirqd/56 Tainted: G E
> 5.3.0-rc2-oopschk+ #62
> [ 69.002026] Hardware name: AMD Corporation Diesel/Diesel, BIOS
> RDL100BB 11/14/2018
> [ 69.010480] Call Trace:
> [ 69.013212] dump_stack+0x8e/0xd5
> [ 69.016915] print_usage_bug+0x1dd/0x1f0
> [ 69.021298] mark_lock+0x454/0x4f0
> [ 69.025094] ? find_held_lock+0x35/0xa0
> [ 69.029376] ? check_usage_backwards+0x180/0x180
> [ 69.034526] __lock_acquire+0x586/0x1470
> [ 69.038909] ? trace_hardirqs_off+0x3b/0xe0
> [ 69.043584] lock_acquire+0xa3/0x180
> [ 69.047577] ? fs_reclaim_acquire.part.101+0x5/0x30
> [ 69.053023] fs_reclaim_acquire.part.101+0x29/0x30
> [ 69.058373] ? fs_reclaim_acquire.part.101+0x5/0x30
> [ 69.063819] fs_reclaim_acquire+0x19/0x20
> [ 69.068299] slab_pre_alloc_hook+0x19/0x70
> [ 69.072875] __kmalloc+0x4e/0x270
> [ 69.076578] ? skcipher_walk_next+0x1c9/0x570
> [ 69.081444] skcipher_walk_next+0x1c9/0x570
> [ 69.086115] skcipher_walk_first+0x46/0x110
> [ 69.090788] skcipher_walk_skcipher+0xb7/0xd0
> [ 69.095656] skcipher_walk_virt+0x2b/0x60
> [ 69.100145] cbc_decrypt+0x4d/0xe0 [aesni_intel]
> [ 69.105310] crypto_skcipher_decrypt+0x1b/0x30
> [ 69.110273] ? crypto_skcipher_decrypt+0x1b/0x30
> [ 69.115431] simd_skcipher_decrypt+0xbe/0xd0 [crypto_simd]
> [ 69.121557] crypto_skcipher_decrypt+0x1b/0x30
> [ 69.126525] crypto_authenc_decrypt_tail.isra.2+0xee/0x110
> [ 69.132654] authenc_verify_ahash_done+0x2e/0x40
> [ 69.137812] ccp_crypto_complete+0xde/0x190
> [ 69.142488] ccp_do_cmd_complete+0x27/0x40
> [ 69.147067] tasklet_action_common.isra.21+0x81/0x1a0
> [ 69.152709] tasklet_action+0x22/0x30
> [ 69.156796] __do_softirq+0xcd/0x3fe
> [ 69.160790] ? lockdep_hardirqs_on+0xfb/0x190
> [ 69.165655] run_ksoftirqd+0x35/0x50
> [ 69.169649] smpboot_thread_fn+0xfc/0x170
> [ 69.174130] kthread+0x121/0x140
> [ 69.177735] ? sort_range+0x30/0x30
> [ 69.181630] ? kthread_park+0xb0/0xb0
> [ 69.185720] ret_from_fork+0x27/0x50
> [ 70.850617] alg: skcipher: failed to allocate transform for ecb(sm4): -2
> [ 70.866302] alg: skcipher: failed to allocate transform for cbc(sm4): -2
> [ 70.881646] alg: skcipher: failed to allocate transform for ctr(sm4): -2
> [ 70.889161] tcrypt: one or more tests failed!