2023-03-09 14:55:35

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Hitting BUG_ON in crypto_unregister_alg() on reboot with caamalg_qi2 driver

Hi folks

I'm hitting what appears to be a deliberate BUG_ON() in
crypto_unregister_alg() when rebooting my traverse ten64 device on a
6.2.2 kernel (using the Arch linux-aarch64 build, which is basically an
upstream kernel).

Any idea what might be causing this? It does not appear on an older
(5.17, which is the newest kernel that works reliably, for unrelated
reasons).

-Toke

[ 188.329145] kernel BUG at crypto/algapi.c:496!
[ 188.333588] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 188.340378] Modules linked in: 8021q garp mrp tun bridge stp llc wireguard libchacha20poly1305 ip6_udp_tunnel udp_tunnel libcurve25519_generic cfg80211 rfkill nft_nat nft_chain_nat nf_nat nft_reject_inet nf_reject_ipv6 nft_reject nft_limit nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink caam_jr dpaa2_caam caamhash_desc caamalg_desc authenc libdes fsl_dpaa2_eth pcs_lynx phylink caam error xgmac_mdio sp805_wdt qoriq_cpufreq rtc_fsl_ftm_alarm dpaa2_console pci_endpoint_test loop fuse gpio_keys
[ 188.385875] CPU: 0 PID: 1 Comm: shutdown Tainted: G W 6.2.2-1-aarch64-ARCH #1
[ 188.394404] Hardware name: traverse ten64/ten64, BIOS 2020.07-rc1-g488778dc 11/22/2021
[ 188.402324] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 188.409289] pc : crypto_unregister_alg+0x104/0x110
[ 188.414087] lr : crypto_unregister_alg+0x94/0x110
[ 188.418793] sp : ffff80000aa2b740
[ 188.422104] x29: ffff80000aa2b740 x28: ffff008000252dc0 x27: 0000000000000000
[ 188.429248] x26: 0000000000000000 x25: ffff80000156a700 x24: dead000000000100
[ 188.436390] x23: dead000000000122 x22: ffff008024517800 x21: ffff80000aa2b778
[ 188.443533] x20: ffff80000a5866b0 x19: ffff008004772980 x18: ffffffffffffffff
[ 188.450676] x17: 7270642f636d2d6c x16: 73662e3030303030 x15: ffffffffffffffff
[ 188.457818] x14: 0000000000000004 x13: ffff008000165b10 x12: 0000000000000000
[ 188.464960] x11: ffff008002027700 x10: ffff0080020276c0 x9 : ffff008000165b10
[ 188.472104] x8 : ffff80000aa2b5f8 x7 : 0000000000000000 x6 : 0000000000000000
[ 188.479245] x5 : 0000000000200017 x4 : ffff008004772980 x3 : ffff80000aa2b728
[ 188.486387] x2 : 0000000000000001 x1 : ffff008000252dc0 x0 : 0000000000000002
[ 188.493530] Call trace:
[ 188.495971] crypto_unregister_alg+0x104/0x110
[ 188.500417] crypto_unregister_ahash+0x14/0x20
[ 188.504863] dpaa2_caam_remove+0xec/0x234 [dpaa2_caam]
[ 188.510015] fsl_mc_driver_remove+0x24/0x60
[ 188.514198] device_remove+0x70/0x80
[ 188.517776] device_release_driver_internal+0x1e4/0x250
[ 188.523004] device_links_unbind_consumers+0xd8/0x100
[ 188.528057] device_release_driver_internal+0xe4/0x250
[ 188.533197] device_links_unbind_consumers+0xd8/0x100
[ 188.538249] device_release_driver_internal+0x12c/0x250
[ 188.543477] device_release_driver+0x18/0x24
[ 188.547748] bus_remove_device+0xd0/0x15c
[ 188.551759] device_del+0x174/0x3a0
[ 188.555246] fsl_mc_device_remove+0x28/0x40
[ 188.559429] __fsl_mc_device_remove+0x10/0x20
[ 188.563785] device_for_each_child+0x5c/0xac
[ 188.568054] dprc_remove+0x94/0xbc
[ 188.571455] fsl_mc_driver_remove+0x24/0x60
[ 188.575637] device_remove+0x70/0x80
[ 188.579213] device_release_driver_internal+0x1e4/0x250
[ 188.584440] device_release_driver+0x18/0x24
[ 188.588711] bus_remove_device+0xd0/0x15c
[ 188.592721] device_del+0x174/0x3a0
[ 188.596210] fsl_mc_bus_remove+0x88/0x100
[ 188.600218] fsl_mc_bus_shutdown+0x10/0x20
[ 188.604314] platform_shutdown+0x24/0x34
[ 188.608235] device_shutdown+0x11c/0x220
[ 188.612158] kernel_restart+0x40/0xac
[ 188.615821] __do_sys_reboot+0x1e0/0x264
[ 188.619743] __arm64_sys_reboot+0x24/0x30
[ 188.623753] invoke_syscall+0x48/0x11c
[ 188.627505] el0_svc_common.constprop.0+0x44/0xf0
[ 188.632211] do_el0_svc+0x2c/0x40
[ 188.635526] el0_svc+0x2c/0x84
[ 188.638581] el0t_64_sync_handler+0xf4/0x120
[ 188.642852] el0t_64_sync+0x190/0x194
[ 188.646517] Code: 9129c000 942f316c d4210000 17ffffee (d4210000)
[ 188.652613] ---[ end trace 0000000000000000 ]---




2023-03-10 04:24:22

by Herbert Xu

[permalink] [raw]
Subject: Re: Hitting BUG_ON in crypto_unregister_alg() on reboot with caamalg_qi2 driver

On Thu, Mar 09, 2023 at 03:51:22PM +0100, Toke H?iland-J?rgensen wrote:
> Hi folks
>
> I'm hitting what appears to be a deliberate BUG_ON() in
> crypto_unregister_alg() when rebooting my traverse ten64 device on a
> 6.2.2 kernel (using the Arch linux-aarch64 build, which is basically an
> upstream kernel).
>
> Any idea what might be causing this? It does not appear on an older
> (5.17, which is the newest kernel that works reliably, for unrelated
> reasons).

On the face of it this looks like a generic issue with drivers
and the Crypto API. Historically crypto modules weren't meant
to be removed/unregistered until the last user has freed the tfm.

Obviously with drivers that start unregistering the algorithms when
the hardware goes away this paradigm breaks. What should happen is
that the driver continues to hold onto the crypto algorithm registration
even when the hardware has gone away.

Some work has to be done in the driver to actually make this safe
(all the drivers I've looked at are broken in this way).

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-03-10 13:38:53

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: Hitting BUG_ON in crypto_unregister_alg() on reboot with caamalg_qi2 driver

Herbert Xu <[email protected]> writes:

> On Thu, Mar 09, 2023 at 03:51:22PM +0100, Toke Høiland-Jørgensen wrote:
>> Hi folks
>>
>> I'm hitting what appears to be a deliberate BUG_ON() in
>> crypto_unregister_alg() when rebooting my traverse ten64 device on a
>> 6.2.2 kernel (using the Arch linux-aarch64 build, which is basically an
>> upstream kernel).
>>
>> Any idea what might be causing this? It does not appear on an older
>> (5.17, which is the newest kernel that works reliably, for unrelated
>> reasons).
>
> On the face of it this looks like a generic issue with drivers
> and the Crypto API. Historically crypto modules weren't meant
> to be removed/unregistered until the last user has freed the tfm.
>
> Obviously with drivers that start unregistering the algorithms when
> the hardware goes away this paradigm breaks. What should happen is
> that the driver continues to hold onto the crypto algorithm registration
> even when the hardware has gone away.
>
> Some work has to be done in the driver to actually make this safe
> (all the drivers I've looked at are broken in this way).

Hmm, okay; any idea why this started happening with the newer kernel
version? I don't see any changes to the driver that could have caused
this; so is it some core-kernel change that has changed the
order of driver removal on shutdown or something?

Also, absent of a fixed driver (which doesn't sound like it's a trivial
fix?), how do I prevent the system from crashing on shutdown? The
BUG_ON() seems a bit heavy-handed, could it be replaced with a WARN_ON?

-Toke


2023-03-11 08:04:18

by Herbert Xu

[permalink] [raw]
Subject: Re: Hitting BUG_ON in crypto_unregister_alg() on reboot with caamalg_qi2 driver

On Fri, Mar 10, 2023 at 02:37:57PM +0100, Toke H?iland-J?rgensen wrote:
>
> Also, absent of a fixed driver (which doesn't sound like it's a trivial
> fix?), how do I prevent the system from crashing on shutdown? The
> BUG_ON() seems a bit heavy-handed, could it be replaced with a WARN_ON?

Yes I think that's probably OK. Could you please send a patch?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-03-11 16:22:20

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: Hitting BUG_ON in crypto_unregister_alg() on reboot with caamalg_qi2 driver

Herbert Xu <[email protected]> writes:

> On Fri, Mar 10, 2023 at 02:37:57PM +0100, Toke Høiland-Jørgensen wrote:
>>
>> Also, absent of a fixed driver (which doesn't sound like it's a trivial
>> fix?), how do I prevent the system from crashing on shutdown? The
>> BUG_ON() seems a bit heavy-handed, could it be replaced with a WARN_ON?
>
> Yes I think that's probably OK. Could you please send a patch?

Sure, can do! :)

-Toke