2011-08-09 18:55:07

by Josh Boyer

[permalink] [raw]
Subject: Cryptomgr race vs built-in aesni

Fedora has had a bug[1] open for a while with people seeing this upon boot:

[ 0.807387] alg: skcipher: Failed to load transform for ecb-aes-aesni: -2

We're still seeing it with the 3.0 kernel, so I poked at it today.

We have the crypto manager built in to the kernel, as well as the
AES_NI_INTEL module. The tests are not disabled, as that would disable
FIPS and apparnetly Fedora wants that on. (I have no idea why.)

I instrumented that module and the place where the error is being spit out,
and it seems as if cryptomgr is racing against itself and trying to request
an algorithm that is still being registered. The instrumented printks are
below (the aesni printks are of the form <__func__>:<__LINE__> <whatever>.

[ 0.805053] aesni_init: 1275 registering ablk_ecb_alg
[ 0.807387] alg: skcipher: Failed to load transform for ecb-aes-aesni: -2
[ 0.807441] Pid: 36, comm: cryptomgr_test Not tainted 2.6.40-4.fc15.x86_64 #6
[ 0.807443] Call Trace:
[ 0.807450] [<ffffffff81215df6>] alg_test_skcipher+0x48/0xa3
[ 0.807453] [<ffffffff812160a9>] ? alg_find_test+0x3a/0x5d
[ 0.807456] [<ffffffff8121628c>] alg_test+0x1c0/0x277
[ 0.807459] [<ffffffff814b58c3>] ? schedule+0x690/0x6be
[ 0.807462] [<ffffffff81213d86>] ? cryptomgr_probe+0xca/0xca
[ 0.807465] [<ffffffff81213daf>] cryptomgr_test+0x29/0x44
[ 0.807468] [<ffffffff8106fd2b>] kthread+0x84/0x8c
[ 0.807471] [<ffffffff814be924>] kernel_thread_helper+0x4/0x10
[ 0.807473] [<ffffffff8106fca7>] ? kthread_worker_fn+0x148/0x148
[ 0.807475] [<ffffffff814be920>] ? gs_change+0x13/0x13
[ 0.807482] aesni_init: 1278 err: 0
[ 0.807627] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[ 0.807768] aesni_init: 1307 err: 0

So it seems that the aesni module is trying to register the ecb(aes) alg
and before that completes (or something?) the test gets scheduled and
tries to do a CRYPTO_MSG_ALG_REQUEST on something that hasn't
finished it's module_init function yet. Eventually the aesni_init function
completes successfully (the last printk), so I'm assuming that the
module is still present but that particular algorithm is listed as unavailable.

My understanding of the crypto layer and it's use of kthreads to schedule
the self-tests is pretty limited so I might have mis-interpreted things. I'd
appreciate it if someone could look this over and give me any thoughts
that might come to mind.

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=721002


2011-08-11 17:53:14

by Josh Boyer

[permalink] [raw]
Subject: Re: Cryptomgr race vs built-in aesni

> Fedora has had a bug[1] open for a while with people seeing this upon boot:

> [ 0.807387] alg: skcipher: Failed to load transform for ecb-aes-aesni: -2

> We're still seeing it with the 3.0 kernel, so I poked at it today.

<snip>

> So it seems that the aesni module is trying to register the ecb(aes) alg
> and before that completes (or something?) the test gets scheduled and
> tries to do a CRYPTO_MSG_ALG_REQUEST on something that hasn't
> finished it's module_init function yet. Eventually the aesni_init function
> completes successfully (the last printk), so I'm assuming that the
> module is still present but that particular algorithm is listed as unavailable.

Ok, so it's not what I thought it was. I did some more investigation today,
and it seems that the ablk_ebc_alg is missing the ivsize setting and the
crypto stack doesn't like that on a CRYPTO_ALG_TYPE_BLKCIPHER type. I added
ivsize = AES_BLOCK_SIZE, and the error message goes away entirely. This also
matches how ablk_cbc_alg works now, so I'm somewhat guessing this is correct.

My understanding of the crypto layer is at best non-existent, so if I'm wrong
please shout. I'll send a patch sortly.

josh