2012-11-17 23:19:52

by Guru Shetty

[permalink] [raw]
Subject: crypto: Kernel crashes at crypto_alg_mod_lookup

Hello All,
I see kernel crashes in some of my x86_64 machines only during the
system boot up.
Specifically, it happens when one of the strongswan daemons is started.
I have seen it quite a few times in Linux 3.2 stable branch. I do not
have a good reproduction scenario yet.

The kernel backtrace is as follows.
CPU 5
[ 69.435004] Modules linked in: openvswitch(O) seqiv algif_skcipher
md4 algif_hash af_alg aesni_intel
cryptd aes_x86_64 xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp
esp4 ah4 deflate ctr twofish_gener
ic twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent
blowfish_generic blowfish_x86_64
blowfish_common cast5 des_generic xcbc rmd160 sha512_generic
crypto_null af_key psmouse ioatdma i7core_
edac serio_raw edac_core lp dca mac_hid parport usbhid hid bnx2x btrfs
e1000e megaraid_sas mdio zlib_de
flate libcrc32c
[ 69.436890]
[ 69.436927] Pid: 1293, comm: pluto Tainted: G O 3.2.30+
#5 iXsystems iX22X4-TTH6RF/X8DTT-H
[ 69.437076] RIP: 0010:[<ffffffff812d568f>] [<ffffffff812d568f>]
crypto_larval_kill+0x2f/0x90
[ 69.437163] RSP: 0018:ffff880be4033c38 EFLAGS: 00010292
[ 69.437206] RAX: dead000000200200 RBX: ffff880be3c93e00 RCX: 0000000000000159
[ 69.437252] RDX: ffff880be3c90a00 RSI: 0000000000016660 RDI: ffffffff81c62ec0
[ 69.437298] RBP: ffff880be4033c48 R08: ffffea002f8f2400 R09: ffffffff812d5c20
[ 69.437344] R10: 0000000000000000 R11: 0000000000000001 R12: fffffffffffffffe
[ 69.437390] R13: 0000000000000004 R14: ffff880be4033cd0 R15: ffff880be461dbc0
[ 69.437437] FS: 00007f1e97beb700(0000) GS:ffff880c0fca0000(0000)
knlGS:0000000000000000
[ 69.437499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 69.437557] CR2: 00007fab55f7e120 CR3: 0000000be46d8000 CR4: 00000000000006e0
[ 69.437616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 69.437676] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 69.437736] Process pluto (pid: 1293, threadinfo ffff880be4032000,
task ffff880be461dbc0)
[ 69.437811] Stack:
[ 69.437861] ffff880be3c93e00 fffffffffffffffe ffff880be4033c68
ffffffff812d5eeb
[ 69.438067] 0000000000000000 000000000000020c ffff880be4033cb8
ffffffff812d5f81
[ 69.438271] ffff880be4033cc8 ffff880be4033c88 dead000000100100
0000000000000000
[ 69.438477] Call Trace:
[ 69.438531] [<ffffffff812d5eeb>] crypto_alg_mod_lookup+0x6b/0x90
[ 69.438590] [<ffffffff812d5f81>] crypto_alloc_base+0x41/0xb0
[ 69.438651] [<ffffffffa02a8a58>] cryptd_alloc_ablkcipher+0x88/0xc0 [cryptd]
[ 69.438712] [<ffffffff812d5802>] ? __crypto_alloc_tfm+0x52/0x160
[ 69.438773] [<ffffffffa02b84c1>] ablk_rfc3686_ctr_init+0x21/0x40
[aesni_intel]
[ 69.438848] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
[ 69.438907] [<ffffffff812d6fac>] crypto_spawn_tfm+0x4c/0x90
[ 69.438966] [<ffffffff8115df3c>] ? __kmalloc+0x12c/0x190
[ 69.439025] [<ffffffff812da33b>] skcipher_geniv_init+0x2b/0x50
[ 69.439083] [<ffffffffa02d605c>] seqiv_init+0x1c/0x20 [seqiv]
[ 69.439142] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
[ 69.439201] [<ffffffff812d97eb>] crypto_alloc_ablkcipher+0x6b/0xc0
[ 69.439261] [<ffffffffa02d036e>] skcipher_bind+0xe/0x10 [algif_skcipher]
[ 69.439322] [<ffffffffa02c6426>] alg_bind+0x76/0x130 [af_alg]
[ 69.439382] [<ffffffff8151d1c4>] sys_bind+0xe4/0x100
[ 69.439439] [<ffffffff8151cee0>] ? sys_socket+0x40/0x70
[ 69.439498] [<ffffffff81644c82>] system_call_fastpath+0x16/0x1b
[ 69.439556] Code: 53 48 83 ec 08 66 66 66 66 90 48 89 fb 48 c7 c7
c0 2e c6 81 e8 33 67 36 00 48 8b 13 48 8b 43 08 48 c7 c7 c0 2e c6 81
48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 00 ad de 48 ba 00 02 20 00
00 00

The kernel backtrace is not always the same. But crypto_alg_mod_lookup
is common across different back traces.
Can someone please shed some light on where the problem could be? I
can try and reproduce
and provide more information, if I get some context on why it happens
only during system startup.

Thanks,
Guru.


2012-11-30 16:46:56

by Jussi Kivilinna

[permalink] [raw]
Subject: Re: crypto: Kernel crashes at crypto_alg_mod_lookup

Quoting Guru Shetty <[email protected]>:

> Hello All,
> I see kernel crashes in some of my x86_64 machines only during the
> system boot up.
> Specifically, it happens when one of the strongswan daemons is started.
> I have seen it quite a few times in Linux 3.2 stable branch. I do not
> have a good reproduction scenario yet.
>
> The kernel backtrace is as follows.
> CPU 5
> [ 69.435004] Modules linked in: openvswitch(O) seqiv algif_skcipher
> md4 algif_hash af_alg aesni_intel
> cryptd aes_x86_64 xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp
> esp4 ah4 deflate ctr twofish_gener
> ic twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent
> blowfish_generic blowfish_x86_64
> blowfish_common cast5 des_generic xcbc rmd160 sha512_generic
> crypto_null af_key psmouse ioatdma i7core_
> edac serio_raw edac_core lp dca mac_hid parport usbhid hid bnx2x btrfs
> e1000e megaraid_sas mdio zlib_de
> flate libcrc32c
> [ 69.436890]
> [ 69.436927] Pid: 1293, comm: pluto Tainted: G O 3.2.30+
> #5 iXsystems iX22X4-TTH6RF/X8DTT-H
> [ 69.437076] RIP: 0010:[<ffffffff812d568f>] [<ffffffff812d568f>]
> crypto_larval_kill+0x2f/0x90
> [ 69.437163] RSP: 0018:ffff880be4033c38 EFLAGS: 00010292
> [ 69.437206] RAX: dead000000200200 RBX: ffff880be3c93e00 RCX:
> 0000000000000159
> [ 69.437252] RDX: ffff880be3c90a00 RSI: 0000000000016660 RDI:
> ffffffff81c62ec0
> [ 69.437298] RBP: ffff880be4033c48 R08: ffffea002f8f2400 R09:
> ffffffff812d5c20
> [ 69.437344] R10: 0000000000000000 R11: 0000000000000001 R12:
> fffffffffffffffe
> [ 69.437390] R13: 0000000000000004 R14: ffff880be4033cd0 R15:
> ffff880be461dbc0
> [ 69.437437] FS: 00007f1e97beb700(0000) GS:ffff880c0fca0000(0000)
> knlGS:0000000000000000
> [ 69.437499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 69.437557] CR2: 00007fab55f7e120 CR3: 0000000be46d8000 CR4:
> 00000000000006e0
> [ 69.437616] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 69.437676] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 69.437736] Process pluto (pid: 1293, threadinfo ffff880be4032000,
> task ffff880be461dbc0)
> [ 69.437811] Stack:
> [ 69.437861] ffff880be3c93e00 fffffffffffffffe ffff880be4033c68
> ffffffff812d5eeb
> [ 69.438067] 0000000000000000 000000000000020c ffff880be4033cb8
> ffffffff812d5f81
> [ 69.438271] ffff880be4033cc8 ffff880be4033c88 dead000000100100
> 0000000000000000
> [ 69.438477] Call Trace:
> [ 69.438531] [<ffffffff812d5eeb>] crypto_alg_mod_lookup+0x6b/0x90
> [ 69.438590] [<ffffffff812d5f81>] crypto_alloc_base+0x41/0xb0
> [ 69.438651] [<ffffffffa02a8a58>]
> cryptd_alloc_ablkcipher+0x88/0xc0 [cryptd]
> [ 69.438712] [<ffffffff812d5802>] ? __crypto_alloc_tfm+0x52/0x160
> [ 69.438773] [<ffffffffa02b84c1>] ablk_rfc3686_ctr_init+0x21/0x40
> [aesni_intel]
> [ 69.438848] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
> [ 69.438907] [<ffffffff812d6fac>] crypto_spawn_tfm+0x4c/0x90
> [ 69.438966] [<ffffffff8115df3c>] ? __kmalloc+0x12c/0x190
> [ 69.439025] [<ffffffff812da33b>] skcipher_geniv_init+0x2b/0x50
> [ 69.439083] [<ffffffffa02d605c>] seqiv_init+0x1c/0x20 [seqiv]
> [ 69.439142] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
> [ 69.439201] [<ffffffff812d97eb>] crypto_alloc_ablkcipher+0x6b/0xc0
> [ 69.439261] [<ffffffffa02d036e>] skcipher_bind+0xe/0x10 [algif_skcipher]
> [ 69.439322] [<ffffffffa02c6426>] alg_bind+0x76/0x130 [af_alg]
> [ 69.439382] [<ffffffff8151d1c4>] sys_bind+0xe4/0x100
> [ 69.439439] [<ffffffff8151cee0>] ? sys_socket+0x40/0x70
> [ 69.439498] [<ffffffff81644c82>] system_call_fastpath+0x16/0x1b
> [ 69.439556] Code: 53 48 83 ec 08 66 66 66 66 90 48 89 fb 48 c7 c7
> c0 2e c6 81 e8 33 67 36 00 48 8b 13 48 8b 43 08 48 c7 c7 c0 2e c6 81
> 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 00 ad de 48 ba 00 02 20 00
> 00 00

Looks like that crash happens in crypto_larval_kill() at
'list_del(&alg->cra_list);' because list item has already being
removed from list (oops happens because of writing to address
'0xdead000000200200', that is value for LIST_POISON2).

-Jussi

>
> The kernel backtrace is not always the same. But crypto_alg_mod_lookup
> is common across different back traces.
> Can someone please shed some light on where the problem could be? I
> can try and reproduce
> and provide more information, if I get some context on why it happens
> only during system startup.
>
> Thanks,
> Guru.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2012-12-04 01:23:41

by Guru Shetty

[permalink] [raw]
Subject: Re: crypto: Kernel crashes at crypto_alg_mod_lookup

On 30 November 2012 08:46, Jussi Kivilinna <[email protected]> wrote:
> Quoting Guru Shetty <[email protected]>:
>
>> Hello All,
>> I see kernel crashes in some of my x86_64 machines only during the
>> system boot up.
>> Specifically, it happens when one of the strongswan daemons is started.
>> I have seen it quite a few times in Linux 3.2 stable branch. I do not
>> have a good reproduction scenario yet.
>>
>> The kernel backtrace is as follows.
>> CPU 5
>> [ 69.435004] Modules linked in: openvswitch(O) seqiv algif_skcipher
>> md4 algif_hash af_alg aesni_intel
>> cryptd aes_x86_64 xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp
>> esp4 ah4 deflate ctr twofish_gener
>> ic twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent
>> blowfish_generic blowfish_x86_64
>> blowfish_common cast5 des_generic xcbc rmd160 sha512_generic
>> crypto_null af_key psmouse ioatdma i7core_
>> edac serio_raw edac_core lp dca mac_hid parport usbhid hid bnx2x btrfs
>> e1000e megaraid_sas mdio zlib_de
>> flate libcrc32c
>> [ 69.436890]
>> [ 69.436927] Pid: 1293, comm: pluto Tainted: G O 3.2.30+
>> #5 iXsystems iX22X4-TTH6RF/X8DTT-H
>> [ 69.437076] RIP: 0010:[<ffffffff812d568f>] [<ffffffff812d568f>]
>> crypto_larval_kill+0x2f/0x90
>> [ 69.437163] RSP: 0018:ffff880be4033c38 EFLAGS: 00010292
>> [ 69.437206] RAX: dead000000200200 RBX: ffff880be3c93e00 RCX:
>> 0000000000000159
>> [ 69.437252] RDX: ffff880be3c90a00 RSI: 0000000000016660 RDI:
>> ffffffff81c62ec0
>> [ 69.437298] RBP: ffff880be4033c48 R08: ffffea002f8f2400 R09:
>> ffffffff812d5c20
>> [ 69.437344] R10: 0000000000000000 R11: 0000000000000001 R12:
>> fffffffffffffffe
>> [ 69.437390] R13: 0000000000000004 R14: ffff880be4033cd0 R15:
>> ffff880be461dbc0
>> [ 69.437437] FS: 00007f1e97beb700(0000) GS:ffff880c0fca0000(0000)
>> knlGS:0000000000000000
>> [ 69.437499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 69.437557] CR2: 00007fab55f7e120 CR3: 0000000be46d8000 CR4:
>> 00000000000006e0
>> [ 69.437616] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [ 69.437676] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [ 69.437736] Process pluto (pid: 1293, threadinfo ffff880be4032000,
>> task ffff880be461dbc0)
>> [ 69.437811] Stack:
>> [ 69.437861] ffff880be3c93e00 fffffffffffffffe ffff880be4033c68
>> ffffffff812d5eeb
>> [ 69.438067] 0000000000000000 000000000000020c ffff880be4033cb8
>> ffffffff812d5f81
>> [ 69.438271] ffff880be4033cc8 ffff880be4033c88 dead000000100100
>> 0000000000000000
>> [ 69.438477] Call Trace:
>> [ 69.438531] [<ffffffff812d5eeb>] crypto_alg_mod_lookup+0x6b/0x90
>> [ 69.438590] [<ffffffff812d5f81>] crypto_alloc_base+0x41/0xb0
>> [ 69.438651] [<ffffffffa02a8a58>] cryptd_alloc_ablkcipher+0x88/0xc0
>> [cryptd]
>> [ 69.438712] [<ffffffff812d5802>] ? __crypto_alloc_tfm+0x52/0x160
>> [ 69.438773] [<ffffffffa02b84c1>] ablk_rfc3686_ctr_init+0x21/0x40
>> [aesni_intel]
>> [ 69.438848] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
>> [ 69.438907] [<ffffffff812d6fac>] crypto_spawn_tfm+0x4c/0x90
>> [ 69.438966] [<ffffffff8115df3c>] ? __kmalloc+0x12c/0x190
>> [ 69.439025] [<ffffffff812da33b>] skcipher_geniv_init+0x2b/0x50
>> [ 69.439083] [<ffffffffa02d605c>] seqiv_init+0x1c/0x20 [seqiv]
>> [ 69.439142] [<ffffffff812d5879>] __crypto_alloc_tfm+0xc9/0x160
>> [ 69.439201] [<ffffffff812d97eb>] crypto_alloc_ablkcipher+0x6b/0xc0
>> [ 69.439261] [<ffffffffa02d036e>] skcipher_bind+0xe/0x10
>> [algif_skcipher]
>> [ 69.439322] [<ffffffffa02c6426>] alg_bind+0x76/0x130 [af_alg]
>> [ 69.439382] [<ffffffff8151d1c4>] sys_bind+0xe4/0x100
>> [ 69.439439] [<ffffffff8151cee0>] ? sys_socket+0x40/0x70
>> [ 69.439498] [<ffffffff81644c82>] system_call_fastpath+0x16/0x1b
>> [ 69.439556] Code: 53 48 83 ec 08 66 66 66 66 90 48 89 fb 48 c7 c7
>> c0 2e c6 81 e8 33 67 36 00 48 8b 13 48 8b 43 08 48 c7 c7 c0 2e c6 81
>> 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 00 ad de 48 ba 00 02 20 00
>> 00 00
>
>
> Looks like that crash happens in crypto_larval_kill() at
> 'list_del(&alg->cra_list);' because list item has already being removed from
> list (oops happens because of writing to address '0xdead000000200200', that
> is value for LIST_POISON2).

Thanks for looking into this. I reproduced it a few more times.

When the machine boots up and the strongswan daemon is started, it
triggers the loading of the
following kernel modules.

insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/ah4.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/esp4.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/xfrm/xfrm_ipcomp.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/ipcomp.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/tunnel4.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/xfrm4_tunnel.ko
insmod /lib/modules/3.2.14-server-nn19+/kernel/net/xfrm/xfrm_user.ko

After the last kernel module is loaded, the backtrace is printed on
the screen. It could be that
the kernel modules are all loaded fine and it is a system call that
triggers this bug as shown in
the trace.

Does adding any printk messages and doing a few reboots help
in providing any more information?


>
> -Jussi
>
>>
>> The kernel backtrace is not always the same. But crypto_alg_mod_lookup
>> is common across different back traces.
>> Can someone please shed some light on where the problem could be? I
>> can try and reproduce
>> and provide more information, if I get some context on why it happens
>> only during system startup.
>>
>> Thanks,
>> Guru.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
>

2012-12-09 20:09:34

by Jussi Kivilinna

[permalink] [raw]
Subject: Re: crypto: Kernel crashes at crypto_alg_mod_lookup

Quoting Guru Shetty <[email protected]>:

>
> Thanks for looking into this. I reproduced it a few more times.
>
> When the machine boots up and the strongswan daemon is started, it
> triggers the loading of the
> following kernel modules.
>
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/ah4.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/esp4.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/xfrm/xfrm_ipcomp.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/ipcomp.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/tunnel4.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/ipv4/xfrm4_tunnel.ko
> insmod /lib/modules/3.2.14-server-nn19+/kernel/net/xfrm/xfrm_user.ko
>
> After the last kernel module is loaded, the backtrace is printed on
> the screen. It could be that
> the kernel modules are all loaded fine and it is a system call that
> triggers this bug as shown in
> the trace.
>
> Does adding any printk messages and doing a few reboots help
> in providing any more information?
>

I think modules are loading fine. Problem might be related to nested
crypto allocations with aesni-intel. You could try kernel with
"CONFIG_DEBUG_LIST=y" config setting, for more output.

-Jussi