Reportedly on Linux 4.12 the LTP testsuite crashes at pcrypt_aead01 infrequently.
To get it reproduce more frequently I tried
n=0 ; while true ; do /opt/ltp/testcases/bin/pcrypt_aead01 >& /dev/null ; n=$(expr $n + 1) ; echo -ne $n\\r ; done
but this is quite stable. However, holding ^C in the terminal where the loop is running tends to trigger the crash.
The backtrace is:
[ 100.615804] Unable to handle kernel paging request for data at address 0x00000000
[ 100.615876] Faulting instruction address: 0xc000000000520e7c
[ 100.615943] Oops: Kernel access of bad area, sig: 11 [#1]
[ 100.616001] SMP NR_CPUS=2048
[ 100.616002] NUMA
[ 100.616030] pSeries
[ 100.616054] Modules linked in: authenc pcrypt crypto_user kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink ip_tables x_tables af_packet rtc_generic vmx_crypto ibmveth(X) gf128mul btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
[ 100.616478] Supported: Yes, External
[ 100.616509] CPU: 5 PID: 6270 Comm: pcrypt_aead01 Tainted: G 4.12.14-150.22-default #1 SLE15
[ 100.616632] task: c000000595084d80 task.stack: c0000005be6dc000
[ 100.616708] NIP: c000000000520e7c LR: c000000000521e3c CTR: c000000000521de0
[ 100.616801] REGS: c0000005be6df620 TRAP: 0300 Tainted: G (4.12.14-150.22-default)
[ 100.616906] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[ 100.616912] CR: 24002844 XER: 20040000
[ 100.617003] CFAR: c000000000008860 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
GPR00: c0000005a331f810 c0000005be6df8a0 c00000000119aa00 c0000005a331f800
GPR04: c0000005be6df930 c0000005be6df8c0 c0000005be6df8d0 0000000000000000
GPR08: 7269632929290000 c0000005a331f800 0000000000000000 0000000000000000
GPR12: c000000000521de0 c000000007a33700 00000001271a0ee0 00007fffcb9e7bb8
GPR16: 00000001271c2d80 00000001271c2d88 00007fffcb9e7a50 00007fffcb9e7a44
GPR20: 00007fffcb9e7a98 00007fffcb9e7a60 0000000000000010 0000000000000010
GPR24: 0000000000000000 0000000000000000 fffffffffffff000 c0000005be6dfaf0
GPR28: c0000005b9929d00 0000000000000c93 c0000005be6df930 c0000005be6df8e0
[ 100.617774] NIP [c000000000520e7c] crypto_remove_spawns+0x6c/0x2e0
[ 100.617816] LR [c000000000521e3c] crypto_unregister_instance+0x5c/0xa0
[ 100.617881] Call Trace:
[ 100.617903] [c0000005be6df8a0] [c0000005b9929d00] 0xc0000005b9929d00 (unreliable)
[ 100.617971] [c0000005be6df910] [0000000000000000] (null)
[ 100.618021] [c0000005be6df960] [d0000000098d0894] crypto_del_alg+0xdc/0x110 [crypto_user]
[ 100.618119] [c0000005be6df990] [d0000000098d0b58] crypto_user_rcv_msg+0xe0/0x260 [crypto_user]
[ 100.618222] [c0000005be6dfa30] [c00000000086d678] netlink_rcv_skb+0x78/0x170
[ 100.618309] [c0000005be6dfaa0] [d0000000098d0064] crypto_netlink_rcv+0x4c/0x80 [crypto_user]
[ 100.618407] [c0000005be6dfad0] [c00000000086cb98] netlink_unicast+0x208/0x2f0
[ 100.618488] [c0000005be6dfb40] [c00000000086d170] netlink_sendmsg+0x380/0x440
[ 100.618582] [c0000005be6dfbd0] [c0000000007e9ba4] sock_sendmsg+0x64/0x90
[ 100.618650] [c0000005be6dfc00] [c0000000007eb94c] ___sys_sendmsg+0x2cc/0x330
[ 100.618710] [c0000005be6dfd90] [c0000000007ed02c] __sys_sendmsg+0x5c/0xc0
[ 100.618766] [c0000005be6dfe30] [c00000000000b188] system_call+0x3c/0x130
[ 100.618822] Instruction dump:
[ 100.618839] e9430010 83a90020 38a10020 fbe10040 fbe10048 f8c10030 f8c10038 f8a10020
[ 100.618902] f8a10028 38030010 7fa05040 7d475378 <e90a0000> 419e0064 60000000 60000000
[ 100.618980] ---[ end trace 60475621348ca387 ]---
The code looks like this:
0xc000000000520e10 <+0>: c8 00 4c 3c addis r2,r12,200
0xc000000000520e14 <+4>: f0 9b 42 38 addi r2,r2,-25616
0xc000000000520e18 <+8>: a6 02 08 7c mflr r0
0xc000000000520e1c <+12>: 00 00 00 60 nop
0xc000000000520e20 <+16>: 79 2b ab 7c mr. r11,r5
0xc000000000520e24 <+20>: f0 ff c1 fb std r30,-16(r1)
0xc000000000520e28 <+24>: e8 ff a1 fb std r29,-24(r1)
0xc000000000520e2c <+28>: f8 ff e1 fb std r31,-8(r1)
0xc000000000520e30 <+32>: 91 ff 21 f8 stdu r1,-112(r1)
0xc000000000520e34 <+36>: 78 1b 69 7c mr r9,r3
0xc000000000520e38 <+40>: 78 23 9e 7c mr r30,r4
0xc000000000520e3c <+44>: 08 00 82 41 beq 0xc000000000520e44 <crypto_remove_spawns+52>
0xc000000000520e40 <+48>: 78 5b 69 7d mr r9,r11
0xc000000000520e44 <+52>: 40 00 e1 3b addi r31,r1,64
0xc000000000520e48 <+56>: 30 00 c1 38 addi r6,r1,48
# 0xc000000000520e4c <+60>: 10 00 43 e9 ld r10,16(r3)
0xc000000000520e50 <+64>: 20 00 a9 83 lwz r29,32(r9)
0xc000000000520e54 <+68>: 20 00 a1 38 addi r5,r1,32
0xc000000000520e58 <+72>: 40 00 e1 fb std r31,64(r1)
0xc000000000520e5c <+76>: 48 00 e1 fb std r31,72(r1)
0xc000000000520e60 <+80>: 30 00 c1 f8 std r6,48(r1)
0xc000000000520e64 <+84>: 38 00 c1 f8 std r6,56(r1)
0xc000000000520e68 <+88>: 20 00 a1 f8 std r5,32(r1)
0xc000000000520e6c <+92>: 28 00 a1 f8 std r5,40(r1)
0xc000000000520e70 <+96>: 10 00 03 38 addi r0,r3,16
& 0xc000000000520e74 <+100>: 40 50 a0 7f cmpld cr7,r0,r10
0xc000000000520e78 <+104>: 78 53 47 7d mr r7,r10
* 0xc000000000520e7c <+108>: 00 00 0a e9 ld r8,0(r10)
0xc000000000520e80 <+112>: 64 00 9e 41 beq cr7,0xc000000000520ee4 <crypto_remove_spawns+212>
#) This looks like alg->cra_users.next is loaded to r10
&) This looks like r10 is compared with &alg->cra_users calculated on the line
above to terminate the loop
*) This looks like *alg->cra_users.next loaded into r8 which causes the null
pointer dereference
So the fixup needs to be applied to the first dereference of
alg->cra_users.next as well to prevent crash.
Fixes: 9a00674213a3 ("crypto: algapi - fix NULL dereference in crypto_remove_spawns()")
Reported-by: [email protected]
Signed-off-by: Michal Suchanek <[email protected]>
---
I cannot really test if this fix is effective because the crash is some
heisenbug that heavily depends on timing. When the bug is not triggered it does
not really mean anything. It is also qestionable if we should be getting these
algs with uninitialized spawns.
crypto/algapi.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/crypto/algapi.c b/crypto/algapi.c
index 313a7682cef1..82125b82ffba 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -151,6 +151,18 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
LIST_HEAD(top);
spawns = &alg->cra_users;
+
+ /*
+ * We may encounter an unregistered instance here, since an instance's
+ * spawns are set up prior to the instance being registered.
+ * An unregistered instance will have NULL ->cra_users.next, since
+ * ->cra_users isn't properly initialized until registration. But an
+ * unregistered instance cannot have any users, so treat it the same as
+ * ->cra_users being empty.
+ */
+ if (spawns->next == NULL)
+ return;
+
list_for_each_entry_safe(spawn, n, spawns, list) {
if ((spawn->alg->cra_flags ^ new_type) & spawn->mask)
continue;
@@ -177,15 +189,7 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
spawn->alg = NULL;
spawns = &inst->alg.cra_users;
- /*
- * We may encounter an unregistered instance here, since
- * an instance's spawns are set up prior to the instance
- * being registered. An unregistered instance will have
- * NULL ->cra_users.next, since ->cra_users isn't
- * properly initialized until registration. But an
- * unregistered instance cannot have any users, so treat
- * it the same as ->cra_users being empty.
- */
+ /* Guard against unregistered instance */
if (spawns->next == NULL)
break;
}
--
2.21.0
Hi Michal,
On Tue, Jun 25, 2019 at 09:16:24AM +0200, Michal Suchanek wrote:
> Reportedly on Linux 4.12 the LTP testsuite crashes at pcrypt_aead01 infrequently.
>
> To get it reproduce more frequently I tried
>
> n=0 ; while true ; do /opt/ltp/testcases/bin/pcrypt_aead01 >& /dev/null ; n=$(expr $n + 1) ; echo -ne $n\\r ; done
>
> but this is quite stable. However, holding ^C in the terminal where the loop is running tends to trigger the crash.
>
> The backtrace is:
>
> [ 100.615804] Unable to handle kernel paging request for data at address 0x00000000
> [ 100.615876] Faulting instruction address: 0xc000000000520e7c
> [ 100.615943] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 100.616001] SMP NR_CPUS=2048
> [ 100.616002] NUMA
> [ 100.616030] pSeries
> [ 100.616054] Modules linked in: authenc pcrypt crypto_user kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink ip_tables x_tables af_packet rtc_generic vmx_crypto ibmveth(X) gf128mul btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> [ 100.616478] Supported: Yes, External
> [ 100.616509] CPU: 5 PID: 6270 Comm: pcrypt_aead01 Tainted: G 4.12.14-150.22-default #1 SLE15
> [ 100.616632] task: c000000595084d80 task.stack: c0000005be6dc000
> [ 100.616708] NIP: c000000000520e7c LR: c000000000521e3c CTR: c000000000521de0
> [ 100.616801] REGS: c0000005be6df620 TRAP: 0300 Tainted: G (4.12.14-150.22-default)
> [ 100.616906] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
> [ 100.616912] CR: 24002844 XER: 20040000
> [ 100.617003] CFAR: c000000000008860 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
> GPR00: c0000005a331f810 c0000005be6df8a0 c00000000119aa00 c0000005a331f800
> GPR04: c0000005be6df930 c0000005be6df8c0 c0000005be6df8d0 0000000000000000
> GPR08: 7269632929290000 c0000005a331f800 0000000000000000 0000000000000000
> GPR12: c000000000521de0 c000000007a33700 00000001271a0ee0 00007fffcb9e7bb8
> GPR16: 00000001271c2d80 00000001271c2d88 00007fffcb9e7a50 00007fffcb9e7a44
> GPR20: 00007fffcb9e7a98 00007fffcb9e7a60 0000000000000010 0000000000000010
> GPR24: 0000000000000000 0000000000000000 fffffffffffff000 c0000005be6dfaf0
> GPR28: c0000005b9929d00 0000000000000c93 c0000005be6df930 c0000005be6df8e0
> [ 100.617774] NIP [c000000000520e7c] crypto_remove_spawns+0x6c/0x2e0
> [ 100.617816] LR [c000000000521e3c] crypto_unregister_instance+0x5c/0xa0
> [ 100.617881] Call Trace:
> [ 100.617903] [c0000005be6df8a0] [c0000005b9929d00] 0xc0000005b9929d00 (unreliable)
> [ 100.617971] [c0000005be6df910] [0000000000000000] (null)
> [ 100.618021] [c0000005be6df960] [d0000000098d0894] crypto_del_alg+0xdc/0x110 [crypto_user]
> [ 100.618119] [c0000005be6df990] [d0000000098d0b58] crypto_user_rcv_msg+0xe0/0x260 [crypto_user]
> [ 100.618222] [c0000005be6dfa30] [c00000000086d678] netlink_rcv_skb+0x78/0x170
> [ 100.618309] [c0000005be6dfaa0] [d0000000098d0064] crypto_netlink_rcv+0x4c/0x80 [crypto_user]
> [ 100.618407] [c0000005be6dfad0] [c00000000086cb98] netlink_unicast+0x208/0x2f0
> [ 100.618488] [c0000005be6dfb40] [c00000000086d170] netlink_sendmsg+0x380/0x440
> [ 100.618582] [c0000005be6dfbd0] [c0000000007e9ba4] sock_sendmsg+0x64/0x90
> [ 100.618650] [c0000005be6dfc00] [c0000000007eb94c] ___sys_sendmsg+0x2cc/0x330
> [ 100.618710] [c0000005be6dfd90] [c0000000007ed02c] __sys_sendmsg+0x5c/0xc0
> [ 100.618766] [c0000005be6dfe30] [c00000000000b188] system_call+0x3c/0x130
> [ 100.618822] Instruction dump:
> [ 100.618839] e9430010 83a90020 38a10020 fbe10040 fbe10048 f8c10030 f8c10038 f8a10020
> [ 100.618902] f8a10028 38030010 7fa05040 7d475378 <e90a0000> 419e0064 60000000 60000000
> [ 100.618980] ---[ end trace 60475621348ca387 ]---
>
> The code looks like this:
>
> 0xc000000000520e10 <+0>: c8 00 4c 3c addis r2,r12,200
> 0xc000000000520e14 <+4>: f0 9b 42 38 addi r2,r2,-25616
> 0xc000000000520e18 <+8>: a6 02 08 7c mflr r0
> 0xc000000000520e1c <+12>: 00 00 00 60 nop
> 0xc000000000520e20 <+16>: 79 2b ab 7c mr. r11,r5
> 0xc000000000520e24 <+20>: f0 ff c1 fb std r30,-16(r1)
> 0xc000000000520e28 <+24>: e8 ff a1 fb std r29,-24(r1)
> 0xc000000000520e2c <+28>: f8 ff e1 fb std r31,-8(r1)
> 0xc000000000520e30 <+32>: 91 ff 21 f8 stdu r1,-112(r1)
> 0xc000000000520e34 <+36>: 78 1b 69 7c mr r9,r3
> 0xc000000000520e38 <+40>: 78 23 9e 7c mr r30,r4
> 0xc000000000520e3c <+44>: 08 00 82 41 beq 0xc000000000520e44 <crypto_remove_spawns+52>
> 0xc000000000520e40 <+48>: 78 5b 69 7d mr r9,r11
> 0xc000000000520e44 <+52>: 40 00 e1 3b addi r31,r1,64
> 0xc000000000520e48 <+56>: 30 00 c1 38 addi r6,r1,48
> # 0xc000000000520e4c <+60>: 10 00 43 e9 ld r10,16(r3)
> 0xc000000000520e50 <+64>: 20 00 a9 83 lwz r29,32(r9)
> 0xc000000000520e54 <+68>: 20 00 a1 38 addi r5,r1,32
> 0xc000000000520e58 <+72>: 40 00 e1 fb std r31,64(r1)
> 0xc000000000520e5c <+76>: 48 00 e1 fb std r31,72(r1)
> 0xc000000000520e60 <+80>: 30 00 c1 f8 std r6,48(r1)
> 0xc000000000520e64 <+84>: 38 00 c1 f8 std r6,56(r1)
> 0xc000000000520e68 <+88>: 20 00 a1 f8 std r5,32(r1)
> 0xc000000000520e6c <+92>: 28 00 a1 f8 std r5,40(r1)
> 0xc000000000520e70 <+96>: 10 00 03 38 addi r0,r3,16
> & 0xc000000000520e74 <+100>: 40 50 a0 7f cmpld cr7,r0,r10
> 0xc000000000520e78 <+104>: 78 53 47 7d mr r7,r10
> * 0xc000000000520e7c <+108>: 00 00 0a e9 ld r8,0(r10)
> 0xc000000000520e80 <+112>: 64 00 9e 41 beq cr7,0xc000000000520ee4 <crypto_remove_spawns+212>
>
> #) This looks like alg->cra_users.next is loaded to r10
> &) This looks like r10 is compared with &alg->cra_users calculated on the line
> above to terminate the loop
> *) This looks like *alg->cra_users.next loaded into r8 which causes the null
> pointer dereference
>
> So the fixup needs to be applied to the first dereference of
> alg->cra_users.next as well to prevent crash.
>
> Fixes: 9a00674213a3 ("crypto: algapi - fix NULL dereference in crypto_remove_spawns()")
>
> Reported-by: [email protected]
> Signed-off-by: Michal Suchanek <[email protected]>
> ---
> I cannot really test if this fix is effective because the crash is some
> heisenbug that heavily depends on timing. When the bug is not triggered it does
> not really mean anything. It is also qestionable if we should be getting these
> algs with uninitialized spawns.
>
> crypto/algapi.c | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/crypto/algapi.c b/crypto/algapi.c
> index 313a7682cef1..82125b82ffba 100644
> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -151,6 +151,18 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
> LIST_HEAD(top);
>
> spawns = &alg->cra_users;
> +
> + /*
> + * We may encounter an unregistered instance here, since an instance's
> + * spawns are set up prior to the instance being registered.
> + * An unregistered instance will have NULL ->cra_users.next, since
> + * ->cra_users isn't properly initialized until registration. But an
> + * unregistered instance cannot have any users, so treat it the same as
> + * ->cra_users being empty.
> + */
> + if (spawns->next == NULL)
> + return;
> +
> list_for_each_entry_safe(spawn, n, spawns, list) {
> if ((spawn->alg->cra_flags ^ new_type) & spawn->mask)
> continue;
> @@ -177,15 +189,7 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
> spawn->alg = NULL;
> spawns = &inst->alg.cra_users;
>
> - /*
> - * We may encounter an unregistered instance here, since
> - * an instance's spawns are set up prior to the instance
> - * being registered. An unregistered instance will have
> - * NULL ->cra_users.next, since ->cra_users isn't
> - * properly initialized until registration. But an
> - * unregistered instance cannot have any users, so treat
> - * it the same as ->cra_users being empty.
> - */
> + /* Guard against unregistered instance */
> if (spawns->next == NULL)
> break;
> }
> --
> 2.21.0
>
The stack trace shows that crypto_remove_spawns() is being called from
crypto_unregister_instance(). Therefore, the instance should already be
registered and have initialized cra_users. Now, I don't claim to understand the
spawn lists stuff that well, so I could have missed something; but if there *is*
a bug, I'd like to see a proper explanation.
Did you check whether this is actually reproducible on mainline, and not just
the SUSE v4.12 based kernel?
- Eric
On Tue, 25 Jun 2019 09:40:54 -0700
Eric Biggers <[email protected]> wrote:
Hello,
> Hi Michal,
>
> The stack trace shows that crypto_remove_spawns() is being called from
> crypto_unregister_instance(). Therefore, the instance should already be
> registered and have initialized cra_users. Now, I don't claim to understand the
> spawn lists stuff that well, so I could have missed something; but if there *is*
> a bug, I'd like to see a proper explanation.
Unfortunately, I don't have an explanation either.
>
> Did you check whether this is actually reproducible on mainline, and not just
> the SUSE v4.12 based kernel?
Mainline crashes on boot:/
Need to find a recent working kernel.
Thanks
Michal
On Tue, 25 Jun 2019 09:40:54 -0700
Eric Biggers <[email protected]> wrote:
> Hi Michal,
>
> On Tue, Jun 25, 2019 at 09:16:24AM +0200, Michal Suchanek wrote:
> > Reportedly on Linux 4.12 the LTP testsuite crashes at pcrypt_aead01 infrequently.
> >
> > To get it reproduce more frequently I tried
> >
> > n=0 ; while true ; do /opt/ltp/testcases/bin/pcrypt_aead01 >& /dev/null ; n=$(expr $n + 1) ; echo -ne $n\\r ; done
> >
> > but this is quite stable. However, holding ^C in the terminal where the loop is running tends to trigger the crash.
> >
> >
> > The code looks like this:
> >
> > 0xc000000000520e10 <+0>: c8 00 4c 3c addis r2,r12,200
> > 0xc000000000520e14 <+4>: f0 9b 42 38 addi r2,r2,-25616
> > 0xc000000000520e18 <+8>: a6 02 08 7c mflr r0
> > 0xc000000000520e1c <+12>: 00 00 00 60 nop
> > 0xc000000000520e20 <+16>: 79 2b ab 7c mr. r11,r5
> > 0xc000000000520e24 <+20>: f0 ff c1 fb std r30,-16(r1)
> > 0xc000000000520e28 <+24>: e8 ff a1 fb std r29,-24(r1)
> > 0xc000000000520e2c <+28>: f8 ff e1 fb std r31,-8(r1)
> > 0xc000000000520e30 <+32>: 91 ff 21 f8 stdu r1,-112(r1)
> > 0xc000000000520e34 <+36>: 78 1b 69 7c mr r9,r3
> > 0xc000000000520e38 <+40>: 78 23 9e 7c mr r30,r4
> > 0xc000000000520e3c <+44>: 08 00 82 41 beq 0xc000000000520e44 <crypto_remove_spawns+52>
> > 0xc000000000520e40 <+48>: 78 5b 69 7d mr r9,r11
> > 0xc000000000520e44 <+52>: 40 00 e1 3b addi r31,r1,64
> > 0xc000000000520e48 <+56>: 30 00 c1 38 addi r6,r1,48
> > # 0xc000000000520e4c <+60>: 10 00 43 e9 ld r10,16(r3)
> > 0xc000000000520e50 <+64>: 20 00 a9 83 lwz r29,32(r9)
> > 0xc000000000520e54 <+68>: 20 00 a1 38 addi r5,r1,32
> > 0xc000000000520e58 <+72>: 40 00 e1 fb std r31,64(r1)
> > 0xc000000000520e5c <+76>: 48 00 e1 fb std r31,72(r1)
> > 0xc000000000520e60 <+80>: 30 00 c1 f8 std r6,48(r1)
> > 0xc000000000520e64 <+84>: 38 00 c1 f8 std r6,56(r1)
> > 0xc000000000520e68 <+88>: 20 00 a1 f8 std r5,32(r1)
> > 0xc000000000520e6c <+92>: 28 00 a1 f8 std r5,40(r1)
> > 0xc000000000520e70 <+96>: 10 00 03 38 addi r0,r3,16
> > & 0xc000000000520e74 <+100>: 40 50 a0 7f cmpld cr7,r0,r10
> > 0xc000000000520e78 <+104>: 78 53 47 7d mr r7,r10
> > * 0xc000000000520e7c <+108>: 00 00 0a e9 ld r8,0(r10)
> > 0xc000000000520e80 <+112>: 64 00 9e 41 beq cr7,0xc000000000520ee4 <crypto_remove_spawns+212>
> >
> > #) This looks like alg->cra_users.next is loaded to r10
> > &) This looks like r10 is compared with &alg->cra_users calculated on the line
> > above to terminate the loop
> > *) This looks like *alg->cra_users.next loaded into r8 which causes the null
> > pointer dereference
> >
> > So the fixup needs to be applied to the first dereference of
> > alg->cra_users.next as well to prevent crash.
> >
> > Fixes: 9a00674213a3 ("crypto: algapi - fix NULL dereference in crypto_remove_spawns()")
> >
> > Reported-by: [email protected]
>
> The stack trace shows that crypto_remove_spawns() is being called from
> crypto_unregister_instance(). Therefore, the instance should already be
> registered and have initialized cra_users. Now, I don't claim to understand the
> spawn lists stuff that well, so I could have missed something; but if there *is*
> a bug, I'd like to see a proper explanation.
>
> Did you check whether this is actually reproducible on mainline, and not just
> the SUSE v4.12 based kernel?
This is the crash with mainline:
BUG: Kernel NULL pointer dereference at 0x00000000
Faulting instruction address: 0xc0000000005bb280
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: authenc pcrypt crypto_user kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ibmveth(xX) vmx_crypto rtc_generic gf128mul btrfs libcrc32c xor zstd_decompress(nN) zstd_compress(nN) raid6_pq sd_mod sg dm_multipath dm_mod ibmvscsi(xX) scsi_dh_rdac scsi_dh_emc scsi_transport_srp scsi_dh_alua crc32c_vpmsum scsi_mod autofs4
Supported: No, Unreleased kernel
CPU: 6 PID: 24816 Comm: pcrypt_aead01 Kdump: loaded Tainted: G 5.2.0-rc6-11.g9d2be15-default #1 SLE15 (unreleased)
NIP: c0000000005bb280 LR: c0000000005bc108 CTR: c0000000005bc0b0
REGS: c0000005b574b590 TRAP: 0300 Tainted: G (5.2.0-rc6-11.g9d2be15-default)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002840 XER: 20040000
CFAR: c00000000000e244 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000005bc108 c0000005b574b820 c000000001406900 c0000005b2eabc00
GPR04: c0000005b574b8b0 0000000000000000 c0000005b574b850 0000000000000000
GPR08: 0000000000000000 c0000005b2eabc00 ffffffff00000001 c0000005b574b860
GPR12: c0000005b2eabc10 c000000007fa7800 0000000131b90ee0 00007fffc975b748
GPR16: 0000000131bb2d80 0000000131bb2d88 00007fffc975b5e0 00007fffc975b5d4
GPR20: 00007fffc975b628 00007fffc975b5f0 0000000000000010 0000000000000000
GPR24: 0000000000000000 0000000000000000 fffffffffffff000 0000000000000000
GPR28: c0000005b574b8b0 0000000000000cb3 c0000000013366f8 c0000005b574b840
CFAR: c00000000000e244 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000005bc108 c0000005b574b820 c000000001406900 c0000005b2eabc00
GPR04: c0000005b574b8b0 0000000000000000 c0000005b574b850 0000000000000000
GPR08: 0000000000000000 c0000005b2eabc00 ffffffff00000001 c0000005b574b860
GPR12: c0000005b2eabc10 c000000007fa7800 0000000131b90ee0 00007fffc975b748
GPR16: 0000000131bb2d80 0000000131bb2d88 00007fffc975b5e0 00007fffc975b5d4
GPR20: 00007fffc975b628 00007fffc975b5f0 0000000000000010 0000000000000000
GPR24: 0000000000000000 0000000000000000 fffffffffffff000 0000000000000000
GPR28: c0000005b574b8b0 0000000000000cb3 c0000000013366f8 c0000005b574b840
NIP [c0000000005bb280] crypto_remove_spawns+0x70/0x2e0
LR [c0000000005bc108] crypto_unregister_instance+0x58/0xa0
Call Trace:
[c0000005b574b820] [000000000000000c] 0xc (unreliable)
[c0000005b574b890] [fffffffffffff000] 0xfffffffffffff000
[c0000005b574b8d0] [c0080000048811c4] crypto_del_alg+0xdc/0x110 [crypto_user]
[c0000005b574b900] [c0080000048802b8] crypto_user_rcv_msg+0xe0/0x270 [crypto_user]
[c0000005b574ba00] [c00000000095d8e4] netlink_rcv_skb+0x84/0x1a0
[c0000005b574ba70] [c008000004880074] crypto_netlink_rcv+0x4c/0x80 [crypto_user]
[c0000005b574baa0] [c00000000095ce1c] netlink_unicast+0x1dc/0x2a0
[c0000005b574bb00] [c00000000095d25c] netlink_sendmsg+0x20c/0x430
[c0000005b574bba0] [c0000000008a09d0] sock_sendmsg+0x60/0x90
[c0000005b574bbd0] [c0000000008a151c] ___sys_sendmsg+0x31c/0x370
[c0000005b574bd80] [c0000000008a320c] __sys_sendmsg+0x6c/0xe0
[c0000005b574be20] [c00000000000b688] system_call+0x5c/0x70
Instruction dump:
e9030010 83a90020 39610040 fbe10020 fbe10028 f8c10030 f8c10038 f9610040
f9610048 39830010 7c2c4040 7d074378 <e9480000> 41820060 60000000 60000000
---[ end trace 4ff8403d5fbae222 ]---
Attaching config and dmesg.
Thanks
Michal
On Mon, Jul 01, 2019 at 03:31:54PM +0200, Michal Such?nek wrote:
> On Tue, 25 Jun 2019 09:40:54 -0700
> Eric Biggers <[email protected]> wrote:
>
> > Hi Michal,
> >
> > On Tue, Jun 25, 2019 at 09:16:24AM +0200, Michal Suchanek wrote:
> > > Reportedly on Linux 4.12 the LTP testsuite crashes at pcrypt_aead01 infrequently.
> > >
> > > To get it reproduce more frequently I tried
> > >
> > > n=0 ; while true ; do /opt/ltp/testcases/bin/pcrypt_aead01 >& /dev/null ; n=$(expr $n + 1) ; echo -ne $n\\r ; done
> > >
> > > but this is quite stable. However, holding ^C in the terminal where the loop is running tends to trigger the crash.
> > >
>
> > >
> > > The code looks like this:
> > >
> > > 0xc000000000520e10 <+0>: c8 00 4c 3c addis r2,r12,200
> > > 0xc000000000520e14 <+4>: f0 9b 42 38 addi r2,r2,-25616
> > > 0xc000000000520e18 <+8>: a6 02 08 7c mflr r0
> > > 0xc000000000520e1c <+12>: 00 00 00 60 nop
> > > 0xc000000000520e20 <+16>: 79 2b ab 7c mr. r11,r5
> > > 0xc000000000520e24 <+20>: f0 ff c1 fb std r30,-16(r1)
> > > 0xc000000000520e28 <+24>: e8 ff a1 fb std r29,-24(r1)
> > > 0xc000000000520e2c <+28>: f8 ff e1 fb std r31,-8(r1)
> > > 0xc000000000520e30 <+32>: 91 ff 21 f8 stdu r1,-112(r1)
> > > 0xc000000000520e34 <+36>: 78 1b 69 7c mr r9,r3
> > > 0xc000000000520e38 <+40>: 78 23 9e 7c mr r30,r4
> > > 0xc000000000520e3c <+44>: 08 00 82 41 beq 0xc000000000520e44 <crypto_remove_spawns+52>
> > > 0xc000000000520e40 <+48>: 78 5b 69 7d mr r9,r11
> > > 0xc000000000520e44 <+52>: 40 00 e1 3b addi r31,r1,64
> > > 0xc000000000520e48 <+56>: 30 00 c1 38 addi r6,r1,48
> > > # 0xc000000000520e4c <+60>: 10 00 43 e9 ld r10,16(r3)
> > > 0xc000000000520e50 <+64>: 20 00 a9 83 lwz r29,32(r9)
> > > 0xc000000000520e54 <+68>: 20 00 a1 38 addi r5,r1,32
> > > 0xc000000000520e58 <+72>: 40 00 e1 fb std r31,64(r1)
> > > 0xc000000000520e5c <+76>: 48 00 e1 fb std r31,72(r1)
> > > 0xc000000000520e60 <+80>: 30 00 c1 f8 std r6,48(r1)
> > > 0xc000000000520e64 <+84>: 38 00 c1 f8 std r6,56(r1)
> > > 0xc000000000520e68 <+88>: 20 00 a1 f8 std r5,32(r1)
> > > 0xc000000000520e6c <+92>: 28 00 a1 f8 std r5,40(r1)
> > > 0xc000000000520e70 <+96>: 10 00 03 38 addi r0,r3,16
> > > & 0xc000000000520e74 <+100>: 40 50 a0 7f cmpld cr7,r0,r10
> > > 0xc000000000520e78 <+104>: 78 53 47 7d mr r7,r10
> > > * 0xc000000000520e7c <+108>: 00 00 0a e9 ld r8,0(r10)
> > > 0xc000000000520e80 <+112>: 64 00 9e 41 beq cr7,0xc000000000520ee4 <crypto_remove_spawns+212>
> > >
> > > #) This looks like alg->cra_users.next is loaded to r10
> > > &) This looks like r10 is compared with &alg->cra_users calculated on the line
> > > above to terminate the loop
> > > *) This looks like *alg->cra_users.next loaded into r8 which causes the null
> > > pointer dereference
> > >
> > > So the fixup needs to be applied to the first dereference of
> > > alg->cra_users.next as well to prevent crash.
> > >
> > > Fixes: 9a00674213a3 ("crypto: algapi - fix NULL dereference in crypto_remove_spawns()")
> > >
> > > Reported-by: [email protected]
>
> >
> > The stack trace shows that crypto_remove_spawns() is being called from
> > crypto_unregister_instance(). Therefore, the instance should already be
> > registered and have initialized cra_users. Now, I don't claim to understand the
> > spawn lists stuff that well, so I could have missed something; but if there *is*
> > a bug, I'd like to see a proper explanation.
> >
> > Did you check whether this is actually reproducible on mainline, and not just
> > the SUSE v4.12 based kernel?
>
> This is the crash with mainline:
>
> BUG: Kernel NULL pointer dereference at 0x00000000
> Faulting instruction address: 0xc0000000005bb280
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: authenc pcrypt crypto_user kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ibmveth(xX) vmx_crypto rtc_generic gf128mul btrfs libcrc32c xor zstd_decompress(nN) zstd_compress(nN) raid6_pq sd_mod sg dm_multipath dm_mod ibmvscsi(xX) scsi_dh_rdac scsi_dh_emc scsi_transport_srp scsi_dh_alua crc32c_vpmsum scsi_mod autofs4
> Supported: No, Unreleased kernel
> CPU: 6 PID: 24816 Comm: pcrypt_aead01 Kdump: loaded Tainted: G 5.2.0-rc6-11.g9d2be15-default #1 SLE15 (unreleased)
> NIP: c0000000005bb280 LR: c0000000005bc108 CTR: c0000000005bc0b0
> REGS: c0000005b574b590 TRAP: 0300 Tainted: G (5.2.0-rc6-11.g9d2be15-default)
> MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002840 XER: 20040000
> CFAR: c00000000000e244 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
> GPR00: c0000000005bc108 c0000005b574b820 c000000001406900 c0000005b2eabc00
> GPR04: c0000005b574b8b0 0000000000000000 c0000005b574b850 0000000000000000
> GPR08: 0000000000000000 c0000005b2eabc00 ffffffff00000001 c0000005b574b860
> GPR12: c0000005b2eabc10 c000000007fa7800 0000000131b90ee0 00007fffc975b748
> GPR16: 0000000131bb2d80 0000000131bb2d88 00007fffc975b5e0 00007fffc975b5d4
> GPR20: 00007fffc975b628 00007fffc975b5f0 0000000000000010 0000000000000000
> GPR24: 0000000000000000 0000000000000000 fffffffffffff000 0000000000000000
> GPR28: c0000005b574b8b0 0000000000000cb3 c0000000013366f8 c0000005b574b840
> CFAR: c00000000000e244 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
> GPR00: c0000000005bc108 c0000005b574b820 c000000001406900 c0000005b2eabc00
> GPR04: c0000005b574b8b0 0000000000000000 c0000005b574b850 0000000000000000
> GPR08: 0000000000000000 c0000005b2eabc00 ffffffff00000001 c0000005b574b860
> GPR12: c0000005b2eabc10 c000000007fa7800 0000000131b90ee0 00007fffc975b748
> GPR16: 0000000131bb2d80 0000000131bb2d88 00007fffc975b5e0 00007fffc975b5d4
> GPR20: 00007fffc975b628 00007fffc975b5f0 0000000000000010 0000000000000000
> GPR24: 0000000000000000 0000000000000000 fffffffffffff000 0000000000000000
> GPR28: c0000005b574b8b0 0000000000000cb3 c0000000013366f8 c0000005b574b840
> NIP [c0000000005bb280] crypto_remove_spawns+0x70/0x2e0
> LR [c0000000005bc108] crypto_unregister_instance+0x58/0xa0
> Call Trace:
> [c0000005b574b820] [000000000000000c] 0xc (unreliable)
> [c0000005b574b890] [fffffffffffff000] 0xfffffffffffff000
> [c0000005b574b8d0] [c0080000048811c4] crypto_del_alg+0xdc/0x110 [crypto_user]
> [c0000005b574b900] [c0080000048802b8] crypto_user_rcv_msg+0xe0/0x270 [crypto_user]
> [c0000005b574ba00] [c00000000095d8e4] netlink_rcv_skb+0x84/0x1a0
> [c0000005b574ba70] [c008000004880074] crypto_netlink_rcv+0x4c/0x80 [crypto_user]
> [c0000005b574baa0] [c00000000095ce1c] netlink_unicast+0x1dc/0x2a0
> [c0000005b574bb00] [c00000000095d25c] netlink_sendmsg+0x20c/0x430
> [c0000005b574bba0] [c0000000008a09d0] sock_sendmsg+0x60/0x90
> [c0000005b574bbd0] [c0000000008a151c] ___sys_sendmsg+0x31c/0x370
> [c0000005b574bd80] [c0000000008a320c] __sys_sendmsg+0x6c/0xe0
> [c0000005b574be20] [c00000000000b688] system_call+0x5c/0x70
> Instruction dump:
> e9030010 83a90020 39610040 fbe10020 fbe10028 f8c10030 f8c10038 f9610040
> f9610048 39830010 7c2c4040 7d074378 <e9480000> 41820060 60000000 60000000
> ---[ end trace 4ff8403d5fbae222 ]---
>
> Attaching config and dmesg.
>
Thanks, I was able to reproduce this, and I came up with a different fix.
I sent it out, but for some reason it doesn't seem to have reached any of the
lists... If I still don't see after a little while, I'll resend it.
- Eric