On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
> Hi *,
> Under a very high load of io traffic, we got the below? BUG trace.
> We can see that:
> plist_for_each_entry_safe(this, next,?&hb1->chain, list) {
> ??????????????? if (match_futex (&this->key, &key1))
> ?
> were called with hb1 = NULL at futex_wake_up function.
> And there is no protection on the code regarding such a scenario.
> ?
> The NULL can? be geting from:
> hb1 = hash_futex(&key1);
> ?
> How can we protect against such a situation?
Can you reproduce it without loading proprietary modules?
Your analysis doesn't quite make sense:
hb1 = hash_futex(&key1);
hb2 = hash_futex(&key2);
retry_private:
double_lock_hb(hb1, hb2);
If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
> RIP: 0010:do_futex+0xdf/0xa90
> ?
> 0xffffffff81138eff is in do_futex (kernel/futex.c:1748).
> 1743?????????????????????????????????????? put_futex_key(&key1);
> 1744?????????????????????????????????????? cond_resched();
> 1745?????????????????????????????????????? goto retry;
> 1746?????????????????????? }
> 1747??????
> 1748?????????????????????? plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> 1749?????????????????????????????????????? if (match_futex (&this->key, &key1)) {
> 1750?????????????????????????????????????????????????????? if (this->pi_state || this->rt_waiter) {
> 1751?????????????????????????????????????????????????????????????????????? ret = -EINVAL;
> 1752?????????????????????????????????????????????????????????????????????? goto out_unlock;
> (gdb)
> ?
> ?
> ?
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> ??????????????? if (match_futex (&this->key, &key1)) {
> ?
> ?
> ?
> ?
> This happened in kernel? 4.19.149 running on Azure vm
> ?
> ?
> Thx
> David
> Reply
> Forward
> MO
>
Thx Matthew
1) You are probably correct regarding the place the actual crash happened unless something happens in-betweens....
But that what the gdb told us in addition, the RDI shows us the value of 0x00000246.
Jun 10 20:49:40 c-node04 kernel: [97562.144463] BUG: unable to handle kernel NULL pointer dereference at 0000000000000246
Jun 10 20:49:40 c-node04 kernel: [97562.145450] PGD 2012ee4067 P4D 2012ee4067 PUD 20135a0067 PMD 0
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Oops: 0000 [#1] SMP
Jun 10 20:49:40 c-node04 kernel: [97562.145450] CPU: 36 PID: 12668 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G W OE 4.19.149-KM6 #1
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe624135b90, 85, 50) returning 0 times: 0, 0, 0, 2203, 0 ccount 42
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RIP: 0010:do_futex+0xdf/0xa90
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Code: 08 4c 8d 6d 08 48 8b 3a 48 8d 72 e8 49 39 d5 4c 8d 67 e8 0f 84 89 00 00 00 31 c0 44 89 3c 24 41 89 df 44 89 f3 41 89 c6 eb 16 <49> 8b 7c
24 18 49 8d 54 24 18 4c 89 e6 4c 39 ea 4c 8d 67 e8 74 58
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RSP: 0018:ffff97f6ea8bbdf0 EFLAGS: 00010283
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RAX: 00007f6db1a5d000 RBX: 0000000000000001 RCX: ffffa5530c5f0140
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RDX: ffff97f6e4287d58 RSI: ffff97f6e4287d40 RDI: 0000000000000246
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe62414a860, 2, 50) returning 0 times: 0, 0, 0, 2191, 0 ccount 277
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RBP: ffffa5530c5bd580 R08: 00007f6db1a5d9c0 R09: 0000000000000001
2) In addition, we got a second crash on the same function a few lines above the previous one
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? pointer+0x137/0x350
Jun 12 11:20:43 c-node06 kernel: [91837.319613] printk+0x58/0x6f
Jun 12 11:20:43 c-node06 kernel: [91837.319613] panic+0xce/0x238
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] __stack_chk_fail+0x15/0x20
Jun 12 11:20:43 c-node06 kernel: [91837.319613] do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_del+0x5f/0xb0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] __schedule+0x243/0x830
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? schedule+0x28/0x80
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? exit_to_usermode_loop+0x57/0xe0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? prepare_exit_to_usermode+0x70/0x90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? retint_user+0x8/0x8
(gdb) l *do_futex+0xa3d
0xffffffff8113985d is in do_futex (kernel/futex.c:1742).
1737 if (!(flags & FLAGS_SHARED)) {
1738 cond_resched();
1739 goto retry_private;
1740 }
1741
1742 put_futex_key(&key2);
1743 put_futex_key(&key1);
1744 cond_resched();
1745 goto retry;
1746 }
(gdb)
Closer to the double_lock_hb(hb1, hb2) you mention.
Regarding running without proprietary modules, we didn't manage to reproduce, but we are getting half of the IO load while this problem happens
Thx
David
-----Original Message-----
From: Matthew Wilcox <[email protected]>
Sent: Sunday, June 13, 2021 11:04 PM
To: David Mozes <[email protected]>
Cc: [email protected]; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Darren Hart <[email protected]>; [email protected]
Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL
On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
> Hi *,
> Under a very high load of io traffic, we got the below? BUG trace.
> We can see that:
> plist_for_each_entry_safe(this, next,?&hb1->chain, list) {
> ??????????????? if (match_futex (&this->key, &key1))
> ?
> were called with hb1 = NULL at futex_wake_up function.
> And there is no protection on the code regarding such a scenario.
> ?
> The NULL can? be geting from:
> hb1 = hash_futex(&key1);
> ?
> How can we protect against such a situation?
Can you reproduce it without loading proprietary modules?
Your analysis doesn't quite make sense:
hb1 = hash_futex(&key1);
hb2 = hash_futex(&key2);
retry_private:
double_lock_hb(hb1, hb2);
If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
> RIP: 0010:do_futex+0xdf/0xa90
> ?
> 0xffffffff81138eff is in do_futex (kernel/futex.c:1748).
> 1743?????????????????????????????????????? put_futex_key(&key1);
> 1744?????????????????????????????????????? cond_resched();
> 1745?????????????????????????????????????? goto retry;
> 1746?????????????????????? }
> 1747??????
> 1748?????????????????????? plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> 1749?????????????????????????????????????? if (match_futex (&this->key, &key1)) {
> 1750?????????????????????????????????????????????????????? if (this->pi_state || this->rt_waiter) {
> 1751?????????????????????????????????????????????????????????????????????? ret = -EINVAL;
> 1752?????????????????????????????????????????????????????????????????????? goto out_unlock;
> (gdb)
> ?
> ?
> ?
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> ??????????????? if (match_futex (&this->key, &key1)) {
> ?
> ?
> ?
> ?
> This happened in kernel? 4.19.149 running on Azure vm
> ?
> ?
> Thx
> David
> Reply
> Forward
> MO
>
On Sun, Jun 13 2021 at 21:04, Matthew Wilcox wrote:
> On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
>> Hi *,
>> Under a very high load of io traffic, we got the below BUG trace.
>> We can see that:
>> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
>> if (match_futex (&this->key, &key1))
>>
>> were called with hb1 = NULL at futex_wake_up function.
>> And there is no protection on the code regarding such a scenario.
>>
>> The NULL can be geting from:
>> hb1 = hash_futex(&key1);
Definitely not.
>>
>> How can we protect against such a situation?
>
> Can you reproduce it without loading proprietary modules?
>
> Your analysis doesn't quite make sense:
>
> hb1 = hash_futex(&key1);
> hb2 = hash_futex(&key2);
>
> retry_private:
> double_lock_hb(hb1, hb2);
>
> If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
Sure, but hash_futex() _cannot_ return a NULL pointer ever.
>>
>>
>> This happened in kernel 4.19.149 running on Azure vm
4.19.149 is almost 50 versions behind the latest 4.19.194 stable.
The other question is whether this happens with an less dead kernel as
well.
Thanks,
tglx
I Will try with the latest 4.19.195 and will see.
Thx
David
-----Original Message-----
From: Thomas Gleixner <[email protected]>
Sent: Tuesday, June 15, 2021 6:04 PM
To: Matthew Wilcox <[email protected]>; David Mozes <[email protected]>
Cc: [email protected]; Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Darren Hart <[email protected]>; [email protected]
Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL
On Sun, Jun 13 2021 at 21:04, Matthew Wilcox wrote:
> On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
>> Hi *,
>> Under a very high load of io traffic, we got the below BUG trace.
>> We can see that:
>> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
>> if (match_futex (&this->key, &key1))
>>
>> were called with hb1 = NULL at futex_wake_up function.
>> And there is no protection on the code regarding such a scenario.
>>
>> The NULL can be geting from:
>> hb1 = hash_futex(&key1);
Definitely not.
>>
>> How can we protect against such a situation?
>
> Can you reproduce it without loading proprietary modules?
>
> Your analysis doesn't quite make sense:
>
> hb1 = hash_futex(&key1);
> hb2 = hash_futex(&key2);
>
> retry_private:
> double_lock_hb(hb1, hb2);
>
> If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
Sure, but hash_futex() _cannot_ return a NULL pointer ever.
>>
>>
>> This happened in kernel 4.19.149 running on Azure vm
4.19.149 is almost 50 versions behind the latest 4.19.194 stable.
The other question is whether this happens with an less dead kernel as
well.
Thanks,
tglx
Hi,
This is what we got with 4.19.195
We catch the corruption
It could be racing on the kernel
The same CPU the same kernel thread try to add and delete
And we have more cpu’s like this heavy load
Jun 17 18:31:45 localhost kernel: [ 0.000000] Linux version 4.19.195-KM9 (root@bb1ab379213a) (gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)) #1 SMP Wed Jun 16 14:02:34 UTC 2021
Jun 17 18:31:45 localhost kernel: [ 0.000000] Command line: ro root=LABEL=/ rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=512M,hig
h nompath append="nmi_watchdog=2" printk.time=1 rd_NO_LVM rd_NO_DM dm_mod.use_blk_mq=y rcutree.kthread_prio=99 intel_pstate=enable intel_idle.max_cstate=0 processor.max_cstate=1 idle=halt con
sole=tty0 console=ttyS0,38400n8d
Jun 17 18:58:23 c-node06 kernel: [26921.734822] ?
Jun 17 18:58:23 c-node06 kernel: [26921.734845] WARNING: CPU: 56 PID: 51893 at lib/list_debug.c:56 __list_del_entry_valid+0x8a/0x90
Jun 17 18:58:23 c-node06 kernel: [26921.734846] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils
Jun 17 18:58:23 c-node06 kernel: [26921.734884] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local]
Jun 17 18:58:23 c-node06 kernel: [26921.734896] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1
Jun 17 18:58:23 c-node06 kernel: [26921.734897] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 17 18:58:23 c-node06 kernel: [26921.734899] RIP: 0010:__list_del_entry_valid+0x8a/0x90
Jun 17 18:58:23 c-node06 kernel: [26921.734901] Code: 44 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 e0 0c 0e b9 e8 37 52 44 00 0f 0b 31 c0 c3 48 c7 c7 20 0d 0e b9 e8 26 52 44 00 <0f> 0b 31 c0 c3 90 48 85 d2 41 55 41 54 55 53 74 5f 48 85 f6 74 64
Jun 17 18:58:23 c-node06 kernel: [26921.734902] RSP: 0018:ffff8f8a2b2f7c08 EFLAGS: 00010086
Jun 17 18:58:23 c-node06 kernel: [26921.734903] RAX: 0000000000000000 RBX: ffff8f87dc9e2740 RCX: 0000000000000006
Jun 17 18:58:23 c-node06 kernel: [26921.734904] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f8adfc164f0
Jun 17 18:58:23 c-node06 kernel: [26921.734905] RBP: ffff8f8a2b3c8800 R08: 0000000000000064 R09: 0000000000000002
Jun 17 18:58:23 c-node06 kernel: [26921.734906] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.734906] R13: ffff8f87dc9e2520 R14: 0000000000021e00 R15: ffff8f8adf6a1e00
Jun 17 18:58:23 c-node06 kernel: [26921.734908] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.734908] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:58:23 c-node06 kernel: [26921.734909] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0
Jun 17 18:58:23 c-node06 kernel: [26921.734912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.734912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 17 18:58:23 c-node06 kernel: [26921.734913] Call Trace:
Jun 17 18:58:23 c-node06 kernel: [26921.734919] __delist_rt_entity+0x12/0x80
Jun 17 18:58:23 c-node06 kernel: [26921.734923] dequeue_rt_stack+0x75/0x280
Jun 17 18:58:23 c-node06 kernel: [26921.734924] dequeue_rt_entity+0x1f/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.734925] dequeue_task_rt+0x26/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.734926] push_rt_task+0x1e2/0x220
Jun 17 18:58:23 c-node06 kernel: [26921.734928] task_woken_rt+0x47/0x50
Jun 17 18:58:23 c-node06 kernel: [26921.734933] ttwu_do_wakeup+0x44/0x140
Jun 17 18:58:23 c-node06 kernel: [26921.734935] try_to_wake_up+0x1d2/0x460
un 17 18:58:23 c-node06 kernel: [26921.734940] ? sock_write_iter+0x97/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.734942] wake_up_q+0x54/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.734948] futex_wake+0x142/0x160
Jun 17 18:58:23 c-node06 kernel: [26921.734950] do_futex+0x2cc/0x9f0
Jun 17 18:58:23 c-node06 kernel: [26921.734955] ? vfs_writev+0xc5/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.734958] ? __bad_area_nosemaphore+0x126/0x190
Jun 17 18:58:23 c-node06 kernel: [26921.734960] __x64_sys_futex+0x143/0x180
Jun 17 18:58:23 c-node06 kernel: [26921.734962] ? do_writev+0xe7/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.734968] do_syscall_64+0x59/0x1b0
Jun 17 18:58:23 c-node06 kernel: [26921.734973] ? page_fault+0x8/0x30
Jun 17 18:58:23 c-node06 kernel: [26921.734975] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 18:58:23 c-node06 kernel: [26921.734977] RIP: 0033:0x7fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.734978] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48
Jun 17 18:58:23 c-node06 kernel: [26921.734979] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
Jun 17 18:58:23 c-node06 kernel: [26921.734980] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.734981] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0
Jun 17 18:58:23 c-node06 kernel: [26921.734982] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8
:
Jun 17 18:58:23 c-node06 kernel: [26921.734982] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8
Jun 17 18:58:23 c-node06 kernel: [26921.734982] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398
Jun 17 18:58:23 c-node06 kernel: [26921.734983] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.734984] ---[ end trace d290eac16902b305 ]---
Jun 17 18:58:23 c-node06 kernel: [26921.734988] WARNING: CPU: 56 PID: 51893 at kernel/sched/rt.c:1250 __enqueue_rt_entity+0x313/0x370
Jun 17 18:58:23 c-node06 kernel: [26921.734989] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils
Jun 17 18:58:23 c-node06 kernel: [26921.735003] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local]
Jun 17 18:58:23 c-node06 kernel: [26921.735009] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1
Jun 17 18:58:23 c-node06 kernel: [26921.735010] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 17 18:58:23 c-node06 kernel: [26921.735011] RIP: 0010:__enqueue_rt_entity+0x313/0x370
Jun 17 18:58:23 c-node06 kernel: [26921.735011] Code: ff ff e9 11 ff ff ff 48 83 c4 08 48 89 ee 48 89 df 5b 5d 41 5c 41 5d e9 fb f9 ff ff ba 01 00 00 00 66 89 53 24 e9 cf fd ff ff <0f> 0b e9 6d fd ff ff 48 8b 83 a0 01 00 00 48 8d ab 70 01 00 00 c7
Jun 17 18:58:23 c-node06 kernel: [26921.735012] RSP: 0018:ffff8f8a2b2f7c20 EFLAGS: 00010002
Jun 17 18:58:23 c-node06 kernel: [26921.735013] RAX: ffff8f8a2b3c8800 RBX: ffff8f8a2d2cb540 RCX: ffff8f8adf9a2050
Jun 17 18:58:23 c-node06 kernel: [26921.735014] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f8a2d2cb540
Jun 17 18:58:23 c-node06 kernel: [26921.735014] RBP: ffff8f8adf9a2040 R08: 0000000000000003 R09: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735015] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8adf9a2670
Jun 17 18:58:23 c-node06 kernel: [26921.735016] R13: ffff8f87dc9e2520 R14: 0000000000021e00 R15: ffff8f8adf6a1e00
Jun 17 18:58:23 c-node06 kernel: [26921.735016] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:58:23 c-node06 kernel: [26921.735018] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0
Jun 17 18:58:23 c-node06 kernel: [26921.735018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 17 18:58:23 c-node06 kernel: [26921.735020] Call Trace:
Jun 17 18:58:23 c-node06 kernel: [26921.735021] ? dequeue_rt_stack+0x1ed/0x280
Jun 17 18:58:23 c-node06 kernel: [26921.735022] dequeue_rt_entity+0x4d/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735023] dequeue_task_rt+0x26/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735025] push_rt_task+0x1e2/0x220
Jun 17 18:58:23 c-node06 kernel: [26921.735026] task_woken_rt+0x47/0x50
Jun 17 18:58:23 c-node06 kernel: [26921.735028] ttwu_do_wakeup+0x44/0x140
Jun 17 18:58:23 c-node06 kernel: [26921.735030] try_to_wake_up+0x1d2/0x460
Jun 17 18:58:23 c-node06 kernel: [26921.735031] ? sock_write_iter+0x97/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735032] wake_up_q+0x54/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735034] futex_wake+0x142/0x160
Jun 17 18:58:23 c-node06 kernel: [26921.735036] do_futex+0x2cc/0x9f0
Jun 17 18:58:23 c-node06 kernel: [26921.735037] ? vfs_writev+0xc5/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735039] ? __bad_area_nosemaphore+0x126/0x190
Jun 17 18:58:23 c-node06 kernel: [26921.735040] __x64_sys_futex+0x143/0x180
Jun 17 18:58:23 c-node06 kernel: [26921.735042] ? do_writev+0xe7/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735043] do_syscall_64+0x59/0x1b0
Jun 17 18:58:23 c-node06 kernel: [26921.735045] ? page_fault+0x8/0x30
Jun 17 18:58:23 c-node06 kernel: [26921.735046] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 18:58:23 c-node06 kernel: [26921.735047] RIP: 0033:0x7fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.735047] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48
Jun 17 18:58:23 c-node06 kernel: [26921.735048] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
Jun 17 18:58:23 c-node06 kernel: [26921.735049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.735050] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0
Jun 17 18:58:23 c-node06 kernel: [26921.735051] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8
Jun 17 18:58:23 c-node06 kernel: [26921.735051] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398
Jun 17 18:58:23 c-node06 kernel: [26921.735052] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735053] ---[ end trace d290eac16902b306 ]---
Jun 17 18:58:23 c-node06 kernel: [26921.735054] ------------[ cut here ]------------
Jun 17 18:58:23 c-node06 kernel: [26921.735055] list_add double add: new=ffff8f8a2d2cb540, prev=ffff8f8a2d2cb540, next=ffff8f8adf9a2670.
Jun 17 18:58:23 c-node06 kernel: [26921.735066] WARNING: CPU: 56 PID: 51893 at lib/list_debug.c:31 __list_add_valid+0x67/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735066] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i
cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc gr
ace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipat
h rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_co
re(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils
Jun 17 18:58:23 c-node06 kernel: [26921.735077] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E)
hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local]
Jun 17 18:58:23 c-node06 kernel: [26921.735080] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1
Jun 17 18:58:23 c-node06 kernel: [26921.735080] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 17 18:58:23 c-node06 kernel: [26921.735082] RIP: 0010:__list_add_valid+0x67/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735082] Code: c1 4c 89 c6 48 c7 c7 e8 0b 0e b9 e8 d3 52 44 00 0f 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 38 0c 0e b9 e8 b9 52 44 00 <0f> 0b 31
c0 c3 0f 1f 40 00 48 b9 00 01 00 00 00 00 ad de 48 8b 07
Jun 17 18:58:23 c-node06 kernel: [26921.735083] RSP: 0018:ffff8f8a2b2f7c18 EFLAGS: 00010086
Jun 17 18:58:23 c-node06 kernel: [26921.735083] RAX: 0000000000000000 RBX: ffff8f8a2d2cb540 RCX: 0000000000000006
Jun 17 18:58:23 c-node06 kernel: [26921.735084] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff8f8adfc164f0
Jun 17 18:58:23 c-node06 kernel: [26921.735084] RBP: ffff8f8adf9a2040 R08: 0000000000000068 R09: 0000000000000002
Jun 17 18:58:23 c-node06 kernel: [26921.735084] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8adf9a2670
Jun 17 18:58:23 c-node06 kernel: [26921.735085] R13: ffff8f8a2d2cb540 R14: 0000000000021e00 R15: ffff8f8adf6a1e00
Jun 17 18:58:23 c-node06 kernel: [26921.735085] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:58:23 c-node06 kernel: [26921.735086] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0
Jun 17 18:58:23 c-node06 kernel: [26921.735087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 17 18:58:23 c-node06 kernel: [26921.735088] Call Trace:
Jun 17 18:58:23 c-node06 kernel: [26921.735088] __enqueue_rt_entity+0x227/0x370
Jun 17 18:58:23 c-node06 kernel: [26921.735089] ? dequeue_rt_stack+0x1ed/0x280
Jun 17 18:58:23 c-node06 kernel: [26921.735090] dequeue_rt_entity+0x4d/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735091] dequeue_task_rt+0x26/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735091] push_rt_task+0x1e2/0x220
Jun 17 18:58:23 c-node06 kernel: [26921.735092] task_woken_rt+0x47/0x50
Jun 17 18:58:23 c-node06 kernel: [26921.735093] ttwu_do_wakeup+0x44/0x140
:
Jun 17 18:58:23 c-node06 kernel: [26921.735095] try_to_wake_up+0x1d2/0x460
Jun 17 18:58:23 c-node06 kernel: [26921.735096] ? sock_write_iter+0x97/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735098] wake_up_q+0x54/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735099] futex_wake+0x142/0x160
Jun 17 18:58:23 c-node06 kernel: [26921.735101] do_futex+0x2cc/0x9f0
Jun 17 18:58:23 c-node06 kernel: [26921.735102] ? vfs_writev+0xc5/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735104] ? __bad_area_nosemaphore+0x126/0x190
Jun 17 18:58:23 c-node06 kernel: [26921.735105] __x64_sys_futex+0x143/0x180
Jun 17 18:58:23 c-node06 kernel: [26921.735106] ? do_writev+0xe7/0x100
Jun 17 18:58:23 c-node06 kernel: [26921.735108] do_syscall_64+0x59/0x1b0
Jun 17 18:58:23 c-node06 kernel: [26921.735109] ? page_fault+0x8/0x30
Jun 17 18:58:23 c-node06 kernel: [26921.735110] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 18:58:23 c-node06 kernel: [26921.735111] RIP: 0033:0x7fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.735112] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48
Jun 17 18:58:23 c-node06 kernel: [26921.735112] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
Jun 17 18:58:23 c-node06 kernel: [26921.735113] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5
Jun 17 18:58:23 c-node06 kernel: [26921.735114] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0
Jun 17 18:58:23 c-node06 kernel: [26921.735114] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8
Jun 17 18:58:23 c-node06 kernel: [26921.735115] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398
Jun 17 18:58:23 c-node06 kernel: [26921.735115] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735117] ---[ end trace d290eac16902b307 ]---
Jun 17 18:58:23 c-node06 kernel: [26921.735124] ------------[ cut here ]------------
Jun 17 18:58:23 c-node06 kernel: [26921.735126] list_del corruption. prev->next should be ffff8f7e2649a740, but was ffff8f8a3347e630
Jun 17 18:58:23 c-node06 kernel: [26921.735136] WARNING: CPU: 46 PID: 53761 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
Jun 17 18:58:23 c-node06 kernel: [26921.735137] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils
Jun 17 18:58:23 c-node06 kernel: [26921.735154] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local]
Jun 17 18:58:23 c-node06 kernel: [26921.735160] CPU: 46 PID: 53761 Comm: STAR4BLKS1_WORK Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1
Jun 17 18:58:23 c-node06 kernel: [26921.735160] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 17 18:58:23 c-node06 kernel: [26921.735162] RIP: 0010:__list_del_entry_valid+0x79/0x90
Jun 17 18:58:23 c-node06 kernel: [26921.735163] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 a8 0c 0e b9 e8 4e 52 44 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 e0 0c 0e b9 e8 37 52 44 00 <0f> 0b 31 c0 c3 48 c7 c7 20 0d 0e b9 e8 26 52 44 00 0f 0b 31 c0 c3
Jun 17 18:58:23 c-node06 kernel: [26921.735164] RSP: 0018:ffff8f513372fb40 EFLAGS: 00010086
Jun 17 18:58:23 c-node06 kernel: [26921.735165] RAX: 0000000000000000 RBX: ffff8f7e2649a740 RCX: 0000000000000006
Jun 17 18:58:23 c-node06 kernel: [26921.735165] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8f8adf9964f0
Jun 17 18:58:23 c-node06 kernel: [26921.735166] RBP: ffff8f8a2b3c8800 R08: 0000000000000064 R09: 0000000000000002
Jun 17 18:58:23 c-node06 kernel: [26921.735166] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735167] R13: 0000000000000062 R14: 0000000000021e00 R15: ffff8f8adf9e1e00
Jun 17 18:58:23 c-node06 kernel: [26921.735168] FS: 00007fa03dac4700(0000) GS:ffff8f8adf980000(0000) knlGS:0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:58:23 c-node06 kernel: [26921.735169] CR2: ffffffffff600400 CR3: 000000201800e006 CR4: 00000000003606e0
Jun 17 18:58:23 c-node06 kernel: [26921.735171] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 17 18:58:23 c-node06 kernel: [26921.735172] Call Trace:
Jun 17 18:58:23 c-node06 kernel: [26921.735175] __delist_rt_entity+0x12/0x80
Jun 17 18:58:23 c-node06 kernel: [26921.735176] dequeue_rt_stack+0x75/0x280
Jun 17 18:58:23 c-node06 kernel: [26921.735177] dequeue_rt_entity+0x1f/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735178] dequeue_task_rt+0x26/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735179] push_rt_task+0x1e2/0x220
Jun 17 18:58:23 c-node06 kernel: [26921.735180] push_rt_tasks+0x11/0x20
Jun 17 18:58:23 c-node06 kernel: [26921.735182] __balance_callback+0x3b/0x60
Jun 17 18:58:23 c-node06 kernel: [26921.735187] __schedule+0x6e9/0x830
Jun 17 18:58:23 c-node06 kernel: [26921.735190] schedule+0x28/0x80
Jun 17 18:58:23 c-node06 kernel: [26921.735193] futex_wait_queue_me+0xb9/0x120
Jun 17 18:58:23 c-node06 kernel: [26921.735194] futex_wait+0x139/0x250
Jun 17 18:58:23 c-node06 kernel: [26921.735196] ? try_to_wake_up+0x54/0x460
Jun 17 18:58:23 c-node06 kernel: [26921.735197] ? enqueue_task_rt+0x9f/0xc0
Jun 17 18:58:23 c-node06 kernel: [26921.735199] do_futex+0x2eb/0x9f0
Jun 17 18:58:23 c-node06 kernel: [26921.735204] ? plist_add+0xc1/0xf0
Jun 17 18:58:23 c-node06 kernel: [26921.735205] ? plist_add+0xc1/0xf0
Jun 17 18:58:23 c-node06 kernel: [26921.735206] ? plist_del+0x5f/0xb0
Jun 17 18:58:23 c-node06 kernel: [26921.735210] ? __switch_to+0x115/0x420
Jun 17 18:58:23 c-node06 kernel: [26921.735211] __x64_sys_futex+0x143/0x180
Jun 17 18:58:23 c-node06 kernel: [26921.735216] do_syscall_64+0x59/0x1b0
Jun 17 18:58:23 c-node06 kernel: [26921.735217] ? prepare_exit_to_usermode+0x70/0x90
Jun 17 18:58:23 c-node06 kernel: [26921.735219] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 18:58:23 c-node06 kernel: [26921.735220] RIP: 0033:0x7fc611475334
Jun 17 18:58:23 c-node06 kernel: [26921.735221] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48
Jun 17 18:58:23 c-node06 kernel: [26921.735221] RSP: 002b:00007fa03dac2f60 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
Jun 17 18:58:23 c-node06 kernel: [26921.735222] RAX: ffffffffffffffda RBX: 00007fa1af1e9768 RCX: 00007fc611475334
Jun 17 18:58:23 c-node06 kernel: [26921.735223] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fa1af1e97f8
Jun 17 18:58:23 c-node06 kernel: [26921.735223] RBP: 00007fa03dac2f80 R08: 00007fa1af1e97f8 R09: 000000000000d201
Jun 17 18:58:23 c-node06 kernel: [26921.735224] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735224] R13: 00007fa1af1e97a8 R14: 0000000000000001 R15: 0000000000b477c4
Jun 17 18:58:23 c-node06 kernel: [26921.735225] ---[ end trace d290eac16902b308 ]---
Jun 17 18:58:23 c-node06 kernel: [26921.735232] list_add corruption. prev->next should be next (ffff8f8a2b3c8e30), but was ffff8f6a1fd14c40. (prev=ffff8f7e2649a740).
Jun 17 18:58:23 c-node06 kernel: [26921.735240] WARNING: CPU: 46 PID: 53761 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735240] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils
Jun 17 18:58:23 c-node06 kernel: [26921.735256] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local]
Jun 17 18:58:23 c-node06 kernel: [26921.735261] CPU: 46 PID: 53761 Comm: STAR4BLKS1_WORK Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1
Jun 17 18:58:23 c-node06 kernel: [26921.735261] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 17 18:58:23 c-node06 kernel: [26921.735263] RIP: 0010:__list_add_valid+0x4d/0x70
Jun 17 18:58:23 c-node06 kernel: [26921.735264] Code: c3 48 89 d1 48 c7 c7 98 0b 0e b9 48 89 c2 e8 ea 52 44 00 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 e8 0b 0e b9 e8 d3 52 44 00 <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 38 0c 0e b9 e8
Jun 17 18:58:23 c-node06 kernel: [26921.735265] RSP: 0018:ffff8f8adf983f10 EFLAGS: 00010082
Jun 17 18:58:23 c-node06 kernel: [26921.735266] RAX: 0000000000000000 RBX: ffff8f7e2658a740 RCX: 0000000000000006
Jun 17 18:58:23 c-node06 kernel: [26921.735266] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8f8adf9964f0
Jun 17 18:58:23 c-node06 kernel: [26921.735267] RBP: ffff8f8a2b3c8800 R08: 0000000000000088 R09: 0000000000000002
Jun 17 18:58:23 c-node06 kernel: [26921.735268] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8a2b3c8e30
Jun 17 18:58:23 c-node06 kernel: [26921.735268] R13: ffff8f7e2649a740 R14: 0000000000000000 R15: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735269] FS: 00007fa03dac4700(0000) GS:ffff8f8adf980000(0000) knlGS:0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:58:23 c-node06 kernel: [26921.735271] CR2: ffffffffff600400 CR3: 000000201800e006 CR4: 00000000003606e0
:
un 17 18:58:23 c-node06 kernel: [26921.735272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735273] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 17 18:58:23 c-node06 kernel: [26921.735273] Call Trace:
Jun 17 18:58:23 c-node06 kernel: [26921.735274] <IRQ>
Jun 17 18:58:23 c-node06 kernel: [26921.735275] __enqueue_rt_entity+0x227/0x370
Jun 17 18:58:23 c-node06 kernel: [26921.735276] ? dequeue_rt_stack+0x1b4/0x280
Jun 17 18:58:23 c-node06 kernel: [26921.735277] enqueue_rt_entity+0x2d/0x50
Jun 17 18:58:23 c-node06 kernel: [26921.735278] enqueue_task_rt+0x2f/0xc0
Jun 17 18:58:23 c-node06 kernel: [26921.735280] ttwu_do_activate+0x44/0x80
Jun 17 18:58:23 c-node06 kernel: [26921.735283] sched_ttwu_pending+0x87/0xd0
Jun 17 18:58:23 c-node06 kernel: [26921.735285] scheduler_ipi+0xa4/0x120
Jun 17 18:58:23 c-node06 kernel: [26921.735287] reschedule_interrupt+0xf/0x20
Jun 17 18:58:23 c-node06 kernel: [26921.735288] </IRQ>
Jun 17 18:58:23 c-node06 kernel: [26921.735290] RIP: 0010:_raw_spin_unlock_irqrestore+0xd/0x20
Jun 17 18:58:23 c-node06 kernel: [26921.735291] Code: 87 ff 48 29 d8 48 3d 24 f4 00 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
Jun 17 18:58:23 c-node06 kernel: [26921.735292] RSP: 0018:ffff8f513372fc30 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff02
Jun 17 18:58:23 c-node06 kernel: [26921.735293] RAX: 0000000000000000 RBX: ffff8f8adfae1e00 RCX: 0000000000000000
Jun 17 18:58:23 c-node06 kernel: [26921.735294] RDX: ffff8f8adf9a26b8 RSI: 0000000000000286 RDI: 0000000000000286
Jun 17 18:58:23 c-node06 kernel: [26921.735295] RBP: ffff8f513372fc88 R08: 0000800000000000 R09: ffff8f6a5f00e0d8
Jun 17 18:58:23 c-node06 kernel: [26921.735295] R10: 00003f8cd33bbd2e R11: 0000000000000001 R12: ffff8f51007d0000
Jun 17 18:58:23 c-node06 kernel: [26921.735296] R13: ffff8f50fc504a00 R14: ffff8f69fcc31540 R15: ffff8f8adf9a1e00
Jun 17 18:58:23 c-node06 kernel: [26921.735299] __schedule+0x6e9/0x830
Jun 17 18:58:23 c-node06 kernel: [26921.735301] schedule+0x28/0x80
Jun 17 18:58:23 c-node06 kernel: [26921.735303] futex_wait_queue_me+0xb9/0x120
Jun 17 18:58:23 c-node06 kernel: [26921.735304] futex_wait+0x139/0x250
Jun 17 18:58:23 c-node06 kernel: [26921.735306] ? try_to_wake_up+0x54/0x460
Jun 17 18:58:23 c-node06 kernel: [26921.735307] ? enqueue_task_rt+0x9f/0xc0
Jun 17 18:58:23 c-node06 kernel: [26921.735309] do_futex+0x2eb/0x9f0
Jun 17 18:58:23 c-node06 kernel: [26921.735311] ? plist_add+0xc1/0xf0
-----Original Message-----
From: David Mozes <[email protected]>
Sent: Wednesday, June 16, 2021 6:42 PM
To: Thomas Gleixner <[email protected]>; Matthew Wilcox <[email protected]>
Cc: [email protected]; Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Darren Hart <[email protected]>; [email protected]
Subject: RE: futex/call -to plist_for_each_entry_safe with head=NULL
I Will try with the latest 4.19.195 and will see.
Thx
David
-----Original Message-----
From: Thomas Gleixner <[email protected]>
Sent: Tuesday, June 15, 2021 6:04 PM
To: Matthew Wilcox <[email protected]>; David Mozes <[email protected]>
Cc: [email protected]; Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Darren Hart <[email protected]>; [email protected]
Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL
On Sun, Jun 13 2021 at 21:04, Matthew Wilcox wrote:
> On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
>> Hi *,
>> Under a very high load of io traffic, we got the below BUG trace.
>> We can see that:
>> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
>> if (match_futex (&this->key, &key1))
>>
>> were called with hb1 = NULL at futex_wake_up function.
>> And there is no protection on the code regarding such a scenario.
>>
>> The NULL can be geting from:
>> hb1 = hash_futex(&key1);
Definitely not.
>>
>> How can we protect against such a situation?
>
> Can you reproduce it without loading proprietary modules?
>
> Your analysis doesn't quite make sense:
>
> hb1 = hash_futex(&key1);
> hb2 = hash_futex(&key2);
>
> retry_private:
> double_lock_hb(hb1, hb2);
>
> If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
Sure, but hash_futex() _cannot_ return a NULL pointer ever.
>>
>>
>> This happened in kernel 4.19.149 running on Azure vm
4.19.149 is almost 50 versions behind the latest 4.19.194 stable.
The other question is whether this happens with an less dead kernel as
well.
Thanks,
tglx