Prior to this change, the combination of `softlockup_panic=1` and
`softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path
is trying to grab the console lock that is held by the stack trace printing
path. What seems to be happening is that while there are multiple CPUs, only one
of them is tasked to print the back trace of all CPUs. On a machine with many
CPUs and a slow serial console (on Google Compute Engine for example), the stack
trace printing routine hits a timeout and the reboot path kicks in. The latter
then tries to print something else, but can't get the lock because it's still
held by earlier printing path. This is easily reproducible on a VM with 16+
vCPUs on Google Compute Engine - which is a very common scenario.
A quick repro is available at
https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 seconds
into executing repro.sh. Both deadlock analysis and repro are credits to Peter
Feiner.
Note that I have read previous discussions on backporting this to stable [1].
The argument for objecting the backport was that this is a non-trivial fix and
is supported to prevent hypothetical soft lockups. What we are hitting is a real
deadlock, in production, however. Hence this request.
[1] https://lore.kernel.org/lkml/[email protected]/T/#u
Serial console logs leading up to the deadlock. As can be seen the stack trace
was incomplete because the printing path hit a timeout.
```
lockup-test-16-2 login: [ 206.648060] LoadPin: kernel-module pinning-ignored obj="/tmp/release/hog.ko" pid=3003 cmdline="insmod hog.ko"
[ 206.650851] hog: loading out-of-tree module taints kernel.
[ 206.654761] Hogging a CPU now
[ 209.577900] watchdog: BUG: soft lockup - CPU#13 stuck for 3s! [hog:3010]
[ 209.584883] Modules linked in: hog(O) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 xt_addrtype nf_nat br_netfilter ip6table_filter ip6_tables aesni_intel aes_x86_64 crypto_simd cryptd glue_helper
[ 209.603952] CPU: 13 PID: 3010 Comm: hog Tainted: G O 4.14.0+ #11
[ 209.611390] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.620733] task: ffff9501b8ca9d00 task.stack: ffffb99c0732c000
[ 209.626766] RIP: 0010:hog_thread+0x13/0x1000 [hog]
[ 209.631763] RSP: 0018:ffffb99c0732ff10 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff11
[ 209.639466] RAX: 0000000000000011 RBX: ffff9501bc1af580 RCX: 0000000000000000
[ 209.646818] RDX: ffff9501c3554d80 RSI: ffff9501c354cc38 RDI: ffff9501c354cc38
[ 209.654087] RBP: ffffb99c0732ff48 R08: 0000000000000030 R09: 0000000000000000
[ 209.661510] R10: ffffb99c08df3ce0 R11: 0000000000000000 R12: ffff9501aeab8e80
[ 209.668773] R13: ffffb99c0803bc28 R14: 0000000000000000 R15: ffff9501bc1af5c8
[ 209.676089] FS: 0000000000000000(0000) GS:ffff9501c3540000(0000) knlGS:0000000000000000
[ 209.684292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.690150] CR2: 00007f146fd8aba0 CR3: 0000000b0ba11006 CR4: 00000000003606a0
[ 209.697571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.704936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.712184] Call Trace:
[ 209.714853] kthread+0x120/0x160
[ 209.718198] ? 0xffffffffc0307000
[ 209.721641] ? kthread_stop+0x120/0x120
[ 209.725591] ? ret_from_fork+0x1f/0x30
[ 209.729462] Code: <eb> fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 209.737518] Sending NMI from CPU 13 to CPUs 0-12,14-15:
[ 209.742864] NMI backtrace for cpu 0
[ 209.742868] CPU: 0 PID: 2866 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.742868] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.742870] task: ffff95019d150000 task.stack: ffffb99c08d98000
[ 209.742875] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 209.742876] RSP: 0018:ffffb99c08d9bda8 EFLAGS: 00000002
[ 209.742877] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.742877] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.742878] RBP: ffffb99c08d9bdd8 R08: 000000007f569b7d R09: 000000006930609f
[ 209.742880] R10: 000000000a41d205 R11: 000000008b5d54b4 R12: ffff9501c2cdda68
[ 209.742881] R13: ffffb99c08d9be30 R14: ffffb99c08d9be30 R15: 0000000000000040
[ 209.742882] FS: 00007f2605cd3700(0000) GS:ffff9501c3200000(0000) knlGS:0000000000000000
[ 209.742883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.742884] CR2: 00007fe4acdfe9c0 CR3: 0000000ef55fa001 CR4: 00000000003606b0
[ 209.742888] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.742889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.742890] Call Trace:
[ 209.742895] do_raw_spin_lock+0xa0/0xb0
[ 209.742900] _raw_spin_lock_irqsave+0x20/0x30
[ 209.742905] _extract_crng+0x45/0x120
[ 209.742907] ? urandom_read+0xfa/0x2a0
[ 209.742910] ? vfs_read+0xad/0x170
[ 209.742912] ? SyS_read+0x4b/0xa0
[ 209.742916] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.742918] ? do_syscall_64+0x63/0x1f0
[ 209.742920] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.742921] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00 00 b9
[ 209.742940] NMI backtrace for cpu 8
[ 209.742942] CPU: 8 PID: 2883 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.742943] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.742943] task: ffff95019d260000 task.stack: ffffb99c07bec000
[ 209.742945] RIP: 0010:native_queued_spin_lock_slowpath+0x20/0x1b0
[ 209.742946] RSP: 0018:ffffb99c07befda8 EFLAGS: 00000097
[ 209.742947] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.742948] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.742948] RBP: ffffb99c07befdd8 R08: 000000007f6dbc9d R09: 0000000011140320
[ 209.742949] R10: 00000000cde5e021 R11: 000000008b7475d4 R12: ffff9501c2cdda68
[ 209.742950] R13: ffffb99c07befe30 R14: ffffb99c07befe30 R15: 0000000000000040
[ 209.742951] FS: 00007f9247798700(0000) GS:ffff9501c3400000(0000) knlGS:0000000000000000
[ 209.742952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.742953] CR2: 0000561d05e288b0 CR3: 0000000edd2c2001 CR4: 00000000003606a0
[ 209.742956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.742956] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.742957] Call Trace:
[ 209.742959] do_raw_spin_lock+0xa0/0xb0
[ 209.742961] _raw_spin_lock_irqsave+0x20/0x30
[ 209.742962] _extract_crng+0x45/0x120
[ 209.742964] ? urandom_read+0xfa/0x2a0
[ 209.742966] ? vfs_read+0xad/0x170
[ 209.742967] ? SyS_read+0x4b/0xa0
[ 209.742969] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.742970] ? do_syscall_64+0x63/0x1f0
[ 209.742971] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.742972] Code: 00 00 00 e9 1d fe ff ff 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 <85> c0 75 f2 5d c3 f3 90 eb ec 81 fe 00 01 00 00 0f 84 9a 00 00
[ 209.742991] NMI backtrace for cpu 5
[ 209.742994] CPU: 5 PID: 2872 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.742994] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.742995] task: ffff95019f229d00 task.stack: ffffb99c07d00000
[ 209.742998] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 209.742999] RSP: 0018:ffffb99c07d03da8 EFLAGS: 00000002
[ 209.743000] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.743001] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.743005] RBP: ffffb99c07d03dd8 R08: 00000000d10b17ce R09: 000000004db462a0
[ 209.743006] R10: 00000000fe50950b R11: 00000000dd11d105 R12: ffff9501c2cdda68
[ 209.743006] R13: ffffb99c07d03e30 R14: ffffb99c07d03e30 R15: 0000000000000040
[ 209.743008] FS: 00007fd82c5e2700(0000) GS:ffff9501c3340000(0000) knlGS:0000000000000000
[ 209.743009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.743009] CR2: 00007f43a5388b20 CR3: 0000000ef5f9d004 CR4: 00000000003606a0
[ 209.743013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.743013] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.743014] Call Trace:
[ 209.743017] do_raw_spin_lock+0xa0/0xb0
[ 209.743020] _raw_spin_lock_irqsave+0x20/0x30
[ 209.743024] _extract_crng+0x45/0x120
[ 209.743026] ? urandom_read+0xfa/0x2a0
[ 209.743028] ? vfs_read+0xad/0x170
[ 209.743030] ? SyS_read+0x4b/0xa0
[ 209.743033] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.743034] ? do_syscall_64+0x63/0x1f0
[ 209.743036] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.743037] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00 00 b9
[ 209.743059] NMI backtrace for cpu 6
[ 209.743061] CPU: 6 PID: 2893 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.743062] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.743063] task: ffff95019d2e2b80 task.stack: ffffb99c07e30000
[ 209.743066] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 209.743067] RSP: 0018:ffffb99c07e33da8 EFLAGS: 00000002
[ 209.743068] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.743069] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.743069] RBP: ffffb99c07e33dd8 R08: 00000000ae91c56e R09: 00000000aa3ad454
[ 209.743070] R10: 00000000646b9d65 R11: 00000000ba987ea5 R12: ffff9501c2cdda68
[ 209.743071] R13: ffffb99c07e33e30 R14: ffffb99c07e33e30 R15: 0000000000000040
[ 209.743072] FS: 00007f525c77c700(0000) GS:ffff9501c3380000(0000) knlGS:0000000000000000
[ 209.743073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.743074] CR2: 00007f11dcd8c8c0 CR3: 0000000edd2b0004 CR4: 00000000003606a0
[ 209.743077] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.743078] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.743078] Call Trace:
[ 209.743082] do_raw_spin_lock+0xa0/0xb0
[ 209.743084] _raw_spin_lock_irqsave+0x20/0x30
[ 209.743087] _extract_crng+0x45/0x120
[ 209.743089] ? urandom_read+0xfa/0x2a0
[ 209.743091] ? vfs_read+0xad/0x170
[ 209.743092] ? SyS_read+0x4b/0xa0
[ 209.743094] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.743096] ? do_syscall_64+0x63/0x1f0
[ 209.743097] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.743098] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81 fe 00
[ 209.743119] NMI backtrace for cpu 14
[ 209.743120] CPU: 14 PID: 2885 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.743121] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.743122] task: ffff95019d1c8e80 task.stack: ffffb99c07e08000
[ 209.743124] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 209.743124] RSP: 0018:ffffb99c07e0bda8 EFLAGS: 00000002
[ 209.743126] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.743126] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.743127] RBP: ffffb99c07e0bdd8 R08: 00000000482066d2 R09: 000000005b007fea
[ 209.743128] R10: 0000000012b1557e R11: 0000000054272009 R12: ffff9501c2cdda68
[ 209.743129] R13: ffffb99c07e0be30 R14: ffffb99c07e0be30 R15: 0000000000000040
[ 209.743130] FS: 00007f4282ed5700(0000) GS:ffff9501c3580000(0000) knlGS:0000000000000000
[ 209.743131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.743131] CR2: 00007f32728e9ba0 CR3: 0000000edd0d7006 CR4: 00000000003606a0
[ 209.743135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.743136] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.743136] Call Trace:
[ 209.743139] do_raw_spin_lock+0xa0/0xb0
[ 209.743141] _raw_spin_lock_irqsave+0x20/0x30
[ 209.743143] _extract_crng+0x45/0x120
[ 209.743145] ? urandom_read+0xfa/0x2a0
[ 209.743147] ? vfs_read+0xad/0x170
[ 209.743148] ? SyS_read+0x4b/0xa0
[ 209.743150] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.743151] ? do_syscall_64+0x63/0x1f0
[ 209.743152] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.743153] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00 00 b9
[ 209.743174] NMI backtrace for cpu 1
[ 209.743176] CPU: 1 PID: 2865 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.743177] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.743178] task: ffff95019d110000 task.stack: ffffb99c07d18000
[ 209.743181] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 209.743182] RSP: 0018:ffffb99c07d1bda8 EFLAGS: 00000002
[ 209.743183] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.743184] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.743184] RBP: ffffb99c07d1bdd8 R08: 0000000052b91797 R09: 000000002f2f8e5c
[ 209.743185] R10: 00000000c5b37258 R11: 000000005ebfd0ce R12: ffff9501c2cdda68
[ 209.743186] R13: ffffb99c07d1be30 R14: ffffb99c07d1be30 R15: 0000000000000040
[ 209.743187] FS: 00007fd12de78700(0000) GS:ffff9501c3240000(0000) knlGS:0000000000000000
[ 209.743188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.743189] CR2: 00007f3f9d1beba0 CR3: 0000000eddffa003 CR4: 00000000003606a0
[ 209.743192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 209.743193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 209.743193] Call Trace:
[ 209.743197] do_raw_spin_lock+0xa0/0xb0
[ 209.743200] _raw_spin_lock_irqsave+0x20/0x30
[ 209.743202] _extract_crng+0x45/0x120
[ 209.743204] ? urandom_read+0xfa/0x2a0
[ 209.743206] ? vfs_read+0xad/0x170
[ 209.743207] ? SyS_read+0x4b/0xa0
[ 209.743209] ? __audit_syscall_exit+0x21e/0x2c0
[ 209.743211] ? do_syscall_64+0x63/0x1f0
[ 209.743212] ? entry_SYSCALL64_slow_path+0x25/0x25
[ 209.743213] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff 0f 1f 00 0f 1f 44 00 00 8b 05 f5 b4 98 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81 fe 00
[ 209.743235] NMI backtrace for cpu 7
[ 209.743238] CPU: 7 PID: 2884 Comm: dd Tainted: G O 4.14.0+ #11
[ 209.743238] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 209.743239] task: ffff95019f259d00 task.stack: ffffb99c078f4000
[ 209.743242] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 209.743243] RSP: 0018:ffffb99c078f7da8 EFLAGS: 00000002
[ 209.743244] RAX: 0000000000000001 RBX: ffff9501c2cdda68 RCX: 0000000000000000
[ 209.743245] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9501c2cdda68
[ 209.74
```
On Thu, 27 Sep 2018 12:46:01 -0700
Daniel Wang <[email protected]> wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path
> is trying to grab the console lock that is held by the stack trace printing
> path. What seems to be happening is that while there are multiple CPUs, only one
> of them is tasked to print the back trace of all CPUs. On a machine with many
> CPUs and a slow serial console (on Google Compute Engine for example), the stack
> trace printing routine hits a timeout and the reboot path kicks in. The latter
> then tries to print something else, but can't get the lock because it's still
> held by earlier printing path. This is easily reproducible on a VM with 16+
> vCPUs on Google Compute Engine - which is a very common scenario.
>
> A quick repro is available at
> https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 seconds
> into executing repro.sh. Both deadlock analysis and repro are credits to Peter
> Feiner.
>
> Note that I have read previous discussions on backporting this to stable [1].
> The argument for objecting the backport was that this is a non-trivial fix and
> is supported to prevent hypothetical soft lockups. What we are hitting is a real
> deadlock, in production, however. Hence this request.
>
> [1] https://lore.kernel.org/lkml/[email protected]/T/#u
>
> Serial console logs leading up to the deadlock. As can be seen the stack trace
> was incomplete because the printing path hit a timeout.
I'm fine with having this backported.
-- Steve
On Mon 2018-10-01 15:23:24, Steven Rostedt wrote:
> On Thu, 27 Sep 2018 12:46:01 -0700
> Daniel Wang <[email protected]> wrote:
>
> > Prior to this change, the combination of `softlockup_panic=1` and
> > `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path
> > is trying to grab the console lock that is held by the stack trace printing
> > path. What seems to be happening is that while there are multiple CPUs, only one
> > of them is tasked to print the back trace of all CPUs. On a machine with many
> > CPUs and a slow serial console (on Google Compute Engine for example), the stack
> > trace printing routine hits a timeout and the reboot path kicks in. The latter
> > then tries to print something else, but can't get the lock because it's still
> > held by earlier printing path. This is easily reproducible on a VM with 16+
> > vCPUs on Google Compute Engine - which is a very common scenario.
> >
> > A quick repro is available at
> > https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 seconds
> > into executing repro.sh. Both deadlock analysis and repro are credits to Peter
> > Feiner.
> >
> > Note that I have read previous discussions on backporting this to stable [1].
> > The argument for objecting the backport was that this is a non-trivial fix and
> > is supported to prevent hypothetical soft lockups. What we are hitting is a real
> > deadlock, in production, however. Hence this request.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/T/#u
> >
> > Serial console logs leading up to the deadlock. As can be seen the stack trace
> > was incomplete because the printing path hit a timeout.
>
> I'm fine with having this backported.
Dunno. Is the patch perhaps a bit too complex? This is not exactly
trivial bugfix.
pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
printk.c | 108
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
I see that it is pretty critical to Daniel, but maybe kernel with
console locking redone should no longer be called 4.4?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 10/1/18 10:13 PM, Pavel Machek wrote:
>
> Dunno. Is the patch perhaps a bit too complex? This is not exactly
> trivial bugfix.
>
> pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> printk.c | 108
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>
> I see that it is pretty critical to Daniel, but maybe kernel with
> console locking redone should no longer be called 4.4?
In that case it probably should no longer be called 4.4 since at least
Meltdown/Spectre fixes :)
On Mon, 1 Oct 2018 22:13:10 +0200
Pavel Machek <[email protected]> wrote:
> > > [1] https://lore.kernel.org/lkml/[email protected]/T/#u
> > >
> > > Serial console logs leading up to the deadlock. As can be seen the stack trace
> > > was incomplete because the printing path hit a timeout.
> >
> > I'm fine with having this backported.
>
> Dunno. Is the patch perhaps a bit too complex? This is not exactly
> trivial bugfix.
>
> pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> printk.c | 108
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>
> I see that it is pretty critical to Daniel, but maybe kernel with
> console locking redone should no longer be called 4.4?
But it prevents a deadlock.
I usually weigh backporting as benefit vs risk. And I believe the
benefit outweighs the risk in this case.
-- Steve
On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt <[email protected]> wrote:
>
> > Serial console logs leading up to the deadlock. As can be seen the stack trace
> > was incomplete because the printing path hit a timeout.
>
> I'm fine with having this backported.
Thanks. I can send the cherrypicks your way. Do you recommend that I
include the three follow-up fixes though?
c14376de3a1b printk: Wake klogd when passing console_lock owner
fd5f7cde1b85 printk: Never set console_may_schedule in console_trylock()
c162d5b4338d printk: Hide console waiter logic into helpers
dbdda842fe96 printk: Add console owner and waiter logic to load
balance console writes
>
> -- Steve
--
Best,
Daniel
On Mon, Oct 1, 2018 at 1:23 PM Vlastimil Babka <[email protected]> wrote:
>
> On 10/1/18 10:13 PM, Pavel Machek wrote:
> >
> > Dunno. Is the patch perhaps a bit too complex? This is not exactly
> > trivial bugfix.
> >
> > pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> > printk.c | 108
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >
> > I see that it is pretty critical to Daniel, but maybe kernel with
> > console locking redone should no longer be called 4.4?
>
> In that case it probably should no longer be called 4.4 since at least
> Meltdown/Spectre fixes :)
To clarify, I am requesting a backport to 4.14. This bug doesn't repro on 4.4.
>
--
Best,
Daniel
On Mon, Oct 01, 2018 at 01:37:30PM -0700, Daniel Wang wrote:
>On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt <[email protected]> wrote:
>>
>> > Serial console logs leading up to the deadlock. As can be seen the stack trace
>> > was incomplete because the printing path hit a timeout.
>>
>> I'm fine with having this backported.
>
>Thanks. I can send the cherrypicks your way. Do you recommend that I
>include the three follow-up fixes though?
>
>c14376de3a1b printk: Wake klogd when passing console_lock owner
>fd5f7cde1b85 printk: Never set console_may_schedule in console_trylock()
>c162d5b4338d printk: Hide console waiter logic into helpers
>dbdda842fe96 printk: Add console owner and waiter logic to load
>balance console writes
Maybe it'll also make sense to make the reproducer into a test that can
go under tools/testing/ and we can backport that as well? It'll be
helpful to have a way to make sure things are sane.
--
Thanks,
Sasha
On (09/27/18 12:46), Daniel Wang wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path
> is trying to grab the console lock that is held by the stack trace printing
> path. What seems to be happening is that while there are multiple CPUs, only one
> of them is tasked to print the back trace of all CPUs. On a machine with many
> CPUs and a slow serial console (on Google Compute Engine for example), the stack
> trace printing routine hits a timeout and the reboot path kicks in. The latter
> then tries to print something else, but can't get the lock because it's still
> held by earlier printing path.
Sorry, I'm missing something here. Steven's patch deals with lockups and
I understand why you want to backport the patch set; but console output
deadlock on panic() is another thing.
You said
"then tries to print something else, but can't get the lock
because it's still held by earlier printing path"
Can't get which of the locks?
-ss
On Mon 2018-10-01 13:37:30, Daniel Wang wrote:
> On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt <[email protected]> wrote:
> >
> > > Serial console logs leading up to the deadlock. As can be seen the stack trace
> > > was incomplete because the printing path hit a timeout.
> >
> > I'm fine with having this backported.
>
> Thanks. I can send the cherrypicks your way. Do you recommend that I
> include the three follow-up fixes though?
>
> c14376de3a1b printk: Wake klogd when passing console_lock owner
> fd5f7cde1b85 printk: Never set console_may_schedule in console_trylock()
> c162d5b4338d printk: Hide console waiter logic into helpers
> dbdda842fe96 printk: Add console owner and waiter logic to load
> balance console writes
This list looks complete and I am fine with backporting it to 4.14.
Well, I still wonder why it helped and why you do not see it with 4.4.
I have a feeling that the console owner switch helped only by chance.
In fact, you might be affected by a race in
printk_safe_flush_on_panic() that was fixed by the commit:
554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
The above one commit might be enough. Well, there was one more
NMI-related race that was fixed by:
ba552399954dde1b printk: Split the code for storing a message into the log buffer
a338f84dc196f44b printk: Create helper function to queue deferred console handling
03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
Best Regards,
Petr
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek <[email protected]> wrote:
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
So do I. I don't think Steven had the deadlock in mind when working on
that patch, but with that patch and that patch alone, the deadlock
disappeared.
> In fact, you might be affected by a race in
> printk_safe_flush_on_panic() that was fixed by the commit:
>
> 554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
>
> The above one commit might be enough. Well, there was one more
Thanks for the pointer. Let me test this out.
> NMI-related race that was fixed by:
>
> ba552399954dde1b printk: Split the code for storing a message into the log buffer
> a338f84dc196f44b printk: Create helper function to queue deferred console handling
> 03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
>
> Best Regards,
> Petr
On Tue, Oct 2, 2018 at 1:17 AM Sergey Senozhatsky
<[email protected]> wrote:
> Sorry, I'm missing something here. Steven's patch deals with lockups and
> I understand why you want to backport the patch set; but console output
> deadlock on panic() is another thing.
Understood. But it did fix the deadlock for me and without it I hit the deadlock
pretty consistently. Let me test the patch Petr pointed out and will update
here.
--
Best,
Daniel
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek <[email protected]> wrote:
>
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
> In fact, you might be affected by a race in
> printk_safe_flush_on_panic() that was fixed by the commit:
>
> 554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
>
> The above one commit might be enough. Well, there was one more
> NMI-related race that was fixed by:
>
> ba552399954dde1b printk: Split the code for storing a message into the log buffer
> a338f84dc196f44b printk: Create helper function to queue deferred console handling
> 03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
All of these commits already exist in 4.14 stable, since 4.14.68. The deadlock
still exists even when built from 4.14.73 (latest tag) though. And cherrypicking
dbdda842fe96 fixes it.
>
> Best Regards,
> Petr
--
Best,
Daniel
On Tue, 2 Oct 2018 17:15:17 -0700
Daniel Wang <[email protected]> wrote:
> On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek <[email protected]> wrote:
> >
> > Well, I still wonder why it helped and why you do not see it with 4.4.
> > I have a feeling that the console owner switch helped only by chance.
> > In fact, you might be affected by a race in
> > printk_safe_flush_on_panic() that was fixed by the commit:
> >
> > 554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
> >
> > The above one commit might be enough. Well, there was one more
> > NMI-related race that was fixed by:
> >
> > ba552399954dde1b printk: Split the code for storing a message into the log buffer
> > a338f84dc196f44b printk: Create helper function to queue deferred console handling
> > 03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
>
> All of these commits already exist in 4.14 stable, since 4.14.68. The deadlock
> still exists even when built from 4.14.73 (latest tag) though. And cherrypicking
> dbdda842fe96 fixes it.
>
I don't see the big deal of backporting this. The biggest complaints
about backports are from fixes that were added to late -rc releases
where the fixes didn't get much testing. This commit was added in 4.16,
and hasn't had any issues due to the design. Although a fix has been
added:
c14376de3a1 ("printk: Wake klogd when passing console_lock owner")
Also from 4.16, but nothing else according to searching for "Fixes"
tags.
-- Steve
On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> On Tue, 2 Oct 2018 17:15:17 -0700
> Daniel Wang <[email protected]> wrote:
>
> > On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek <[email protected]> wrote:
> > >
> > > Well, I still wonder why it helped and why you do not see it with 4.4.
> > > I have a feeling that the console owner switch helped only by chance.
> > > In fact, you might be affected by a race in
> > > printk_safe_flush_on_panic() that was fixed by the commit:
> > >
> > > 554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
> > >
> > > The above one commit might be enough. Well, there was one more
> > > NMI-related race that was fixed by:
> > >
> > > ba552399954dde1b printk: Split the code for storing a message into the log buffer
> > > a338f84dc196f44b printk: Create helper function to queue deferred console handling
> > > 03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
> >
> > All of these commits already exist in 4.14 stable, since 4.14.68. The deadlock
> > still exists even when built from 4.14.73 (latest tag) though. And cherrypicking
> > dbdda842fe96 fixes it.
> >
>
> I don't see the big deal of backporting this. The biggest complaints
> about backports are from fixes that were added to late -rc releases
> where the fixes didn't get much testing. This commit was added in 4.16,
> and hasn't had any issues due to the design. Although a fix has been
> added:
>
> c14376de3a1 ("printk: Wake klogd when passing console_lock owner")
As I said, I am fine with backporting the console_lock owner stuff
into the stable release.
I just wonder (like Sergey) what the real problem is. The console_lock
owner handshake is not fully reliable. It is might be good enough
to prevent softlockup. But we should not relay on it to prevent
a deadlock.
My new theory ;-)
printk_safe_flush() is called in nmi_trigger_cpumask_backtrace().
=> watchdog_timer_fn() is blocked until all backtraces are printed.
Now, the original report complained that the system rebooted before
all backtraces were printed. It means that panic() was called
on another CPU. My guess is that it is from the hardlockup detector.
And the panic() was not able to flush the console because it was
not able to take console_lock.
IMHO, there was not a real deadlock. The console_lock owner
handshake jsut helped to get console_lock in panic() and
flush all messages before reboot => it is reasonable
and acceptable fix.
Just to be sure. Daniel, could you please send a log with
the console_lock owner stuff backported? There we would see
who called the panic() and why it rebooted early.
Best Regards,
Petr
On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek <[email protected]> wrote:
>
> On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > I don't see the big deal of backporting this. The biggest complaints
> > about backports are from fixes that were added to late -rc releases
> > where the fixes didn't get much testing. This commit was added in 4.16,
> > and hasn't had any issues due to the design. Although a fix has been
> > added:
> >
> > c14376de3a1 ("printk: Wake klogd when passing console_lock owner")
>
> As I said, I am fine with backporting the console_lock owner stuff
> into the stable release.
>
> I just wonder (like Sergey) what the real problem is. The console_lock
> owner handshake is not fully reliable. It is might be good enough
> to prevent softlockup. But we should not relay on it to prevent
> a deadlock.
Yes. I myself was curious too. :)
>
> My new theory ;-)
>
> printk_safe_flush() is called in nmi_trigger_cpumask_backtrace().
> => watchdog_timer_fn() is blocked until all backtraces are printed.
>
> Now, the original report complained that the system rebooted before
> all backtraces were printed. It means that panic() was called
> on another CPU. My guess is that it is from the hardlockup detector.
> And the panic() was not able to flush the console because it was
> not able to take console_lock.
>
> IMHO, there was not a real deadlock. The console_lock owner
> handshake jsut helped to get console_lock in panic() and
> flush all messages before reboot => it is reasonable
> and acceptable fix.
I had the same speculation. Tried to capture a lockdep snippet with
CONFIG_PROVE_LOCKING turned on but didn't get anything. But
maybe I was doing it wrong.
>
> Just to be sure. Daniel, could you please send a log with
> the console_lock owner stuff backported? There we would see
> who called the panic() and why it rebooted early.
Sure. Here is one. It's a bit long but complete. I attached another log
snippet below it which is what I got when `softlockup_panic` was turned
off. The log was from the IRQ task that was flushing the printk buffer. I
will be taking a closer look at it too but in case you'll find it helpful.
lockup-test-16-2 login: [ 89.277372] LoadPin: kernel-module
pinning-ignored obj="/tmp/release/hog.ko" pid=1992 cmdline="insmod
hog.ko"
[ 89.280029] hog: loading out-of-tree module taints kernel.
[ 89.294559] Hogging a CPU now
[ 92.619688] watchdog: BUG: soft lockup - CPU#6 stuck for 3s! [hog:1993]
[ 92.626490] Modules linked in: hog(O) ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 xt_addrtype nf_nat
br_netfilter ip6table_filter ip6_tables aesni_intel aes_x86_64
crypto_simd cryptd glue_helper
[ 92.645567] CPU: 6 PID: 1993 Comm: hog Tainted: G O 4.15.0+ #12
[ 92.652899] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.662245] RIP: 0010:hog_thread+0x13/0x1000 [hog]
[ 92.667164] RSP: 0018:ffffb489c741ff10 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffff11
[ 92.675139] RAX: 0000000000000011 RBX: ffff9f5c75a88900 RCX: 0000000000000000
[ 92.682474] RDX: ffff9f5c8339d840 RSI: ffff9f5c833954b8 RDI: ffff9f5c833954b8
[ 92.689727] RBP: ffffb489c741ff48 R08: 0000000000000030 R09: 0000000000000000
[ 92.696985] R10: 00000000000003a8 R11: 0000000000000000 R12: ffff9f5c7959e080
[ 92.704251] R13: ffffb489c7f2bc70 R14: 0000000000000000 R15: ffff9f5c75a88948
[ 92.711498] FS: 0000000000000000(0000) GS:ffff9f5c83380000(0000)
knlGS:0000000000000000
[ 92.719699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.725556] CR2: 0000558184c9b89c CR3: 0000000499e12006 CR4: 00000000003606a0
[ 92.732976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.740231] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.747487] Call Trace:
[ 92.750054] kthread+0x120/0x160
[ 92.753419] ? 0xffffffffc030d000
[ 92.756859] ? kthread_stop+0x120/0x120
[ 92.760819] ret_from_fork+0x35/0x40
[ 92.764594] Code: <eb> fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
[ 92.772656] Sending NMI from CPU 6 to CPUs 0-5,7-15:
[ 92.777743] NMI backtrace for cpu 0
[ 92.777746] CPU: 0 PID: 1844 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.777747] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.777755] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 92.777756] RSP: 0018:ffffb489c7dcbdb0 EFLAGS: 00000002
[ 92.777757] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.777758] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.777759] RBP: ffffb489c7dcbde0 R08: 00000000f9f8f56c R09: 000000004c55ba96
[ 92.777760] R10: 0000000084f6cd57 R11: 0000000041f66b45 R12: ffff9f5c82ca5a68
[ 92.777761] R13: ffffb489c7dcbe38 R14: ffffb489c7dcbe38 R15: 0000000000000040
[ 92.777762] FS: 00007f1e21116700(0000) GS:ffff9f5c83200000(0000)
knlGS:0000000000000000
[ 92.777763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.777764] CR2: 000055ada196235c CR3: 0000000edda30001 CR4: 00000000003606b0
[ 92.777768] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.777769] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.777769] Call Trace:
[ 92.777774] do_raw_spin_lock+0xa0/0xb0
[ 92.777778] _raw_spin_lock_irqsave+0x20/0x30
[ 92.777784] _extract_crng+0x45/0x120
[ 92.777787] urandom_read+0xea/0x270
[ 92.777793] vfs_read+0xad/0x170
[ 92.777795] SyS_read+0x4b/0xa0
[ 92.777798] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.777801] do_syscall_64+0x63/0x1f0
[ 92.777804] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.777806] RIP: 0033:0x7f1e20aec410
[ 92.777807] RSP: 002b:00007ffd42a321e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.777808] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1e20aec410
[ 92.777809] RDX: 0000000000100000 RSI: 00007f1e2062f000 RDI: 0000000000000000
[ 92.777810] RBP: 00007ffd42a32210 R08: ffffffffffffffff R09: 0000000000000000
[ 92.777810] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.777811] R13: 00007f1e21116690 R14: 0000000000100000 R15: 00007f1e2062f000
[ 92.777812] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff
0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00
00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81
fe 00
[ 92.777834] NMI backtrace for cpu 9
[ 92.777836] CPU: 9 PID: 1875 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.777837] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.777840] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 92.777841] RSP: 0018:ffffb489c785bdb0 EFLAGS: 00000002
[ 92.777842] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.777843] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.777844] RBP: ffffb489c785bde0 R08: 00000000af260603 R09: 0000000099b415a4
[ 92.777844] R10: 000000006ee0b179 R11: 00000000f7237bdc R12: ffff9f5c82ca5a68
[ 92.777845] R13: ffffb489c785be38 R14: ffffb489c785be38 R15: 0000000000000040
[ 92.777846] FS: 00007f4b27878700(0000) GS:ffff9f5c83440000(0000)
knlGS:0000000000000000
[ 92.777847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.777848] CR2: 00005556ecb96938 CR3: 0000000edd498005 CR4: 00000000003606a0
[ 92.777851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.777852] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.777853] Call Trace:
[ 92.777856] do_raw_spin_lock+0xa0/0xb0
[ 92.777859] _raw_spin_lock_irqsave+0x20/0x30
[ 92.777862] _extract_crng+0x45/0x120
[ 92.777865] urandom_read+0xea/0x270
[ 92.777868] vfs_read+0xad/0x170
[ 92.777870] SyS_read+0x4b/0xa0
[ 92.777872] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.777874] do_syscall_64+0x63/0x1f0
[ 92.777876] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.777877] RIP: 0033:0x7f4b2724e410
[ 92.777878] RSP: 002b:00007fffcc371cc8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.777879] RAX: ffffffffffffffda RBX: 000000000000003d RCX: 00007f4b2724e410
[ 92.777880] RDX: 0000000000100000 RSI: 00007f4b26d91000 RDI: 0000000000000000
[ 92.777881] RBP: 00007fffcc371cf0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.777881] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.777882] R13: 00007f4b27878690 R14: 0000000000100000 R15: 00007f4b26d91000
[ 92.777883] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff
0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00
00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81
fe 00
[ 92.777903] NMI backtrace for cpu 1
[ 92.777904] CPU: 1 PID: 1853 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.777905] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.777907] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.777908] RSP: 0018:ffffb489c7bd3db0 EFLAGS: 00000002
[ 92.777909] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.777909] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.777910] RBP: ffffb489c7bd3de0 R08: 000000004de274a7 R09: 000000007bb3f38c
[ 92.777911] R10: 00000000dcb5416d R11: 0000000095dfea80 R12: ffff9f5c82ca5a68
[ 92.777912] R13: ffffb489c7bd3e38 R14: ffffb489c7bd3e38 R15: 0000000000000040
[ 92.777913] FS: 00007fe443813700(0000) GS:ffff9f5c83240000(0000)
knlGS:0000000000000000
[ 92.777913] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.777914] CR2: 00007f31139788c0 CR3: 0000000eddafa001 CR4: 00000000003606a0
[ 92.777917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.777918] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.777918] Call Trace:
[ 92.777921] do_raw_spin_lock+0xa0/0xb0
[ 92.777922] _raw_spin_lock_irqsave+0x20/0x30
[ 92.777924] _extract_crng+0x45/0x120
[ 92.777926] urandom_read+0xea/0x270
[ 92.777928] vfs_read+0xad/0x170
[ 92.777930] SyS_read+0x4b/0xa0
[ 92.777931] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.777932] do_syscall_64+0x63/0x1f0
[ 92.777934] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.777935] RIP: 0033:0x7fe4431e9410
[ 92.777936] RSP: 002b:00007ffe86708e88 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.777937] RAX: ffffffffffffffda RBX: 000000000000003a RCX: 00007fe4431e9410
[ 92.777938] RDX: 0000000000100000 RSI: 00007fe442d2c000 RDI: 0000000000000000
[ 92.777938] RBP: 00007ffe86708eb0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.777939] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.777940] R13: 00007fe443813690 R14: 0000000000100000 R15: 00007fe442d2c000
[ 92.777940] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.777960] NMI backtrace for cpu 13
[ 92.777962] CPU: 13 PID: 1851 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.777963] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.777966] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.777967] RSP: 0018:ffffb489c7c9fdb0 EFLAGS: 00000002
[ 92.777968] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.777969] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.777970] RBP: ffffb489c7c9fde0 R08: 000000002e505de7 R09: 0000000094345515
[ 92.777970] R10: 00000000bde83e93 R11: 00000000764dd3c0 R12: ffff9f5c82ca5a68
[ 92.777971] R13: ffffb489c7c9fe38 R14: ffffb489c7c9fe38 R15: 0000000000000040
[ 92.777972] FS: 00007fc869785700(0000) GS:ffff9f5c83540000(0000)
knlGS:0000000000000000
[ 92.777973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.777974] CR2: 000000c420d93000 CR3: 0000000ef4dee004 CR4: 00000000003606a0
[ 92.777977] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.777978] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.777978] Call Trace:
[ 92.777982] do_raw_spin_lock+0xa0/0xb0
[ 92.777984] _raw_spin_lock_irqsave+0x20/0x30
[ 92.777987] _extract_crng+0x45/0x120
[ 92.777989] urandom_read+0xea/0x270
[ 92.777991] vfs_read+0xad/0x170
[ 92.777993] SyS_read+0x4b/0xa0
[ 92.778006] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778007] do_syscall_64+0x63/0x1f0
[ 92.778010] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778011] RIP: 0033:0x7fc86915b410
[ 92.778012] RSP: 002b:00007ffc289f5578 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778013] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007fc86915b410
[ 92.778014] RDX: 0000000000100000 RSI: 00007fc868c9e000 RDI: 0000000000000000
[ 92.778015] RBP: 00007ffc289f55a0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778015] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778016] R13: 00007fc869785690 R14: 0000000000100000 R15: 00007fc868c9e000
[ 92.778017] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778037] NMI backtrace for cpu 5
[ 92.778038] CPU: 5 PID: 1865 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778039] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778041] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778041] RSP: 0018:ffffb489c791fdb0 EFLAGS: 00000002
[ 92.778042] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778043] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778044] RBP: ffffb489c791fde0 R08: 00000000b5f5cc7e R09: 000000001db25a77
[ 92.778044] R10: 000000000ffafde2 R11: 00000000fdf34257 R12: ffff9f5c82ca5a68
[ 92.778045] R13: ffffb489c791fe38 R14: ffffb489c791fe38 R15: 0000000000000040
[ 92.778046] FS: 00007f495240c700(0000) GS:ffff9f5c83340000(0000)
knlGS:0000000000000000
[ 92.778047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778048] CR2: 000000c420d8d068 CR3: 0000000edd40a004 CR4: 00000000003606a0
[ 92.778051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778052] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778052] Call Trace:
[ 92.778054] do_raw_spin_lock+0xa0/0xb0
[ 92.778056] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778058] _extract_crng+0x45/0x120
[ 92.778060] urandom_read+0xea/0x270
[ 92.778062] vfs_read+0xad/0x170
[ 92.778064] SyS_read+0x4b/0xa0
[ 92.778065] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778066] do_syscall_64+0x63/0x1f0
[ 92.778068] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778069] RIP: 0033:0x7f4951de2410
[ 92.778070] RSP: 002b:00007fff89373808 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778071] RAX: ffffffffffffffda RBX: 0000000000000029 RCX: 00007f4951de2410
[ 92.778071] RDX: 0000000000100000 RSI: 00007f4951925000 RDI: 0000000000000000
[ 92.778072] RBP: 00007fff89373830 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778073] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778073] R13: 00007f495240c690 R14: 0000000000100000 R15: 00007f4951925000
[ 92.778074] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778094] NMI backtrace for cpu 2
[ 92.778096] CPU: 2 PID: 1850 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778097] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778100] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778101] RSP: 0018:ffffb489c7573db0 EFLAGS: 00000002
[ 92.778102] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778103] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778104] RBP: ffffb489c7573de0 R08: 000000005881149c R09: 0000000016f603e6
[ 92.778105] R10: 000000007e14a1cc R11: 00000000a07e8a75 R12: ffff9f5c82ca5a68
[ 92.778105] R13: ffffb489c7573e38 R14: ffffb489c7573e38 R15: 0000000000000040
[ 92.778107] FS: 00007f30c1a57700(0000) GS:ffff9f5c83280000(0000)
knlGS:0000000000000000
[ 92.778108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778108] CR2: 000055ada37359c0 CR3: 0000000eddabe001 CR4: 00000000003606a0
[ 92.778112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778113] Call Trace:
[ 92.778116] do_raw_spin_lock+0xa0/0xb0
[ 92.778118] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778121] _extract_crng+0x45/0x120
[ 92.778123] urandom_read+0xea/0x270
[ 92.778125] vfs_read+0xad/0x170
[ 92.778127] SyS_read+0x4b/0xa0
[ 92.778129] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778130] do_syscall_64+0x63/0x1f0
[ 92.778133] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778134] RIP: 0033:0x7f30c142d410
[ 92.778135] RSP: 002b:00007ffc67fe0ac8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778136] RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007f30c142d410
[ 92.778137] RDX: 0000000000100000 RSI: 00007f30c0f70000 RDI: 0000000000000000
[ 92.778138] RBP: 00007ffc67fe0af0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778139] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778139] R13: 00007f30c1a57690 R14: 0000000000100000 R15: 00007f30c0f70000
[ 92.778140] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778160] NMI backtrace for cpu 10
[ 92.778162] CPU: 10 PID: 1846 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778162] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778164] RIP: 0010:native_queued_spin_lock_slowpath+0x20/0x1b0
[ 92.778165] RSP: 0018:ffffb489c7cfbdb0 EFLAGS: 00000097
[ 92.778166] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778167] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778168] RBP: ffffb489c7cfbde0 R08: 00000000c8adce20 R09: 00000000488b6915
[ 92.778168] R10: 00000000b1e0660c R11: 0000000010ab43f9 R12: ffff9f5c82ca5a68
[ 92.778169] R13: ffffb489c7cfbe38 R14: ffffb489c7cfbe38 R15: 0000000000000040
[ 92.778170] FS: 00007fc53087c700(0000) GS:ffff9f5c83480000(0000)
knlGS:0000000000000000
[ 92.778171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778172] CR2: 00007f31139788c0 CR3: 0000000edda36006 CR4: 00000000003606a0
[ 92.778175] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778176] Call Trace:
[ 92.778178] do_raw_spin_lock+0xa0/0xb0
[ 92.778180] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778182] _extract_crng+0x45/0x120
[ 92.778184] urandom_read+0xea/0x270
[ 92.778186] vfs_read+0xad/0x170
[ 92.778188] SyS_read+0x4b/0xa0
[ 92.778189] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778190] do_syscall_64+0x63/0x1f0
[ 92.778192] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778193] RIP: 0033:0x7fc530252410
[ 92.778193] RSP: 002b:00007fffe389c818 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778195] RAX: ffffffffffffffda RBX: 000000000000003a RCX: 00007fc530252410
[ 92.778195] RDX: 0000000000100000 RSI: 00007fc52fd95000 RDI: 0000000000000000
[ 92.778196] RBP: 00007fffe389c840 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778197] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778197] R13: 00007fc53087c690 R14: 0000000000100000 R15: 00007fc52fd95000
[ 92.778198] Code: 00 00 00 e9 1d fe ff ff 0f 1f 00 0f 1f 44 00 00
8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a
f0 0f b1 17 <85> c0 75 f2 5d c3 f3 90 eb ec 81 fe 00 01 00 00 0f 84 9a
00 00
[ 92.778218] NMI backtrace for cpu 8
[ 92.778220] CPU: 8 PID: 1848 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778220] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778222] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 92.778223] RSP: 0018:ffffb489c7d6bdb0 EFLAGS: 00000002
[ 92.778224] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778225] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778225] RBP: ffffb489c7d6bde0 R08: 000000008a2cdbe2 R09: 00000000b1c3e3b9
[ 92.778226] R10: 000000007230bf45 R11: 00000000d22a51bb R12: ffff9f5c82ca5a68
[ 92.778227] R13: ffffb489c7d6be38 R14: ffffb489c7d6be38 R15: 0000000000000040
[ 92.778228] FS: 00007f3d84692700(0000) GS:ffff9f5c83400000(0000)
knlGS:0000000000000000
[ 92.778229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778229] CR2: 00007fe831dbc140 CR3: 0000000ede342001 CR4: 00000000003606a0
[ 92.778232] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778233] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778233] Call Trace:
[ 92.778235] do_raw_spin_lock+0xa0/0xb0
[ 92.778237] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778239] _extract_crng+0x45/0x120
[ 92.778241] urandom_read+0xea/0x270
[ 92.778243] vfs_read+0xad/0x170
[ 92.778245] SyS_read+0x4b/0xa0
[ 92.778246] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778247] do_syscall_64+0x63/0x1f0
[ 92.778249] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778250] RIP: 0033:0x7f3d84068410
[ 92.778251] RSP: 002b:00007fffea90d928 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778252] RAX: ffffffffffffffda RBX: 000000000000003b RCX: 00007f3d84068410
[ 92.778253] RDX: 0000000000100000 RSI: 00007f3d83bab000 RDI: 0000000000000000
[ 92.778253] RBP: 00007fffea90d950 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778254] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778255] R13: 00007f3d84692690 R14: 0000000000100000 R15: 00007f3d83bab000
[ 92.778255] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff
0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00
00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81
fe 00
[ 92.778275] NMI backtrace for cpu 4
[ 92.778277] CPU: 4 PID: 1864 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778278] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778281] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778282] RSP: 0018:ffffb489c7ddbdb0 EFLAGS: 00000002
[ 92.778283] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778284] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778285] RBP: ffffb489c7ddbde0 R08: 000000000ffa62a0 R09: 000000002e18c499
[ 92.778285] R10: 000000007d13e3b0 R11: 0000000057f7d879 R12: ffff9f5c82ca5a68
[ 92.778286] R13: ffffb489c7ddbe38 R14: ffffb489c7ddbe38 R15: 0000000000000040
[ 92.778287] FS: 00007f2f743d5700(0000) GS:ffff9f5c83300000(0000)
knlGS:0000000000000000
[ 92.778288] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778289] CR2: 000055c6f86eddf8 CR3: 0000000eddbde005 CR4: 00000000003606a0
[ 92.778292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778293] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778293] Call Trace:
[ 92.778297] do_raw_spin_lock+0xa0/0xb0
[ 92.778299] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778302] _extract_crng+0x45/0x120
[ 92.778304] urandom_read+0xea/0x270
[ 92.778306] vfs_read+0xad/0x170
[ 92.778308] SyS_read+0x4b/0xa0
[ 92.778310] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778311] do_syscall_64+0x63/0x1f0
[ 92.778313] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778315] RIP: 0033:0x7f2f73dab410
[ 92.778315] RSP: 002b:00007ffcbb71e838 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778317] RAX: ffffffffffffffda RBX: 0000000000000047 RCX: 00007f2f73dab410
[ 92.778317] RDX: 0000000000100000 RSI: 00007f2f738ee000 RDI: 0000000000000000
[ 92.778318] RBP: 00007ffcbb71e860 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778319] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778320] R13: 00007f2f743d5690 R14: 0000000000100000 R15: 00007f2f738ee000
[ 92.778321] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778341] NMI backtrace for cpu 12
[ 92.778342] CPU: 12 PID: 1860 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778343] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778345] RIP: 0010:native_queued_spin_lock_slowpath+0x20/0x1b0
[ 92.778346] RSP: 0018:ffffb489c74f3db0 EFLAGS: 00000046
[ 92.778346] RAX: 0000000000000000 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778347] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778348] RBP: ffffb489c74f3de0 R08: 00000000a966dd49 R09: 00000000fab8387b
[ 92.778349] R10: 0000000036100fb1 R11: 00000000f1645322 R12: ffff9f5c82ca5a68
[ 92.778349] R13: ffffb489c74f3e38 R14: ffffb489c74f3e38 R15: 0000000000000040
[ 92.778350] FS: 00007f214a3b0700(0000) GS:ffff9f5c83500000(0000)
knlGS:0000000000000000
[ 92.778351] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778352] CR2: 00007efe1164bba0 CR3: 0000000eddb92002 CR4: 00000000003606a0
[ 92.778356] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778357] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778357] Call Trace:
[ 92.778359] do_raw_spin_lock+0xa0/0xb0
[ 92.778361] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778363] _extract_crng+0x45/0x120
[ 92.778365] urandom_read+0xea/0x270
[ 92.778367] vfs_read+0xad/0x170
[ 92.778369] SyS_read+0x4b/0xa0
[ 92.778370] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778371] do_syscall_64+0x63/0x1f0
[ 92.778373] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778374] RIP: 0033:0x7f2149d86410
[ 92.778375] RSP: 002b:00007ffec719e588 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778376] RAX: ffffffffffffffda RBX: 0000000000000044 RCX: 00007f2149d86410
[ 92.778376] RDX: 0000000000100000 RSI: 00007f21498c9000 RDI: 0000000000000000
[ 92.778377] RBP: 00007ffec719e5b0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778378] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778378] R13: 00007f214a3b0690 R14: 0000000000100000 R15: 00007f21498c9000
[ 92.778379] Code: 00 00 00 e9 1d fe ff ff 0f 1f 00 0f 1f 44 00 00
8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a
f0 0f b1 17 <85> c0 75 f2 5d c3 f3 90 eb ec 81 fe 00 01 00 00 0f 84 9a
00 00
[ 92.778399] NMI backtrace for cpu 7
[ 92.778402] CPU: 7 PID: 1871 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778402] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778406] RIP: 0010:native_queued_spin_lock_slowpath+0x18/0x1b0
[ 92.778406] RSP: 0018:ffffb489c7c03db0 EFLAGS: 00000002
[ 92.778408] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778408] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778409] RBP: ffffb489c7c03de0 R08: 00000000aa046079 R09: 000000001b50b1a2
[ 92.778410] R10: 00000000a366b6ee R11: 00000000f201d652 R12: ffff9f5c82ca5a68
[ 92.778411] R13: ffffb489c7c03e38 R14: ffffb489c7c03e38 R15: 0000000000000040
[ 92.778412] FS: 00007f2a946e7700(0000) GS:ffff9f5c833c0000(0000)
knlGS:0000000000000000
[ 92.778413] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778413] CR2: 00005588da6ad210 CR3: 0000000edd470001 CR4: 00000000003606a0
[ 92.778417] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778418] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778418] Call Trace:
[ 92.778422] do_raw_spin_lock+0xa0/0xb0
[ 92.778424] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778426] _extract_crng+0x45/0x120
[ 92.778429] urandom_read+0xea/0x270
[ 92.778431] vfs_read+0xad/0x170
[ 92.778433] SyS_read+0x4b/0xa0
[ 92.778435] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778436] do_syscall_64+0x63/0x1f0
[ 92.778438] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778439] RIP: 0033:0x7f2a940bd410
[ 92.778440] RSP: 002b:00007fff62b4d7a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778441] RAX: ffffffffffffffda RBX: 0000000000000035 RCX: 00007f2a940bd410
[ 92.778442] RDX: 0000000000100000 RSI: 00007f2a93c00000 RDI: 0000000000000000
[ 92.778443] RBP: 00007fff62b4d7d0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778444] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778444] R13: 00007f2a946e7690 R14: 0000000000100000 R15: 00007f2a93c00000
[ 92.778445] Code: 48 8b 2c 24 48 c7 00 00 00 00 00 e9 1d fe ff ff
0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85 c0 7e 1a ba 01 00 00
00 90 8b 07 <85> c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 eb ec 81
fe 00
[ 92.778473] NMI backtrace for cpu 3
[ 92.778476] CPU: 3 PID: 1862 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778476] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778480] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778480] RSP: 0018:ffffb489c7c67db0 EFLAGS: 00000002
[ 92.778482] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778482] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778483] RBP: ffffb489c7c67de0 R08: 000000000bb268a5 R09: 0000000023d30aaf
[ 92.778484] R10: 00000000020fd5a8 R11: 0000000053afde7e R12: ffff9f5c82ca5a68
[ 92.778485] R13: ffffb489c7c67e38 R14: ffffb489c7c67e38 R15: 0000000000000040
[ 92.778486] FS: 00007f9aeb39d700(0000) GS:ffff9f5c832c0000(0000)
knlGS:0000000000000000
[ 92.778487] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778488] CR2: 00007f8af1af22d0 CR3: 0000000eddbba006 CR4: 00000000003606a0
[ 92.778491] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778492] Call Trace:
[ 92.778496] do_raw_spin_lock+0xa0/0xb0
[ 92.778498] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778501] _extract_crng+0x45/0x120
[ 92.778503] urandom_read+0xea/0x270
[ 92.778505] vfs_read+0xad/0x170
[ 92.778507] SyS_read+0x4b/0xa0
[ 92.778509] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778510] do_syscall_64+0x63/0x1f0
[ 92.778512] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778514] RIP: 0033:0x7f9aead73410
[ 92.778514] RSP: 002b:00007fff1035a0a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778516] RAX: ffffffffffffffda RBX: 000000000000003e RCX: 00007f9aead73410
[ 92.778516] RDX: 0000000000100000 RSI: 00007f9aea8b6000 RDI: 0000000000000000
[ 92.778517] RBP: 00007fff1035a0d0 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778518] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778519] R13: 00007f9aeb39d690 R14: 0000000000100000 R15: 00007f9aea8b6000
[ 92.778520] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778541] NMI backtrace for cpu 11
[ 92.778542] CPU: 11 PID: 1870 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778543] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778545] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778546] RSP: 0018:ffffb489c7e13db0 EFLAGS: 00000002
[ 92.778547] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778548] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778548] RBP: ffffb489c7e13de0 R08: 00000000ee7a4106 R09: 00000000e50a300e
[ 92.778549] R10: 00000000dc4f7e72 R11: 000000003677b6df R12: ffff9f5c82ca5a68
[ 92.778550] R13: ffffb489c7e13e38 R14: ffffb489c7e13e38 R15: 0000000000000040
[ 92.778551] FS: 00007fab474ee700(0000) GS:ffff9f5c834c0000(0000)
knlGS:0000000000000000
[ 92.778552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778553] CR2: 0000563ea45d4938 CR3: 0000000edd466005 CR4: 00000000003606a0
[ 92.778556] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778557] Call Trace:
[ 92.778559] do_raw_spin_lock+0xa0/0xb0
[ 92.778561] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778564] _extract_crng+0x45/0x120
[ 92.778566] urandom_read+0xea/0x270
[ 92.778568] vfs_read+0xad/0x170
[ 92.778570] SyS_read+0x4b/0xa0
[ 92.778571] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778572] do_syscall_64+0x63/0x1f0
[ 92.778574] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778575] RIP: 0033:0x7fab46ec4410
[ 92.778575] RSP: 002b:00007fff47b5e7e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778577] RAX: ffffffffffffffda RBX: 0000000000000036 RCX: 00007fab46ec4410
[ 92.778577] RDX: 0000000000100000 RSI: 00007fab46a07000 RDI: 0000000000000000
[ 92.778578] RBP: 00007fff47b5e810 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778579] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778579] R13: 00007fab474ee690 R14: 0000000000100000 R15: 00007fab46a07000
[ 92.778580] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778601] NMI backtrace for cpu 15
[ 92.778603] CPU: 15 PID: 1857 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778603] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778605] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778606] RSP: 0018:ffffb489c7cb3db0 EFLAGS: 00000002
[ 92.778607] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778607] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778608] RBP: ffffb489c7cb3de0 R08: 000000004be85ff2 R09: 0000000018b0b19c
[ 92.778609] R10: 0000000035e781b4 R11: 0000000093e5d5cb R12: ffff9f5c82ca5a68
[ 92.778610] R13: ffffb489c7cb3e38 R14: ffffb489c7cb3e38 R15: 0000000000000040
[ 92.778611] FS: 00007f05286cc700(0000) GS:ffff9f5c835c0000(0000)
knlGS:0000000000000000
[ 92.778611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778612] CR2: 00005626d2f49210 CR3: 0000000eddb90005 CR4: 00000000003606a0
[ 92.778615] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778616] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778616] Call Trace:
[ 92.778618] do_raw_spin_lock+0xa0/0xb0
[ 92.778620] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778622] _extract_crng+0x45/0x120
[ 92.778624] urandom_read+0xea/0x270
[ 92.778626] vfs_read+0xad/0x170
[ 92.778628] SyS_read+0x4b/0xa0
[ 92.778629] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778630] do_syscall_64+0x63/0x1f0
[ 92.778632] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778633] RIP: 0033:0x7f05280a2410
[ 92.778634] RSP: 002b:00007ffc27d2fa58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778635] RAX: ffffffffffffffda RBX: 000000000000002e RCX: 00007f05280a2410
[ 92.778635] RDX: 0000000000100000 RSI: 00007f0527be5000 RDI: 0000000000000000
[ 92.778636] RBP: 00007ffc27d2fa80 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778637] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778637] R13: 00007f05286cc690 R14: 0000000000100000 R15: 00007f0527be5000
[ 92.778638] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778659] NMI backtrace for cpu 14
[ 92.778661] CPU: 14 PID: 1867 Comm: dd Tainted: G O 4.15.0+ #12
[ 92.778661] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 92.778664] RIP: 0010:native_queued_spin_lock_slowpath+0x28/0x1b0
[ 92.778665] RSP: 0018:ffffb489c7e03db0 EFLAGS: 00000002
[ 92.778666] RAX: 0000000000000001 RBX: ffff9f5c82ca5a68 RCX: 0000000000000000
[ 92.778667] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9f5c82ca5a68
[ 92.778668] RBP: ffffb489c7e03de0 R08: 00000000edb4d0c8 R09: 00000000a4a0a15d
[ 92.778669] R10: 000000004eecd136 R11: 0000000035b246a1 R12: ffff9f5c82ca5a68
[ 92.778669] R13: ffffb489c7e03e38 R14: ffffb489c7e03e38 R15: 0000000000000040
[ 92.778671] FS: 00007ff0fd2ee700(0000) GS:ffff9f5c83580000(0000)
knlGS:0000000000000000
[ 92.778671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.778672] CR2: 000000c420dbc010 CR3: 0000000edd43c004 CR4: 00000000003606a0
[ 92.778676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.778676] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.778677] Call Trace:
[ 92.778680] do_raw_spin_lock+0xa0/0xb0
[ 92.778682] _raw_spin_lock_irqsave+0x20/0x30
[ 92.778684] _extract_crng+0x45/0x120
[ 92.778686] urandom_read+0xea/0x270
[ 92.778689] vfs_read+0xad/0x170
[ 92.778691] SyS_read+0x4b/0xa0
[ 92.778692] ? __audit_syscall_exit+0x21e/0x2c0
[ 92.778694] do_syscall_64+0x63/0x1f0
[ 92.778696] entry_SYSCALL64_slow_path+0x25/0x25
[ 92.778697] RIP: 0033:0x7ff0fccc4410
[ 92.778698] RSP: 002b:00007ffe7ee66a28 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 92.778699] RAX: ffffffffffffffda RBX: 000000000000003c RCX: 00007ff0fccc4410
[ 92.778700] RDX: 0000000000100000 RSI: 00007ff0fc807000 RDI: 0000000000000000
[ 92.778701] RBP: 00007ffe7ee66a50 R08: ffffffffffffffff R09: 0000000000000000
[ 92.778702] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 92.778702] R13: 00007ff0fd2ee690 R14: 0000000000100000 R15: 00007ff0fc807000
[ 92.778703] Code: 0f 1f 00 0f 1f 44 00 00 8b 05 d5 ad d8 00 55 85
c0 7e 1a ba 01 00 00 00 90 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2
5d c3 f3 90 <eb> ec 81 fe 00 01 00 00 0f 84 9a 00 00 00 41 b8 01 01 00
00 b9
[ 92.778780] Kernel panic - not syncing: softlockup: hung tasks
[ 95.939261] CPU: 6 PID: 1993 Comm: hog Tainted: G O L 4.15.0+ #12
[ 95.946506] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 95.955921] Call Trace:
[ 95.958832] <IRQ>
[ 95.960962] dump_stack+0x63/0x8a
[ 95.964394] panic+0xd6/0x22d
[ 95.967473] ? cpumask_next+0x1a/0x20
[ 95.971280] watchdog_timer_fn+0x22b/0x240
[ 95.975486] ? watchdog+0x30/0x30
[ 95.979099] __hrtimer_run_queues+0xd6/0x2f0
[ 95.983585] hrtimer_interrupt+0x11b/0x290
[ 95.987793] smp_apic_timer_interrupt+0x6c/0x140
[ 95.992524] apic_timer_interrupt+0x98/0xa0
[ 95.996819] </IRQ>
[ 95.999032] RIP: 0010:hog_thread+0x13/0x1000 [hog]
[ 96.003933] RSP: 0018:ffffb489c741ff10 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffff11
[ 96.011610] RAX: 0000000000000011 RBX: ffff9f5c75a88900 RCX: 0000000000000000
[ 96.018852] RDX: ffff9f5c8339d840 RSI: ffff9f5c833954b8 RDI: ffff9f5c833954b8
[ 96.026095] RBP: ffffb489c741ff48 R08: 0000000000000030 R09: 0000000000000000
[ 96.033339] R10: 00000000000003a8 R11: 0000000000000000 R12: ffff9f5c7959e080
[ 96.040621] R13: ffffb489c7f2bc70 R14: 0000000000000000 R15: ffff9f5c75a88948
[ 96.047876] kthread+0x120/0x160
[ 96.051235] ? 0xffffffffc030d000
[ 96.054662] ? kthread_stop+0x120/0x120
[ 96.058611] ret_from_fork+0x35/0x40
[ 96.064388] Kernel Offset: 0x31000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 96.075390] ACPI MEMORY or I/O RESET_REG.
SeaBIOS (version 1.8.2-20171012_061934-google) <----- Reboot
happened here
Total RAM Size = 0x0000000f00000000 = 61440 MiB
CPUs found: 16 Max CPUs supported: 16
=====================
Log snippet for the buffer flushing worker when `softlockup_panic` is
turned off:
[ 348.058207] NMI backtrace for cpu 8
[ 348.058207] CPU: 8 PID: 1700 Comm: dd Tainted: G O L 4.14.73 #18
[ 348.058208] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 348.058208] task: ffff9afe5dfc0000 task.stack: ffffbc14c7d14000
[ 348.058208] RIP: 0010:delay_tsc+0x35/0x50
[ 348.058209] RSP: 0018:ffff9afe83403e50 EFLAGS: 00000087
[ 348.058210] RAX: 000000b377ae8e51 RBX: ffffffffa13283c0 RCX: 000000b377ae8e28
[ 348.058210] RDX: 0000000000000029 RSI: 0000000000000008 RDI: 0000000000000899
[ 348.058210] RBP: ffff9afe83403e78 R08: 0000000000000030 R09: 0000000000000000
[ 348.058211] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002708
[ 348.058211] R13: 0000000000000020 R14: ffffffffa12cfc89 R15: ffffffffa13283c0
[ 348.058212] FS: 00007f87d366d700(0000) GS:ffff9afe83400000(0000)
knlGS:0000000000000000
[ 348.058212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 348.058213] CR2: 00007f3a666bf130 CR3: 0000000eeed3e005 CR4: 00000000003606a0
[ 348.058213] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 348.058213] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 348.058214] Call Trace:
[ 348.058214] <IRQ>
[ 348.058214] wait_for_xmitr+0x2c/0xb0
[ 348.058215] serial8250_console_putchar+0x1c/0x40
[ 348.058215] ? wait_for_xmitr+0xb0/0xb0
[ 348.058215] uart_console_write+0x33/0x70
[ 348.058216] serial8250_console_write+0xe2/0x2b0
[ 348.058216] ? msg_print_text+0xa6/0x110
[ 348.058216] console_unlock+0x306/0x4a0
[ 348.058217] wake_up_klogd_work_func+0x55/0x60
[ 348.058217] irq_work_run_list+0x50/0x80
[ 348.058217] smp_irq_work_interrupt+0x3f/0xe0
[ 348.058218] irq_work_interrupt+0x7d/0x90
[ 348.058218] </IRQ>
[ 348.058218] RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x20
[ 348.058219] RSP: 0018:ffffbc14c7d17e00 EFLAGS: 00000212 ORIG_RAX:
ffffffffffffff09
[ 348.058220] RAX: 0000000000000008 RBX: 0000000000000212 RCX: 00000000f051d16f
[ 348.058220] RDX: 00000000d5d8d427 RSI: 0000000000000212 RDI: 0000000000000212
[ 348.058220] RBP: ffffbc14c7d17e08 R08: 000000007064b05b R09: 000000008702a7b3
[ 348.058221] R10: 000000007b5e67a9 R11: 00000000bd0b4c4f R12: 00007f87d2c63200
[ 348.058221] R13: 00000000000dd200 R14: ffffbc14c7d17e30 R15: 0000000000000040
[ 348.058221] urandom_read+0xf9/0x2c0
[ 348.058222] vfs_read+0xad/0x170
[ 348.058222] SyS_read+0x4b/0xa0
[ 348.058222] ? __audit_syscall_exit+0x21e/0x2c0
[ 348.058223] do_syscall_64+0x70/0x200
[ 348.058223] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 348.058224] RIP: 0033:0x7f87d3043410
[ 348.058224] RSP: 002b:00007ffff267bb58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 348.058225] RAX: ffffffffffffffda RBX: 000000000000002b RCX: 00007f87d3043410
[ 348.058225] RDX: 0000000000100000 RSI: 00007f87d2b86000 RDI: 0000000000000000
[ 348.058226] RBP: 00007ffff267bb80 R08: ffffffffffffffff R09: 0000000000000000
[ 348.058226] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
[ 348.058227] R13: 00007f87d366d690 R14: 0000000000100000 R15: 00007f87d2b86000
[ 348.058227] Code: a3 99 5f 0f ae e8 0f 31 48 89 d1 48 c1 e1 20 48
09 c1 0f ae e8 0f 31 48 c1 e2 20 48 09 d0 48 89 c2 48 29 ca 48 39 fa
73 15 f3 90 <65> 8b 15 9c a3 99 5f 39 d6 74 dc 48 29 c1 48 01 cf eb be
5d c3
--
Best,
Daniel
On Wed, 3 Oct 2018 10:16:08 -0700
Daniel Wang <[email protected]> wrote:
> On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek <[email protected]> wrote:
> >
> > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > I don't see the big deal of backporting this. The biggest complaints
> > > about backports are from fixes that were added to late -rc releases
> > > where the fixes didn't get much testing. This commit was added in 4.16,
> > > and hasn't had any issues due to the design. Although a fix has been
> > > added:
> > >
> > > c14376de3a1 ("printk: Wake klogd when passing console_lock owner")
> >
> > As I said, I am fine with backporting the console_lock owner stuff
> > into the stable release.
> >
> > I just wonder (like Sergey) what the real problem is. The console_lock
> > owner handshake is not fully reliable. It is might be good enough
I'm not sure what you mean by 'not fully reliable'
> > to prevent softlockup. But we should not relay on it to prevent
> > a deadlock.
>
> Yes. I myself was curious too. :)
>
> >
> > My new theory ;-)
> >
> > printk_safe_flush() is called in nmi_trigger_cpumask_backtrace().
> > => watchdog_timer_fn() is blocked until all backtraces are printed.
> >
> > Now, the original report complained that the system rebooted before
> > all backtraces were printed. It means that panic() was called
> > on another CPU. My guess is that it is from the hardlockup detector.
> > And the panic() was not able to flush the console because it was
> > not able to take console_lock.
> >
> > IMHO, there was not a real deadlock. The console_lock owner
> > handshake jsut helped to get console_lock in panic() and
> > flush all messages before reboot => it is reasonable
> > and acceptable fix.
Agreed.
>
> I had the same speculation. Tried to capture a lockdep snippet with
> CONFIG_PROVE_LOCKING turned on but didn't get anything. But
> maybe I was doing it wrong.
>
> >
> > Just to be sure. Daniel, could you please send a log with
> > the console_lock owner stuff backported? There we would see
> > who called the panic() and why it rebooted early.
>
> Sure. Here is one. It's a bit long but complete. I attached another log
> snippet below it which is what I got when `softlockup_panic` was turned
> off. The log was from the IRQ task that was flushing the printk buffer. I
> will be taking a closer look at it too but in case you'll find it helpful.
Just so I understand correctly. Does the panic hit with and without the
suggested backport patch? The only difference is that you get the full
output with the patch and limited output without it?
-- Steve
On Wed, Oct 3, 2018 at 10:37 AM Steven Rostedt <[email protected]> wrote:
> Just so I understand correctly. Does the panic hit with and without the
> suggested backport patch? The only difference is that you get the full
> output with the patch and limited output without it?
When `softlockup_panic` is set (which is what my original repro had and
what we use in production), without the backport patch, the expected panic
would hit a seemingly deadlock. So even when the machine is configured
to reboot immediately after the panic (kernel.panic=-1), it just hangs there
with an incomplete backtrace. With your patch, the deadlock doesn't happen
and the machine reboots successfully.
This was and still is the issue this thread is trying to fix. The last
log snippet
was from an "experiment" that I did in order to understand what's really
happening. So far the speculation has been that the panic path was trying
to get a lock held by a backtrace dumping thread, but there is not enough
evidence which thread is holding the lock and how it uses it. So I set
`softlockup_panic` to 0, to get panic out of the equation. Then I saw that one
CPU was indeed holding the console lock, trying to write something out. If
the panic was to hit while it's doing that, we might get a deadlock.
>
> -- Steve
>
--
Best,
Daniel
I wanted to let you know that I am leaving for a two-week vacation. So
if you don't hear from me during that period assume bad network
connectivity and not lack of enthusiasm. :) Feel free to go with the
backports if we reach an agreement here. Otherwise I'll do it when I get
back. Thank you all!
On (10/03/18 11:37), Daniel Wang wrote:
> When `softlockup_panic` is set (which is what my original repro had and
> what we use in production), without the backport patch, the expected panic
> would hit a seemingly deadlock. So even when the machine is configured
> to reboot immediately after the panic (kernel.panic=-1), it just hangs there
> with an incomplete backtrace. With your patch, the deadlock doesn't happen
> and the machine reboots successfully.
>
> This was and still is the issue this thread is trying to fix. The last
> log snippet
> was from an "experiment" that I did in order to understand what's really
> happening. So far the speculation has been that the panic path was trying
> to get a lock held by a backtrace dumping thread, but there is not enough
> evidence which thread is holding the lock and how it uses it. So I set
> `softlockup_panic` to 0, to get panic out of the equation. Then I saw that one
> CPU was indeed holding the console lock, trying to write something out. If
> the panic was to hit while it's doing that, we might get a deadlock.
Hmm, console_sem state is ignored when we flush logbuf, so it's OK to
have it locked when we declare panic():
void console_flush_on_panic(void)
{
/*
* If someone else is holding the console lock, trylock will fail
* and may_schedule may be set. Ignore and proceed to unlock so
* that messages are flushed out. As this can be called from any
* context and we don't want to get preempted while flushing,
* ensure may_schedule is cleared.
*/
console_trylock();
console_may_schedule = 0;
console_unlock();
}
Things are not so simple with uart_port lock. Generally speaking we
should deadlock when we NMI panic() kills the system while one of the
CPUs holds uart_port lock.
8250 has sort of a workaround for this scenario:
serial8250_console_write()
{
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(&port->lock, flags);
else
spin_lock_irqsave(&port->lock, flags);
...
uart_console_write(port, s, count, serial8250_console_putchar);
...
if (locked)
spin_unlock_irqrestore(&port->lock, flags);
}
Note, spin_trylock_irqsave() path.
So, as long as we are in sysrq or oops_in_progress, uart_port lock state
is sort of ignored.
Looking at your backtraces:
---
[ 348.058207] NMI backtrace for cpu 8
[ 348.058207] CPU: 8 PID: 1700 Comm: dd Tainted: G O L 4.14.73 #18
[ 348.058214] <IRQ>
[ 348.058214] wait_for_xmitr+0x2c/0xb0
[ 348.058215] serial8250_console_putchar+0x1c/0x40
[ 348.058215] ? wait_for_xmitr+0xb0/0xb0
[ 348.058215] uart_console_write+0x33/0x70
[ 348.058216] serial8250_console_write+0xe2/0x2b0
[ 348.058216] ? msg_print_text+0xa6/0x110
[ 348.058216] console_unlock+0x306/0x4a0
[ 348.058217] wake_up_klogd_work_func+0x55/0x60
[ 348.058217] irq_work_run_list+0x50/0x80
[ 348.058217] smp_irq_work_interrupt+0x3f/0xe0
[ 348.058218] irq_work_interrupt+0x7d/0x90
---
Now... the problem. A theory, in fact.
panic() sets oops_in_progress back to zero - bust_spinlocks(0) - too soon.
When we do console_flush_on_panic() we ignore console_sem state and go
to the 8250 driver - serial8250_console_write(). But at this point
oops_in_progress is zero, so we endup in spin_lock_irqsave(&port->lock, flags).
If the port->lock was already locked, then this is your deadlock. We
can't emergency_restart() because the panic() CPU stuck spinning on
port->lock in serial8250_console_write(), so it never returns from
console_flush_on_panic() and there is no progress.
---
void panic(const char *fmt, ...)
{
....
bust_spinlocks(0);
/*
* We may have ended up stopping the CPU holding the lock (in
* smp_send_stop()) while still having some valuable data in the console
* buffer. Try to acquire the lock then release it regardless of the
* result. The release will also print the buffers out. Locks debug
* should be disabled to avoid reporting bad unlock balance when
* panic() is not being callled from OOPS.
*/
debug_locks_off();
console_flush_on_panic();
if (!panic_blink)
panic_blink = no_blink;
if (panic_timeout > 0) {
/*
* Delay timeout seconds before rebooting the machine.
* We can't use the "normal" timers since we just panicked.
*/
pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
touch_nmi_watchdog();
if (i >= i_next) {
i += panic_blink(state ^= 1);
i_next = i + 3600 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
}
}
if (panic_timeout != 0) {
/*
* This will not be a clean reboot, with everything
* shutting down. But if there is a chance of
* rebooting the system it will be rebooted.
*/
emergency_restart();
}
---
So... Just an idea. Can you try a very dirty hack? Forcibly increase
oops_in_progress in panic() before console_flush_on_panic(), so 8250
serial8250_console_write() will use spin_trylock_irqsave() and maybe
avoid deadlock.
-ss
On Wed 2018-10-03 13:37:04, Steven Rostedt wrote:
> On Wed, 3 Oct 2018 10:16:08 -0700
> Daniel Wang <[email protected]> wrote:
>
> > On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek <[email protected]> wrote:
> > >
> > > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > > I don't see the big deal of backporting this. The biggest complaints
> > > > about backports are from fixes that were added to late -rc releases
> > > > where the fixes didn't get much testing. This commit was added in 4.16,
> > > > and hasn't had any issues due to the design. Although a fix has been
> > > > added:
> > > >
> > > > c14376de3a1 ("printk: Wake klogd when passing console_lock owner")
> > >
> > > As I said, I am fine with backporting the console_lock owner stuff
> > > into the stable release.
> > >
> > > I just wonder (like Sergey) what the real problem is. The console_lock
> > > owner handshake is not fully reliable. It is might be good enough
>
> I'm not sure what you mean by 'not fully reliable'
I mean that it is not guaranteed that the very first printk() takes over
the console. It will happen only when the other printk() calls
console_trylock_spinning() while the current console owner does
the code between:
console_lock_spinning_enable();
console_lock_spinning_disable_and_check();
> > > Just to be sure. Daniel, could you please send a log with
> > > the console_lock owner stuff backported? There we would see
> > > who called the panic() and why it rebooted early.
> >
> > Sure. Here is one. It's a bit long but complete. I attached another log
> > snippet below it which is what I got when `softlockup_panic` was turned
> > off. The log was from the IRQ task that was flushing the printk buffer. I
> > will be taking a closer look at it too but in case you'll find it helpful.
>
> Just so I understand correctly. Does the panic hit with and without the
> suggested backport patch? The only difference is that you get the full
> output with the patch and limited output without it?
Sigh, the other mail suggest that there was a real deadlock. It means
that the console owner logic might help but it would not prevent
the deadlock completely.
Best Regards,
Petr
On (10/04/18 16:44), Sergey Senozhatsky wrote:
> So... Just an idea. Can you try a very dirty hack? Forcibly increase
> oops_in_progress in panic() before console_flush_on_panic(), so 8250
> serial8250_console_write() will use spin_trylock_irqsave() and maybe
> avoid deadlock.
E.g. something like below?
[this is not a patch; just a theory]:
---
diff --git a/kernel/panic.c b/kernel/panic.c
index 8b2e002d52eb..188338a55d1c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,7 +233,13 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
+ /*
+ * Decrement oops_in_progress and let bust_spinlocks() to
+ * unblank_screen(), console_unblank() and wake_up_klogd()
+ */
bust_spinlocks(0);
+ /* Set oops_in_progress, so we can reenter serial console driver*/
+ bust_spinlocks(1);
/*
* We may have ended up stopping the CPU holding the lock (in
On Thu 2018-10-04 16:44:42, Sergey Senozhatsky wrote:
> On (10/03/18 11:37), Daniel Wang wrote:
> > When `softlockup_panic` is set (which is what my original repro had and
> > what we use in production), without the backport patch, the expected panic
> > would hit a seemingly deadlock. So even when the machine is configured
> > to reboot immediately after the panic (kernel.panic=-1), it just hangs there
> > with an incomplete backtrace. With your patch, the deadlock doesn't happen
> > and the machine reboots successfully.
> >
> > This was and still is the issue this thread is trying to fix. The last
> > log snippet
> > was from an "experiment" that I did in order to understand what's really
> > happening. So far the speculation has been that the panic path was trying
> > to get a lock held by a backtrace dumping thread, but there is not enough
> > evidence which thread is holding the lock and how it uses it. So I set
> > `softlockup_panic` to 0, to get panic out of the equation. Then I saw that one
> > CPU was indeed holding the console lock, trying to write something out. If
> > the panic was to hit while it's doing that, we might get a deadlock.
>
> Hmm, console_sem state is ignored when we flush logbuf, so it's OK to
> have it locked when we declare panic():
>
> void console_flush_on_panic(void)
> {
> /*
> * If someone else is holding the console lock, trylock will fail
> * and may_schedule may be set. Ignore and proceed to unlock so
> * that messages are flushed out. As this can be called from any
> * context and we don't want to get preempted while flushing,
> * ensure may_schedule is cleared.
> */
> console_trylock();
> console_may_schedule = 0;
> console_unlock();
> }
>
> Things are not so simple with uart_port lock. Generally speaking we
> should deadlock when we NMI panic() kills the system while one of the
> CPUs holds uart_port lock.
This looks like a reasonable explanation of what is happening here.
It also explains why the console owner logic helped.
> 8250 has sort of a workaround for this scenario:
>
> serial8250_console_write()
> {
> if (port->sysrq)
> locked = 0;
> else if (oops_in_progress)
> locked = spin_trylock_irqsave(&port->lock, flags);
> else
> spin_lock_irqsave(&port->lock, flags);
>
> ...
> uart_console_write(port, s, count, serial8250_console_putchar);
> ...
>
> if (locked)
> spin_unlock_irqrestore(&port->lock, flags);
> }
>
> Now... the problem. A theory, in fact.
> panic() sets oops_in_progress back to zero - bust_spinlocks(0) - too soon.
I see your point. I am just a bit scared of this way. Ignoring locks
is a dangerous and painful approach in general.
Best Regards,
Petr
On (10/04/18 10:36), Petr Mladek wrote:
>
> This looks like a reasonable explanation of what is happening here.
> It also explains why the console owner logic helped.
Well, I'm still a bit puzzled, frankly speaking. I've two theories.
Theory #1 [most likely]
Steven is a wizard and his code cures whatever problem we throw it at.
Theory #2
console_sem hand over actually spreads print out, so we don't have one CPU
doing all the printing job. Instead every CPU prints its backtrace, while the
CPU which issued all_cpus_backtrace() waits for them. So all_cpus_backtrace()
still has to wait for NR_CPUS * strlen(bakctrace), which still probably
truggers NMI panic on it at some point. The panic CPU send out stop IPI, then
it waits for foreign CPUs to ACK stop IPI request - for 10 seconds. So each
CPU prints its backtrace, then ACK stop IPI. So when panic CPU proceeds with
flush_on_panic() and emergency_reboot() uart_port->lock is unlocked. Without
the patch we probably declare NMI panic on the CPU which does all the printing
work, and panic sometimes jumps in when that CPU is in busy in
serial8250_console_write(), holding the uart_port->lock. So we can't re-enter
the 8250 driver from panic CPU and we can't reboot the system. In other
words... Steven is a wizard.
> > serial8250_console_write()
> > {
> > if (port->sysrq)
> > locked = 0;
> > else if (oops_in_progress)
> > locked = spin_trylock_irqsave(&port->lock, flags);
> > else
> > spin_lock_irqsave(&port->lock, flags);
> >
> > ...
> > uart_console_write(port, s, count, serial8250_console_putchar);
> > ...
> >
> > if (locked)
> > spin_unlock_irqrestore(&port->lock, flags);
> > }
> >
> > Now... the problem. A theory, in fact.
> > panic() sets oops_in_progress back to zero - bust_spinlocks(0) - too soon.
>
> I see your point. I am just a bit scared of this way. Ignoring locks
> is a dangerous and painful approach in general.
Well, I agree. But 8250 is not the only console which does ignore
uart_port lock state sometimes. Otherwise sysrq would be totally unreliable,
including emergency reboot. So it's sort of how it has been for quite some
time, I guess. We are in panic(), it's over, so we probably can ignore
uart_port->lock at this point.
-ss
Just got back from vacation. Thanks for the continued discussion. Just so
I understand the current state. Looks like we've got a pretty good explanation
of what's going on (though not completely sure), and backporting Steven's
patches is still the way to go? I see that Sergey had sent an RFC series
for similar things. Are those trying to solve the deadlock problem in a
different way?On Thu, Oct 4, 2018 at 1:55 AM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (10/04/18 10:36), Petr Mladek wrote:
> >
> > This looks like a reasonable explanation of what is happening here.
> > It also explains why the console owner logic helped.
>
> Well, I'm still a bit puzzled, frankly speaking. I've two theories.
>
> Theory #1 [most likely]
>
> Steven is a wizard and his code cures whatever problem we throw it at.
>
> Theory #2
>
> console_sem hand over actually spreads print out, so we don't have one CPU
> doing all the printing job. Instead every CPU prints its backtrace, while the
> CPU which issued all_cpus_backtrace() waits for them. So all_cpus_backtrace()
> still has to wait for NR_CPUS * strlen(bakctrace), which still probably
> truggers NMI panic on it at some point. The panic CPU send out stop IPI, then
> it waits for foreign CPUs to ACK stop IPI request - for 10 seconds. So each
> CPU prints its backtrace, then ACK stop IPI. So when panic CPU proceeds with
> flush_on_panic() and emergency_reboot() uart_port->lock is unlocked. Without
> the patch we probably declare NMI panic on the CPU which does all the printing
> work, and panic sometimes jumps in when that CPU is in busy in
> serial8250_console_write(), holding the uart_port->lock. So we can't re-enter
> the 8250 driver from panic CPU and we can't reboot the system. In other
> words... Steven is a wizard.
>
> > > serial8250_console_write()
> > > {
> > > if (port->sysrq)
> > > locked = 0;
> > > else if (oops_in_progress)
> > > locked = spin_trylock_irqsave(&port->lock, flags);
> > > else
> > > spin_lock_irqsave(&port->lock, flags);
> > >
> > > ...
> > > uart_console_write(port, s, count, serial8250_console_putchar);
> > > ...
> > >
> > > if (locked)
> > > spin_unlock_irqrestore(&port->lock, flags);
> > > }
> > >
> > > Now... the problem. A theory, in fact.
> > > panic() sets oops_in_progress back to zero - bust_spinlocks(0) - too soon.
> >
> > I see your point. I am just a bit scared of this way. Ignoring locks
> > is a dangerous and painful approach in general.
>
> Well, I agree. But 8250 is not the only console which does ignore
> uart_port lock state sometimes. Otherwise sysrq would be totally unreliable,
> including emergency reboot. So it's sort of how it has been for quite some
> time, I guess. We are in panic(), it's over, so we probably can ignore
> uart_port->lock at this point.
>
> -ss
--
Best,
Daniel
On Sun 2018-10-21 11:09:22, Daniel Wang wrote:
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still the way to go? I see that Sergey had sent an RFC series
> for similar things. Are those trying to solve the deadlock problem in a
> different way?On Thu, Oct 4, 2018 at 1:55 AM Sergey Senozhatsky
I suggest to go with backporting Steven's patchset. We do not have
anything better at the moment.
Best Regards,
Petr
On (10/21/18 11:09), Daniel Wang wrote:
>
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still the way to go?
Up to -stable maintainers.
Note, with or without Steven's patch, the non-reentrable consoles are
still non-reentrable, so the deadlock is still there:
spin_lock_irqsave(&port->lock, flags)
<NMI>
panic()
console_flush_on_panic()
spin_lock_irqsave(&port->lock, flags) // deadlock
// And I wouldn't mind to have more reviews/testing on [1].
Another deadlock scenario could be the following one:
printk()
console_trylock()
down_trylock()
raw_spin_lock_irqsave(&sem->lock, flags)
<NMI>
panic()
console_flush_on_panic()
console_trylock()
raw_spin_lock_irqsave(&sem->lock, flags) // deadlock
There are no patches addressing this one at the moment. And it's
unclear if you are hitting this scenario.
> I see that Sergey had sent an RFC series for similar things. Are those
> trying to solve the deadlock problem in a different way?
Umm, I wouldn't call it "another way". It turns non-reentrant serial
consoles to re-entrable ones. Did you test patch [1] from that series
on you environment, by the way?
[1] lkml.kernel.org/r/[email protected]
-ss
On Mon, Oct 22, 2018 at 3:10 AM Sergey Senozhatsky
<[email protected]> wrote:
> Another deadlock scenario could be the following one:
>
> printk()
> console_trylock()
> down_trylock()
> raw_spin_lock_irqsave(&sem->lock, flags)
> <NMI>
> panic()
> console_flush_on_panic()
> console_trylock()
> raw_spin_lock_irqsave(&sem->lock, flags) // deadlock
>
> There are no patches addressing this one at the moment. And it's
> unclear if you are hitting this scenario.
I am not sure, but Steven's patches did make the deadlock I saw go away...
>
>
> > I see that Sergey had sent an RFC series for similar things. Are those
> > trying to solve the deadlock problem in a different way?
>
> Umm, I wouldn't call it "another way". It turns non-reentrant serial
> consoles to re-entrable ones. Did you test patch [1] from that series
> on you environment, by the way?
A little swamped by other things lately but I'll run a test with it.
If it works, would you recommend taking your patch alone and not
bother taking Steven's (since as you mentioned above even with his
patches there's still possibility for other deadlocks) ?
>
> [1] lkml.kernel.org/r/[email protected]
>
> -ss
--
Best,
Daniel
On (11/01/18 09:05), Daniel Wang wrote:
> > Another deadlock scenario could be the following one:
> >
> > printk()
> > console_trylock()
> > down_trylock()
> > raw_spin_lock_irqsave(&sem->lock, flags)
> > <NMI>
> > panic()
> > console_flush_on_panic()
> > console_trylock()
> > raw_spin_lock_irqsave(&sem->lock, flags) // deadlock
> >
> > There are no patches addressing this one at the moment. And it's
> > unclear if you are hitting this scenario.
>
> I am not sure, but Steven's patches did make the deadlock I saw go away...
You certainly can find cases when "busy spin on console_sem owner" logic
can reduce some possibilities.
But spin_lock(&lock); NMI; spin_lock(&lock); code is still in the kernel.
> A little swamped by other things lately but I'll run a test with it.
> If it works, would you recommend taking your patch alone
Let's first figure out if it works.
-ss
> Let's first figure out if it works.
I would still like to try applying your patches that went into
printk.git, but for now I wonder if we can get Steven's patch into
4.14 first, for at least we know it mitigated the issue if not
fundamentally addressed it, and we've agreed it's an innocuous change
that doesn't risk breaking stable.
I haven't done this before so I'll need your help. What's the next
step to actually get Steven's patch *in* linux-4.14.y? According to
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
I am supposed to send an email with the patch ID and subject, which
are both mentioned in this email. Should I send another one? What's
the process like? Thanks!
On Thu, Nov 8, 2018 at 10:47 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (11/01/18 09:05), Daniel Wang wrote:
> > > Another deadlock scenario could be the following one:
> > >
> > > printk()
> > > console_trylock()
> > > down_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags)
> > > <NMI>
> > > panic()
> > > console_flush_on_panic()
> > > console_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags) // deadlock
> > >
> > > There are no patches addressing this one at the moment. And it's
> > > unclear if you are hitting this scenario.
> >
> > I am not sure, but Steven's patches did make the deadlock I saw go away...
>
> You certainly can find cases when "busy spin on console_sem owner" logic
> can reduce some possibilities.
>
> But spin_lock(&lock); NMI; spin_lock(&lock); code is still in the kernel.
>
> > A little swamped by other things lately but I'll run a test with it.
> > If it works, would you recommend taking your patch alone
>
> Let's first figure out if it works.
>
> -ss
--
Best,
Daniel
On Thu, Nov 8, 2018 at 10:47 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (11/01/18 09:05), Daniel Wang wrote:
> > > Another deadlock scenario could be the following one:
> > >
> > > printk()
> > > console_trylock()
> > > down_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags)
> > > <NMI>
> > > panic()
> > > console_flush_on_panic()
> > > console_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags) // deadlock
> > >
> > > There are no patches addressing this one at the moment. And it's
> > > unclear if you are hitting this scenario.
> >
> > I am not sure, but Steven's patches did make the deadlock I saw go away...
>
> You certainly can find cases when "busy spin on console_sem owner" logic
> can reduce some possibilities.
>
> But spin_lock(&lock); NMI; spin_lock(&lock); code is still in the kernel.
>
> > A little swamped by other things lately but I'll run a test with it.
> > If it works, would you recommend taking your patch alone
>
> Let's first figure out if it works.
>
> -ss
--
Best,
Daniel
On (12/11/18 17:16), Daniel Wang wrote:
> > Let's first figure out if it works.
>
> I would still like to try applying your patches that went into
> printk.git, but for now I wonder if we can get Steven's patch into
> 4.14 first, for at least we know it mitigated the issue if not
> fundamentally addressed it, and we've agreed it's an innocuous change
> that doesn't risk breaking stable.
So... did my patch address the deadlock you are seeing or it didn't?
> I haven't done this before so I'll need your help. What's the next
> step to actually get Steven's patch *in* linux-4.14.y? According to
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> I am supposed to send an email with the patch ID and subject, which
> are both mentioned in this email. Should I send another one? What's
> the process like? Thanks!
I'm not doing any -stable releases, so can't really answer, sorry.
Probably would be better to re-address this question to 4.14 -stable
maintainers.
---
I guess we still don't have a really clear understanding of what exactly
is going in your system. We don't even know for sure which one of the locks
is deadlocking the system. And why exactly Steven's patch helps. If it
is uart_port->lock, then it's one thing; if it's console_sem ->lock then
it's another thing. But those two are just theories, not supported by any
logs/backtraces from your systems.
If it's uart_port->lock and there will be 2 patch sets to choose from
for -stable, then -stable guys can pick up the one that requires less
effort: 1 two-liner patch vs. 3 or 4 bigger patches.
-ss
> So... did my patch address the deadlock you are seeing or it didn't?
I've been meaning to try it but kept getting distracted by other
things. I'll try to find some time for it this week or next. Right now
my intent is to get Steven's patch into 4.14 stable as it evidently
fixed the particular issue I was seeing, and as Steven said it has
been in upstream since 4.16 so it's not like backporting it will raise
any red flags. I will start another thread on -stable for it.
> I guess we still don't have a really clear understanding of what exactly
is going in your system
I would also like to get to the bottom of it. Unfortunately I haven't
got the expertise in this area nor the time to do it yet. Hence the
intent to take a step back and backport Steven's patch to fix the
issue that has resurfaced in our production recently.
> If it's uart_port->lock and there will be 2 patch sets to choose from
for -stable, then -stable guys can pick up the one that requires less
effort: 1 two-liner patch vs. 3 or 4 bigger patches.
Which two sets are you referring to specifically?
On Tue, Dec 11, 2018 at 9:21 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (12/11/18 17:16), Daniel Wang wrote:
> > > Let's first figure out if it works.
> >
> > I would still like to try applying your patches that went into
> > printk.git, but for now I wonder if we can get Steven's patch into
> > 4.14 first, for at least we know it mitigated the issue if not
> > fundamentally addressed it, and we've agreed it's an innocuous change
> > that doesn't risk breaking stable.
>
> So... did my patch address the deadlock you are seeing or it didn't?
>
> > I haven't done this before so I'll need your help. What's the next
> > step to actually get Steven's patch *in* linux-4.14.y? According to
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > I am supposed to send an email with the patch ID and subject, which
> > are both mentioned in this email. Should I send another one? What's
> > the process like? Thanks!
>
> I'm not doing any -stable releases, so can't really answer, sorry.
> Probably would be better to re-address this question to 4.14 -stable
> maintainers.
>
>
> ---
> I guess we still don't have a really clear understanding of what exactly
> is going in your system. We don't even know for sure which one of the locks
> is deadlocking the system. And why exactly Steven's patch helps. If it
> is uart_port->lock, then it's one thing; if it's console_sem ->lock then
> it's another thing. But those two are just theories, not supported by any
> logs/backtraces from your systems.
>
> If it's uart_port->lock and there will be 2 patch sets to choose from
> for -stable, then -stable guys can pick up the one that requires less
> effort: 1 two-liner patch vs. 3 or 4 bigger patches.
>
> -ss
--
Best,
Daniel
On (12/11/18 22:08), Daniel Wang wrote:
>
> I've been meaning to try it but kept getting distracted by other
> things. I'll try to find some time for it this week or next. Right now
> my intent is to get Steven's patch into 4.14 stable as it evidently
> fixed the particular issue I was seeing, and as Steven said it has
> been in upstream since 4.16 so it's not like backporting it will raise
> any red flags. I will start another thread on -stable for it.
OK.
> > I guess we still don't have a really clear understanding of what exactly
> is going in your system
>
> I would also like to get to the bottom of it. Unfortunately I haven't
> got the expertise in this area nor the time to do it yet. Hence the
> intent to take a step back and backport Steven's patch to fix the
> issue that has resurfaced in our production recently.
No problem.
I just meant that -stable people can be a bit "unconvinced".
> Which two sets are you referring to specifically?
I guess I used the wrong word:
The first set (actually just one patch) is the one that makes consoles
re-entrant from panic().
The other set - those 4 patches (Steven's patch, + Petr's patch + a
patch that makes printk() atomic again).
-ss
On Wed, Dec 12, 2018 at 03:28:41PM +0900, Sergey Senozhatsky wrote:
>On (12/11/18 22:08), Daniel Wang wrote:
>>
>> I've been meaning to try it but kept getting distracted by other
>> things. I'll try to find some time for it this week or next. Right now
>> my intent is to get Steven's patch into 4.14 stable as it evidently
>> fixed the particular issue I was seeing, and as Steven said it has
>> been in upstream since 4.16 so it's not like backporting it will raise
>> any red flags. I will start another thread on -stable for it.
>
>OK.
>
>> > I guess we still don't have a really clear understanding of what exactly
>> is going in your system
>>
>> I would also like to get to the bottom of it. Unfortunately I haven't
>> got the expertise in this area nor the time to do it yet. Hence the
>> intent to take a step back and backport Steven's patch to fix the
>> issue that has resurfaced in our production recently.
>
>No problem.
>I just meant that -stable people can be a bit "unconvinced".
The -stable people tried adding this patch back in April, but ended up
getting complaints up the wazoo (https://lkml.org/lkml/2018/4/9/154)
about how this is not -stable material.
So yes, testing/acks welcome :)
--
Thanks,
Sasha
On (12/12/18 01:48), Sasha Levin wrote:
> > > > I guess we still don't have a really clear understanding of what exactly
> > > is going in your system
> > >
> > > I would also like to get to the bottom of it. Unfortunately I haven't
> > > got the expertise in this area nor the time to do it yet. Hence the
> > > intent to take a step back and backport Steven's patch to fix the
> > > issue that has resurfaced in our production recently.
> >
> > No problem.
> > I just meant that -stable people can be a bit "unconvinced".
>
> The -stable people tried adding this patch back in April, but ended up
> getting complaints up the wazoo (https://lkml.org/lkml/2018/4/9/154)
> about how this is not -stable material.
OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
and I wasn't Cc-ed on this whole discussion and found it purely
accidentally while browsing linux-mm list.
I understand what Petr meant by his email. Not arguing; below are just
my 5 cents.
> So yes, testing/acks welcome :)
OK. The way I see it (and I can be utterly wrong here):
The patch set in question, most likely and probably (*and those are
theories*), makes panic() deadlock less likely because panic_cpu waits
for console_sem owner to release uart_port/console_owner locks before
panic_cpu pr_emerg("Kernel panic - not syncing"), dump_stack()-s and
brings other CPUs down via stop IPI or NMI.
So a precondition is
panic CPU != uart_port->lock owner CPU
If the panic happens on the same CPU which holds the uart_port spin_lock,
then the deadlock is still there, just like before; we have another patch
which attempts to fix this (it makes console drivers re-entrant from
panic()).
So if you are willing to backport this set to -stable, then I wouldn't
mind, probably would be more correct if we don't advertise this as a
"panic() deadlock fix" tho; we know that deadlock is still possible.
And there will be another -stable backport request in a week or so.
In the meantime, I can add my Acked-by to this backport if it helps.
/* Assuming that my theories explain what's happening with
Daniel's systems. */
-ss
On Wed 2018-12-12 17:10:34, Sergey Senozhatsky wrote:
> On (12/12/18 01:48), Sasha Levin wrote:
> > > > > I guess we still don't have a really clear understanding of what exactly
> > > > is going in your system
> > > >
> > > > I would also like to get to the bottom of it. Unfortunately I haven't
> > > > got the expertise in this area nor the time to do it yet. Hence the
> > > > intent to take a step back and backport Steven's patch to fix the
> > > > issue that has resurfaced in our production recently.
> > >
> > > No problem.
> > > I just meant that -stable people can be a bit "unconvinced".
> >
> > The -stable people tried adding this patch back in April, but ended up
> > getting complaints up the wazoo (https://lkml.org/lkml/2018/4/9/154)
> > about how this is not -stable material.
>
> OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> and I wasn't Cc-ed on this whole discussion and found it purely
> accidentally while browsing linux-mm list.
I am sorry that I did not CC you. There were so many people in CC.
I expected that all people mentioned in the related commit message
were included by default.
> So if you are willing to backport this set to -stable, then I wouldn't
> mind, probably would be more correct if we don't advertise this as a
> "panic() deadlock fix"
This should not be a problem. I guess that stable does not modify
the original commit messages unless there is a change.
> In the meantime, I can add my Acked-by to this backport if it helps.
I am fine with back-porting the patches now. They have got much more
testing in the meantime and nobody reported any regression. They
seems to help in more situations than we expected. Finally, there is
someone requesting the backport who spent non-trivial time
on tracking the problem and testing.
Best Regards,
Petr
On (12/12/18 14:36), Petr Mladek wrote:
> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> > and I wasn't Cc-ed on this whole discussion and found it purely
> > accidentally while browsing linux-mm list.
>
> I am sorry that I did not CC you. There were so many people in CC.
> I expected that all people mentioned in the related commit message
> were included by default.
No worries! I'm not blaming anyone.
> > So if you are willing to backport this set to -stable, then I wouldn't
> > mind, probably would be more correct if we don't advertise this as a
> > "panic() deadlock fix"
>
> This should not be a problem. I guess that stable does not modify
> the original commit messages unless there is a change.
Agreed.
> > In the meantime, I can add my Acked-by to this backport if it helps.
>
> I am fine with back-porting the patches now. They have got much more
> testing in the meantime and nobody reported any regression. They
> seems to help in more situations than we expected. Finally, there is
> someone requesting the backport who spent non-trivial time
> on tracking the problem and testing.
Great!
Sasha, here is
Acked-by: Sergey Senozhatsky <[email protected]>
from me.
And expect another backport request in 1 or 2 weeks - the patch
which eliminates the existing "panic CPU != uart_port lock owner CPU"
limitation.
-ss
On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
>On (12/12/18 14:36), Petr Mladek wrote:
>> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
>> > and I wasn't Cc-ed on this whole discussion and found it purely
>> > accidentally while browsing linux-mm list.
>>
>> I am sorry that I did not CC you. There were so many people in CC.
>> I expected that all people mentioned in the related commit message
>> were included by default.
>
>No worries! I'm not blaming anyone.
>
>> > So if you are willing to backport this set to -stable, then I wouldn't
>> > mind, probably would be more correct if we don't advertise this as a
>> > "panic() deadlock fix"
>>
>> This should not be a problem. I guess that stable does not modify
>> the original commit messages unless there is a change.
>
>Agreed.
I'll be happy to add anything you want to the commit message. Do you
have a blurb you want to use?
--
Thanks,
Sasha
On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin <[email protected]> wrote:
>
> On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
> >On (12/12/18 14:36), Petr Mladek wrote:
> >> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> >> > and I wasn't Cc-ed on this whole discussion and found it purely
> >> > accidentally while browsing linux-mm list.
> >>
> >> I am sorry that I did not CC you. There were so many people in CC.
> >> I expected that all people mentioned in the related commit message
> >> were included by default.
> >
> >No worries! I'm not blaming anyone.
> >
> >> > So if you are willing to backport this set to -stable, then I wouldn't
> >> > mind, probably would be more correct if we don't advertise this as a
> >> > "panic() deadlock fix"
> >>
> >> This should not be a problem. I guess that stable does not modify
> >> the original commit messages unless there is a change.
> >
> >Agreed.
>
> I'll be happy to add anything you want to the commit message. Do you
> have a blurb you want to use?
If we still get to amend the commit message, I'd like to add "Cc:
[email protected]" in the sign-off area. According to
https://www.kernel.org/doc/html/v4.12/process/stable-kernel-rules.html#option-1
patches with that tag will be automatically applied to -stable trees.
It's unclear though if it'll get applied to ALL -stable trees. For my
request, I care at least about 4.19 and 4.14. So maybe we can add two
lines, "Cc: <[email protected]> # 4.14.x" and "Cc:
<[email protected]> # 4.19.x".
>
> --
> Thanks,
> Sasha
--
Best,
Daniel
On Wed, Dec 12, 2018 at 12:11:29PM -0800, Daniel Wang wrote:
>On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin <[email protected]> wrote:
>>
>> On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
>> >On (12/12/18 14:36), Petr Mladek wrote:
>> >> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
>> >> > and I wasn't Cc-ed on this whole discussion and found it purely
>> >> > accidentally while browsing linux-mm list.
>> >>
>> >> I am sorry that I did not CC you. There were so many people in CC.
>> >> I expected that all people mentioned in the related commit message
>> >> were included by default.
>> >
>> >No worries! I'm not blaming anyone.
>> >
>> >> > So if you are willing to backport this set to -stable, then I wouldn't
>> >> > mind, probably would be more correct if we don't advertise this as a
>> >> > "panic() deadlock fix"
>> >>
>> >> This should not be a problem. I guess that stable does not modify
>> >> the original commit messages unless there is a change.
>> >
>> >Agreed.
>>
>> I'll be happy to add anything you want to the commit message. Do you
>> have a blurb you want to use?
>
>If we still get to amend the commit message, I'd like to add "Cc:
>[email protected]" in the sign-off area. According to
>https://www.kernel.org/doc/html/v4.12/process/stable-kernel-rules.html#option-1
>patches with that tag will be automatically applied to -stable trees.
>It's unclear though if it'll get applied to ALL -stable trees. For my
>request, I care at least about 4.19 and 4.14. So maybe we can add two
>lines, "Cc: <[email protected]> # 4.14.x" and "Cc:
><[email protected]> # 4.19.x".
We can't change the original commit message (but that's fine, the
purpose of that tag is to let us know that this commit should go in
stable - and no we do :) ).
I was under the impression that Sergey or Petr wanted to add more
information about the purpose of this patch and the issue it solves.
--
Thanks,
Sasha
Thanks for the clarification. So I guess I don't need to start another
thread for it? What are the next steps?
On Wed, Dec 12, 2018 at 1:43 PM Sasha Levin <[email protected]> wrote:
>
> On Wed, Dec 12, 2018 at 12:11:29PM -0800, Daniel Wang wrote:
> >On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin <[email protected]> wrote:
> >>
> >> On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
> >> >On (12/12/18 14:36), Petr Mladek wrote:
> >> >> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> >> >> > and I wasn't Cc-ed on this whole discussion and found it purely
> >> >> > accidentally while browsing linux-mm list.
> >> >>
> >> >> I am sorry that I did not CC you. There were so many people in CC.
> >> >> I expected that all people mentioned in the related commit message
> >> >> were included by default.
> >> >
> >> >No worries! I'm not blaming anyone.
> >> >
> >> >> > So if you are willing to backport this set to -stable, then I wouldn't
> >> >> > mind, probably would be more correct if we don't advertise this as a
> >> >> > "panic() deadlock fix"
> >> >>
> >> >> This should not be a problem. I guess that stable does not modify
> >> >> the original commit messages unless there is a change.
> >> >
> >> >Agreed.
> >>
> >> I'll be happy to add anything you want to the commit message. Do you
> >> have a blurb you want to use?
> >
> >If we still get to amend the commit message, I'd like to add "Cc:
> >[email protected]" in the sign-off area. According to
> >https://www.kernel.org/doc/html/v4.12/process/stable-kernel-rules.html#option-1
> >patches with that tag will be automatically applied to -stable trees.
> >It's unclear though if it'll get applied to ALL -stable trees. For my
> >request, I care at least about 4.19 and 4.14. So maybe we can add two
> >lines, "Cc: <[email protected]> # 4.14.x" and "Cc:
> ><[email protected]> # 4.19.x".
>
> We can't change the original commit message (but that's fine, the
> purpose of that tag is to let us know that this commit should go in
> stable - and no we do :) ).
>
> I was under the impression that Sergey or Petr wanted to add more
> information about the purpose of this patch and the issue it solves.
>
> --
> Thanks,
> Sasha
--
Best,
Daniel
On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
>Thanks for the clarification. So I guess I don't need to start another
>thread for it? What are the next steps?
Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
additional information in the -stable commit message.
--
Thanks,
Sasha
Thank you!
On Wed, Dec 12, 2018 at 1:52 PM Sasha Levin <[email protected]> wrote:
>
> On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
> >Thanks for the clarification. So I guess I don't need to start another
> >thread for it? What are the next steps?
>
> Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
> additional information in the -stable commit message.
>
> --
> Thanks,
> Sasha
>
--
Best,
Daniel
In case this was buried in previous messages, the commit I'd like to
get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
waiter logic to load balance console writes. But another followup
patch that fixes a bug in that patch is also required. That is
c14376de3a1b: printk: Wake klogd when passing console_lock owner.
On Wed, Dec 12, 2018 at 1:56 PM Daniel Wang <[email protected]> wrote:
>
> Thank you!
>
> On Wed, Dec 12, 2018 at 1:52 PM Sasha Levin <[email protected]> wrote:
> >
> > On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
> > >Thanks for the clarification. So I guess I don't need to start another
> > >thread for it? What are the next steps?
> >
> > Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
> > additional information in the -stable commit message.
> >
> > --
> > Thanks,
> > Sasha
> >
>
>
> --
> Best,
> Daniel
--
Best,
Daniel
On (12/12/18 16:52), Sasha Levin wrote:
> On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
> > Thanks for the clarification. So I guess I don't need to start another
> > thread for it? What are the next steps?
>
> Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
> additional information in the -stable commit message.
Hi Sasha,
No, no commit message change requests from my side.
-ss
On (12/12/18 16:40), Daniel Wang wrote:
> In case this was buried in previous messages, the commit I'd like to
> get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
> waiter logic to load balance console writes. But another followup
> patch that fixes a bug in that patch is also required. That is
> c14376de3a1b: printk: Wake klogd when passing console_lock owner.
Additionally, for dbdda842fe96f to work as expected we really
need fd5f7cde1b85d4c. Otherwise printk() can schedule under
console_sem and console_owner, which will deactivate the "load
balance" logic.
-ss
> Additionally, for dbdda842fe96f to work as expected we really
need fd5f7cde1b85d4c. Otherwise printk() can schedule under
console_sem and console_owner, which will deactivate the "load
balance" logic.
It looks like fd5f7cde1b85d4c got into 4.14.82 that was released last month.
On Wed, Dec 12, 2018 at 6:27 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (12/12/18 16:40), Daniel Wang wrote:
> > In case this was buried in previous messages, the commit I'd like to
> > get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
> > waiter logic to load balance console writes. But another followup
> > patch that fixes a bug in that patch is also required. That is
> > c14376de3a1b: printk: Wake klogd when passing console_lock owner.
>
> Additionally, for dbdda842fe96f to work as expected we really
> need fd5f7cde1b85d4c. Otherwise printk() can schedule under
> console_sem and console_owner, which will deactivate the "load
> balance" logic.
>
> -ss
--
Best,
Daniel
On Wed 2018-12-12 18:39:42, Daniel Wang wrote:
> > Additionally, for dbdda842fe96f to work as expected we really
> need fd5f7cde1b85d4c. Otherwise printk() can schedule under
> console_sem and console_owner, which will deactivate the "load
> balance" logic.
>
> It looks like fd5f7cde1b85d4c got into 4.14.82 that was released last month.
>
> On Wed, Dec 12, 2018 at 6:27 PM Sergey Senozhatsky
> <[email protected]> wrote:
> >
> > On (12/12/18 16:40), Daniel Wang wrote:
> > > In case this was buried in previous messages, the commit I'd like to
> > > get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
> > > waiter logic to load balance console writes. But another followup
> > > patch that fixes a bug in that patch is also required. That is
> > > c14376de3a1b: printk: Wake klogd when passing console_lock owner.
> >
> > Additionally, for dbdda842fe96f to work as expected we really
> > need fd5f7cde1b85d4c. Otherwise printk() can schedule under
> > console_sem and console_owner, which will deactivate the "load
> > balance" logic.
To make it clear. Please, make sure that the following commits are
backported together:
+ dbdda842fe96f8932ba ("printk: Add console owner and waiter
logic to load balance console writes")
+ c162d5b4338d72deed6 ("printk: Hide console waiter logic into
helpers")
+ fd5f7cde1b85d4c8e09 ("printk: Never set console_may_schedule
in console_trylock()")
+ c14376de3a1befa70d9 ("printk: Wake klogd when passing
console_lock owner")
I generated this list from git log using "Fixes:" tag. It seems
to mention all commits dicussed above.
Best Regards,
Petr
On Thu, Dec 13, 2018 at 10:59:31AM +0100, Petr Mladek wrote:
>On Wed 2018-12-12 18:39:42, Daniel Wang wrote:
>> > Additionally, for dbdda842fe96f to work as expected we really
>> need fd5f7cde1b85d4c. Otherwise printk() can schedule under
>> console_sem and console_owner, which will deactivate the "load
>> balance" logic.
>>
>> It looks like fd5f7cde1b85d4c got into 4.14.82 that was released last month.
>>
>> On Wed, Dec 12, 2018 at 6:27 PM Sergey Senozhatsky
>> <[email protected]> wrote:
>> >
>> > On (12/12/18 16:40), Daniel Wang wrote:
>> > > In case this was buried in previous messages, the commit I'd like to
>> > > get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
>> > > waiter logic to load balance console writes. But another followup
>> > > patch that fixes a bug in that patch is also required. That is
>> > > c14376de3a1b: printk: Wake klogd when passing console_lock owner.
>> >
>> > Additionally, for dbdda842fe96f to work as expected we really
>> > need fd5f7cde1b85d4c. Otherwise printk() can schedule under
>> > console_sem and console_owner, which will deactivate the "load
>> > balance" logic.
>
>To make it clear. Please, make sure that the following commits are
>backported together:
>
>+ dbdda842fe96f8932ba ("printk: Add console owner and waiter
> logic to load balance console writes")
>+ c162d5b4338d72deed6 ("printk: Hide console waiter logic into
> helpers")
>+ fd5f7cde1b85d4c8e09 ("printk: Never set console_may_schedule
> in console_trylock()")
>+ c14376de3a1befa70d9 ("printk: Wake klogd when passing
> console_lock owner")
>
>
>I generated this list from git log using "Fixes:" tag. It seems
>to mention all commits dicussed above.
All 4 queued for 4.14, thank you.
--
Thanks,
Sasha
On (12/12/18 17:10), Sergey Senozhatsky wrote:
> And there will be another -stable backport request in a week or so.
The remaining one:
commit c7c3f05e341a9a2bd
-ss
On Fri, Dec 28, 2018 at 09:16:51AM +0900, Sergey Senozhatsky wrote:
> On (12/12/18 17:10), Sergey Senozhatsky wrote:
> > And there will be another -stable backport request in a week or so.
>
> The remaining one:
>
> commit c7c3f05e341a9a2bd
Now queued up, thanks.
greg k-h
Thanks. I was able to confirm that commit c7c3f05e341a9a2bd alone
fixed the problem for me. As expected, all 16 CPUs' stacktrace was
printed, before a final panic stack dump and a successful reboot.
[ 24.035044] Hogging a CPU now
[ 48.200258] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lockme:1102]
[ 48.207371] Modules linked in: lockme(O) ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 xt_addrtype nf_nat
br_netfilter ip6table_filter ip6_tables aesni_intel aes_x86_64
crypto_simd cryptd glue_helper
[ 48.226613] CPU: 3 PID: 1102 Comm: lockme Tainted: G O
4.14.79 #33
[ 48.234057] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.243388] task: ffffa3da1bd70000 task.stack: ffffc04e077e0000
[ 48.249425] RIP: 0010:hog_thread+0x13/0x1000 [lockme]
[ 48.255197] RSP: 0018:ffffc04e077e3f10 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffff10
[ 48.262879] RAX: 0000000000000011 RBX: ffffa3da362ffa80 RCX: 0000000000000000
[ 48.270131] RDX: ffffa3da432dd740 RSI: ffffa3da432d54f8 RDI: ffffa3da432d54f8
[ 48.277382] RBP: ffffc04e077e3f48 R08: 0000000000000030 R09: 0000000000000000
[ 48.284629] R10: 0000000000000358 R11: 0000000000000000 R12: ffffa3da33f7c940
[ 48.291881] R13: ffffc04e079b7c58 R14: 0000000000000000 R15: ffffa3da362ffac8
[ 48.299134] FS: 0000000000000000(0000) GS:ffffa3da432c0000(0000)
knlGS:0000000000000000
[ 48.307338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.313200] CR2: 00007f0142c77e5d CR3: 0000000b10e12002 CR4: 00000000003606a0
[ 48.320455] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.327705] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.334955] Call Trace:
[ 48.337534] kthread+0x127/0x160
[ 48.340878] ? 0xffffffffc04bc000
[ 48.344315] ? kthread_create_on_node+0x40/0x40
[ 48.348962] ret_from_fork+0x35/0x40
[ 48.352655] Code: <eb> fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
[ 48.360712] Sending NMI from CPU 3 to CPUs 0-2,4-15:
[ 48.365891] NMI backtrace for cpu 5
[ 48.365892] CPU: 5 PID: 963 Comm: dd Tainted: G O 4.14.79 #33
[ 48.365892] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.365893] task: ffffa3da2e769c80 task.stack: ffffc04e072dc000
[ 48.365894] RIP: 0010:chacha20_block+0x203/0x350
[ 48.365894] RSP: 0018:ffffc04e072dfd08 EFLAGS: 00000086
[ 48.365895] RAX: 00000000430aa37c RBX: 000000008849a559 RCX: 000000001e380d02
[ 48.365896] RDX: 00000000f37255aa RSI: 00000000430aa37c RDI: 00000000d39ce109
[ 48.365896] RBP: 00000000242dad92 R08: 00000000942a2b36 R09: 000000006df44375
[ 48.365897] R10: 000000007f47d158 R11: 0000000080fde9af R12: 0000000092e47c5e
[ 48.365897] R13: 00000000ed09aada R14: 00000000c6fd956d R15: 000000001bb4deeb
[ 48.365898] FS: 00007f074a4a6700(0000) GS:ffffa3da43340000(0000)
knlGS:0000000000000000
[ 48.365899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.365899] CR2: 000055a35c5b0520 CR3: 0000000edc900003 CR4: 00000000003606a0
[ 48.365900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.365900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.365900] Call Trace:
[ 48.365901] _extract_crng+0xdb/0x130
[ 48.365901] crng_backtrack_protect+0xb3/0xc0
[ 48.365902] urandom_read+0x13b/0x2c0
[ 48.365902] vfs_read+0xad/0x170
[ 48.365903] SyS_read+0x4b/0xa0
[ 48.365903] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.365904] do_syscall_64+0x70/0x200
[ 48.365904] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.365904] RIP: 0033:0x7f0749e7c410
[ 48.365905] RSP: 002b:00007ffd69532b18 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.365906] RAX: ffffffffffffffda RBX: 0000000000024ab3 RCX: 00007f0749e7c410
[ 48.365906] RDX: 0000000000000400 RSI: 000055e440267000 RDI: 0000000000000000
[ 48.365907] RBP: 00007ffd69532b40 R08: 0000000000000000 R09: 000000000000000d
[ 48.365907] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.365908] R13: 00007f074a4a6690 R14: 0000000000000400 R15: 000055e440267000
[ 48.365908] Code: c0 10 31 f0 01 d3 89 74 24 08 41 89 c7 8b 44 24
0c 41 31 dc 41 c1 c4 0c 46 8d 0c 1f 45 89 eb 41 c1 c7 0c 44 01 c0 45
31 cb 89 c6 <89> e8 41 c1 c3 08 89 74 24 0c 31 f0 41 8d 34 0c 8b 4c 24
10 c1
[ 48.365921] NMI backtrace for cpu 13
[ 48.365923] CPU: 13 PID: 967 Comm: dd Tainted: G O 4.14.79 #33
[ 48.365924] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.365924] task: ffffa3da1c8b0000 task.stack: ffffc04e07798000
[ 48.365925] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.365925] RSP: 0018:ffffc04e0779bda8 EFLAGS: 00000002
[ 48.365926] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.365926] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.365927] RBP: ffffc04e0779bdd8 R08: 000000005afce914 R09: 00000000e446e58a
[ 48.365927] R10: 000000004789081f R11: 000000001fb8dc14 R12: ffffc04e0779be30
[ 48.365928] R13: ffffffffadecd3c8 R14: ffffc04e0779be30 R15: 0000000000000040
[ 48.365928] FS: 00007f79835ff700(0000) GS:ffffa3da43540000(0000)
knlGS:0000000000000000
[ 48.365929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.365929] CR2: 00007ff20c75d140 CR3: 0000000edc928004 CR4: 00000000003606a0
[ 48.365929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.365930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.365930] Call Trace:
[ 48.365931] do_raw_spin_lock+0xa0/0xb0
[ 48.365931] _raw_spin_lock_irqsave+0x20/0x26
[ 48.365932] _extract_crng+0x52/0x130
[ 48.365932] urandom_read+0xf9/0x2c0
[ 48.365932] vfs_read+0xad/0x170
[ 48.365933] SyS_read+0x4b/0xa0
[ 48.365933] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.365934] do_syscall_64+0x70/0x200
[ 48.365934] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.365934] RIP: 0033:0x7f7982fd5410
[ 48.365935] RSP: 002b:00007ffc84173ec8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.365936] RAX: ffffffffffffffda RBX: 0000000000022155 RCX: 00007f7982fd5410
[ 48.365936] RDX: 0000000000000400 RSI: 0000560209655000 RDI: 0000000000000000
[ 48.365936] RBP: 00007ffc84173ef0 R08: 0000000000000000 R09: 000000000000000d
[ 48.365937] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.365937] R13: 00007f79835ff690 R14: 0000000000000400 R15: 0000560209655000
[ 48.365938] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.365953] NMI backtrace for cpu 9
[ 48.365953] CPU: 9 PID: 974 Comm: dd Tainted: G O 4.14.79 #33
[ 48.365954] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.365954] task: ffffa3da1e310e40 task.stack: ffffc04e077a0000
[ 48.365955] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.365955] RSP: 0018:ffffc04e077a3da8 EFLAGS: 00000002
[ 48.365956] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.365956] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.365957] RBP: ffffc04e077a3dd8 R08: 00000000c4612c74 R09: 00000000b23896fe
[ 48.365957] R10: 0000000081037022 R11: 0000000022fc570d R12: ffffc04e077a3e30
[ 48.365957] R13: ffffffffadecd3c8 R14: ffffc04e077a3e30 R15: 0000000000000040
[ 48.365958] FS: 00007f758a7fe700(0000) GS:ffffa3da43440000(0000)
knlGS:0000000000000000
[ 48.365958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.365959] CR2: 000055a35e272620 CR3: 0000000edca6e002 CR4: 00000000003606a0
[ 48.365959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.365959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.365960] Call Trace:
[ 48.365960] do_raw_spin_lock+0xa0/0xb0
[ 48.365960] _raw_spin_lock_irqsave+0x20/0x26
[ 48.365961] _extract_crng+0x52/0x130
[ 48.365961] urandom_read+0xf9/0x2c0
[ 48.365961] vfs_read+0xad/0x170
[ 48.365962] SyS_read+0x4b/0xa0
[ 48.365962] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.365962] do_syscall_64+0x70/0x200
[ 48.365963] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.365963] RIP: 0033:0x7f758a1d4410
[ 48.365963] RSP: 002b:00007fffde09c978 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.365964] RAX: ffffffffffffffda RBX: 0000000000022850 RCX: 00007f758a1d4410
[ 48.365965] RDX: 0000000000000400 RSI: 000055abdd543000 RDI: 0000000000000000
[ 48.365965] RBP: 00007fffde09c9a0 R08: 0000000000000000 R09: 000000000000000d
[ 48.365965] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.365966] R13: 00007f758a7fe690 R14: 0000000000000400 R15: 000055abdd543000
[ 48.365966] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.365979] NMI backtrace for cpu 11
[ 48.365980] CPU: 11 PID: 979 Comm: dd Tainted: G O 4.14.79 #33
[ 48.365980] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.365981] task: ffffa3da1c932ac0 task.stack: ffffc04e077c8000
[ 48.365981] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.365982] RSP: 0018:ffffc04e077cbda8 EFLAGS: 00000002
[ 48.365982] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: ffffc04e077cbef0
[ 48.365983] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.365983] RBP: ffffc04e077cbdd8 R08: 0000000000000000 R09: 0000000000000000
[ 48.365984] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc04e077cbe30
[ 48.365984] R13: ffffffffadecd3c8 R14: ffffc04e077cbe30 R15: 0000000000000040
[ 48.365985] FS: 00007f8747be2700(0000) GS:ffffa3da434c0000(0000)
knlGS:0000000000000000
[ 48.365985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.365986] CR2: 000055de0b60635c CR3: 0000000edca70001 CR4: 00000000003606a0
[ 48.365986] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.365986] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.365987] Call Trace:
[ 48.365987] do_raw_spin_lock+0xa0/0xb0
[ 48.365987] _raw_spin_lock_irqsave+0x20/0x26
[ 48.365988] _extract_crng+0x52/0x130
[ 48.365988] urandom_read+0xf9/0x2c0
[ 48.365988] vfs_read+0xad/0x170
[ 48.365989] SyS_read+0x4b/0xa0
[ 48.365989] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.365989] do_syscall_64+0x70/0x200
[ 48.365990] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.365990] RIP: 0033:0x7f87475b8410
[ 48.365990] RSP: 002b:00007fff13681918 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.365991] RAX: ffffffffffffffda RBX: 000000000001cf65 RCX: 00007f87475b8410
[ 48.365992] RDX: 0000000000000400 RSI: 0000561a3760c000 RDI: 0000000000000000
[ 48.365992] RBP: 00007fff13681940 R08: 0000000000000000 R09: 000000000000000d
[ 48.365992] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.365993] R13: 00007f8747be2690 R14: 0000000000000400 R15: 0000561a3760c000
[ 48.365993] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366007] NMI backtrace for cpu 6
[ 48.366008] CPU: 6 PID: 960 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366009] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366009] task: ffffa3da2c1c8e40 task.stack: ffffc04e07428000
[ 48.366009] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366010] RSP: 0018:ffffc04e0742bda8 EFLAGS: 00000002
[ 48.366011] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366011] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366011] RBP: ffffc04e0742bdd8 R08: 00000000a3d78655 R09: 000000001654483e
[ 48.366012] R10: 0000000010c7e4a4 R11: 00000000abedc2d0 R12: ffffc04e0742be30
[ 48.366012] R13: ffffffffadecd3c8 R14: ffffc04e0742be30 R15: 0000000000000040
[ 48.366013] FS: 00007fe757e41700(0000) GS:ffffa3da43380000(0000)
knlGS:0000000000000000
[ 48.366013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366013] CR2: 000055a35e25613c CR3: 0000000ede082004 CR4: 00000000003606a0
[ 48.366014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366014] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366014] Call Trace:
[ 48.366015] do_raw_spin_lock+0xa0/0xb0
[ 48.366015] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366015] _extract_crng+0x52/0x130
[ 48.366016] urandom_read+0xf9/0x2c0
[ 48.366016] vfs_read+0xad/0x170
[ 48.366016] SyS_read+0x4b/0xa0
[ 48.366017] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366017] do_syscall_64+0x70/0x200
[ 48.366017] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366018] RIP: 0033:0x7fe757817410
[ 48.366018] RSP: 002b:00007fff37fd8518 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366019] RAX: ffffffffffffffda RBX: 000000000002baaf RCX: 00007fe757817410
[ 48.366019] RDX: 0000000000000400 RSI: 000055cd666e2000 RDI: 0000000000000000
[ 48.366020] RBP: 00007fff37fd8540 R08: 0000000000000000 R09: 000000000000000d
[ 48.366020] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366020] R13: 00007fe757e41690 R14: 0000000000000400 R15: 000055cd666e2000
[ 48.366021] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 0[ 48.366034] NMI backtrace for
cpu 14
[ 48.366035] CPU: 14 PID: 962 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366035] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366036] task: ffffa3da1e3a0e40 task.stack: ffffc04e076c0000
[ 48.366036] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366036] RSP: 0018:ffffc04e076c3da8 EFLAGS: 00000002
[ 48.366037] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366037] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366038] RBP: ffffc04e076c3dd8 R08: 000000002f43f57e R09: 0000000072c41751
[ 48.366038] R10: 0000000066350959 R11: 00000000f3ffe87e R12: ffffc04e076c3e30
[ 48.366038] R13: ffffffffadecd3c8 R14: ffffc04e076c3e30 R15: 0000000000000040
[ 48.366039] FS: 00007f016e063700(0000) GS:ffffa3da43580000(0000)
knlGS:0000000000000000
[ 48.366039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366040] CR2: 0000562d8b3d63fa CR3: 0000000edc88a002 CR4: 00000000003606a0
[ 48.366040] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366040] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366041] Call Trace:
[ 48.366041] do_raw_spin_lock+0xa0/0xb0
[ 48.366041] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366042] _extract_crng+0x52/0x130
[ 48.366042] urandom_read+0xf9/0x2c0
[ 48.366042] vfs_read+0xad/0x170
[ 48.366043] SyS_read+0x4b/0xa0
[ 48.366043] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366044] do_syscall_64+0x70/0x200
[ 48.366044] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366044] RIP: 0033:0x7f016da39410
[ 48.366045] RSP: 002b:00007ffc2c45a3d8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366046] RAX: ffffffffffffffda RBX: 0000000000029ee7 RCX: 00007f016da39410
[ 48.366046] RDX: 0000000000000400 RSI: 0000558176fbc000 RDI: 0000000000000000
[ 48.366046] RBP: 00007ffc2c45a400 R08: 0000000000000000 R09: 000000000000000d
[ 48.366047] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366047] R13: 00007f016e063690 R14: 0000000000000400 R15: 0000558176fbc000
[ 48.366048] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366062] NMI backtrace for cpu 12
[ 48.366062] CPU: 12 PID: 958 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366063] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366063] task: ffffa3da2caeaac0 task.stack: ffffc04e07930000
[ 48.366064] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366064] RSP: 0018:ffffc04e07933da8 EFLAGS: 00000002
[ 48.366065] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366065] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366066] RBP: ffffc04e07933dd8 R08: 00000000944dcd42 R09: 000000000f07d125
[ 48.366066] R10: 000000001e88050f R11: 000000005909c042 R12: ffffc04e07933e30
[ 48.366067] R13: ffffffffadecd3c8 R14: ffffc04e07933e30 R15: 0000000000000040
[ 48.366067] FS: 00007fa18e75f700(0000) GS:ffffa3da43500000(0000)
knlGS:0000000000000000
[ 48.366067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366068] CR2: 00007f4254fd62d0 CR3: 0000000ede3e6006 CR4: 00000000003606a0
[ 48.366068] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366069] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366069] Call Trace:
[ 48.366069] do_raw_spin_lock+0xa0/0xb0
[ 48.366070] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366070] _extract_crng+0x52/0x130
[ 48.366070] urandom_read+0xf9/0x2c0
[ 48.366071] vfs_read+0xad/0x170
[ 48.366071] SyS_read+0x4b/0xa0
[ 48.366071] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366072] do_syscall_64+0x70/0x200
[ 48.366072] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366072] RIP: 0033:0x7fa18e135410
[ 48.366073] RSP: 002b:00007ffcf4ceb518 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366074] RAX: ffffffffffffffda RBX: 000000000002861a RCX: 00007fa18e135410
[ 48.366074] RDX: 0000000000000400 RSI: 000055ef19030000 RDI: 0000000000000000
[ 48.366074] RBP: 00007ffcf4ceb540 R08: 0000000000000000 R09: 000000000000000d
[ 48.366075] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366075] R13: 00007fa18e75f690 R14: 0000000000000400 R15: 000055ef19030000
[ 48.366075] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366089] NMI backtrace for cpu 10
[ 48.366090] CPU: 10 PID: 970 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366090] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366091] task: ffffa3da1e2b9c80 task.stack: ffffc04e07648000
[ 48.366091] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366091] RSP: 0018:ffffc04e0764bda8 EFLAGS: 00000002
[ 48.366092] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366093] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366093] RBP: ffffc04e0764bdd8 R08: 000000005ff6e80a R09: 00000000488cc97c
[ 48.366093] R10: 0000000049a24741 R11: 000000003c8a7f14 R12: ffffc04e0764be30
[ 48.366094] R13: ffffffffadecd3c8 R14: ffffc04e0764be30 R15: 0000000000000040
[ 48.366094] FS: 00007fce45b0a700(0000) GS:ffffa3da43480000(0000)
knlGS:0000000000000000
[ 48.366095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366095] CR2: 000055a39aef4938 CR3: 0000000edc924001 CR4: 00000000003606a0
[ 48.366095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366096] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366096] Call Trace:
[ 48.366096] do_raw_spin_lock+0xa0/0xb0
[ 48.366097] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366097] _extract_crng+0x52/0x130
[ 48.366097] urandom_read+0xf9/0x2c0
[ 48.366098] vfs_read+0xad/0x170
[ 48.366098] SyS_read+0x4b/0xa0
[ 48.366098] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366099] do_syscall_64+0x70/0x200
[ 48.366099] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366099] RIP: 0033:0x7fce454e0410
[ 48.366100] RSP: 002b:00007ffc04eebd58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366101] RAX: ffffffffffffffda RBX: 0000000000025a27 RCX: 00007fce454e0410
[ 48.366101] RDX: 0000000000000400 RSI: 000055cb94ecc000 RDI: 0000000000000000
[ 48.366101] RBP: 00007ffc04eebd80 R08: 0000000000000000 R09: 000000000000000d
[ 48.366102] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366102] R13: 00007fce45b0a690 R14: 0000000000000400 R15: 000055cb94ecc000
[ 48.366102] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366116] NMI backtrace for cpu 7
[ 48.366117] CPU: 7 PID: 956 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366117] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366118] task: ffffa3da1e3a0000 task.stack: ffffc04e07778000
[ 48.366118] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366118] RSP: 0018:ffffc04e0777bda8 EFLAGS: 00000002
[ 48.366119] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366120] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366120] RBP: ffffc04e0777bdd8 R08: 0000000087f524c7 R09: 00000000130c71f1
[ 48.366121] R10: 00000000a72ccfc7 R11: 00000000900b6142 R12: ffffc04e0777be30
[ 48.366121] R13: ffffffffadecd3c8 R14: ffffc04e0777be30 R15: 0000000000000040
[ 48.366122] FS: 00007f5312a09700(0000) GS:ffffa3da433c0000(0000)
knlGS:0000000000000000
[ 48.366122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366123] CR2: 00007fdd9d09aba0 CR3: 0000000ee16f6004 CR4: 00000000003606a0
[ 48.366123] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366123] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366124] Call Trace:
[ 48.366124] do_raw_spin_lock+0xa0/0xb0
[ 48.366124] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366125] _extract_crng+0x52/0x130
[ 48.366125] urandom_read+0xf9/0x2c0
[ 48.366125] vfs_read+0xad/0x170
[ 48.366126] SyS_read+0x4b/0xa0
[ 48.366126] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366126] do_syscall_64+0x70/0x200
[ 48.366127] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366127] RIP: 0033:0x7f53123df410
[ 48.366127] RSP: 002b:00007ffd5c3163d8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366128] RAX: ffffffffffffffda RBX: 00000000000267bd RCX: 00007f53123df410
[ 48.366128] RDX: 0000000000000400 RSI: 0000562641ec2000 RDI: 0000000000000000
[ 48.366129] RBP: 00007ffd5c316400 R08: 0000000000000000 R09: 000000000000000d
[ 48.366129] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366130] R13: 00007f5312a09690 R14: 0000000000000400 R15: 0000562641ec2000
[ 48.366130] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366143] NMI backtrace for cpu 15
[ 48.366144] CPU: 15 PID: 968 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366144] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366144] task: ffffa3da2caeb900 task.stack: ffffc04e07758000
[ 48.366145] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366145] RSP: 0018:ffffc04e0775bda8 EFLAGS: 00000002
[ 48.366146] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366146] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366147] RBP: ffffc04e0775bdd8 R08: 000000000cedf674 R09: 000000001205cfdd
[ 48.366147] R10: 00000000a2a512e0 R11: 00000000171b5795 R12: ffffc04e0775be30
[ 48.366147] R13: ffffffffadecd3c8 R14: ffffc04e0775be30 R15: 0000000000000040
[ 48.366148] FS: 00007fc929c81700(0000) GS:ffffa3da435c0000(0000)
knlGS:0000000000000000
[ 48.366148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366148] CR2: 00007f73a627b750 CR3: 0000000ee40b4001 CR4: 00000000003606a0
[ 48.366149] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366149] Call Trace:
[ 48.366150] do_raw_spin_lock+0xa0/0xb0
[ 48.366150] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366150] _extract_crng+0x52/0x130
[ 48.366151] urandom_read+0xf9/0x2c0
[ 48.366151] vfs_read+0xad/0x170
[ 48.366151] SyS_read+0x4b/0xa0
[ 48.366152] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366152] do_syscall_64+0x70/0x200
[ 48.366152] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366153] RIP: 0033:0x7fc929657410
[ 48.366153] RSP: 002b:00007ffe58971538 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366154] RAX: ffffffffffffffda RBX: 0000000000023d40 RCX: 00007fc929657410
[ 48.366154] RDX: 0000000000000400 RSI: 000055cc31cf3000 RDI: 0000000000000000
[ 48.366154] RBP: 00007ffe58971560 R08: 0000000000000000 R09: 000000000000000d
[ 48.366155] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366155] R13: 00007fc929c81690 R14: 0000000000000400 R15: 000055cc31cf3000
[ 48.366156] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366169] NMI backtrace for cpu 4
[ 48.366170] CPU: 4 PID: 953 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366170] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366171] task: ffffa3da1e288000 task.stack: ffffc04e076f8000
[ 48.366171] RIP: 0010:native_queued_spin_lock_slowpath+0x12/0x1b0
[ 48.366172] RSP: 0018:ffffc04e076fbda8 EFLAGS: 00000002
[ 48.366172] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366173] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366173] RBP: ffffc04e076fbdd8 R08: 0000000073a95a12 R09: 0000000050c6526e
[ 48.366174] R10: 00000000854d712d R11: 000000007bbf968d R12: ffffc04e076fbe30
[ 48.366174] R13: ffffffffadecd3c8 R14: ffffc04e076fbe30 R15: 0000000000000040
[ 48.366175] FS: 00007fcd2ba98700(0000) GS:ffffa3da43300000(0000)
knlGS:0000000000000000
[ 48.366175] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366175] CR2: 00007f364c1f08c0 CR3: 0000000ede396001 CR4: 00000000003606a0
[ 48.366176] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366176] Call Trace:
[ 48.366177] do_raw_spin_lock+0xa0/0xb0
[ 48.366177] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366177] _extract_crng+0x52/0x130
[ 48.366178] urandom_read+0xf9/0x2c0
[ 48.366178] vfs_read+0xad/0x170
[ 48.366178] SyS_read+0x4b/0xa0
[ 48.366179] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366179] do_syscall_64+0x70/0x200
[ 48.366180] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366180] RIP: 0033:0x7fcd2b46e410
[ 48.366180] RSP: 002b:00007ffc41e60388 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366181] RAX: ffffffffffffffda RBX: 000000000002b1a8 RCX: 00007fcd2b46e410
[ 48.366182] RDX: 0000000000000400 RSI: 000055f8d658e000 RDI: 0000000000000000
[ 48.366182] RBP: 00007ffc41e603b0 R08: 0000000000000000 R09: 000000000000000d
[ 48.366182] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366183] R13: 00007fcd2ba98690 R14: 0000000000000400 R15: 000055f8d658e000
[ 48.366183] Code: 44 24 08 c6 03 01 48 8b 2c 24 48 c7 00 00 00 00
00 e9 29 fe ff ff 0f 1f 00 0f 1f 44 00 00 55 0f 1f 44 00 00 ba 01 00
00 00 8b 07 <85> c0 0f 85 b2 00 00 00 f0 0f b1 17 85 c0 75 ee 5d c3 81
fe 00
[ 48.366197] NMI backtrace for cpu 1
[ 48.366197] CPU: 1 PID: 972 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366198] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366198] task: ffffa3da2e76aac0 task.stack: ffffc04e07788000
[ 48.366199] RIP: 0010:native_queued_spin_lock_slowpath+0x12/0x1b0
[ 48.366199] RSP: 0018:ffffc04e0778bda8 EFLAGS: 00000002
[ 48.366200] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: ffffc04e0778bef0
[ 48.366200] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366201] RBP: ffffc04e0778bdd8 R08: 0000000000000000 R09: 0000000000000000
[ 48.366201] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc04e0778be30
[ 48.366201] R13: ffffffffadecd3c8 R14: ffffc04e0778be30 R15: 0000000000000040
[ 48.366202] FS: 00007fcb41458700(0000) GS:ffffa3da43240000(0000)
knlGS:0000000000000000
[ 48.366202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366203] CR2: 000055a35e25613c CR3: 0000000edc9ca005 CR4: 00000000003606a0
[ 48.366203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366203] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366204] Call Trace:
[ 48.366204] do_raw_spin_lock+0xa0/0xb0
[ 48.366204] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366205] _extract_crng+0x52/0x130
[ 48.366205] urandom_read+0xf9/0x2c0
[ 48.366205] vfs_read+0xad/0x170
[ 48.366206] SyS_read+0x4b/0xa0
[ 48.366206] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366206] do_syscall_64+0x70/0x200
[ 48.366207] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366207] RIP: 0033:0x7fcb40e2e410
[ 48.366207] RSP: 002b:00007ffdd2b5d348 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366208] RAX: ffffffffffffffda RBX: 0000000000025648 RCX: 00007fcb40e2e410
[ 48.366208] RDX: 0000000000000400 RSI: 000055b7a0519000 RDI: 0000000000000000
[ 48.366209] RBP: 00007ffdd2b5d370 R08: 0000000000000000 R09: 000000000000000d
[ 48.366209] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366209] R13: 00007fcb41458690 R14: 0000000000000400 R15: 000055b7a0519000
[ 48.366210] Code: 44 24 08 c6 03 01 48 8b 2c 24 48 c7 00 00 00 00
00 e9 29 fe ff ff 0f 1f 00 0f 1f 44 00 00 55 0f 1f 44 00 00 ba 01 00
00 00 8b 07 <85> c0 0f 85 b2 00 00 00 f0 0f b1 17 85 c0 75 ee 5d c3 81
fe 00
[ 48.366223] NMI backtrace for cpu 2
[ 48.366223] CPU: 2 PID: 952 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366224] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366224] task: ffffa3da1e2b8000 task.stack: ffffc04e077e8000
[ 48.366224] RIP: 0010:native_queued_spin_lock_slowpath+0x12/0x1b0
[ 48.366225] RSP: 0018:ffffc04e077ebda8 EFLAGS: 00000002
[ 48.366226] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366226] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366226] RBP: ffffc04e077ebdd8 R08: 0000000043cef058 R09: 00000000cfa81335
[ 48.366227] R10: 000000003ab03ada R11: 00000000a26a1af1 R12: ffffc04e077ebe30
[ 48.366227] R13: ffffffffadecd3c8 R14: ffffc04e077ebe30 R15: 0000000000000040
[ 48.366227] FS: 00007f24d6c90700(0000) GS:ffffa3da43280000(0000)
knlGS:0000000000000000
[ 48.366228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366228] CR2: 0000555a040f73fa CR3: 0000000ef31c6005 CR4: 00000000003606a0
[ 48.366228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366229] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366229] Call Trace:
[ 48.366229] do_raw_spin_lock+0xa0/0xb0
[ 48.366230] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366230] _extract_crng+0x52/0x130
[ 48.366231] urandom_read+0xf9/0x2c0
[ 48.366231] vfs_read+0xad/0x170
[ 48.366231] SyS_read+0x4b/0xa0
[ 48.366232] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366232] do_syscall_64+0x70/0x200
[ 48.366233] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366233] RIP: 0033:0x7f24d6666410
[ 48.366233] RSP: 002b:00007ffc09334398 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366234] RAX: ffffffffffffffda RBX: 0000000000025b40 RCX: 00007f24d6666410
[ 48.366235] RDX: 0000000000000400 RSI: 000055ac9bc9e000 RDI: 0000000000000000
[ 48.366235] RBP: 00007ffc093343c0 R08: 0000000000000000 R09: 000000000000000d
[ 48.366235] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366236] R13: 00007f24d6c90690 R14: 0000000000000400 R15: 000055ac9bc9e000
[ 48.366236] Code: 44 24 08 c6 03 01 48 8b 2c 24 48 c7 00 00 00 00
00 e9 29 fe ff ff 0f 1f 00 0f 1f 44 00 00 55 0f 1f 44 00 00 ba 01 00
00 00 8b 07 <85> c0 0f 85 b2 00 00 00 f0 0f b1 17 85 c0 75 ee 5d c3 81
fe 00
[ 48.366257] NMI backtrace for cpu 8
[ 48.366258] CPU: 8 PID: 978 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366259] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366259] task: ffffa3da1e311c80 task.stack: ffffc04e077c0000
[ 48.366260] RIP: 0010:native_queued_spin_lock_slowpath+0xce/0x1b0
[ 48.366260] RSP: 0018:ffffc04e077c3da8 EFLAGS: 00000002
[ 48.366261] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366262] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366262] RBP: ffffc04e077c3dd8 R08: 000000009f7f8ef3 R09: 0000000048862586
[ 48.366263] R10: 00000000c997070d R11: 00000000fe1ab98c R12: ffffc04e077c3e30
[ 48.366263] R13: ffffffffadecd3c8 R14: ffffc04e077c3e30 R15: 0000000000000040
[ 48.366264] FS: 00007fdb118b0700(0000) GS:ffffa3da43400000(0000)
knlGS:0000000000000000
[ 48.366264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366265] CR2: 000055de0cdbcac0 CR3: 0000000edca4c006 CR4: 00000000003606a0
[ 48.366265] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366266] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366266] Call Trace:
[ 48.366266] do_raw_spin_lock+0xa0/0xb0
[ 48.366267] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366267] _extract_crng+0x52/0x130
[ 48.366267] urandom_read+0xf9/0x2c0
[ 48.366268] vfs_read+0xad/0x170
[ 48.366268] SyS_read+0x4b/0xa0
[ 48.366269] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366269] do_syscall_64+0x70/0x200
[ 48.366269] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366270] RIP: 0033:0x7fdb11286410
[ 48.366270] RSP: 002b:00007ffdbddda708 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366271] RAX: ffffffffffffffda RBX: 000000000002658b RCX: 00007fdb11286410
[ 48.366271] RDX: 0000000000000400 RSI: 00005637eee18000 RDI: 0000000000000000
[ 48.366272] RBP: 00007ffdbddda730 R08: 0000000000000000 R09: 000000000000000d
[ 48.366272] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366273] R13: 00007fdb118b0690 R14: 0000000000000400 R15: 00005637eee18000
[ 48.366273] Code: 75 2e be 01 00 00 00 f0 0f b1 37 85 c0 75 21 65
ff 0d 93 ce f7 52 5d c3 f3 90 8b 37 81 fe 00 01 00 00 74 f4 e9 64 ff
ff ff f3 90 <e9> 3d ff ff ff 8d 71 01 c1 e2 10 c1 e6 12 09 d6 89 f0 c1
e8 10
[ 48.366287] NMI backtrace for cpu 0
[ 48.366288] CPU: 0 PID: 950 Comm: dd Tainted: G O 4.14.79 #33
[ 48.366289] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366289] task: ffffa3da1e310000 task.stack: ffffc04e076e8000
[ 48.366290] RIP: 0010:native_queued_spin_lock_slowpath+0x12/0x1b0
[ 48.366290] RSP: 0018:ffffc04e076ebda8 EFLAGS: 00000002
[ 48.366291] RAX: 0000000000000001 RBX: ffffffffadecd3c8 RCX: 0000000000000000
[ 48.366292] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffadecd3c8
[ 48.366292] RBP: ffffc04e076ebdd8 R08: 00000000173ee25a R09: 00000000307066a7
[ 48.366293] R10: 000000007bb0d182 R11: 0000000075da0cf3 R12: ffffc04e076ebe30
[ 48.366293] R13: ffffffffadecd3c8 R14: ffffc04e076ebe30 R15: 0000000000000040
[ 48.366294] FS: 00007f75e9e55700(0000) GS:ffffa3da43200000(0000)
knlGS:0000000000000000
[ 48.366294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.366295] CR2: 000000c420dd4000 CR3: 0000000ede370005 CR4: 00000000003606b0
[ 48.366295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.366295] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.366296] Call Trace:
[ 48.366296] do_raw_spin_lock+0xa0/0xb0
[ 48.366296] _raw_spin_lock_irqsave+0x20/0x26
[ 48.366297] _extract_crng+0x52/0x130
[ 48.366297] urandom_read+0xf9/0x2c0
[ 48.366297] vfs_read+0xad/0x170
[ 48.366298] SyS_read+0x4b/0xa0
[ 48.366298] ? __audit_syscall_exit+0x21e/0x2c0
[ 48.366298] do_syscall_64+0x70/0x200
[ 48.366299] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 48.366299] RIP: 0033:0x7f75e982b410
[ 48.366299] RSP: 002b:00007ffd3f4b76c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 48.366300] RAX: ffffffffffffffda RBX: 0000000000029034 RCX: 00007f75e982b410
[ 48.366301] RDX: 0000000000000400 RSI: 000055da49bec000 RDI: 0000000000000000
[ 48.366301] RBP: 00007ffd3f4b76f0 R08: 0000000000000000 R09: 000000000000000d
[ 48.366301] R10: fffffffffffff000 R11: 0000000000000246 R12: 0000000000000000
[ 48.366302] R13: 00007f75e9e55690 R14: 0000000000000400 R15: 000055da49bec000
[ 48.366302] Code: 44 24 08 c6 03 01 48 8b 2c 24 48 c7 00 00 00 00
00 e9 29 fe ff ff 0f 1f 00 0f 1f 44 00 00 55 0f 1f 44 00 00 ba 01 00
00 00 8b 07 <85> c0 0f 85 b2 00 00 00 f0 0f b1 17 85 c0 75 ee 5d c3 81
fe 00
[ 48.366857] Kernel panic - not syncing: softlockup: hung tasks
[ 48.366858] CPU: 3 PID: 1102 Comm: lockme Tainted: G O L
4.14.79 #33
[ 48.366859] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 48.366860] Call Trace:
[ 48.366862] <IRQ>
[ 48.366864] dump_stack+0x63/0x82
[ 48.366868] panic+0xd6/0x22d
[ 48.366871] ? cpumask_next+0x1a/0x20
[ 48.366874] watchdog_timer_fn+0x22b/0x240
[ 48.366876] ? watchdog+0x30/0x30
[ 48.366879] __hrtimer_run_queues+0xed/0x240
[ 48.366881] hrtimer_interrupt+0xac/0x1b0
[ 48.366884] smp_apic_timer_interrupt+0x70/0x140
[ 48.366886] apic_timer_interrupt+0x7d/0x90
[ 48.366887] </IRQ>
[ 48.366890] RIP: 0010:hog_thread+0x13/0x1000 [lockme]
[ 48.366890] RSP: 0018:ffffc04e077e3f10 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffff10
[ 48.366892] RAX: 0000000000000011 RBX: ffffa3da362ffa80 RCX: 0000000000000000
[ 48.366893] RDX: ffffa3da432dd740 RSI: ffffa3da432d54f8 RDI: ffffa3da432d54f8
[ 48.366893] RBP: ffffc04e077e3f48 R08: 0000000000000030 R09: 0000000000000000
[ 48.366894] R10: 0000000000000358 R11: 0000000000000000 R12: ffffa3da33f7c940
[ 48.366895] R13: ffffc04e079b7c58 R14: 0000000000000000 R15: ffffa3da362ffac8
[ 48.366898] kthread+0x127/0x160
[ 48.366899] ? 0xffffffffc04bc000
[ 48.366900] ? kthread_create_on_node+0x40/0x40
[ 48.366902] ret_from_fork+0x35/0x40
[ 49.433843] Shutting down cpus with NMI
[ 49.434570] Kernel Offset: 0x2c000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 51.728081] ACPI MEMORY or I/O RESET_REG.
SeaBIOS (version 1.8.2-20181112_143635-google)
Total RAM Size = 0x0000000f00000000 = 61440 MiB
CPUs found: 16 Max CPUs supported: 16
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=14680064 = 7168 MiB
drive 0x000f2c60: PCHS=0/0/0 translation=lba LCHS=913/255/63 s=14680064
Booting from Hard Disk 0...
<abbreviated>
On Fri, Dec 28, 2018 at 2:27 AM Greg KH <[email protected]> wrote:
>
> On Fri, Dec 28, 2018 at 09:16:51AM +0900, Sergey Senozhatsky wrote:
> > On (12/12/18 17:10), Sergey Senozhatsky wrote:
> > > And there will be another -stable backport request in a week or so.
> >
> > The remaining one:
> >
> > commit c7c3f05e341a9a2bd
>
> Now queued up, thanks.
>
> greg k-h
--
Best,
Daniel
On (12/28/18 16:03), Daniel Wang wrote:
> Thanks. I was able to confirm that commit c7c3f05e341a9a2bd alone
> fixed the problem for me. As expected, all 16 CPUs' stacktrace was
> printed, before a final panic stack dump and a successful reboot.
Cool, thanks!
-ss