2019-01-16 08:23:24

by Ivan Babrou

[permalink] [raw]
Subject: ipmi_msghandler crashes in 4.19

Hey,

We've upgraded some machines from 4.14 to 4.19 and started seeing rare
crashes like these:

[75855.909507] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000d00
[75855.925667] PGD 0 P4D 0
[75855.936359] Oops: 0000 [#1] SMP PTI
[75855.947951] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G O
4.19.13-cloudflare-2019.1.4 #2019.1.4
[75855.966028] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
T42S-2U(LBG-4) -/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10 06/29/2018
[75855.994246] RIP: 0010:__srcu_read_unlock+0xe/0x20
[75856.006851] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f
44 00
[75856.041551] RSP: 0018:ffffba00cc66fd48 EFLAGS: 00010286
[75856.054564] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[75856.069449] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000018
[75856.084168] RBP: ffffa28276abb200 R08: ffffa29119772540 R09: 0000000000000000
[75856.098756] R10: 00000000000c1425 R11: ffffa29120a201c8 R12: ffffa29118d57e08
[75856.113422] R13: dead000000000200 R14: dead000000000100 R15: ffffa27dcbafa400
[75856.127798] FS: 0000000000000000(0000) GS:ffffa29120a00000(0000)
knlGS:0000000000000000
[75856.138973] perf: interrupt took too long (7735 > 7677), lowering
kernel.perf_event_max_sample_rate to 25000
[75856.143083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75856.172956] CR2: 0000000000000d00 CR3: 000000187ca0a005 CR4: 00000000007606f0
[75856.187116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75856.201312] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[75856.215274] PKRU: 55555554
[75856.224621] Call Trace:
[75856.230942] perf: interrupt took too long (9748 > 9668), lowering
kernel.perf_event_max_sample_rate to 20000
[75856.233560] deliver_response+0x88/0xd0 [ipmi_msghandler]
[75856.261744] deliver_local_response+0xe/0x30 [ipmi_msghandler]
[75856.273937] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
[75856.285962] ? __switch_to_asm+0x34/0x70
[75856.295957] ? __switch_to_asm+0x40/0x70
[75856.306011] ? __switch_to_asm+0x34/0x70
[75856.315872] ? __switch_to_asm+0x40/0x70
[75856.325562] ? __switch_to_asm+0x34/0x70
[75856.325565] ? __switch_to_asm+0x40/0x70
[75856.325567] ? __switch_to_asm+0x34/0x70
[75856.325569] ? __switch_to_asm+0x40/0x70
[75856.325578] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
[75856.325583] ? __switch_to_asm+0x34/0x70
[75856.381815] tasklet_action_common.isra.21+0x4e/0xf0
[75856.381823] __do_softirq+0xd8/0x2d2
[75856.399498] ? sort_range+0x20/0x20
[75856.399506] run_ksoftirqd+0x1a/0x20
[75856.415184] smpboot_thread_fn+0xc5/0x160
[75856.415190] kthread+0x113/0x130
[75856.430502] ? kthread_create_worker_on_cpu+0x70/0x70
[75856.430512] ret_from_fork+0x35/0x40
[75856.446793] Modules linked in: xt_connlimit nf_conncount xt_bpf
xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul crc32c_intel
ipmi_ssif pcbc aesni_intel aes_x86_64 crypto_simd sfc(O)
[75856.446862] cryptd glue_helper mdio ipmi_si xhci_pci i40e tpm_crb
ioatdma ipmi_devintf xhci_hcd dca ipmi_msghandler tpm_tis tpm_tis_core
tpm efivarfs ip_tables x_tables
[75856.569103] CR2: 0000000000000d00
[75856.569124] ---[ end trace 604e13a0789ee766 ]---

[117620.868720] general protection fault: 0000 [#1] SMP PTI
[117620.911871] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G
O 4.19.0-cloudflare-2018.10.3 #1
[117620.937885] Hardware name: Quanta Computer Inc QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[117620.963750] RIP: 0010:__srcu_read_unlock+0xe/0x20
[117620.984950] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40
00 0f 1f 44 00
[117621.020240] perf: interrupt took too long (10250 > 10230),
lowering kernel.perf_event_max_sample_rate to 19000
[117621.036578] RSP: 0018:ffff89007f603e38 EFLAGS: 00010286
[117621.073528] perf: interrupt took too long (12979 > 12812),
lowering kernel.perf_event_max_sample_rate to 15000
[117621.084232] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[117621.133897] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
403a080083ad0878
[117621.156877] RBP: ffff890d90a78e00 R08: 0000000000000002 R09:
0000000000020900
[117621.179507] R10: 0000eb0270fbf3f0 R11: ffff89007f603ca4 R12:
ffff89107b411e08
[117621.179509] R13: dead000000000200 R14: dead000000000100 R15:
ffff890a9b3e6800
[117621.179511] FS: 0000000000000000(0000) GS:ffff89007f600000(0000)
knlGS:0000000000000000
[117621.179513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[117621.179514] CR2: 00007f193f3095e0 CR3: 0000001f79e0a001 CR4:
00000000003606f0
[117621.179526] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[117621.179527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[117621.179529] Call Trace:
[117621.179532] <IRQ>
[117621.179552] deliver_response+0x88/0xd0 [ipmi_msghandler]
[117621.179557] deliver_local_response+0xe/0x30 [ipmi_msghandler]
[117621.179561] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
[117621.179568] ? try_to_wake_up+0x54/0x470
[117621.179575] ? ipmi_si_platform_shutdown+0x20/0x20 [ipmi_si]
[117621.236448] perf: interrupt took too long (16285 > 16223),
lowering kernel.perf_event_max_sample_rate to 12000
[117621.247534] ? kcs_event+0x17d/0x730 [ipmi_si]
[117621.426069] perf: interrupt took too long (20619 > 20356),
lowering kernel.perf_event_max_sample_rate to 9000
[117621.437773] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
[117621.535276] tasklet_action_common.isra.21+0x4e/0xf0
[117621.535284] __do_softirq+0xd8/0x2d2
[117621.567383] irq_exit+0xb4/0xc0
[117621.567387] smp_apic_timer_interrupt+0x74/0x140
[117621.567390] apic_timer_interrupt+0xf/0x20
[117621.567392] </IRQ>
[117621.567397] RIP: 0010:finish_task_switch+0x78/0x260
[117621.567399] Code: 65 48 8b 1c 25 00 4d 01 00 0f 1f 44 00 00 0f 1f
44 00 00 41 c7 46 38 00 00 00 00 41 c6 04 24 00 fb 65 48 8b 04 25 00
4d 01 00 <0f> 1f 44 00 00 4d 85 ed 74 1a 41 8b 85 80 03 00 00


2019-01-16 22:53:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: ipmi_msghandler crashes in 4.19

On Tue, Jan 15, 2019 at 10:36:42AM -0800, Ivan Babrou wrote:
> Hey,
>
> We've upgraded some machines from 4.14 to 4.19 and started seeing rare
> crashes like these:

<snip>

Does the fix posted here:
https://lore.kernel.org/lkml/[email protected]/

help resolve this?

thanks,

greg k-h

2019-01-22 19:20:24

by Ivan Babrou

[permalink] [raw]
Subject: Re: ipmi_msghandler crashes in 4.19

We're going to try this, but crashes are really infrequent and our
stack is slightly different:

* We have RIP = __srcu_read_unlock on x86_64
* Patch mentions PC = __srcu_read_lock on aarch64


On Wed, Jan 16, 2019 at 7:49 AM Greg KH <[email protected]> wrote:
>
> On Tue, Jan 15, 2019 at 10:36:42AM -0800, Ivan Babrou wrote:
> > Hey,
> >
> > We've upgraded some machines from 4.14 to 4.19 and started seeing rare
> > crashes like these:
>
> <snip>
>
> Does the fix posted here:
> https://lore.kernel.org/lkml/[email protected]/
>
> help resolve this?
>
> thanks,
>
> greg k-h

2019-01-29 10:29:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: ipmi_msghandler crashes in 4.19

On Tue, Jan 15, 2019 at 10:36:42AM -0800, Ivan Babrou wrote:
> Hey,
>
> We've upgraded some machines from 4.14 to 4.19 and started seeing rare
> crashes like these:
>
> [75855.909507] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000d00
> [75855.925667] PGD 0 P4D 0
> [75855.936359] Oops: 0000 [#1] SMP PTI
> [75855.947951] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G O
> 4.19.13-cloudflare-2019.1.4 #2019.1.4
> [75855.966028] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) -/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10 06/29/2018
> [75855.994246] RIP: 0010:__srcu_read_unlock+0xe/0x20
> [75856.006851] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
> 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
> 48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f
> 44 00
> [75856.041551] RSP: 0018:ffffba00cc66fd48 EFLAGS: 00010286
> [75856.054564] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [75856.069449] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000018
> [75856.084168] RBP: ffffa28276abb200 R08: ffffa29119772540 R09: 0000000000000000
> [75856.098756] R10: 00000000000c1425 R11: ffffa29120a201c8 R12: ffffa29118d57e08
> [75856.113422] R13: dead000000000200 R14: dead000000000100 R15: ffffa27dcbafa400
> [75856.127798] FS: 0000000000000000(0000) GS:ffffa29120a00000(0000)
> knlGS:0000000000000000
> [75856.138973] perf: interrupt took too long (7735 > 7677), lowering
> kernel.perf_event_max_sample_rate to 25000
> [75856.143083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [75856.172956] CR2: 0000000000000d00 CR3: 000000187ca0a005 CR4: 00000000007606f0
> [75856.187116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75856.201312] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [75856.215274] PKRU: 55555554
> [75856.224621] Call Trace:
> [75856.230942] perf: interrupt took too long (9748 > 9668), lowering
> kernel.perf_event_max_sample_rate to 20000
> [75856.233560] deliver_response+0x88/0xd0 [ipmi_msghandler]
> [75856.261744] deliver_local_response+0xe/0x30 [ipmi_msghandler]
> [75856.273937] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
> [75856.285962] ? __switch_to_asm+0x34/0x70
> [75856.295957] ? __switch_to_asm+0x40/0x70
> [75856.306011] ? __switch_to_asm+0x34/0x70
> [75856.315872] ? __switch_to_asm+0x40/0x70
> [75856.325562] ? __switch_to_asm+0x34/0x70
> [75856.325565] ? __switch_to_asm+0x40/0x70
> [75856.325567] ? __switch_to_asm+0x34/0x70
> [75856.325569] ? __switch_to_asm+0x40/0x70
> [75856.325578] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
> [75856.325583] ? __switch_to_asm+0x34/0x70
> [75856.381815] tasklet_action_common.isra.21+0x4e/0xf0
> [75856.381823] __do_softirq+0xd8/0x2d2
> [75856.399498] ? sort_range+0x20/0x20
> [75856.399506] run_ksoftirqd+0x1a/0x20
> [75856.415184] smpboot_thread_fn+0xc5/0x160
> [75856.415190] kthread+0x113/0x130
> [75856.430502] ? kthread_create_worker_on_cpu+0x70/0x70
> [75856.430512] ret_from_fork+0x35/0x40
> [75856.446793] Modules linked in: xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul crc32c_intel
> ipmi_ssif pcbc aesni_intel aes_x86_64 crypto_simd sfc(O)
> [75856.446862] cryptd glue_helper mdio ipmi_si xhci_pci i40e tpm_crb
> ioatdma ipmi_devintf xhci_hcd dca ipmi_msghandler tpm_tis tpm_tis_core
> tpm efivarfs ip_tables x_tables
> [75856.569103] CR2: 0000000000000d00
> [75856.569124] ---[ end trace 604e13a0789ee766 ]---
>
> [117620.868720] general protection fault: 0000 [#1] SMP PTI
> [117620.911871] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G
> O 4.19.0-cloudflare-2018.10.3 #1
> [117620.937885] Hardware name: Quanta Computer Inc QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [117620.963750] RIP: 0010:__srcu_read_unlock+0xe/0x20
> [117620.984950] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
> 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
> 48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40
> 00 0f 1f 44 00
> [117621.020240] perf: interrupt took too long (10250 > 10230),
> lowering kernel.perf_event_max_sample_rate to 19000
> [117621.036578] RSP: 0018:ffff89007f603e38 EFLAGS: 00010286
> [117621.073528] perf: interrupt took too long (12979 > 12812),
> lowering kernel.perf_event_max_sample_rate to 15000
> [117621.084232] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [117621.133897] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> 403a080083ad0878
> [117621.156877] RBP: ffff890d90a78e00 R08: 0000000000000002 R09:
> 0000000000020900
> [117621.179507] R10: 0000eb0270fbf3f0 R11: ffff89007f603ca4 R12:
> ffff89107b411e08
> [117621.179509] R13: dead000000000200 R14: dead000000000100 R15:
> ffff890a9b3e6800
> [117621.179511] FS: 0000000000000000(0000) GS:ffff89007f600000(0000)
> knlGS:0000000000000000
> [117621.179513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [117621.179514] CR2: 00007f193f3095e0 CR3: 0000001f79e0a001 CR4:
> 00000000003606f0
> [117621.179526] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [117621.179527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [117621.179529] Call Trace:
> [117621.179532] <IRQ>
> [117621.179552] deliver_response+0x88/0xd0 [ipmi_msghandler]
> [117621.179557] deliver_local_response+0xe/0x30 [ipmi_msghandler]
> [117621.179561] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
> [117621.179568] ? try_to_wake_up+0x54/0x470
> [117621.179575] ? ipmi_si_platform_shutdown+0x20/0x20 [ipmi_si]
> [117621.236448] perf: interrupt took too long (16285 > 16223),
> lowering kernel.perf_event_max_sample_rate to 12000
> [117621.247534] ? kcs_event+0x17d/0x730 [ipmi_si]
> [117621.426069] perf: interrupt took too long (20619 > 20356),
> lowering kernel.perf_event_max_sample_rate to 9000
> [117621.437773] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
> [117621.535276] tasklet_action_common.isra.21+0x4e/0xf0
> [117621.535284] __do_softirq+0xd8/0x2d2
> [117621.567383] irq_exit+0xb4/0xc0
> [117621.567387] smp_apic_timer_interrupt+0x74/0x140
> [117621.567390] apic_timer_interrupt+0xf/0x20
> [117621.567392] </IRQ>
> [117621.567397] RIP: 0010:finish_task_switch+0x78/0x260
> [117621.567399] Code: 65 48 8b 1c 25 00 4d 01 00 0f 1f 44 00 00 0f 1f
> 44 00 00 41 c7 46 38 00 00 00 00 41 c6 04 24 00 fb 65 48 8b 04 25 00
> 4d 01 00 <0f> 1f 44 00 00 4d 85 ed 74 1a 41 8b 85 80 03 00 00

This should all be fixed in the latest 4.19.y release, right?

thanks,

greg k-h

2019-01-30 15:59:22

by Ignat Korchagin

[permalink] [raw]
Subject: Re: ipmi_msghandler crashes in 4.19

We're rolling out 4.19.18 across the fleet. Hopefully, we'll not see
it anymore, but if we do, we'll let you know.

Regards,
Ignat

On Tue, Jan 29, 2019 at 10:29 AM Greg KH <[email protected]> wrote:
>
> On Tue, Jan 15, 2019 at 10:36:42AM -0800, Ivan Babrou wrote:
> > Hey,
> >
> > We've upgraded some machines from 4.14 to 4.19 and started seeing rare
> > crashes like these:
> >
> > [75855.909507] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000d00
> > [75855.925667] PGD 0 P4D 0
> > [75855.936359] Oops: 0000 [#1] SMP PTI
> > [75855.947951] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G O
> > 4.19.13-cloudflare-2019.1.4 #2019.1.4
> > [75855.966028] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> > T42S-2U(LBG-4) -/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10 06/29/2018
> > [75855.994246] RIP: 0010:__srcu_read_unlock+0xe/0x20
> > [75856.006851] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
> > 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
> > 48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f
> > 44 00
> > [75856.041551] RSP: 0018:ffffba00cc66fd48 EFLAGS: 00010286
> > [75856.054564] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [75856.069449] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000018
> > [75856.084168] RBP: ffffa28276abb200 R08: ffffa29119772540 R09: 0000000000000000
> > [75856.098756] R10: 00000000000c1425 R11: ffffa29120a201c8 R12: ffffa29118d57e08
> > [75856.113422] R13: dead000000000200 R14: dead000000000100 R15: ffffa27dcbafa400
> > [75856.127798] FS: 0000000000000000(0000) GS:ffffa29120a00000(0000)
> > knlGS:0000000000000000
> > [75856.138973] perf: interrupt took too long (7735 > 7677), lowering
> > kernel.perf_event_max_sample_rate to 25000
> > [75856.143083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [75856.172956] CR2: 0000000000000d00 CR3: 000000187ca0a005 CR4: 00000000007606f0
> > [75856.187116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [75856.201312] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [75856.215274] PKRU: 55555554
> > [75856.224621] Call Trace:
> > [75856.230942] perf: interrupt took too long (9748 > 9668), lowering
> > kernel.perf_event_max_sample_rate to 20000
> > [75856.233560] deliver_response+0x88/0xd0 [ipmi_msghandler]
> > [75856.261744] deliver_local_response+0xe/0x30 [ipmi_msghandler]
> > [75856.273937] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
> > [75856.285962] ? __switch_to_asm+0x34/0x70
> > [75856.295957] ? __switch_to_asm+0x40/0x70
> > [75856.306011] ? __switch_to_asm+0x34/0x70
> > [75856.315872] ? __switch_to_asm+0x40/0x70
> > [75856.325562] ? __switch_to_asm+0x34/0x70
> > [75856.325565] ? __switch_to_asm+0x40/0x70
> > [75856.325567] ? __switch_to_asm+0x34/0x70
> > [75856.325569] ? __switch_to_asm+0x40/0x70
> > [75856.325578] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
> > [75856.325583] ? __switch_to_asm+0x34/0x70
> > [75856.381815] tasklet_action_common.isra.21+0x4e/0xf0
> > [75856.381823] __do_softirq+0xd8/0x2d2
> > [75856.399498] ? sort_range+0x20/0x20
> > [75856.399506] run_ksoftirqd+0x1a/0x20
> > [75856.415184] smpboot_thread_fn+0xc5/0x160
> > [75856.415190] kthread+0x113/0x130
> > [75856.430502] ? kthread_create_worker_on_cpu+0x70/0x70
> > [75856.430512] ret_from_fork+0x35/0x40
> > [75856.446793] Modules linked in: xt_connlimit nf_conncount xt_bpf
> > xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> > algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> > ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> > ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> > nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> > xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> > nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> > nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> > iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> > ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> > x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul crc32c_intel
> > ipmi_ssif pcbc aesni_intel aes_x86_64 crypto_simd sfc(O)
> > [75856.446862] cryptd glue_helper mdio ipmi_si xhci_pci i40e tpm_crb
> > ioatdma ipmi_devintf xhci_hcd dca ipmi_msghandler tpm_tis tpm_tis_core
> > tpm efivarfs ip_tables x_tables
> > [75856.569103] CR2: 0000000000000d00
> > [75856.569124] ---[ end trace 604e13a0789ee766 ]---
> >
> > [117620.868720] general protection fault: 0000 [#1] SMP PTI
> > [117620.911871] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G
> > O 4.19.0-cloudflare-2018.10.3 #1
> > [117620.937885] Hardware name: Quanta Computer Inc QuantaPlex
> > T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > [117620.963750] RIP: 0010:__srcu_read_unlock+0xe/0x20
> > [117620.984950] Code: 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 c3
> > 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00
> > 48 63 f6 <48> 8b 87 e8 0c 00 00 65 48 ff 44 f0 10 c3 0f 1f 40
> > 00 0f 1f 44 00
> > [117621.020240] perf: interrupt took too long (10250 > 10230),
> > lowering kernel.perf_event_max_sample_rate to 19000
> > [117621.036578] RSP: 0018:ffff89007f603e38 EFLAGS: 00010286
> > [117621.073528] perf: interrupt took too long (12979 > 12812),
> > lowering kernel.perf_event_max_sample_rate to 15000
> > [117621.084232] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 0000000000000000
> > [117621.133897] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > 403a080083ad0878
> > [117621.156877] RBP: ffff890d90a78e00 R08: 0000000000000002 R09:
> > 0000000000020900
> > [117621.179507] R10: 0000eb0270fbf3f0 R11: ffff89007f603ca4 R12:
> > ffff89107b411e08
> > [117621.179509] R13: dead000000000200 R14: dead000000000100 R15:
> > ffff890a9b3e6800
> > [117621.179511] FS: 0000000000000000(0000) GS:ffff89007f600000(0000)
> > knlGS:0000000000000000
> > [117621.179513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [117621.179514] CR2: 00007f193f3095e0 CR3: 0000001f79e0a001 CR4:
> > 00000000003606f0
> > [117621.179526] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [117621.179527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [117621.179529] Call Trace:
> > [117621.179532] <IRQ>
> > [117621.179552] deliver_response+0x88/0xd0 [ipmi_msghandler]
> > [117621.179557] deliver_local_response+0xe/0x30 [ipmi_msghandler]
> > [117621.179561] handle_one_recv_msg+0x164/0xbf0 [ipmi_msghandler]
> > [117621.179568] ? try_to_wake_up+0x54/0x470
> > [117621.179575] ? ipmi_si_platform_shutdown+0x20/0x20 [ipmi_si]
> > [117621.236448] perf: interrupt took too long (16285 > 16223),
> > lowering kernel.perf_event_max_sample_rate to 12000
> > [117621.247534] ? kcs_event+0x17d/0x730 [ipmi_si]
> > [117621.426069] perf: interrupt took too long (20619 > 20356),
> > lowering kernel.perf_event_max_sample_rate to 9000
> > [117621.437773] handle_new_recv_msgs+0x16d/0x1e0 [ipmi_msghandler]
> > [117621.535276] tasklet_action_common.isra.21+0x4e/0xf0
> > [117621.535284] __do_softirq+0xd8/0x2d2
> > [117621.567383] irq_exit+0xb4/0xc0
> > [117621.567387] smp_apic_timer_interrupt+0x74/0x140
> > [117621.567390] apic_timer_interrupt+0xf/0x20
> > [117621.567392] </IRQ>
> > [117621.567397] RIP: 0010:finish_task_switch+0x78/0x260
> > [117621.567399] Code: 65 48 8b 1c 25 00 4d 01 00 0f 1f 44 00 00 0f 1f
> > 44 00 00 41 c7 46 38 00 00 00 00 41 c6 04 24 00 fb 65 48 8b 04 25 00
> > 4d 01 00 <0f> 1f 44 00 00 4d 85 ed 74 1a 41 8b 85 80 03 00 00
>
> This should all be fixed in the latest 4.19.y release, right?
>
> thanks,
>
> greg k-h