There are reports of kernels crashing due to stack overflow while
running OpenShift (Kubernetes). The primary contributor to the stack
usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
OVN (Open Virtual Network)), but NFS also contributes in some stack
traces.
There may be some opportunities to reduce stack usage in the openvswitch
code, but doing so potentially require tradeoffs vs performance, and
also requires testing across architectures.
Looking at stack usage across the kernel (using -fstack-usage), shows
that ppc64le stack frames are on average 50-100% larger than the
equivalent function built for x86-64. Which is not surprising given the
minimum stack frame size is 32 bytes on ppc64le vs 16 bytes on x86-64.
So increase the default stack size to 32KB for the modern 64-bit Book3S
platforms, ie. pseries (virtualised) and powernv (bare metal). That
leaves the older systems like G5s, and the AmigaOne (pasemi) with a 16KB
stack which should be sufficient on those machines.
Signed-off-by: Michael Ellerman <[email protected]>
---
arch/powerpc/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..2df545c1446e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -858,6 +858,7 @@ config THREAD_SHIFT
int "Thread shift" if EXPERT
range 13 15
default "15" if PPC_256K_PAGES
+ default "15" if PPC_PSERIES || PPC_POWERNV
default "14" if PPC64
default "13"
help
--
2.43.0
Michael Ellerman <[email protected]> writes:
> There are reports of kernels crashing due to stack overflow while
> running OpenShift (Kubernetes). The primary contributor to the stack
> usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
> OVN (Open Virtual Network)), but NFS also contributes in some stack
> traces.
For the archives here's an example trace.
This comes from the openshift CI:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-remote-libvirt-ppc64le/1703597644732960768
Which links through to the kdump.tar:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-remote-libvirt-ppc64le/1703597644732960768/artifacts/ocp-e2e-ovn-remote-libvirt-ppc64le/ipi-conf-debug-kdump-gather-logs/artifacts/kdump.tar
Which contains vmcore-dmesg.txt, which includes this trace:
[ 1805.324030] do_IRQ: stack overflow: 1808
[ 1805.324179] CPU: 0 PID: 263384 Comm: mount.nfs Kdump: loaded Not tainted 5.14.0-284.32.1.el9_2.ppc64le #1
[ 1805.324184] Call Trace:
[ 1805.324186] [c00000037d4806d0] [c0000000008427d0] dump_stack_lvl+0x74/0xa8 (unreliable)
[ 1805.324199] [c00000037d480710] [c000000000016bbc] __do_IRQ+0x11c/0x130
[ 1805.324205] [c00000037d4807a0] [c000000000016c10] do_IRQ+0x40/0xa0
[ 1805.324210] [c00000037d4807d0] [c000000000009080] hardware_interrupt_common_virt+0x210/0x220
[ 1805.324215] --- interrupt: 500 at slab_pre_alloc_hook.constprop.0+0x7c/0x340
[ 1805.324221] NIP: c0000000004feb3c LR: c0000000004feb24 CTR: c00000000092b770
[ 1805.324223] REGS: c00000037d480840 TRAP: 0500 Not tainted (5.14.0-284.32.1.el9_2.ppc64le)
[ 1805.324226] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24424442 XER: 00000000
[ 1805.324240] CFAR: c00000000045ef8c IRQMASK: 0
GPR00: c0000000004feb24 c00000037d480ae0 c000000002b12700 0000000000000000
GPR04: 0000000000000a20 c00000037d480b60 0000000000000001 0000000000000a20
GPR08: c00000000133ca80 0000000000000000 0000000000000028 0000000000004000
GPR12: c00000000092b770 c000000002ea0000 0000000000000000 0000000000000000
GPR16: 0000000000000005 0000000000000040 000000000000012e c0000000566930e0
GPR20: 0000000000000008 0000000000000000 c0000000566930e0 0000000000000000
GPR24: c00000000092bac4 c000000003010400 c00000037d480b60 0000000000000001
GPR28: 0000000000000000 0000000000000a20 0000000000000000 c000000003010400
[ 1805.324284] NIP [c0000000004feb3c] slab_pre_alloc_hook.constprop.0+0x7c/0x340
[ 1805.324288] LR [c0000000004feb24] slab_pre_alloc_hook.constprop.0+0x64/0x340
[ 1805.324291] --- interrupt: 500
[ 1805.324292] [c00000037d480ae0] [0000000000000000] 0x0 (unreliable)
[ 1805.324298] [c00000037d480b40] [c00000000050560c] __kmalloc+0x8c/0x5e0
[ 1805.324302] [c00000037d480bc0] [c00000000092bac4] virtqueue_add_outbuf+0x354/0xac0
[ 1805.324307] [c00000037d480cc0] [c0080000011b3a84] xmit_skb+0x1dc/0x350 [virtio_net]
[ 1805.324317] [c00000037d480d50] [c0080000011b3ccc] start_xmit+0xd4/0x3b0 [virtio_net]
[ 1805.324321] [c00000037d480e00] [c000000000c4baac] dev_hard_start_xmit+0x11c/0x280
[ 1805.324327] [c00000037d480e80] [c000000000cf1c8c] sch_direct_xmit+0xec/0x330
[ 1805.324332] [c00000037d480f20] [c000000000c4a03c] __dev_xmit_skb+0x41c/0xa80
[ 1805.324336] [c00000037d480f90] [c000000000c4c194] __dev_queue_xmit+0x414/0x950
[ 1805.324340] [c00000037d481070] [c008000002abdfdc] ovs_vport_send+0xb4/0x210 [openvswitch]
[ 1805.324351] [c00000037d4810f0] [c008000002aa14a4] do_output+0x7c/0x200 [openvswitch]
[ 1805.324359] [c00000037d481140] [c008000002aa33b0] do_execute_actions+0xe48/0xeb0 [openvswitch]
[ 1805.324366] [c00000037d481300] [c008000002aa3800] ovs_execute_actions+0x78/0x1f0 [openvswitch]
[ 1805.324373] [c00000037d481380] [c008000002aa970c] ovs_dp_process_packet+0xb4/0x2e0 [openvswitch]
[ 1805.324380] [c00000037d481450] [c008000002abde84] ovs_vport_receive+0x8c/0x130 [openvswitch]
[ 1805.324388] [c00000037d481660] [c008000002abe638] internal_dev_xmit+0x40/0xd0 [openvswitch]
[ 1805.324396] [c00000037d481690] [c000000000c4baac] dev_hard_start_xmit+0x11c/0x280
[ 1805.324401] [c00000037d481710] [c000000000c4c3b4] __dev_queue_xmit+0x634/0x950
[ 1805.324405] [c00000037d4817f0] [c000000000d50810] neigh_hh_output+0xd0/0x180
[ 1805.324410] [c00000037d481840] [c000000000d516ec] ip_finish_output2+0x31c/0x5c0
[ 1805.324415] [c00000037d4818e0] [c000000000d53f94] ip_local_out+0x64/0x90
[ 1805.324419] [c00000037d481920] [c000000000dd83e4] iptunnel_xmit+0x194/0x290
[ 1805.324423] [c00000037d4819c0] [c008000003160408] udp_tunnel_xmit_skb+0x100/0x140 [udp_tunnel]
[ 1805.324429] [c00000037d481a80] [c008000003203a54] geneve_xmit_skb+0x34c/0x610 [geneve]
[ 1805.324434] [c00000037d481bb0] [c00800000320596c] geneve_xmit+0x94/0x1e8 [geneve]
[ 1805.324438] [c00000037d481c30] [c000000000c4baac] dev_hard_start_xmit+0x11c/0x280
[ 1805.324442] [c00000037d481cb0] [c000000000c4c3b4] __dev_queue_xmit+0x634/0x950
[ 1805.324446] [c00000037d481d90] [c008000002abdfdc] ovs_vport_send+0xb4/0x210 [openvswitch]
[ 1805.324454] [c00000037d481e10] [c008000002aa14a4] do_output+0x7c/0x200 [openvswitch]
[ 1805.324461] [c00000037d481e60] [c008000002aa33b0] do_execute_actions+0xe48/0xeb0 [openvswitch]
[ 1805.324468] [c00000037d482020] [c008000002aa3800] ovs_execute_actions+0x78/0x1f0 [openvswitch]
[ 1805.324475] [c00000037d4820a0] [c008000002aa970c] ovs_dp_process_packet+0xb4/0x2e0 [openvswitch]
[ 1805.324482] [c00000037d482170] [c008000002aa36e0] clone_execute+0x2c8/0x370 [openvswitch]
[ 1805.324489] [c00000037d482210] [c008000002aa2a20] do_execute_actions+0x4b8/0xeb0 [openvswitch]
[ 1805.324495] [c00000037d4823d0] [c008000002aa3800] ovs_execute_actions+0x78/0x1f0 [openvswitch]
[ 1805.324502] [c00000037d482450] [c008000002aa970c] ovs_dp_process_packet+0xb4/0x2e0 [openvswitch]
[ 1805.324509] [c00000037d482520] [c008000002abde84] ovs_vport_receive+0x8c/0x130 [openvswitch]
[ 1805.324516] [c00000037d482730] [c008000002abe638] internal_dev_xmit+0x40/0xd0 [openvswitch]
[ 1805.324524] [c00000037d482760] [c000000000c4baac] dev_hard_start_xmit+0x11c/0x280
[ 1805.324528] [c00000037d4827e0] [c000000000c4c3b4] __dev_queue_xmit+0x634/0x950
[ 1805.324532] [c00000037d4828c0] [c000000000d50810] neigh_hh_output+0xd0/0x180
[ 1805.324536] [c00000037d482910] [c000000000d516ec] ip_finish_output2+0x31c/0x5c0
[ 1805.324541] [c00000037d4829b0] [c000000000d54440] __ip_queue_xmit+0x1b0/0x4f0
[ 1805.324545] [c00000037d482a40] [c000000000d821e0] __tcp_transmit_skb+0x450/0x9a0
[ 1805.324549] [c00000037d482b10] [c000000000d84230] tcp_write_xmit+0x4e0/0xb40
[ 1805.324553] [c00000037d482be0] [c000000000d848d4] __tcp_push_pending_frames+0x44/0x130
[ 1805.324557] [c00000037d482c50] [c000000000d63aac] __tcp_sock_set_cork.part.0+0x8c/0xb0
[ 1805.324561] [c00000037d482c80] [c000000000d63b48] tcp_sock_set_cork+0x78/0xa0
[ 1805.324565] [c00000037d482cb0] [c0080000061b2acc] xs_tcp_send_request+0x2d4/0x430 [sunrpc]
[ 1805.324594] [c00000037d482e50] [c0080000061ab120] xprt_request_transmit.constprop.0+0xa8/0x3c0 [sunrpc]
[ 1805.324619] [c00000037d482eb0] [c0080000061acc74] xprt_transmit+0x12c/0x260 [sunrpc]
[ 1805.324644] [c00000037d482f20] [c0080000061a1de8] call_transmit+0xd0/0x100 [sunrpc]
[ 1805.324667] [c00000037d482f50] [c0080000061c8dc4] __rpc_execute+0xec/0x570 [sunrpc]
[ 1805.324696] [c00000037d482fd0] [c0080000061d00e0] rpc_execute+0x168/0x1d0 [sunrpc]
[ 1805.324725] [c00000037d483010] [c0080000061a4a74] rpc_run_task+0x1cc/0x2a0 [sunrpc]
[ 1805.324754] [c00000037d483070] [c008000006013970] nfs4_call_sync_sequence+0x98/0x100 [nfsv4]
[ 1805.324811] [c00000037d483120] [c008000006013dec] _nfs4_server_capabilities+0xd4/0x3c0 [nfsv4]
[ 1805.324832] [c00000037d483210] [c00800000602036c] nfs4_server_capabilities+0x74/0xd0 [nfsv4]
[ 1805.324854] [c00000037d483270] [c008000006020404] nfs4_proc_get_root+0x3c/0x150 [nfsv4]
[ 1805.324876] [c00000037d4832f0] [c0080000062bee54] nfs_get_root+0xac/0x660 [nfs]
[ 1805.324907] [c00000037d483420] [c0080000062c7ccc] nfs_get_tree_common+0x104/0x5f0 [nfs]
[ 1805.324946] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1805.325103] CPU: 0 PID: 263384 Comm: mount.nfs Kdump: loaded Not tainted 5.14.0-284.32.1.el9_2.ppc64le #1
[ 1805.325316] Call Trace:
[ 1805.325368] [c00000037d482c50] [c0000000008427d0] dump_stack_lvl+0x74/0xa8 (unreliable)
[ 1805.325549] [c00000037d482c90] [c0000000001492b4] panic+0x160/0x3ec
[ 1805.325706] [c00000037d482d30] [c000000000efce90] __schedule+0x710/0x720
[ 1805.325838] [c00000037d482e00] [c000000000efcf7c] schedule+0x3c/0xa0
[ 1805.325978] [c00000037d482e30] [c0080000061c4f84] rpc_wait_bit_killable+0x3c/0x110 [sunrpc]
[ 1805.326185] [c00000037d482e60] [c000000000efd664] __wait_on_bit+0xd4/0x210
[ 1805.326325] [c00000037d482ee0] [c000000000efd840] out_of_line_wait_on_bit+0xa0/0xd0
[ 1805.326502] [c00000037d482f50] [c0080000061c8e54] __rpc_execute+0x17c/0x570 [sunrpc]
[ 1805.326751] [c00000037d482fd0] [c0080000061d00e0] rpc_execute+0x168/0x1d0 [sunrpc]
[ 1805.326936] [c00000037d483010] [c0080000061a4a74] rpc_run_task+0x1cc/0x2a0 [sunrpc]
[ 1805.327120] [c00000037d483070] [c008000006013970] nfs4_call_sync_sequence+0x98/0x100 [nfsv4]
[ 1805.327346] [c00000037d483120] [c008000006013dec] _nfs4_server_capabilities+0xd4/0x3c0 [nfsv4]
[ 1805.327548] [c00000037d483210] [c00800000602036c] nfs4_server_capabilities+0x74/0xd0 [nfsv4]
[ 1805.327747] [c00000037d483270] [c008000006020404] nfs4_proc_get_root+0x3c/0x150 [nfsv4]
[ 1805.327972] [c00000037d4832f0] [c0080000062bee54] nfs_get_root+0xac/0x660 [nfs]
[ 1805.328174] [c00000037d483420] [c0080000062c7ccc] nfs_get_tree_common+0x104/0x5f0 [nfs]
[ 1805.328366] [c00000037d4834b0] [c0080000062ec6f8] nfs_get_tree+0x90/0xc0 [nfs]
[ 1805.328556] [c00000037d4834e0] [c00000000056cd38] vfs_get_tree+0x48/0x160
[ 1805.328715] [c00000037d483560] [c0080000062d8b68] nfs_do_submount+0x170/0x210 [nfs]
[ 1805.328911] [c00000037d483600] [c008000006055b58] nfs4_submount+0x250/0x360 [nfsv4]
[ 1805.329115] [c00000037d4836b0] [c0080000062d8eac] nfs_d_automount+0x194/0x2d0 [nfs]
[ 1805.329303] [c00000037d483710] [c00000000057c7f4] __traverse_mounts+0x114/0x330
[ 1805.329459] [c00000037d483770] [c000000000583d54] step_into+0x364/0x4d0
[ 1805.329581] [c00000037d4837f0] [c00000000058465c] walk_component+0x8c/0x300
[ 1805.329700] [c00000037d483870] [c000000000585868] path_lookupat+0xa8/0x260
[ 1805.329819] [c00000037d4838c0] [c000000000586ab8] filename_lookup+0xc8/0x230
[ 1805.329962] [c00000037d483a00] [c000000000586d18] vfs_path_lookup+0x68/0xc0
[ 1805.330093] [c00000037d483a60] [c0000000005b0760] mount_subtree+0xd0/0x1e0
[ 1805.330214] [c00000037d483ad0] [c0080000060496b8] do_nfs4_mount+0x280/0x520 [nfsv4]
[ 1805.330370] [c00000037d483ba0] [c0080000060499b8] nfs4_try_get_tree+0x60/0x140 [nfsv4]
[ 1805.330526] [c00000037d483c20] [c0080000062ec6c8] nfs_get_tree+0x60/0xc0 [nfs]
[ 1805.330681] [c00000037d483c50] [c00000000056cd38] vfs_get_tree+0x48/0x160
[ 1805.330821] [c00000037d483cd0] [c0000000005ae154] do_new_mount+0x204/0x3c0
[ 1805.330972] [c00000037d483d40] [c0000000005af8f8] sys_mount+0x168/0x1c0
[ 1805.331086] [c00000037d483db0] [c00000000002f544] system_call_exception+0x164/0x310
[ 1805.331227] [c00000037d483e10] [c00000000000bfe8] system_call_vectored_common+0xe8/0x278
[ 1805.331367] --- interrupt: 3000 at 0x7fffb235f4d0
cheers
On Fri, 15 Dec, 2023 23:44:49 +1100 Michael Ellerman <[email protected]> wrote:
> There are reports of kernels crashing due to stack overflow while
> running OpenShift (Kubernetes). The primary contributor to the stack
> usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
> OVN (Open Virtual Network)), but NFS also contributes in some stack
> traces.
>
> There may be some opportunities to reduce stack usage in the openvswitch
> code, but doing so potentially require tradeoffs vs performance, and
> also requires testing across architectures.
>
> Looking at stack usage across the kernel (using -fstack-usage), shows
> that ppc64le stack frames are on average 50-100% larger than the
> equivalent function built for x86-64. Which is not surprising given the
> minimum stack frame size is 32 bytes on ppc64le vs 16 bytes on x86-64.
>
> So increase the default stack size to 32KB for the modern 64-bit Book3S
> platforms, ie. pseries (virtualised) and powernv (bare metal). That
> leaves the older systems like G5s, and the AmigaOne (pasemi) with a 16KB
> stack which should be sufficient on those machines.
>
> Signed-off-by: Michael Ellerman <[email protected]>
> ---
We noticed this change is causing assembler issues for us when building
the kernel.
make ARCH=powerpc KERNELRELEASE=6.8.0-rc2_for_upstream_debug_2024_02_06_20_01 KBUILD_BUILD_VERSION=1
arch/powerpc/kernel/switch.S: Assembler messages:
arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)
make[6]: *** [scripts/Makefile.build:361: arch/powerpc/kernel/switch.o] Error 1
make[5]: *** [scripts/Makefile.build:481: arch/powerpc/kernel] Error 2
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [scripts/Makefile.build:481: arch/powerpc] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [Makefile:1921: .] Error 2
The issue lies with the do_switch_64 macro.
--
Thanks,
Rahul Rameshbabu
Rahul Rameshbabu <[email protected]> writes:
> On Fri, 15 Dec, 2023 23:44:49 +1100 Michael Ellerman <[email protected]> wrote:
>> There are reports of kernels crashing due to stack overflow while
>> running OpenShift (Kubernetes). The primary contributor to the stack
>> usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
>> OVN (Open Virtual Network)), but NFS also contributes in some stack
>> traces.
>>
>> There may be some opportunities to reduce stack usage in the openvswitch
>> code, but doing so potentially require tradeoffs vs performance, and
>> also requires testing across architectures.
>>
>> Looking at stack usage across the kernel (using -fstack-usage), shows
>> that ppc64le stack frames are on average 50-100% larger than the
>> equivalent function built for x86-64. Which is not surprising given the
>> minimum stack frame size is 32 bytes on ppc64le vs 16 bytes on x86-64.
>>
>> So increase the default stack size to 32KB for the modern 64-bit Book3S
>> platforms, ie. pseries (virtualised) and powernv (bare metal). That
>> leaves the older systems like G5s, and the AmigaOne (pasemi) with a 16KB
>> stack which should be sufficient on those machines.
>>
>> Signed-off-by: Michael Ellerman <[email protected]>
>> ---
>
> We noticed this change is causing assembler issues for us when building
> the kernel.
>
> make ARCH=powerpc KERNELRELEASE=6.8.0-rc2_for_upstream_debug_2024_02_06_20_01 KBUILD_BUILD_VERSION=1
> arch/powerpc/kernel/switch.S: Assembler messages:
> arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)
> make[6]: *** [scripts/Makefile.build:361: arch/powerpc/kernel/switch.o] Error 1
> make[5]: *** [scripts/Makefile.build:481: arch/powerpc/kernel] Error 2
> make[5]: *** Waiting for unfinished jobs....
> make[4]: *** [scripts/Makefile.build:481: arch/powerpc] Error 2
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [Makefile:1921: .] Error 2
There's a fix in my fixes branch:
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=f1acb109505d983779bbb7e20a1ee6244d2b5736
I'll send it to Linus this week.
cheers