2013-09-04 21:40:29

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: do_softirq() vs __do_softirq() in irq_exit() and stack overflow

Hi Folks !

It appears that the current version of irq_exit() calls __do_softirq()
directly rather than do_softirq().

That means we are going to call the softirq's in the current interrupt
frame rather than on the separate softirq stack.

The current frame is also still the normal kernel stack, because
do_IRQ() itself only switches to the interrupt stack for processing
the handlers (it's back to the original stack by the time it calls
irq_exit).

That means that we end up stacking the normal stack, the actually HW
interrupt stack frame (which can be pretty big on ppc) + do_IRQ's own,
then the softirq (networks stack can create HUGE stack frames) and ...
we are in softirq, so HW irqs are enable, we can thus can another irq
stack frame piled up on top of that (or a perf stack).

We are observing actual overflows, here's an example blowing up our 16k
stack on ppc64, you notice that it's all on the normal kernel stack:

[ 1002.364577] do_IRQ: stack overflow: 1920
[ 1002.364653] CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
[ 1002.364734] Call Trace:
[ 1002.364770] [c0000000050a8740] [c0000000000157c0] .show_stack+0x130/0x200 (unreliable)
[ 1002.364872] [c0000000050a8810] [c00000000082f6d0] .dump_stack+0x28/0x3c
[ 1002.364955] [c0000000050a8880] [c000000000010b98] .do_IRQ+0x2b8/0x2c0
[ 1002.365039] [c0000000050a8930] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
[ 1002.365148] --- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
[ 1002.365148] LR = .cp_start_xmit+0x390/0x820 [8139cp]
[ 1002.365359] [c0000000050a8d40] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
[ 1002.365433] [c0000000050a8e00] [c0000000007028c0] .sch_direct_xmit+0x110/0x260
[ 1002.365499] [c0000000050a8ea0] [c0000000006d8420] .dev_queue_xmit+0x260/0x630
[ 1002.365571] [c0000000050a8f40] [d0000000027d30d4] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
[ 1002.365641] [c0000000050a8fc0] [d0000000027d01f8] .br_dev_xmit+0x198/0x270 [bridge]
[ 1002.365707] [c0000000050a9070] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
[ 1002.365774] [c0000000050a9130] [c0000000006d85e8] .dev_queue_xmit+0x428/0x630
[ 1002.365834] [c0000000050a91d0] [c000000000729764] .ip_finish_output+0x2a4/0x550
[ 1002.365902] [c0000000050a9290] [c00000000072aaf0] .ip_local_out+0x50/0x70
[ 1002.365960] [c0000000050a9310] [c00000000072aed8] .ip_queue_xmit+0x148/0x420
[ 1002.366018] [c0000000050a93b0] [c000000000749524] .tcp_transmit_skb+0x4e4/0xaf0
[ 1002.366085] [c0000000050a94a0] [c00000000073de9c] .__tcp_ack_snd_check+0x7c/0xf0
[ 1002.366152] [c0000000050a9520] [c0000000007451d8] .tcp_rcv_established+0x1e8/0x930
[ 1002.366217] [c0000000050a95f0] [c00000000075326c] .tcp_v4_do_rcv+0x21c/0x570
[ 1002.366274] [c0000000050a96c0] [c000000000754a44] .tcp_v4_rcv+0x734/0x930
[ 1002.366332] [c0000000050a97a0] [c000000000724144] .ip_local_deliver_finish+0x184/0x360
[ 1002.366398] [c0000000050a9840] [c000000000724468] .ip_rcv_finish+0x148/0x400
[ 1002.366457] [c0000000050a98d0] [c0000000006d3248] .__netif_receive_skb_core+0x4f8/0xb00
[ 1002.366523] [c0000000050a99d0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
[ 1002.366594] [c0000000050a9a70] [d0000000027d4e2c] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
[ 1002.366674] [c0000000050a9b20] [d0000000027de5ac] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
[ 1002.366754] [c0000000050a9bd0] [d0000000027df5ec] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
[ 1002.366820] [c0000000050a9c70] [c000000000717aa4] .nf_iterate+0x114/0x130
[ 1002.366877] [c0000000050a9d30] [c000000000717b74] .nf_hook_slow+0xb4/0x1e0
[ 1002.366938] [c0000000050a9e00] [d0000000027d51f0] .br_handle_frame+0x290/0x330 [bridge]
[ 1002.367005] [c0000000050a9ea0] [c0000000006d309c] .__netif_receive_skb_core+0x34c/0xb00
[ 1002.367072] [c0000000050a9fa0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
[ 1002.367138] [c0000000050aa040] [c0000000006d6218] .napi_gro_receive+0xe8/0x120
[ 1002.367210] [c0000000050aa0c0] [d00000000208536c] .cp_rx_poll+0x31c/0x590 [8139cp]
[ 1002.367276] [c0000000050aa1d0] [c0000000006d59cc] .net_rx_action+0x1dc/0x310
[ 1002.367335] [c0000000050aa2b0] [c0000000000a0568] .__do_softirq+0x158/0x330
[ 1002.367394] [c0000000050aa3b0] [c0000000000a0978] .irq_exit+0xc8/0x110
[ 1002.367452] [c0000000050aa430] [c0000000000109bc] .do_IRQ+0xdc/0x2c0
[ 1002.367510] [c0000000050aa4e0] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
[ 1002.367580] --- Exception: 501 at .bad_range+0x1c/0x110
[ 1002.367580] LR = .get_page_from_freelist+0x908/0xbb0
[ 1002.367658] [c0000000050aa7d0] [c00000000041d758] .list_del+0x18/0x50 (unreliable)
[ 1002.367725] [c0000000050aa850] [c0000000001bfa98] .get_page_from_freelist+0x908/0xbb0
[ 1002.367792] [c0000000050aa9e0] [c0000000001bff5c] .__alloc_pages_nodemask+0x21c/0xae0
[ 1002.367860] [c0000000050aaba0] [c0000000002126d0] .alloc_pages_vma+0xd0/0x210
[ 1002.367918] [c0000000050aac60] [c0000000001e93f4] .handle_pte_fault+0x814/0xb70
[ 1002.367985] [c0000000050aad50] [c0000000001eade4] .__get_user_pages+0x1a4/0x640
[ 1002.368052] [c0000000050aae60] [c00000000004606c] .get_user_pages_fast+0xec/0x160
[ 1002.368130] [c0000000050aaf10] [d000000001f73930] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
[ 1002.368205] [c0000000050aafd0] [d000000001f7e214] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
[ 1002.368280] [c0000000050ab070] [d000000001f8a824] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
[ 1002.368354] [c0000000050ab190] [d000000001f85064] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
[ 1002.368429] [c0000000050ab270] [d000000001f85b74] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
[ 1002.368504] [c0000000050ab320] [d000000001f88ec4] kvm_start_lightweight+0x1ec/0x1fc [kvm]
[ 1002.368578] [c0000000050ab4f0] [d000000001f86a58] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
[ 1002.368652] [c0000000050ab9c0] [d000000001f7f218] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
[ 1002.368725] [c0000000050aba50] [d000000001f7bdac] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
[ 1002.368797] [c0000000050abae0] [d000000001f74618] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
[ 1002.368865] [c0000000050abc90] [c00000000025302c] .do_vfs_ioctl+0x4ec/0x7c0
[ 1002.368923] [c0000000050abd80] [c0000000002533d4] .SyS_ioctl+0xd4/0xf0
[ 1002.368981] [c0000000050abe30] [c000000000009ed4] syscall_exit+0x0/0x98
[ 1002.369117] ------------[ cut here ]------------

Cheers,
Ben.


2013-09-05 00:27:36

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: do_softirq() vs __do_softirq() in irq_exit() and stack overflow

On Thu, 2013-09-05 at 07:39 +1000, Benjamin Herrenschmidt wrote:

> That means that we end up stacking the normal stack, the actually HW
> interrupt stack frame (which can be pretty big on ppc) + do_IRQ's own,
> then the softirq (networks stack can create HUGE stack frames) and ...
> we are in softirq, so HW irqs are enable, we can thus can another irq
> stack frame piled up on top of that (or a perf stack).

Omg ... obviously I wasn't fully woken up when I wrote the above :-)

I assume you still understood the gist of it.

Cheers,
Ben.

> We are observing actual overflows, here's an example blowing up our 16k
> stack on ppc64, you notice that it's all on the normal kernel stack:
>
> [ 1002.364577] do_IRQ: stack overflow: 1920
> [ 1002.364653] CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
> [ 1002.364734] Call Trace:
> [ 1002.364770] [c0000000050a8740] [c0000000000157c0] .show_stack+0x130/0x200 (unreliable)
> [ 1002.364872] [c0000000050a8810] [c00000000082f6d0] .dump_stack+0x28/0x3c
> [ 1002.364955] [c0000000050a8880] [c000000000010b98] .do_IRQ+0x2b8/0x2c0
> [ 1002.365039] [c0000000050a8930] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
> [ 1002.365148] --- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
> [ 1002.365148] LR = .cp_start_xmit+0x390/0x820 [8139cp]
> [ 1002.365359] [c0000000050a8d40] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
> [ 1002.365433] [c0000000050a8e00] [c0000000007028c0] .sch_direct_xmit+0x110/0x260
> [ 1002.365499] [c0000000050a8ea0] [c0000000006d8420] .dev_queue_xmit+0x260/0x630
> [ 1002.365571] [c0000000050a8f40] [d0000000027d30d4] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
> [ 1002.365641] [c0000000050a8fc0] [d0000000027d01f8] .br_dev_xmit+0x198/0x270 [bridge]
> [ 1002.365707] [c0000000050a9070] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
> [ 1002.365774] [c0000000050a9130] [c0000000006d85e8] .dev_queue_xmit+0x428/0x630
> [ 1002.365834] [c0000000050a91d0] [c000000000729764] .ip_finish_output+0x2a4/0x550
> [ 1002.365902] [c0000000050a9290] [c00000000072aaf0] .ip_local_out+0x50/0x70
> [ 1002.365960] [c0000000050a9310] [c00000000072aed8] .ip_queue_xmit+0x148/0x420
> [ 1002.366018] [c0000000050a93b0] [c000000000749524] .tcp_transmit_skb+0x4e4/0xaf0
> [ 1002.366085] [c0000000050a94a0] [c00000000073de9c] .__tcp_ack_snd_check+0x7c/0xf0
> [ 1002.366152] [c0000000050a9520] [c0000000007451d8] .tcp_rcv_established+0x1e8/0x930
> [ 1002.366217] [c0000000050a95f0] [c00000000075326c] .tcp_v4_do_rcv+0x21c/0x570
> [ 1002.366274] [c0000000050a96c0] [c000000000754a44] .tcp_v4_rcv+0x734/0x930
> [ 1002.366332] [c0000000050a97a0] [c000000000724144] .ip_local_deliver_finish+0x184/0x360
> [ 1002.366398] [c0000000050a9840] [c000000000724468] .ip_rcv_finish+0x148/0x400
> [ 1002.366457] [c0000000050a98d0] [c0000000006d3248] .__netif_receive_skb_core+0x4f8/0xb00
> [ 1002.366523] [c0000000050a99d0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
> [ 1002.366594] [c0000000050a9a70] [d0000000027d4e2c] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
> [ 1002.366674] [c0000000050a9b20] [d0000000027de5ac] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
> [ 1002.366754] [c0000000050a9bd0] [d0000000027df5ec] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
> [ 1002.366820] [c0000000050a9c70] [c000000000717aa4] .nf_iterate+0x114/0x130
> [ 1002.366877] [c0000000050a9d30] [c000000000717b74] .nf_hook_slow+0xb4/0x1e0
> [ 1002.366938] [c0000000050a9e00] [d0000000027d51f0] .br_handle_frame+0x290/0x330 [bridge]
> [ 1002.367005] [c0000000050a9ea0] [c0000000006d309c] .__netif_receive_skb_core+0x34c/0xb00
> [ 1002.367072] [c0000000050a9fa0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
> [ 1002.367138] [c0000000050aa040] [c0000000006d6218] .napi_gro_receive+0xe8/0x120
> [ 1002.367210] [c0000000050aa0c0] [d00000000208536c] .cp_rx_poll+0x31c/0x590 [8139cp]
> [ 1002.367276] [c0000000050aa1d0] [c0000000006d59cc] .net_rx_action+0x1dc/0x310
> [ 1002.367335] [c0000000050aa2b0] [c0000000000a0568] .__do_softirq+0x158/0x330
> [ 1002.367394] [c0000000050aa3b0] [c0000000000a0978] .irq_exit+0xc8/0x110
> [ 1002.367452] [c0000000050aa430] [c0000000000109bc] .do_IRQ+0xdc/0x2c0
> [ 1002.367510] [c0000000050aa4e0] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
> [ 1002.367580] --- Exception: 501 at .bad_range+0x1c/0x110
> [ 1002.367580] LR = .get_page_from_freelist+0x908/0xbb0
> [ 1002.367658] [c0000000050aa7d0] [c00000000041d758] .list_del+0x18/0x50 (unreliable)
> [ 1002.367725] [c0000000050aa850] [c0000000001bfa98] .get_page_from_freelist+0x908/0xbb0
> [ 1002.367792] [c0000000050aa9e0] [c0000000001bff5c] .__alloc_pages_nodemask+0x21c/0xae0
> [ 1002.367860] [c0000000050aaba0] [c0000000002126d0] .alloc_pages_vma+0xd0/0x210
> [ 1002.367918] [c0000000050aac60] [c0000000001e93f4] .handle_pte_fault+0x814/0xb70
> [ 1002.367985] [c0000000050aad50] [c0000000001eade4] .__get_user_pages+0x1a4/0x640
> [ 1002.368052] [c0000000050aae60] [c00000000004606c] .get_user_pages_fast+0xec/0x160
> [ 1002.368130] [c0000000050aaf10] [d000000001f73930] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
> [ 1002.368205] [c0000000050aafd0] [d000000001f7e214] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
> [ 1002.368280] [c0000000050ab070] [d000000001f8a824] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
> [ 1002.368354] [c0000000050ab190] [d000000001f85064] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
> [ 1002.368429] [c0000000050ab270] [d000000001f85b74] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
> [ 1002.368504] [c0000000050ab320] [d000000001f88ec4] kvm_start_lightweight+0x1ec/0x1fc [kvm]
> [ 1002.368578] [c0000000050ab4f0] [d000000001f86a58] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
> [ 1002.368652] [c0000000050ab9c0] [d000000001f7f218] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
> [ 1002.368725] [c0000000050aba50] [d000000001f7bdac] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
> [ 1002.368797] [c0000000050abae0] [d000000001f74618] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
> [ 1002.368865] [c0000000050abc90] [c00000000025302c] .do_vfs_ioctl+0x4ec/0x7c0
> [ 1002.368923] [c0000000050abd80] [c0000000002533d4] .SyS_ioctl+0xd4/0xf0
> [ 1002.368981] [c0000000050abe30] [c000000000009ed4] syscall_exit+0x0/0x98
> [ 1002.369117] ------------[ cut here ]------------
>
> Cheers,
> Ben.
>

2013-09-05 13:29:20

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: do_softirq() vs __do_softirq() in irq_exit() and stack overflow

On Thu, Sep 05, 2013 at 07:39:56AM +1000, Benjamin Herrenschmidt wrote:
> Hi Folks !
>
> It appears that the current version of irq_exit() calls __do_softirq()
> directly rather than do_softirq().
>
> That means we are going to call the softirq's in the current interrupt
> frame rather than on the separate softirq stack.
>
> The current frame is also still the normal kernel stack, because
> do_IRQ() itself only switches to the interrupt stack for processing
> the handlers (it's back to the original stack by the time it calls
> irq_exit).
>
> That means that we end up stacking the normal stack, the actually HW
> interrupt stack frame (which can be pretty big on ppc) + do_IRQ's own,
> then the softirq (networks stack can create HUGE stack frames) and ...
> we are in softirq, so HW irqs are enable, we can thus can another irq
> stack frame piled up on top of that (or a perf stack).
>
> We are observing actual overflows, here's an example blowing up our 16k
> stack on ppc64, you notice that it's all on the normal kernel stack:

I see, __do_softirq() is sometimes called to avoid irqsafe and softirq_pending
check they are not necessary but OTOH this bypass the arch overriden handler.

I'm going to try something and post soon.

Thanks.

2013-09-05 15:33:32

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 0/3] irq: Fix stack overflow due to softirq called on current stack

Hi,

This series is a proposition to fix the crash reported here: http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop
And it has the upside to also consolidate a bit the arch do_softirq overriden implementation.

Only tested in x86-64 for now.

Thanks.

Frederic Weisbecker (3):
irq: Consolidate do_softirq() arch overriden implementations
irq: Execute softirq on its own stack on irq exit
irq: Comment on the use of inline stack for ksoftirq

arch/metag/kernel/irq.c | 56 +++++++++++++++++------------------------
arch/parisc/kernel/irq.c | 17 +-----------
arch/powerpc/kernel/irq.c | 17 +-----------
arch/s390/kernel/irq.c | 52 +++++++++++++++----------------------
arch/sh/kernel/irq.c | 60 ++++++++++++++++++-------------------------
arch/sparc/kernel/irq_64.c | 31 +++++++---------------
arch/x86/kernel/irq_32.c | 34 +++++++++----------------
arch/x86/kernel/irq_64.c | 18 ++-----------
include/linux/interrupt.h | 11 ++++++++
kernel/softirq.c | 10 +++----
10 files changed, 112 insertions(+), 194 deletions(-)

--
1.7.5.4

2013-09-05 15:33:36

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/3] irq: Consolidate do_softirq() arch overriden implementations

All arch's overriden implementation of do_softirq() do the same:
disabled irqs, check if there are softirqs pending, then execute
__do_softirq() it a specific stack.

Consolidate the common parts such that arch only worry about the
stack switch.

Reported-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/metag/kernel/irq.c | 56 +++++++++++++++++------------------------
arch/parisc/kernel/irq.c | 17 +-----------
arch/powerpc/kernel/irq.c | 17 +-----------
arch/s390/kernel/irq.c | 52 +++++++++++++++----------------------
arch/sh/kernel/irq.c | 60 ++++++++++++++++++-------------------------
arch/sparc/kernel/irq_64.c | 31 +++++++---------------
arch/x86/kernel/irq_32.c | 34 +++++++++----------------
arch/x86/kernel/irq_64.c | 18 ++-----------
include/linux/interrupt.h | 11 ++++++++
kernel/softirq.c | 7 +---
10 files changed, 110 insertions(+), 193 deletions(-)

diff --git a/arch/metag/kernel/irq.c b/arch/metag/kernel/irq.c
index 2a2c9d5..12a12b4 100644
--- a/arch/metag/kernel/irq.c
+++ b/arch/metag/kernel/irq.c
@@ -159,44 +159,34 @@ void irq_ctx_exit(int cpu)

extern asmlinkage void __do_softirq(void);

-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- unsigned long flags;
struct thread_info *curctx;
union irq_ctx *irqctx;
u32 *isp;

- if (in_interrupt())
- return;
-
- local_irq_save(flags);
-
- if (local_softirq_pending()) {
- curctx = current_thread_info();
- irqctx = softirq_ctx[smp_processor_id()];
- irqctx->tinfo.task = curctx->task;
-
- /* build the stack frame on the softirq stack */
- isp = (u32 *) ((char *)irqctx + sizeof(struct thread_info));
-
- asm volatile (
- "MOV D0.5,%0\n"
- "SWAP A0StP,D0.5\n"
- "CALLR D1RtP,___do_softirq\n"
- "MOV A0StP,D0.5\n"
- :
- : "r" (isp)
- : "memory", "cc", "D1Ar1", "D0Ar2", "D1Ar3", "D0Ar4",
- "D1Ar5", "D0Ar6", "D0Re0", "D1Re0", "D0.4", "D1RtP",
- "D0.5"
- );
- /*
- * Shouldn't happen, we returned above if in_interrupt():
- */
- WARN_ON_ONCE(softirq_count());
- }
-
- local_irq_restore(flags);
+ curctx = current_thread_info();
+ irqctx = softirq_ctx[smp_processor_id()];
+ irqctx->tinfo.task = curctx->task;
+
+ /* build the stack frame on the softirq stack */
+ isp = (u32 *) ((char *)irqctx + sizeof(struct thread_info));
+
+ asm volatile (
+ "MOV D0.5,%0\n"
+ "SWAP A0StP,D0.5\n"
+ "CALLR D1RtP,___do_softirq\n"
+ "MOV A0StP,D0.5\n"
+ :
+ : "r" (isp)
+ : "memory", "cc", "D1Ar1", "D0Ar2", "D1Ar3", "D0Ar4",
+ "D1Ar5", "D0Ar6", "D0Re0", "D1Re0", "D0.4", "D1RtP",
+ "D0.5"
+ );
+ /*
+ * Shouldn't happen, we returned above if in_interrupt():
+ */
+ WARN_ON_ONCE(softirq_count());
}
#endif

diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index 2e6443b..ef59276 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -499,22 +499,9 @@ static void execute_on_irq_stack(void *func, unsigned long param1)
*irq_stack_in_use = 1;
}

-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- __u32 pending;
- unsigned long flags;
-
- if (in_interrupt())
- return;
-
- local_irq_save(flags);
-
- pending = local_softirq_pending();
-
- if (pending)
- execute_on_irq_stack(__do_softirq, 0);
-
- local_irq_restore(flags);
+ execute_on_irq_stack(__do_softirq, 0);
}
#endif /* CONFIG_IRQSTACKS */

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index c69440c..7d0da88 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -601,7 +601,7 @@ void irq_ctx_init(void)
}
}

-static inline void do_softirq_onstack(void)
+void do_softirq_own_stack(void)
{
struct thread_info *curtp, *irqtp;
unsigned long saved_sp_limit = current->thread.ksp_limit;
@@ -623,21 +623,6 @@ static inline void do_softirq_onstack(void)
set_bits(irqtp->flags, &curtp->flags);
}

-void do_softirq(void)
-{
- unsigned long flags;
-
- if (in_interrupt())
- return;
-
- local_irq_save(flags);
-
- if (local_softirq_pending())
- do_softirq_onstack();
-
- local_irq_restore(flags);
-}
-
irq_hw_number_t virq_to_hw(unsigned int virq)
{
struct irq_data *irq_data = irq_get_irq_data(virq);
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 54b0995..c289daa 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -124,39 +124,29 @@ skip_arch_irqs:
/*
* Switch to the asynchronous interrupt stack for softirq execution.
*/
-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- unsigned long flags, old, new;
-
- if (in_interrupt())
- return;
-
- local_irq_save(flags);
-
- if (local_softirq_pending()) {
- /* Get current stack pointer. */
- asm volatile("la %0,0(15)" : "=a" (old));
- /* Check against async. stack address range. */
- new = S390_lowcore.async_stack;
- if (((new - old) >> (PAGE_SHIFT + THREAD_ORDER)) != 0) {
- /* Need to switch to the async. stack. */
- new -= STACK_FRAME_OVERHEAD;
- ((struct stack_frame *) new)->back_chain = old;
-
- asm volatile(" la 15,0(%0)\n"
- " basr 14,%2\n"
- " la 15,0(%1)\n"
- : : "a" (new), "a" (old),
- "a" (__do_softirq)
- : "0", "1", "2", "3", "4", "5", "14",
- "cc", "memory" );
- } else {
- /* We are already on the async stack. */
- __do_softirq();
- }
+ unsigned long old, new;
+
+ /* Get current stack pointer. */
+ asm volatile("la %0,0(15)" : "=a" (old));
+ /* Check against async. stack address range. */
+ new = S390_lowcore.async_stack;
+ if (((new - old) >> (PAGE_SHIFT + THREAD_ORDER)) != 0) {
+ /* Need to switch to the async. stack. */
+ new -= STACK_FRAME_OVERHEAD;
+ ((struct stack_frame *) new)->back_chain = old;
+ asm volatile(" la 15,0(%0)\n"
+ " basr 14,%2\n"
+ " la 15,0(%1)\n"
+ : : "a" (new), "a" (old),
+ "a" (__do_softirq)
+ : "0", "1", "2", "3", "4", "5", "14",
+ "cc", "memory" );
+ } else {
+ /* We are already on the async stack. */
+ __do_softirq();
}
-
- local_irq_restore(flags);
}

#ifdef CONFIG_PROC_FS
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index 063af10..50ecd48 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -149,47 +149,37 @@ void irq_ctx_exit(int cpu)
hardirq_ctx[cpu] = NULL;
}

-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- unsigned long flags;
struct thread_info *curctx;
union irq_ctx *irqctx;
u32 *isp;

- if (in_interrupt())
- return;
-
- local_irq_save(flags);
+ curctx = current_thread_info();
+ irqctx = softirq_ctx[smp_processor_id()];
+ irqctx->tinfo.task = curctx->task;
+ irqctx->tinfo.previous_sp = current_stack_pointer;
+
+ /* build the stack frame on the softirq stack */
+ isp = (u32 *)((char *)irqctx + sizeof(*irqctx));
+
+ __asm__ __volatile__ (
+ "mov r15, r9 \n"
+ "jsr @%0 \n"
+ /* switch to the softirq stack */
+ " mov %1, r15 \n"
+ /* restore the thread stack */
+ "mov r9, r15 \n"
+ : /* no outputs */
+ : "r" (__do_softirq), "r" (isp)
+ : "memory", "r0", "r1", "r2", "r3", "r4",
+ "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
+ );

- if (local_softirq_pending()) {
- curctx = current_thread_info();
- irqctx = softirq_ctx[smp_processor_id()];
- irqctx->tinfo.task = curctx->task;
- irqctx->tinfo.previous_sp = current_stack_pointer;
-
- /* build the stack frame on the softirq stack */
- isp = (u32 *)((char *)irqctx + sizeof(*irqctx));
-
- __asm__ __volatile__ (
- "mov r15, r9 \n"
- "jsr @%0 \n"
- /* switch to the softirq stack */
- " mov %1, r15 \n"
- /* restore the thread stack */
- "mov r9, r15 \n"
- : /* no outputs */
- : "r" (__do_softirq), "r" (isp)
- : "memory", "r0", "r1", "r2", "r3", "r4",
- "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
- );
-
- /*
- * Shouldn't happen, we returned above if in_interrupt():
- */
- WARN_ON_ONCE(softirq_count());
- }
-
- local_irq_restore(flags);
+ /*
+ * Shouldn't happen, we returned above if in_interrupt():
+ */
+ WARN_ON_ONCE(softirq_count());
}
#else
static inline void handle_one_irq(unsigned int irq)
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index d4840ce..666193f 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -698,30 +698,19 @@ void __irq_entry handler_irq(int pil, struct pt_regs *regs)
set_irq_regs(old_regs);
}

-void do_softirq(void)
+void do_softirq_own_stack(void)
{
- unsigned long flags;
-
- if (in_interrupt())
- return;
-
- local_irq_save(flags);
+ void *orig_sp, *sp = softirq_stack[smp_processor_id()];

- if (local_softirq_pending()) {
- void *orig_sp, *sp = softirq_stack[smp_processor_id()];
-
- sp += THREAD_SIZE - 192 - STACK_BIAS;
-
- __asm__ __volatile__("mov %%sp, %0\n\t"
- "mov %1, %%sp"
- : "=&r" (orig_sp)
- : "r" (sp));
- __do_softirq();
- __asm__ __volatile__("mov %0, %%sp"
- : : "r" (orig_sp));
- }
+ sp += THREAD_SIZE - 192 - STACK_BIAS;

- local_irq_restore(flags);
+ __asm__ __volatile__("mov %%sp, %0\n\t"
+ "mov %1, %%sp"
+ : "=&r" (orig_sp)
+ : "r" (sp));
+ __do_softirq();
+ __asm__ __volatile__("mov %0, %%sp"
+ : : "r" (orig_sp));
}

#ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index 4186755..1036f03 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -149,35 +149,25 @@ void irq_ctx_init(int cpu)
cpu, per_cpu(hardirq_ctx, cpu), per_cpu(softirq_ctx, cpu));
}

-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- unsigned long flags;
struct thread_info *curctx;
union irq_ctx *irqctx;
u32 *isp;

- if (in_interrupt())
- return;
-
- local_irq_save(flags);
-
- if (local_softirq_pending()) {
- curctx = current_thread_info();
- irqctx = __this_cpu_read(softirq_ctx);
- irqctx->tinfo.task = curctx->task;
- irqctx->tinfo.previous_esp = current_stack_pointer;
-
- /* build the stack frame on the softirq stack */
- isp = (u32 *) ((char *)irqctx + sizeof(*irqctx));
+ curctx = current_thread_info();
+ irqctx = __this_cpu_read(softirq_ctx);
+ irqctx->tinfo.task = curctx->task;
+ irqctx->tinfo.previous_esp = current_stack_pointer;

- call_on_stack(__do_softirq, isp);
- /*
- * Shouldn't happen, we returned above if in_interrupt():
- */
- WARN_ON_ONCE(softirq_count());
- }
+ /* build the stack frame on the softirq stack */
+ isp = (u32 *) ((char *)irqctx + sizeof(*irqctx));

- local_irq_restore(flags);
+ call_on_stack(__do_softirq, isp);
+ /*
+ * Shouldn't happen, we returned above if in_interrupt():
+ */
+ WARN_ON_ONCE(softirq_count());
}

bool handle_irq(unsigned irq, struct pt_regs *regs)
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index d04d3ec..8dd8c96 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -91,20 +91,8 @@ bool handle_irq(unsigned irq, struct pt_regs *regs)

extern void call_softirq(void);

-asmlinkage void do_softirq(void)
+void do_softirq_own_stack(void)
{
- __u32 pending;
- unsigned long flags;
-
- if (in_interrupt())
- return;
-
- local_irq_save(flags);
- pending = local_softirq_pending();
- /* Switch to interrupt stack */
- if (pending) {
- call_softirq();
- WARN_ON_ONCE(softirq_count());
- }
- local_irq_restore(flags);
+ call_softirq();
+ WARN_ON_ONCE(softirq_count());
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5fa5afe..6554954 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -19,6 +19,7 @@

#include <linux/atomic.h>
#include <asm/ptrace.h>
+#include <asm/irq.h>

/*
* These correspond to the IORESOURCE_IRQ_* defines in
@@ -443,6 +444,16 @@ struct softirq_action

asmlinkage void do_softirq(void);
asmlinkage void __do_softirq(void);
+
+#ifdef __ARCH_HAS_DO_SOFTIRQ
+void do_softirq_own_stack(void);
+#else
+static inline void do_softirq_own_stack(void)
+{
+ __do_softirq();
+}
+#endif
+
extern void open_softirq(int nr, void (*action)(struct softirq_action *));
extern void softirq_init(void);
extern void __raise_softirq_irqoff(unsigned int nr);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index be3d351..39d27ff 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -29,7 +29,6 @@
#define CREATE_TRACE_POINTS
#include <trace/events/irq.h>

-#include <asm/irq.h>
/*
- No shared variables, all the data are CPU local.
- If a softirq needs serialization, let it serialize itself
@@ -283,7 +282,7 @@ restart:
tsk_restore_flags(current, old_flags, PF_MEMALLOC);
}

-#ifndef __ARCH_HAS_DO_SOFTIRQ
+

asmlinkage void do_softirq(void)
{
@@ -298,13 +297,11 @@ asmlinkage void do_softirq(void)
pending = local_softirq_pending();

if (pending)
- __do_softirq();
+ do_softirq_own_stack();

local_irq_restore(flags);
}

-#endif
-
/*
* Enter an interrupt context.
*/
--
1.7.5.4

2013-09-05 15:33:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/3] irq: Comment on the use of inline stack for ksoftirq

Ksoftirqd shouldn't need softirq stack since it's executing
in a kernel thread with a callstack that is only beginning at
this stage.

Lets comment about that for clarity.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Andrew Morton <[email protected]>
---
kernel/softirq.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 657e047..1de0322 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -759,6 +759,7 @@ static void run_ksoftirqd(unsigned int cpu)
{
local_irq_disable();
if (local_softirq_pending()) {
+ /* No need to use softirq stack here */
__do_softirq();
rcu_note_context_switch(cpu);
local_irq_enable();
--
1.7.5.4

2013-09-05 15:34:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/3] irq: Execute softirq on its own stack on irq exit

When a softirq executes in irq_exit(), it can contribute
to random complicated and large stack scenario involving task
calls, hw interrupt calls, softirq handler calls and
then other irqs, interrupting the softirq, that can dig further
with an irq handler.

Softirqs executing on the inline hw interrupt stack may favour
stack overflows in such circumstances, as it has been reported
in powerpc where task -> irq -> softirq -> irq can end up forming
a huge calltrace in the single kernel stack.

So if there are softirqs pending on hardirq exit, lets execute them
on the softirq stack to minimize this.

Reported-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Andrew Morton <[email protected]>
---
kernel/softirq.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 39d27ff..657e047 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -326,7 +326,7 @@ void irq_enter(void)
static inline void invoke_softirq(void)
{
if (!force_irqthreads)
- __do_softirq();
+ do_softirq_own_stack();
else
wakeup_softirqd();
}
--
1.7.5.4

2013-09-05 22:19:56

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] irq: Fix stack overflow due to softirq called on current stack

On Thu, 2013-09-05 at 17:33 +0200, Frederic Weisbecker wrote:
> Hi,
>
> This series is a proposition to fix the crash reported here: http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop
> And it has the upside to also consolidate a bit the arch do_softirq overriden implementation.
>
> Only tested in x86-64 for now.

Thanks. I'm off for a few days, Paul can you give that a spin ? My
understanding is that the crash was fairly reproduceable... which makes
me wonder what are the chances to get RH to pick that series up for
RHEL7...

Cheers,
Ben.

> Thanks.
>
> Frederic Weisbecker (3):
> irq: Consolidate do_softirq() arch overriden implementations
> irq: Execute softirq on its own stack on irq exit
> irq: Comment on the use of inline stack for ksoftirq
>
> arch/metag/kernel/irq.c | 56 +++++++++++++++++------------------------
> arch/parisc/kernel/irq.c | 17 +-----------
> arch/powerpc/kernel/irq.c | 17 +-----------
> arch/s390/kernel/irq.c | 52 +++++++++++++++----------------------
> arch/sh/kernel/irq.c | 60 ++++++++++++++++++-------------------------
> arch/sparc/kernel/irq_64.c | 31 +++++++---------------
> arch/x86/kernel/irq_32.c | 34 +++++++++----------------
> arch/x86/kernel/irq_64.c | 18 ++-----------
> include/linux/interrupt.h | 11 ++++++++
> kernel/softirq.c | 10 +++----
> 10 files changed, 112 insertions(+), 194 deletions(-)
>

2013-09-05 22:57:01

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 3/3] irq: Comment on the use of inline stack for ksoftirq

On Thu, Sep 05, 2013 at 05:33:24PM +0200, Frederic Weisbecker wrote:
> Ksoftirqd shouldn't need softirq stack since it's executing
> in a kernel thread with a callstack that is only beginning at
> this stage.
>
> Lets comment about that for clarity.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: James Hogan <[email protected]>
> Cc: James E.J. Bottomley <[email protected]>
> Cc: Helge Deller <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: David S. Miller <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> kernel/softirq.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 657e047..1de0322 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -759,6 +759,7 @@ static void run_ksoftirqd(unsigned int cpu)
> {
> local_irq_disable();
> if (local_softirq_pending()) {
> + /* No need to use softirq stack here */

Looking at that comment now, it probably deserve more details :)

> __do_softirq();
> rcu_note_context_switch(cpu);
> local_irq_enable();
> --
> 1.7.5.4
>

2013-09-18 06:51:13

by Paul Mackerras

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] irq: Fix stack overflow due to softirq called on current stack

Frederic,

On Thu, Sep 05, 2013 at 05:33:21PM +0200, Frederic Weisbecker wrote:
>
> This series is a proposition to fix the crash reported here: http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop
> And it has the upside to also consolidate a bit the arch do_softirq overriden implementation.

I tried this series on a ppc64 machine running KVM guests and doing
network traffic, while under memory pressure, and it seems to work
fine. Although the original problem was never completely
deterministically reproducible, the problem did originally show up
with the combination of memory pressure, network traffic and KVM
guests. I pushed the test machine moderately hard for a while and
there was no sign of stack overflow. So:

Tested-by: Paul Mackerras <[email protected]>

Are you going to push this upstream?

Thanks,
Paul.

2013-09-19 15:38:44

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] irq: Fix stack overflow due to softirq called on current stack

2013/9/18 Paul Mackerras <[email protected]>:
> Frederic,
>
> On Thu, Sep 05, 2013 at 05:33:21PM +0200, Frederic Weisbecker wrote:
>>
>> This series is a proposition to fix the crash reported here: http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop
>> And it has the upside to also consolidate a bit the arch do_softirq overriden implementation.
>
> I tried this series on a ppc64 machine running KVM guests and doing
> network traffic, while under memory pressure, and it seems to work
> fine. Although the original problem was never completely
> deterministically reproducible, the problem did originally show up
> with the combination of memory pressure, network traffic and KVM
> guests. I pushed the test machine moderately hard for a while and
> there was no sign of stack overflow. So:
>
> Tested-by: Paul Mackerras <[email protected]>

Thanks!

>
> Are you going to push this upstream?

Yeah I'll prepare a pull request and let Thomas decide about the fate
of these patches.

Thanks!