2020-12-15 19:08:44

by Heiko Carstens

[permalink] [raw]
Subject: __local_bh_enable_ip() vs lockdep

Hello,

the ftrace stack tracer kernel selftest is able to trigger the warning
below from time to time. This looks like there is an ordering problem
in __local_bh_enable_ip():
first there is a call to lockdep_softirqs_on() and afterwards
preempt_count_sub() is ftraced before it was able to modify
preempt_count:

[ 1016.245418] ------------[ cut here ]------------
[ 1016.245428] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[ 1016.245441] WARNING: CPU: 8 PID: 8300 at kernel/locking/lockdep.c:5298 check_flags.part.0+0x196/0x208
[ 1016.245580] CPU: 8 PID: 8300 Comm: sshd Not tainted 5.11.0-20201215.rc0.git0.d33ce49dca6c.300.fc33.s390x+debug #1
...
[ 1016.245691] Call Trace:
[ 1016.245698] [<000000004c1537fa>] check_flags.part.0+0x19a/0x208
[ 1016.245705] ([<000000004c1537f6>] check_flags.part.0+0x196/0x208)
[ 1016.245711] [<000000004cced786>] lock_is_held_type+0x8e/0x1b8
[ 1016.245716] [<000000004c172924>] rcu_read_lock_sched_held+0x64/0xb8
[ 1016.245724] [<000000004c1b151c>] module_assert_mutex_or_preempt+0x34/0x68
[ 1016.245730] [<000000004c1b2e04>] __module_address.part.0+0x2c/0x118
[ 1016.245735] [<000000004c1b9dca>] __module_text_address+0x3a/0x90
[ 1016.245741] [<000000004c1b9ed4>] is_module_text_address+0x34/0x78
[ 1016.245748] [<000000004c0f9a1a>] kernel_text_address+0x5a/0x130
[ 1016.245752] [<000000004c0f9b16>] __kernel_text_address+0x26/0x70
[ 1016.245757] [<000000004c094038>] unwind_get_return_address+0x40/0x68
[ 1016.245763] [<000000004c099dac>] arch_stack_walk+0xac/0xd0
[ 1016.245768] [<000000004c18be10>] stack_trace_save+0x50/0x68
[ 1016.245774] [<000000004c22d80c>] check_stack+0xc4/0x348
[ 1016.245780] [<000000004c22db46>] stack_trace_call+0xb6/0xd0
[ 1016.245785] [<000000004cd00082>] ftrace_caller+0x7a/0x7e
[ 1016.245791] [<000000004c1081d6>] preempt_count_sub+0x6/0x138 <---
[ 1016.245795] [<000000004c0d3d46>] __local_bh_enable_ip+0x13e/0x190 <---
[ 1016.245811] [<000003ff8023c34c>] nft_update_chain_stats+0xdc/0x168 [nf_tables]
[ 1016.245820] [<000003ff8023c916>] nft_do_chain+0x53e/0x550 [nf_tables]
[ 1016.245827] [<000003ff80251974>] nft_do_chain_ipv4+0x6c/0x78 [nf_tables]
[ 1016.245833] [<000000004cb0ab00>] nf_hook_slow+0x58/0xf8
[ 1016.245839] [<000000004cb1dc24>] nf_hook.constprop.0+0xfc/0x1d0
[ 1016.245844] [<000000004cb207b2>] __ip_local_out+0x92/0xe8
[ 1016.245848] [<000000004cb20d00>] __ip_queue_xmit+0x1d8/0x640
[ 1016.245854] [<000000004cb4578c>] __tcp_transmit_skb+0x3dc/0x770
[ 1016.245858] [<000000004cb46e86>] tcp_write_xmit+0x38e/0x758
[ 1016.245863] [<000000004cb47298>] __tcp_push_pending_frames+0x48/0x118
[ 1016.245868] [<000000004cb2f604>] tcp_sendmsg_locked+0x95c/0xb78
[ 1016.245872] [<000000004cb2f864>] tcp_sendmsg+0x44/0x68
[ 1016.245878] [<000000004ca30c3c>] sock_sendmsg+0x64/0x78
[ 1016.245882] [<000000004ca30cc2>] sock_write_iter+0x72/0x98
[ 1016.245887] [<000000004c3dcfda>] new_sync_write+0x10a/0x198
[ 1016.245891] [<000000004c3dd6a6>] vfs_write.part.0+0x196/0x290
[ 1016.245896] [<000000004c3e0220>] ksys_write+0xb8/0xf8
[ 1016.245900] [<000000004ccfd326>] system_call+0xe2/0x29c
[ 1016.245904] INFO: lockdep is turned off.


2020-12-15 19:51:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: __local_bh_enable_ip() vs lockdep

On Tue, 15 Dec 2020 20:01:52 +0100
Heiko Carstens <[email protected]> wrote:

> Hello,
>
> the ftrace stack tracer kernel selftest is able to trigger the warning
> below from time to time. This looks like there is an ordering problem
> in __local_bh_enable_ip():
> first there is a call to lockdep_softirqs_on() and afterwards
> preempt_count_sub() is ftraced before it was able to modify
> preempt_count:

Don't run ftrace stack tracer when debugging lockdep. ;-)

/me runs!

-- Steve

>
> [ 1016.245418] ------------[ cut here ]------------
> [ 1016.245428] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
> [ 1016.245441] WARNING: CPU: 8 PID: 8300 at kernel/locking/lockdep.c:5298 check_flags.part.0+0x196/0x208
> [ 1016.245580] CPU: 8 PID: 8300 Comm: sshd Not tainted 5.11.0-20201215.rc0.git0.d33ce49dca6c.300.fc33.s390x+debug #1
> ...
> [ 1016.245691] Call Trace:
> [ 1016.245698] [<000000004c1537fa>] check_flags.part.0+0x19a/0x208
> [ 1016.245705] ([<000000004c1537f6>] check_flags.part.0+0x196/0x208)
> [ 1016.245711] [<000000004cced786>] lock_is_held_type+0x8e/0x1b8
> [ 1016.245716] [<000000004c172924>] rcu_read_lock_sched_held+0x64/0xb8
> [ 1016.245724] [<000000004c1b151c>] module_assert_mutex_or_preempt+0x34/0x68
> [ 1016.245730] [<000000004c1b2e04>] __module_address.part.0+0x2c/0x118
> [ 1016.245735] [<000000004c1b9dca>] __module_text_address+0x3a/0x90
> [ 1016.245741] [<000000004c1b9ed4>] is_module_text_address+0x34/0x78
> [ 1016.245748] [<000000004c0f9a1a>] kernel_text_address+0x5a/0x130
> [ 1016.245752] [<000000004c0f9b16>] __kernel_text_address+0x26/0x70
> [ 1016.245757] [<000000004c094038>] unwind_get_return_address+0x40/0x68
> [ 1016.245763] [<000000004c099dac>] arch_stack_walk+0xac/0xd0
> [ 1016.245768] [<000000004c18be10>] stack_trace_save+0x50/0x68
> [ 1016.245774] [<000000004c22d80c>] check_stack+0xc4/0x348
> [ 1016.245780] [<000000004c22db46>] stack_trace_call+0xb6/0xd0
> [ 1016.245785] [<000000004cd00082>] ftrace_caller+0x7a/0x7e
> [ 1016.245791] [<000000004c1081d6>] preempt_count_sub+0x6/0x138 <---
> [ 1016.245795] [<000000004c0d3d46>] __local_bh_enable_ip+0x13e/0x190 <---
> [ 1016.245811] [<000003ff8023c34c>] nft_update_chain_stats+0xdc/0x168 [nf_tables]
> [ 1016.245820] [<000003ff8023c916>] nft_do_chain+0x53e/0x550 [nf_tables]
> [ 1016.245827] [<000003ff80251974>] nft_do_chain_ipv4+0x6c/0x78 [nf_tables]
> [ 1016.245833] [<000000004cb0ab00>] nf_hook_slow+0x58/0xf8
> [ 1016.245839] [<000000004cb1dc24>] nf_hook.constprop.0+0xfc/0x1d0
> [ 1016.245844] [<000000004cb207b2>] __ip_local_out+0x92/0xe8
> [ 1016.245848] [<000000004cb20d00>] __ip_queue_xmit+0x1d8/0x640
> [ 1016.245854] [<000000004cb4578c>] __tcp_transmit_skb+0x3dc/0x770
> [ 1016.245858] [<000000004cb46e86>] tcp_write_xmit+0x38e/0x758
> [ 1016.245863] [<000000004cb47298>] __tcp_push_pending_frames+0x48/0x118
> [ 1016.245868] [<000000004cb2f604>] tcp_sendmsg_locked+0x95c/0xb78
> [ 1016.245872] [<000000004cb2f864>] tcp_sendmsg+0x44/0x68
> [ 1016.245878] [<000000004ca30c3c>] sock_sendmsg+0x64/0x78
> [ 1016.245882] [<000000004ca30cc2>] sock_write_iter+0x72/0x98
> [ 1016.245887] [<000000004c3dcfda>] new_sync_write+0x10a/0x198
> [ 1016.245891] [<000000004c3dd6a6>] vfs_write.part.0+0x196/0x290
> [ 1016.245896] [<000000004c3e0220>] ksys_write+0xb8/0xf8
> [ 1016.245900] [<000000004ccfd326>] system_call+0xe2/0x29c
> [ 1016.245904] INFO: lockdep is turned off.

2020-12-18 15:37:44

by Heiko Carstens

[permalink] [raw]
Subject: Re: __local_bh_enable_ip() vs lockdep

On Wed, Dec 16, 2020 at 06:52:59PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 15, 2020 at 02:47:24PM -0500, Steven Rostedt wrote:
> > On Tue, 15 Dec 2020 20:01:52 +0100
> > Heiko Carstens <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > the ftrace stack tracer kernel selftest is able to trigger the warning
> > > below from time to time. This looks like there is an ordering problem
> > > in __local_bh_enable_ip():
> > > first there is a call to lockdep_softirqs_on() and afterwards
> > > preempt_count_sub() is ftraced before it was able to modify
> > > preempt_count:
> >
> > Don't run ftrace stack tracer when debugging lockdep. ;-)
> >
> > /me runs!
>
> Ha!, seriously though; that seems like something we've encountered
> before, but my google-fu is failing me.
>
> Do you remember what, if anything, was the problem with this?
>
> ---
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index d5bfd5e661fc..9d71046ea247 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -186,7 +186,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
> * Keep preemption disabled until we are done with
> * softirq processing:
> */
> - preempt_count_sub(cnt - 1);
> + __preempt_count_sub(cnt - 1);
>
> if (unlikely(!in_interrupt() && local_softirq_pending())) {
> /*

FWIW,

Tested-by: Heiko Carstens <[email protected]>

Peter, will you make proper patch out of this?

2020-12-18 15:40:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: __local_bh_enable_ip() vs lockdep

On Fri, Dec 18, 2020 at 04:33:56PM +0100, Heiko Carstens wrote:

> Peter, will you make proper patch out of this?

Yes, let me go do that now, and stick it in the quilt series before I go
off and don't think for 2 weeks.

2020-12-18 15:49:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: __local_bh_enable_ip() vs lockdep

On Fri, Dec 18, 2020 at 04:37:16PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 18, 2020 at 04:33:56PM +0100, Heiko Carstens wrote:
>
> > Peter, will you make proper patch out of this?
>
> Yes, let me go do that now, and stick it in the quilt series before I go
> off and don't think for 2 weeks.

As below, I'll try and push it out to tip/locking/urgent before I really
dissapear.

---

Subject: softirq: Avoid bad tracing / lockdep interaction
From: Peter Zijlstra <[email protected]>
Date: Fri Dec 18 16:39:14 CET 2020

Similar to commit:

1a63dcd8765b ("softirq: Reorder trace_softirqs_on to prevent lockdep splat")

__local_bh_enable_ip() can also call into tracing with inconsistent
state. Unlike that commit we don't need to bother about the tracepoint
because 'cnt-1' never matches preempt_count() (by construction).

Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
---
kernel/softirq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -186,7 +186,7 @@ void __local_bh_enable_ip(unsigned long
* Keep preemption disabled until we are done with
* softirq processing:
*/
- preempt_count_sub(cnt - 1);
+ __preempt_count_sub(cnt - 1);

if (unlikely(!in_interrupt() && local_softirq_pending())) {
/*

Subject: [tip: locking/urgent] softirq: Avoid bad tracing / lockdep interaction

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 91ea62d58bd661827c328a2c6c02a87fa4aae88b
Gitweb: https://git.kernel.org/tip/91ea62d58bd661827c328a2c6c02a87fa4aae88b
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 18 Dec 2020 16:39:14 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 18 Dec 2020 16:53:13 +01:00

softirq: Avoid bad tracing / lockdep interaction

Similar to commit:

1a63dcd8765b ("softirq: Reorder trace_softirqs_on to prevent lockdep splat")

__local_bh_enable_ip() can also call into tracing with inconsistent
state. Unlike that commit we don't need to bother about the tracepoint
because 'cnt-1' never matches preempt_count() (by construction).

Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/softirq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 09229ad..0f1d3a3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -185,7 +185,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
* Keep preemption disabled until we are done with
* softirq processing:
*/
- preempt_count_sub(cnt - 1);
+ __preempt_count_sub(cnt - 1);

if (unlikely(!in_interrupt() && local_softirq_pending())) {
/*