2020-09-27 16:14:19

by Chengming Zhou

[permalink] [raw]
Subject: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
deadlock on rq lock in the printk path.

Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d95dc3f4644..81d8bf614225 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4444,6 +4444,7 @@ static void __sched notrace __schedule(bool preempt)
*/
rq_lock(rq, &rf);
smp_mb__after_spinlock();
+ printk_deferred_enter();

/* Promote REQ to ACT */
rq->clock_update_flags <<= 1;
@@ -4530,6 +4531,7 @@ static void __sched notrace __schedule(bool preempt)
rq_unlock_irq(rq, &rf);
}

+ printk_deferred_exit();
balance_callback(rq);
}

--
2.11.0


2020-09-28 07:34:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
> deadlock on rq lock in the printk path.

It also shouldn't happen in the first place, so who bloody cares.

> Signed-off-by: Chengming Zhou <[email protected]>
> Signed-off-by: Muchun Song <[email protected]>
> ---
> kernel/sched/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d95dc3f4644..81d8bf614225 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4444,6 +4444,7 @@ static void __sched notrace __schedule(bool preempt)
> */
> rq_lock(rq, &rf);
> smp_mb__after_spinlock();
> + printk_deferred_enter();
>
> /* Promote REQ to ACT */
> rq->clock_update_flags <<= 1;
> @@ -4530,6 +4531,7 @@ static void __sched notrace __schedule(bool preempt)
> rq_unlock_irq(rq, &rf);
> }
>
> + printk_deferred_exit();
> balance_callback(rq);
> }

NAK printk_deferred is an abomination, kill that.

2020-09-28 07:54:30

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, 2020-09-28 at 09:32 +0200, Peter Zijlstra wrote:
> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
> > The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
> > deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
> > deadlock on rq lock in the printk path.
[]
> NAK printk_deferred is an abomination, kill that.

Didn't you introduce it?
Should you be complaining to yourself?

---

commit 3ccf3e8306156a28213adc720aba807e9a901ad5
Author: Peter Zijlstra <[email protected]>
Date: Mon Feb 27 10:47:00 2012 +0100

printk/sched: Introduce special printk_sched() for those awkward moments

There's a few awkward printk()s inside of scheduler guts that people
prefer to keep but really are rather deadlock prone. Fudge around it
by storing the text in a per-cpu buffer and poll it using the existing
printk_tick() handler.

This will drop output when its more frequent than once a tick, however
only the affinity thing could possible go that fast and for that just
one should suffice to notify the admin he's done something silly..

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

commit aac74dc495456412c4130a1167ce4beb6c1f0b38
Author: John Stultz <[email protected]>
Date: Wed Jun 4 16:11:40 2014 -0700

printk: rename printk_sched to printk_deferred

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Bohac <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


2020-09-28 08:02:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, Sep 28, 2020 at 12:52:44AM -0700, Joe Perches wrote:
> On Mon, 2020-09-28 at 09:32 +0200, Peter Zijlstra wrote:
> > On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
> > > The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
> > > deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
> > > deadlock on rq lock in the printk path.
> []
> > NAK printk_deferred is an abomination, kill that.
>
> Didn't you introduce it?
> Should you be complaining to yourself?

Yeah. I should've hacked around it then I suppose. Still, no reason to
proliferate that crap. Afaik the new printk should be able to deal with
this at some point.

The thing we're not going to do it add a cache-miss to schedule just
because.

2020-09-28 08:10:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, Sep 28, 2020 at 10:00:22AM +0200, Peter Zijlstra wrote:
> On Mon, Sep 28, 2020 at 12:52:44AM -0700, Joe Perches wrote:
> > On Mon, 2020-09-28 at 09:32 +0200, Peter Zijlstra wrote:
> > > On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
> > > > The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
> > > > deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
> > > > deadlock on rq lock in the printk path.
> > []
> > > NAK printk_deferred is an abomination, kill that.
> >
> > Didn't you introduce it?
> > Should you be complaining to yourself?
>
> Yeah. I should've hacked around it then I suppose. Still, no reason to
> proliferate that crap. Afaik the new printk should be able to deal with
> this at some point.
>
> The thing we're not going to do it add a cache-miss to schedule just
> because.

Also, using deferred for WARN/BUG is fundamentally wrong, you want that
splat _NOW_, there might not be a later.

2020-09-28 08:56:28

by Chengming Zhou

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()


在 2020/9/28 下午3:32, Peter Zijlstra 写道:
> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
>> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
>> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
>> deadlock on rq lock in the printk path.
> It also shouldn't happen in the first place, so who bloody cares.

Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to
reboot : (

So these WARN_ON_ONCE have BUG_ON effect ?  Or we should change to use
BUG_ON ?

Thanks.

>> Signed-off-by: Chengming Zhou <[email protected]>
>> Signed-off-by: Muchun Song <[email protected]>
>> ---
>> kernel/sched/core.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 2d95dc3f4644..81d8bf614225 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4444,6 +4444,7 @@ static void __sched notrace __schedule(bool preempt)
>> */
>> rq_lock(rq, &rf);
>> smp_mb__after_spinlock();
>> + printk_deferred_enter();
>>
>> /* Promote REQ to ACT */
>> rq->clock_update_flags <<= 1;
>> @@ -4530,6 +4531,7 @@ static void __sched notrace __schedule(bool preempt)
>> rq_unlock_irq(rq, &rf);
>> }
>>
>> + printk_deferred_exit();
>> balance_callback(rq);
>> }
> NAK printk_deferred is an abomination, kill that.

2020-09-28 09:06:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, Sep 28, 2020 at 04:54:53PM +0800, Chengming Zhou wrote:
>
> 在 2020/9/28 下午3:32, Peter Zijlstra 写道:
> > On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
> >> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
> >> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
> >> deadlock on rq lock in the printk path.
> > It also shouldn't happen in the first place, so who bloody cares.
>
> Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to
> reboot : (

You have to reboot anyway to get into the fixed kernel.

> So these WARN_ON_ONCE have BUG_ON effect ?  Or we should change to use
> BUG_ON ?

Or use a sane printk implementation, I never suffer this.

2020-09-28 10:06:38

by Chengming Zhou

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()


在 2020/9/28 下午5:01, Peter Zijlstra 写道:
> On Mon, Sep 28, 2020 at 04:54:53PM +0800, Chengming Zhou wrote:
>> 在 2020/9/28 下午3:32, Peter Zijlstra 写道:
>>> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
>>>> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
>>>> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
>>>> deadlock on rq lock in the printk path.
>>> It also shouldn't happen in the first place, so who bloody cares.
>> Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to
>> reboot : (
> You have to reboot anyway to get into the fixed kernel.

Mostly, WARN_ON_ONCE happened in the perf code on our machines, we actually

don't care too much about the perf function works : )   Looks like we
have to find and

fix that perf bug before go on...

>> So these WARN_ON_ONCE have BUG_ON effect ?  Or we should change to use
>> BUG_ON ?
> Or use a sane printk implementation, I never suffer this.

Well, you are lucky. So it's a problem in our printk implementation.

The deadlock path is:

printk

  vprintk_emit

    console_unlock

      vt_console_print

        hide_cursor

          bit_cursor

            soft_cursor

              queue_work_on

                __queue_work

                  try_to_wake_up

                    _raw_spin_lock

                      native_queued_spin_lock_slowpath

Looks like it's introduced by this commit:

eaa434defaca1781fb2932c685289b610aeb8b4b

"drm/fb-helper: Add fb_deferred_io support"

    This adds deferred io support to drm_fb_helper.
    The fbdev framebuffer changes are flushed using the callback
    (struct drm_framebuffer *)->funcs->dirty() by a dedicated worker
    ensuring that it always runs in process context.
   
    For those wondering why we need to be able to handle atomic calling
    contexts: Both panic paths and cursor code and fbcon blanking can run
    from atomic. See
   
    commit bcb39af4486be07e896fc374a2336bad3104ae0a
    Author: Dave Airlie <[email protected]>
    Date:   Thu Feb 7 11:19:15 2013 +1000
   
        drm/udl: make usage as a console safer
   
    for where this was originally discovered.

2020-09-28 10:28:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:

> Well, you are lucky. So it's a problem in our printk implementation.

Not lucky; I just kicked it in the groin really hard:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental

> The deadlock path is:
>
> printk
> ? vprintk_emit
> ??? console_unlock
> ? ? ? vt_console_print
> ? ? ? ? hide_cursor
> ? ? ? ? ? bit_cursor
> ? ? ? ? ? ? soft_cursor
> ? ? ? ? ? ? ? queue_work_on
> ? ? ? ? ? ? ? ? __queue_work
> ? ? ? ? ? ? ? ? ? try_to_wake_up
> ? ? ? ? ? ? ? ? ? ? _raw_spin_lock
> ? ? ? ? ? ? ? ? ? ? ? native_queued_spin_lock_slowpath
>
> Looks like it's introduced by this commit:
>
> eaa434defaca1781fb2932c685289b610aeb8b4b
>
> "drm/fb-helper: Add fb_deferred_io support"

Oh gawd, yeah, all the !serial consoles are utter batshit.

Please look at John's last printk rewrite, IIRC it farms all that off to
a kernel thread instead of doing it from the printk() caller's context.

I'm not sure where he hides his latests patches, but I'm sure he'll be
more than happy to tell you.

2020-09-28 23:53:14

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On (20/09/28 12:25), Peter Zijlstra wrote:
[..]
> > printk
> > ? vprintk_emit
> > ??? console_unlock
> > ? ? ? vt_console_print
> > ? ? ? ? hide_cursor
> > ? ? ? ? ? bit_cursor
> > ? ? ? ? ? ? soft_cursor
> > ? ? ? ? ? ? ? queue_work_on
> > ? ? ? ? ? ? ? ? __queue_work
> > ? ? ? ? ? ? ? ? ? try_to_wake_up
> > ? ? ? ? ? ? ? ? ? ? _raw_spin_lock
> > ? ? ? ? ? ? ? ? ? ? ? native_queued_spin_lock_slowpath
> >
> > Looks like it's introduced by this commit:
> >
> > eaa434defaca1781fb2932c685289b610aeb8b4b
> >
> > "drm/fb-helper: Add fb_deferred_io support"
>
> Oh gawd, yeah, all the !serial consoles are utter batshit.
>
> Please look at John's last printk rewrite, IIRC it farms all that off to
> a kernel thread instead of doing it from the printk() caller's context.

Not yet. Scheduler is still part of the printk() - either in the
form of !serial consoles or console_sem, or both.

-ss

2020-09-29 14:31:48

by Petr Mladek

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Mon 2020-09-28 12:25:59, Peter Zijlstra wrote:
> On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:
>
> > Well, you are lucky. So it's a problem in our printk implementation.
>
> Not lucky; I just kicked it in the groin really hard:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental
>
> > The deadlock path is:
> >
> > printk
> > ? vprintk_emit
> > ??? console_unlock
> > ? ? ? vt_console_print
> > ? ? ? ? hide_cursor
> > ? ? ? ? ? bit_cursor
> > ? ? ? ? ? ? soft_cursor
> > ? ? ? ? ? ? ? queue_work_on
> > ? ? ? ? ? ? ? ? __queue_work
> > ? ? ? ? ? ? ? ? ? try_to_wake_up
> > ? ? ? ? ? ? ? ? ? ? _raw_spin_lock
> > ? ? ? ? ? ? ? ? ? ? ? native_queued_spin_lock_slowpath
> >
> > Looks like it's introduced by this commit:
> >
> > eaa434defaca1781fb2932c685289b610aeb8b4b
> >
> > "drm/fb-helper: Add fb_deferred_io support"
>
> Oh gawd, yeah, all the !serial consoles are utter batshit.
>
> Please look at John's last printk rewrite, IIRC it farms all that off to
> a kernel thread instead of doing it from the printk() caller's context.
>
> I'm not sure where he hides his latests patches, but I'm sure he'll be
> more than happy to tell you.

AFAIK, John is just working on updating the patchset so that it will
be based on the lockless ringbuffer that is finally in the queue
for-5.10.

Upstreaming the console handling will be the next big step. I am sure
that there will be long discussion about it. But there might be
few things that would help removing printk_deferred().

1. Messages will be printed on consoles by dedicated kthreads. It will
be safe context. No deadlocks.

2. The registration and unregistration of consoles should not longer
be handled by console_lock (semaphore). It should be possible to
call most consoles without a sleeping lock. It should remove all
these deadlocks between printk() and scheduler().

There might be problems with some consoles. For example, tty would
most likely still need a sleeping lock because it is using the console
semaphore also internally.


3. We will try harder to get the messages out immediately during
panic().


It would take some time until the above reaches upstream. But it seems
to be the right way to go.


About printk_deferred():

It is a whack a mole game. It is easy to miss printk() that might
eventually cause the deadlock.

printk deferred context is more safe. But it is still a what a mole
game. The kthreads will do the same job for sure.

Finally, the deadlock happens "only" when someone is waiting on
console_lock() in parallel. Otherwise, the waitqueue for the semaphore
is empty and scheduler is not called.

It means that there is quite a big change to see the WARN(). It might
be even bigger than with printk_deferred() because WARN() in scheduler
means that the scheduler is big troubles. Nobody guarantees that
the deferred messages will get handled later.

Best Regards,
Petr

2020-09-29 15:13:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On Tue, Sep 29, 2020 at 04:27:51PM +0200, Petr Mladek wrote:

> Upstreaming the console handling will be the next big step. I am sure
> that there will be long discussion about it. But there might be
> few things that would help removing printk_deferred().
>
> 1. Messages will be printed on consoles by dedicated kthreads. It will
> be safe context. No deadlocks.

This, that's what I remember. With sane consoles having a
->write_atomic() callback which is called in-line.

Current -RT has a single kthread that's flushing the consoles.

> 2. The registration and unregistration of consoles should not longer
> be handled by console_lock (semaphore). It should be possible to
> call most consoles without a sleeping lock. It should remove all
> these deadlocks between printk() and scheduler().

I'm confused, who cares about registation? That only happens once at
boot, right?

> There might be problems with some consoles. For example, tty would
> most likely still need a sleeping lock because it is using the console
> semaphore also internally.

But per 1) above, that's done from a kthread, so nobody cares.

> 3. We will try harder to get the messages out immediately during
> panic().

As long as you guarantee to first write everything to consoles with
->write_atomic() before attempting to flush others that should be fine.

> It would take some time until the above reaches upstream. But it seems
> to be the right way to go.

Yep.

> Finally, the deadlock happens "only" when someone is waiting on
> console_lock() in parallel. Otherwise, the waitqueue for the semaphore
> is empty and scheduler is not called.

There's more deadlocks. In fact the one described earlier in this
thread isn't the one you mention.

> It means that there is quite a big change to see the WARN(). It might
> be even bigger than with printk_deferred() because WARN() in scheduler
> means that the scheduler is big troubles. Nobody guarantees that
> the deferred messages will get handled later.

I only care about ->write_atomic() :-) Anything else is a
best-effort-loss anyway.

2020-09-30 00:54:29

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

On (20/09/29 17:09), Peter Zijlstra wrote:
> > 2. The registration and unregistration of consoles should not longer
> > be handled by console_lock (semaphore). It should be possible to
> > call most consoles without a sleeping lock. It should remove all
> > these deadlocks between printk() and scheduler().
>
> I'm confused, who cares about registation? That only happens once at
> boot, right?

You can modprobe or rmmod (register/unregister) netconsole anytime,
for example.

-ss