2014-06-10 16:46:47

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Thu, May 29, 2014 at 10:09:48AM +0200, Jiri Kosina wrote:
> On Thu, 29 May 2014, Frederic Weisbecker wrote:
>
> > > I am rather surprised that this patchset hasn't received a single review
> > > comment for 3 weeks.
> > >
> > > Let me point out that the issues Petr is talking about in the cover letter
> > > are real -- we've actually seen the lockups triggered by RCU stall
> > > detector trying to dump stacks on all CPUs, and hard-locking machine up
> > > while doing so.
> > >
> > > So this really needs to be solved.
> >
> > The lack of review may be partly due to a not very appealing changestat
> > on an old codebase that is already unpopular:
> >
> > Documentation/kernel-parameters.txt | 19 +-
> > kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
> > 2 files changed, 878 insertions(+), 359 deletions(-)
> >
> >
> > Your patches look clean and pretty nice actually. They must be seriously
> > considered if we want to keep the current locked ring buffer design and
> > extend it to multiple per context buffers. But I wonder if it's worth to
> > continue that way with the printk ancient design.
> >
> > If it takes more than 1000 line changes (including 500 added) to make it
> > finally work correctly with NMIs by working around its fundamental
> > flaws, shouldn't we rather redesign it to use a lockless ring buffer
> > like ftrace or perf ones?
>
> Yeah, printk() has grown over years to a stinking pile of you-know-what,
> no argument to that.
>
> I also agree that performing a massive rewrite, which will make it use a
> lockless buffer, and therefore ultimately solve all its problems
> (scheduler deadlocks, NMI deadlocks, xtime_lock deadlocks) at once, is
> necessary in the long run.
>
> On the other hand, I am completely sure that the diffstat for such rewrite
> is going to be much more scary :)

Indeed, but probably much more valuable in the long term.

>
> This is not adding fancy features to printk(), where we really should be
> saying no; horrible commits like 7ff9554bb5 is exactly something that
> should be pushed against *heavily*. But bugfixes for hard machine lockups
> are a completely different story to me (until we have a whole new printk()
> buffer handling implementation).

Yeah bugfixes are certainly another story. Still it looks like yet another
layer of workaround on a big hack.

But yeah I'm certainly not in a right position to set anyone to do a massive
rewrite on such a boring subsystem :)

There is also a big risk that if we push back this bugfix, nobody will actually do
that desired rewrite.

Lets be crazy and Cc Linus on that.


2014-06-10 16:57:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue, Jun 10, 2014 at 9:46 AM, Frederic Weisbecker <[email protected]> wrote:
>
> There is also a big risk that if we push back this bugfix, nobody will actually do
> that desired rewrite.
>
> Lets be crazy and Cc Linus on that.

Quite frankly, I hate seeing something like this:

kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------

for something that is stupid and broken. Printing from NMI context
isn't really supposed to work, and we all *know* it's not supposed to
work.

I'd much rather disallow it, and if there is one or two places that
really want to print a warning and know that they are in NMI context,
have a special workaround just for them, with something that does
*not* try to make printk in general work any better.

Dammit, NMI context is special. I absolutely refuse to buy into the
broken concept that we should make more stuff work in NMI context.
Hell no, we should *not* try to make more crap work in NMI. NMI people
should be careful.

Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
logbuf_lock, and *maybe* the existing sequence of

if (console_trylock_for_printk())
console_unlock();

then works for actually triggering the printout. But the wrapper
should be 15 lines of code for "if possible, try to print things", and
*not* a thousand lines of changes.

Linus

2014-06-10 17:32:58

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > Lets be crazy and Cc Linus on that.
>
> Quite frankly, I hate seeing something like this:
>
> kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
>
> for something that is stupid and broken. Printing from NMI context
> isn't really supposed to work, and we all *know* it's not supposed to
> work.

It's OTOH rather useful in a few scenarios -- particularly it's the only
way to dump stacktraces from remote CPUs in order to obtain traces that
actually make sense (in situations like RCU stall); using workqueue-based
dumping is useless there.

> I'd much rather disallow it, and if there is one or two places that
> really want to print a warning and know that they are in NMI context,
> have a special workaround just for them, with something that does
> *not* try to make printk in general work any better.

Well, that'd mean that at least our stack dumping mechanism would need to
know both ways of printing; but yes, it'll still probably be less than 880
lines added.

> Dammit, NMI context is special. I absolutely refuse to buy into the
> broken concept that we should make more stuff work in NMI context.
> Hell no, we should *not* try to make more crap work in NMI. NMI people
> should be careful.

In parallel, I'd for the sake of argument propose to just drop the whole
_CONT printing (and all the things that followed on top) as that made
printk() a complete hell to maintain for a disputable gain IMO.

> Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> logbuf_lock, and *maybe* the existing sequence of
>
> if (console_trylock_for_printk())
> console_unlock();
>
> then works for actually triggering the printout. But the wrapper
> should be 15 lines of code for "if possible, try to print things", and
> *not* a thousand lines of changes.

Well, we are carrying much simpler fix for this whole braindamage in our
enterprise kernel that is from pre-7ff9554bb578 era, and it was rather
simple fix in principle (the diffstat is much larger than it had to be due
to code movements):

http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

But after the scary 7ff9554bb578 and its successors, things got a lot more
complicated.

--
Jiri Kosina
SUSE Labs

2014-06-11 09:01:13

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue 2014-06-10 19:32:51, Jiri Kosina wrote:
> On Tue, 10 Jun 2014, Linus Torvalds wrote:
>
> > > Lets be crazy and Cc Linus on that.
> >
> > Quite frankly, I hate seeing something like this:
> >
> > kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
> >
> > for something that is stupid and broken. Printing from NMI context
> > isn't really supposed to work, and we all *know* it's not supposed to
> > work.
>
> It's OTOH rather useful in a few scenarios -- particularly it's the only
> way to dump stacktraces from remote CPUs in order to obtain traces that
> actually make sense (in situations like RCU stall); using workqueue-based
> dumping is useless there.
>
> > Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> > logbuf_lock, and *maybe* the existing sequence of
> >
> > if (console_trylock_for_printk())
> > console_unlock();
> >
> > then works for actually triggering the printout. But the wrapper
> > should be 15 lines of code for "if possible, try to print things", and
> > *not* a thousand lines of changes.

I am afraid that basically this is done in my patch set. It does
trylock and uses the main buffer when possible. I am just not able to
squeeze it into 15 lines :-(

One problem is that we do not want to loose the messages,
e.g. stacktraces. We need to store them somewhere and merge them into
the main ring buffer later.

> Well, we are carrying much simpler fix for this whole braindamage in our
> enterprise kernel that is from pre-7ff9554bb578 era, and it was rather
> simple fix in principle (the diffstat is much larger than it had to be due
> to code movements):
>
> http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
>
> But after the scary 7ff9554bb578 and its successors, things got a lot more
> complicated.

Yes, another big problem is the above mentioned commit. The reading from the
temporary storage has to be in the normal context and thus lockless.
It is much more complicated when we work with whole messages and all
the added flags.

Also note that we want to save the last messages when the temporary storage
is full. This is why I used a ring buffer and was not able to use a
more simple producer and consumer algorithm.


Best Regards,
Petr

2014-06-18 11:03:10

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > Lets be crazy and Cc Linus on that.
>
> Quite frankly, I hate seeing something like this:
>
> kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
>
> for something that is stupid and broken. Printing from NMI context
> isn't really supposed to work, and we all *know* it's not supposed to
> work.
>
> I'd much rather disallow it, and if there is one or two places that
> really want to print a warning and know that they are in NMI context,
> have a special workaround just for them, with something that does
> *not* try to make printk in general work any better.
>
> Dammit, NMI context is special. I absolutely refuse to buy into the
> broken concept that we should make more stuff work in NMI context.
> Hell no, we should *not* try to make more crap work in NMI. NMI people
> should be careful.
>
> Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> logbuf_lock, and *maybe* the existing sequence of
>
> if (console_trylock_for_printk())
> console_unlock();
>
> then works for actually triggering the printout. But the wrapper
> should be 15 lines of code for "if possible, try to print things", and
> *not* a thousand lines of changes.

Alright, so this went silent again without any real progress. Is everyone
hoping this gets sorted out on kernel summit, or ... ?

Let me sum up the current situation:

- both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
seen it happening for real) cause a complete, undebuggable, silent hang
of machine (deadlock in NMI context)

- before 7ff9554bb578 and friends, this was trivial to fix more or less
exactly the way Linus is proposing above. We've been carrying the
fix in our kernels for a while already [1]. With printk() having got
overly complicated recently, the "in principle trivial" fix turns out
into crazy mess due to handling of all the indexes, sequence numbers,
etc.

- printk() from NMI is actually useful in rare cases (such as inter-CPU
stack dumping)

- using lockless buffers that trace_printk() is using has its own
problems, as described by Petr elsewhere in this thread


I find it rather outrageous that fixing *real bugs* (leading to hangs)
becomes impossible due to printk() being too complex. It's very
unfortunate that the same level of pushback didn't happen when new
features (that actually *made* it so complicated) have been pushed; that
would be much more valuable an appropriate.

I believe Jan Kara is in the same situation with his softlockup fixes for
printk. Those are real problems, as they are bringing machines down, and
after two years, still not fixed, because "printk() code is scary enough
as-is"

[1] http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

--
Jiri Kosina
SUSE Labs

2014-06-18 14:36:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 01:03:05PM +0200, Jiri Kosina wrote:
> On Tue, 10 Jun 2014, Linus Torvalds wrote:
>
> > > Lets be crazy and Cc Linus on that.
> >
> > Quite frankly, I hate seeing something like this:
> >
> > kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
> >
> > for something that is stupid and broken. Printing from NMI context
> > isn't really supposed to work, and we all *know* it's not supposed to
> > work.
> >
> > I'd much rather disallow it, and if there is one or two places that
> > really want to print a warning and know that they are in NMI context,
> > have a special workaround just for them, with something that does
> > *not* try to make printk in general work any better.
> >
> > Dammit, NMI context is special. I absolutely refuse to buy into the
> > broken concept that we should make more stuff work in NMI context.
> > Hell no, we should *not* try to make more crap work in NMI. NMI people
> > should be careful.
> >
> > Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> > logbuf_lock, and *maybe* the existing sequence of
> >
> > if (console_trylock_for_printk())
> > console_unlock();
> >
> > then works for actually triggering the printout. But the wrapper
> > should be 15 lines of code for "if possible, try to print things", and
> > *not* a thousand lines of changes.
>
> Alright, so this went silent again without any real progress. Is everyone
> hoping this gets sorted out on kernel summit, or ... ?
>
> Let me sum up the current situation:
>
> - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> seen it happening for real) cause a complete, undebuggable, silent hang
> of machine (deadlock in NMI context)

I could easily add an option to RCU to allow people to tell it not to
use NMIs to dump the stack. Would that help?

Thanx, Paul

> - before 7ff9554bb578 and friends, this was trivial to fix more or less
> exactly the way Linus is proposing above. We've been carrying the
> fix in our kernels for a while already [1]. With printk() having got
> overly complicated recently, the "in principle trivial" fix turns out
> into crazy mess due to handling of all the indexes, sequence numbers,
> etc.
>
> - printk() from NMI is actually useful in rare cases (such as inter-CPU
> stack dumping)
>
> - using lockless buffers that trace_printk() is using has its own
> problems, as described by Petr elsewhere in this thread
>
>
> I find it rather outrageous that fixing *real bugs* (leading to hangs)
> becomes impossible due to printk() being too complex. It's very
> unfortunate that the same level of pushback didn't happen when new
> features (that actually *made* it so complicated) have been pushed; that
> would be much more valuable an appropriate.
>
> I believe Jan Kara is in the same situation with his softlockup fixes for
> printk. Those are real problems, as they are bringing machines down, and
> after two years, still not fixed, because "printk() code is scary enough
> as-is"
>
> [1] http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
>
> --
> Jiri Kosina
> SUSE Labs
>

2014-06-18 14:41:13

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> > seen it happening for real) cause a complete, undebuggable, silent hang
> > of machine (deadlock in NMI context)
>
> I could easily add an option to RCU to allow people to tell it not to
> use NMIs to dump the stack. Would that help?

Well, that would make unfortunately the information provided by RCU stall
detector rather useless ... workqueue-based stack dumping is very unlikely
to point its finger to the real offender, as it'd be coming way too late.

Thanks,

--
Jiri Kosina
SUSE Labs

2014-06-18 14:45:09

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:41:09PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
>
> > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> > > seen it happening for real) cause a complete, undebuggable, silent hang
> > > of machine (deadlock in NMI context)
> >
> > I could easily add an option to RCU to allow people to tell it not to
> > use NMIs to dump the stack. Would that help?
>
> Well, that would make unfortunately the information provided by RCU stall
> detector rather useless ... workqueue-based stack dumping is very unlikely
> to point its finger to the real offender, as it'd be coming way too late.

I would not use workqueues, but rather have the CPU detecting the
stall grovel through the other CPUs' stacks, which is what I do now for
architectures that don't support NMI-based stack dumps. Would that be
a reasonable approach?

Thanx, Paul

2014-06-18 14:53:53

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> > > > seen it happening for real) cause a complete, undebuggable, silent hang
> > > > of machine (deadlock in NMI context)
> > >
> > > I could easily add an option to RCU to allow people to tell it not to
> > > use NMIs to dump the stack. Would that help?
> >
> > Well, that would make unfortunately the information provided by RCU stall
> > detector rather useless ... workqueue-based stack dumping is very unlikely
> > to point its finger to the real offender, as it'd be coming way too late.
>
> I would not use workqueues, but rather have the CPU detecting the
> stall grovel through the other CPUs' stacks, which is what I do now for
> architectures that don't support NMI-based stack dumps. Would that be
> a reasonable approach?

That would indeed solve lockups induced by RCU stall detector (and we
should convert sysrq stack dumping code to use the same mechanism
afterwards).

But then, the kernel is still polluted by quite a few instances of

WARN_ON(in_nmi())

BUG_IN(in_nmi())

if (in_nmi())
printk(....)

which need to be fixed separately afterwards anyway.

Thanks,

--
Jiri Kosina
SUSE Labs

2014-06-18 15:07:32

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:53:14PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
>
> > > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> > > > > seen it happening for real) cause a complete, undebuggable, silent hang
> > > > > of machine (deadlock in NMI context)
> > > >
> > > > I could easily add an option to RCU to allow people to tell it not to
> > > > use NMIs to dump the stack. Would that help?
> > >
> > > Well, that would make unfortunately the information provided by RCU stall
> > > detector rather useless ... workqueue-based stack dumping is very unlikely
> > > to point its finger to the real offender, as it'd be coming way too late.
> >
> > I would not use workqueues, but rather have the CPU detecting the
> > stall grovel through the other CPUs' stacks, which is what I do now for
> > architectures that don't support NMI-based stack dumps. Would that be
> > a reasonable approach?
>
> That would indeed solve lockups induced by RCU stall detector (and we
> should convert sysrq stack dumping code to use the same mechanism
> afterwards).
>
> But then, the kernel is still polluted by quite a few instances of
>
> WARN_ON(in_nmi())
>
> BUG_IN(in_nmi())
>
> if (in_nmi())
> printk(....)
>
> which need to be fixed separately afterwards anyway.

True enough!

Thanx, Paul