2013-03-01 21:47:14

by David Miller

[permalink] [raw]
Subject: WARNING at tty_buffer.c:428 process_one_work()


I'm getting these non-stop right when the hypervisor console registers
on sparc64, and the machine won't boot up properly. This is with
Linus's current tree.

[511865.556835] console [ttyHV0] enabled
[511865.564555] ------------[ cut here ]------------
[511865.612410] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
[511865.627846] tty is NULL
[511865.687052] Modules linked in:
[511865.693305] Call Trace:
[511865.698343] [0000000000464f2c] warn_slowpath_fmt+0x2c/0x40
[511865.709637] [0000000000480f24] process_one_work+0x164/0x420
[511865.721097] [0000000000481580] worker_thread+0x120/0x440
[511865.732036] [00000000004867a8] kthread+0x88/0xa0
[511865.741585] [0000000000406044] ret_from_syscall+0x1c/0x2c
[511865.752695] [0000000000000000] (null)
[511865.762080] ---[ end trace 40d7a6b076cabd64 ]---
[511865.771806] ------------[ cut here ]------------
[511865.780935] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
[511865.796379] tty is NULL
[511865.801409] Modules linked in:
[511865.807649] Call Trace:
[511865.812689] [0000000000464f2c] warn_slowpath_fmt+0x2c/0x40
[511865.823981] [0000000000480f24] process_one_work+0x164/0x420
[511865.835447] [0000000000481580] worker_thread+0x120/0x440
[511865.846385] [00000000004867a8] kthread+0x88/0xa0
[511865.855930] [0000000000406044] ret_from_syscall+0x1c/0x2c
[511865.867042] [0000000000000000] (null)
[511865.876414] ---[ end trace 40d7a6b076cabd65 ]---
[511865.886157] ------------[ cut here ]------------
[511865.895284] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
[511865.910730] tty is NULL
[511865.915757] Modules linked in:
[511865.922000] Call Trace:
[511865.927035] [0000000000464f2c] warn_slowpath_fmt+0x2c/0x40
[511865.938330] [0000000000480f24] process_one_work+0x164/0x420
[511865.949796] [0000000000481580] worker_thread+0x120/0x440
[511865.960733] [00000000004867a8] kthread+0x88/0xa0
[511865.970288] [0000000000406044] ret_from_syscall+0x1c/0x2c
[511865.981402] [0000000000000000] (null)
[511865.990773] ---[ end trace 40d7a6b076cabd66 ]---
[511866.000507] ------------[ cut here ]------------
[511866.009618] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
[511866.025074] tty is NULL
[511866.030102] Modules linked in:
[511866.036361] Call Trace:
[511866.041400] [0000000000464f2c] warn_slowpath_fmt+0x2c/0x40
[511866.052675] [0000000000480f24] process_one_work+0x164/0x420
[511866.064147] [0000000000481580] worker_thread+0x120/0x440
[511866.075077] [00000000004867a8] kthread+0x88/0xa0
[511866.084639] [0000000000406044] ret_from_syscall+0x1c/0x2c
[511866.095739] [0000000000000000] (null)
[511866.105123] ---[ end trace 40d7a6b076cabd67 ]---


2013-03-01 21:56:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On Fri, Mar 01, 2013 at 04:47:11PM -0500, David Miller wrote:
>
> I'm getting these non-stop right when the hypervisor console registers
> on sparc64, and the machine won't boot up properly. This is with
> Linus's current tree.
>
> [511865.556835] console [ttyHV0] enabled
> [511865.564555] ------------[ cut here ]------------
> [511865.612410] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
> [511865.627846] tty is NULL

Sorry about this, I have a patch, from Jiri, to get to Linus after
3.9-rc1 to remove the warning. It's safe to ignore. Maybe I should
just push it today, I wasn't aware that it was being hit so easily.

thanks,

greg k-h

2013-03-01 22:00:59

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: David Miller <[email protected]>
Date: Fri, 01 Mar 2013 16:47:11 -0500 (EST)

>
> I'm getting these non-stop right when the hypervisor console registers
> on sparc64, and the machine won't boot up properly. This is with
> Linus's current tree.

As a quick addendum I'm looking at all of these tty_insert_flip_char()
conversions and I'm extremely disappointed.

Let's just look at just two of the sparc drivers converted in commit
92a19f9cec9a80ad93c06e115822deb729e2c6ad, namely sunhv.c and sunsab.c.

The changes to the uart_handle_sysrq_char() call sites are handled
completely differently in these two drivers, in nearly identical
situations. How can this be correct?

In the sunhv.c case the uart_handle_sysrq_char() call is preserved
but the guarding test is changed from:

if (tty == NULL)

into

if (port->start == NULL)

Whereas in the sunsab.c case, the entire guarding test as well as
the uart_handle_sysrq_char() invocation itself are both
completly removed.

How in the world can that be right?

Either the receive_chars() method's invocations of
uart_handle_sysrq_char() should be retained, or this sysrq processing
is handled now elsewhere and in which case that should be documented
clearly and in detail in the commit log message.

2013-03-01 22:10:12

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Greg KH <[email protected]>
Date: Fri, 1 Mar 2013 13:56:09 -0800

> On Fri, Mar 01, 2013 at 04:47:11PM -0500, David Miller wrote:
>>
>> I'm getting these non-stop right when the hypervisor console registers
>> on sparc64, and the machine won't boot up properly. This is with
>> Linus's current tree.
>>
>> [511865.556835] console [ttyHV0] enabled
>> [511865.564555] ------------[ cut here ]------------
>> [511865.612410] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
>> [511865.627846] tty is NULL
>
> Sorry about this, I have a patch, from Jiri, to get to Linus after
> 3.9-rc1 to remove the warning. It's safe to ignore. Maybe I should
> just push it today, I wasn't aware that it was being hit so easily.

Ok, next I'm hitting some regression in Al Viro's signal changes when userland
starts up. :-)

2013-03-02 04:49:57

by Stephen Rothwell

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller <[email protected]> wrote:
>
> From: Greg KH <[email protected]>
> Date: Fri, 1 Mar 2013 13:56:09 -0800
>
> > On Fri, Mar 01, 2013 at 04:47:11PM -0500, David Miller wrote:
> >>
> >> I'm getting these non-stop right when the hypervisor console registers
> >> on sparc64, and the machine won't boot up properly. This is with
> >> Linus's current tree.
> >>
> >> [511865.556835] console [ttyHV0] enabled
> >> [511865.564555] ------------[ cut here ]------------
> >> [511865.612410] WARNING: at drivers/tty/tty_buffer.c:428 process_one_work+0x164/0x420()
> >> [511865.627846] tty is NULL
> >
> > Sorry about this, I have a patch, from Jiri, to get to Linus after
> > 3.9-rc1 to remove the warning. It's safe to ignore. Maybe I should
> > just push it today, I wasn't aware that it was being hit so easily.
>
> Ok, next I'm hitting some regression in Al Viro's signal changes when userland
> starts up. :-)

If only we had some way of testing this stuff before it gets into Linus'
tree ... ;-)

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
(No filename) (1.11 kB)
(No filename) (836.00 B)
Download all attachments

2013-03-02 05:23:38

by Al Viro

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On Sat, Mar 02, 2013 at 03:49:35PM +1100, Stephen Rothwell wrote:
> On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller <[email protected]> wrote:

> > Ok, next I'm hitting some regression in Al Viro's signal changes when userland
> > starts up. :-)
>
> If only we had some way of testing this stuff before it gets into Linus'
> tree ... ;-)

Dave, my deep apologies. I only now realized that you hadn't been on Cc of
the discussion of that crap; see signal.git#for-linus (commit aee41fe) for
fix... I thought I'd Cc'd you back then (about a week ago) ;-/

2013-03-02 06:35:08

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Al Viro <[email protected]>
Date: Sat, 2 Mar 2013 05:23:30 +0000

> On Sat, Mar 02, 2013 at 03:49:35PM +1100, Stephen Rothwell wrote:
>> On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller <[email protected]> wrote:
>
>> > Ok, next I'm hitting some regression in Al Viro's signal changes when userland
>> > starts up. :-)
>>
>> If only we had some way of testing this stuff before it gets into Linus'
>> tree ... ;-)
>
> Dave, my deep apologies. I only now realized that you hadn't been on Cc of
> the discussion of that crap; see signal.git#for-linus (commit aee41fe) for
> fix... I thought I'd Cc'd you back then (about a week ago) ;-/

No apology necessary.

2013-03-05 11:08:37

by Jiri Slaby

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On 03/01/2013 11:00 PM, David Miller wrote:
> From: David Miller <[email protected]>
> Date: Fri, 01 Mar 2013 16:47:11 -0500 (EST)
>
>>
>> I'm getting these non-stop right when the hypervisor console registers
>> on sparc64, and the machine won't boot up properly. This is with
>> Linus's current tree.
>
> As a quick addendum I'm looking at all of these tty_insert_flip_char()
> conversions and I'm extremely disappointed.
>
> Let's just look at just two of the sparc drivers converted in commit
> 92a19f9cec9a80ad93c06e115822deb729e2c6ad, namely sunhv.c and sunsab.c.
>
> The changes to the uart_handle_sysrq_char() call sites are handled
> completely differently in these two drivers, in nearly identical
> situations. How can this be correct?
>
> In the sunhv.c case the uart_handle_sysrq_char() call is preserved
> but the guarding test is changed from:
>
> if (tty == NULL)
>
> into
>
> if (port->start == NULL)
>
> Whereas in the sunsab.c case, the entire guarding test as well as
> the uart_handle_sysrq_char() invocation itself are both
> completly removed.
>
> How in the world can that be right?
>
> Either the receive_chars() method's invocations of
> uart_handle_sysrq_char() should be retained, or this sysrq processing
> is handled now elsewhere and in which case that should be documented
> clearly and in detail in the commit log message.

Ok, uart_handle_sysrq_char is still there, in both of them. But since we
do not care about the tty pointer in those routines any more, the tty ==
NULL test is removed along with its body. I didn't considered that so
critical to document that in the commit log. I thought it's obvious.

All those "if (port->state == NULL)"'s are mostly wrong. Look for
example to subsab's receive_chars and transmit_chars. One checks that,
one dereferences that immediately OTOH. Yes, serial_core should ensure
the interrupts are off by the time port->state is NULL (by calling
driver's ->shutdown). I will remove the test from receive_chars too
which will make the code cleaner.

I left that "if (port->start == NULL)" in sunhv in place because it
behaves completely differently. It checks port->start on all paths prior
dereferencing it. And it does not stop interrupts on ->shutdown.

thanks,
--
js
suse labs

2013-03-05 19:39:44

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Jiri Slaby <[email protected]>
Date: Tue, 05 Mar 2013 12:01:06 +0100

> I left that "if (port->start == NULL)" in sunhv in place because it
> behaves completely differently. It checks port->start on all paths prior
> dereferencing it. And it does not stop interrupts on ->shutdown.

But this code really does care if a TTY is attached, because it wants
to make sure that SYSRQ handling occurs unconditionally, even if there
is no TTY to queue the characters to.

This is critically important during bootup before the initial shell
is spawned, if you want to do a SYSRQ register dump or reset out of
a hung boot.

Whether that test is now ->state == NULL or whatever, the same logic
still needs to exist in all of these places.

2013-03-05 19:44:59

by Jiri Slaby

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On 03/05/2013 08:39 PM, David Miller wrote:
> From: Jiri Slaby <[email protected]>
> Date: Tue, 05 Mar 2013 12:01:06 +0100
>
>> I left that "if (port->start == NULL)" in sunhv in place because it
>> behaves completely differently. It checks port->start on all paths prior
>> dereferencing it. And it does not stop interrupts on ->shutdown.
>
> But this code really does care if a TTY is attached, because it wants
> to make sure that SYSRQ handling occurs unconditionally, even if there
> is no TTY to queue the characters to.
>
> This is critically important during bootup before the initial shell
> is spawned, if you want to do a SYSRQ register dump or reset out of
> a hung boot.
>
> Whether that test is now ->state == NULL or whatever, the same logic
> still needs to exist in all of these places.

Hi, I must admit I don't understand. I now checked both of them and they
call uart_handle_sysrq_char unconditionally, or?

--
js
suse labs

2013-03-05 20:03:43

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Jiri Slaby <[email protected]>
Date: Tue, 05 Mar 2013 20:44:49 +0100

> Hi, I must admit I don't understand. I now checked both of them and they
> call uart_handle_sysrq_char unconditionally, or?

Nope, in the sunsab.c receive function, we used to handle the SYSRQ
stuff before break checking when TTY is NULL, now we don't.

2013-03-05 20:28:06

by Jiri Slaby

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On 03/05/2013 09:03 PM, David Miller wrote:
> From: Jiri Slaby <[email protected]>
> Date: Tue, 05 Mar 2013 20:44:49 +0100
>
>> Hi, I must admit I don't understand. I now checked both of them and they
>> call uart_handle_sysrq_char unconditionally, or?
>
> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
> stuff before break checking when TTY is NULL, now we don't.

But how could tty be NULL in there? subsab enables interrupts after the
TTY is open and disables before it is closed/hupped. So the tty couldn't
be NULL there, right?

--
js
suse labs

2013-03-05 20:34:01

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Jiri Slaby <[email protected]>
Date: Tue, 05 Mar 2013 21:27:59 +0100

> On 03/05/2013 09:03 PM, David Miller wrote:
>> From: Jiri Slaby <[email protected]>
>> Date: Tue, 05 Mar 2013 20:44:49 +0100
>>
>>> Hi, I must admit I don't understand. I now checked both of them and they
>>> call uart_handle_sysrq_char unconditionally, or?
>>
>> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
>> stuff before break checking when TTY is NULL, now we don't.
>
> But how could tty be NULL in there? subsab enables interrupts after the
> TTY is open and disables before it is closed/hupped. So the tty couldn't
> be NULL there, right?

TTY is NULL until bootup and SYSRQ works perfectly fine during this
time when sunsab is the console, which via sunsab_console_setup()
invokes sunsab_startup() which requests the IRQ.

2013-03-05 20:34:52

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: David Miller <[email protected]>
Date: Tue, 05 Mar 2013 15:33:58 -0500 (EST)

> From: Jiri Slaby <[email protected]>
> Date: Tue, 05 Mar 2013 21:27:59 +0100
>
>> On 03/05/2013 09:03 PM, David Miller wrote:
>>> From: Jiri Slaby <[email protected]>
>>> Date: Tue, 05 Mar 2013 20:44:49 +0100
>>>
>>>> Hi, I must admit I don't understand. I now checked both of them and they
>>>> call uart_handle_sysrq_char unconditionally, or?
>>>
>>> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
>>> stuff before break checking when TTY is NULL, now we don't.
>>
>> But how could tty be NULL in there? subsab enables interrupts after the
>> TTY is open and disables before it is closed/hupped. So the tty couldn't
>> be NULL there, right?
>
> TTY is NULL until bootup and SYSRQ works perfectly fine during this
> time when sunsab is the console, which via sunsab_console_setup()
> invokes sunsab_startup() which requests the IRQ.

And let's be honest about the fact that it's very possible that you've
overlooked many issues like this in the extremely large number of
drivers that you made changes of this nature.

All of those commits need serious auditing :-(

2013-03-05 20:47:14

by Peter Hurley

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On Tue, 2013-03-05 at 15:03 -0500, David Miller wrote:
> From: Jiri Slaby <[email protected]>
> Date: Tue, 05 Mar 2013 20:44:49 +0100
>
> > Hi, I must admit I don't understand. I now checked both of them and they
> > call uart_handle_sysrq_char unconditionally, or?
>
> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
> stuff before break checking when TTY is NULL, now we don't.

Hi David,

SysRq is signalled first by a BRK condition, then followed by the input
character indicating which SysRq function to perform.

sunsab.c: receive_char() is behaving as you would expect.

First, a BRK status is indicated so uart_handle_break() records a
timestamp. If the next input is received within 5 sec. of that
timestamp, the character received is interpreted as a SysRq function --
handled by uart_handle_sysrq_char().

Are you observing that SysRq processing is not occurring with this
driver when only a console exists, or are you hypothesizing that this is
possible?

Regards,
Peter Hurley

2013-03-05 20:54:18

by David Miller

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

From: Peter Hurley <[email protected]>
Date: Tue, 05 Mar 2013 15:47:04 -0500

> On Tue, 2013-03-05 at 15:03 -0500, David Miller wrote:
>> From: Jiri Slaby <[email protected]>
>> Date: Tue, 05 Mar 2013 20:44:49 +0100
>>
>> > Hi, I must admit I don't understand. I now checked both of them and they
>> > call uart_handle_sysrq_char unconditionally, or?
>>
>> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
>> stuff before break checking when TTY is NULL, now we don't.
>
> Hi David,
>
> SysRq is signalled first by a BRK condition, then followed by the input
> character indicating which SysRq function to perform.
>
> sunsab.c: receive_char() is behaving as you would expect.
>
> First, a BRK status is indicated so uart_handle_break() records a
> timestamp. If the next input is received within 5 sec. of that
> timestamp, the character received is interpreted as a SysRq function --
> handled by uart_handle_sysrq_char().
>
> Are you observing that SysRq processing is not occurring with this
> driver when only a console exists, or are you hypothesizing that this is
> possible?

Before we go down this road we need to first address the fact that
non-trivial semantic changes we made to these functions without any
of that being documented.

And it was done to a large swath of serial and TTY drivers.

The author of these changes doesn't even grok that receive interrupts
for these devices can be enabled even if TTY is NULL.

2013-03-05 21:43:23

by Jiri Slaby

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On 03/05/2013 09:34 PM, David Miller wrote:
>> which via sunsab_console_setup()
>> invokes sunsab_startup() which requests the IRQ.

Of course I see now, I overlooked that invocation before.

> And let's be honest about the fact that it's very possible that you've
> overlooked many issues like this in the extremely large number of
> drivers that you made changes of this nature.

That is indeed possible; I don't have a solver in my head to prove
correctness (unfortunately). Now I'm going to revise the patches (and
send a fix for this one).

--
js
suse labs

2013-03-05 21:44:33

by Peter Hurley

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On Tue, 2013-03-05 at 15:53 -0500, David Miller wrote:
> From: Peter Hurley <[email protected]>
> Date: Tue, 05 Mar 2013 15:47:04 -0500
>
> > On Tue, 2013-03-05 at 15:03 -0500, David Miller wrote:
> >> From: Jiri Slaby <[email protected]>
> >> Date: Tue, 05 Mar 2013 20:44:49 +0100
> >>
> >> > Hi, I must admit I don't understand. I now checked both of them and they
> >> > call uart_handle_sysrq_char unconditionally, or?
> >>
> >> Nope, in the sunsab.c receive function, we used to handle the SYSRQ
> >> stuff before break checking when TTY is NULL, now we don't.
> >
> > Hi David,
> >
> > SysRq is signalled first by a BRK condition, then followed by the input
> > character indicating which SysRq function to perform.
> >
> > sunsab.c: receive_char() is behaving as you would expect.
> >
> > First, a BRK status is indicated so uart_handle_break() records a
> > timestamp. If the next input is received within 5 sec. of that
> > timestamp, the character received is interpreted as a SysRq function --
> > handled by uart_handle_sysrq_char().
> >
> > Are you observing that SysRq processing is not occurring with this
> > driver when only a console exists, or are you hypothesizing that this is
> > possible?
>
> Before we go down this road we need to first address the fact that
> non-trivial semantic changes we made to these functions without any
> of that being documented.

Agreed. I had my surprise moments back in January on linux-next (partly
my own fault for being away over the holidays).

And it is not my intention to camouflage or distract from the issues:
the change to sunsab.c may indeed affect the way SysRq is handled, and
that's what prompted my question.

> And it was done to a large swath of serial and TTY drivers.
>
> The author of these changes doesn't even grok that receive interrupts
> for these devices can be enabled even if TTY is NULL.

I guarantee Jiri groks that, since he authored these patches with that
very concern in mind (despite his momentary confusion).

The whole motivation behind this series is to have tty (and by
extension, console) drivers __always__ drive input if they have some,
without checking if tty == NULL. All the input is buffered anyway.

[BTW, that's why you were seeing that "tty == NULL" WARN earlier. That
message uncovered an entire class of hangs, oops, and access-after-free
errors. But now that drivers push input all the time, it's pointless.
Those hangs, oops and accesses-after-free are still happening though.]

By ignoring the tty lifetime, a whole collection of race conditions that
tty drivers were not handling properly go away.

Plus, since the actual handling of any input was scheduled anyway, the
mechanism for 'holding a tty reference while scheduling the push' was
pointless. The instant the reference was dropped the tty could go away
before flush_to_ldisc() might even run.

Now, it could be that other problems show up as a result of these
changes, not just with respect to the Sun drivers, but rather,
unforeseen consequences. I think we should address those that come up,
unless we discover this was a bad idea.

Regards,
Peter Hurley

[As an aside, in this driver we've been referencing, it's unclear that
SysRq worked before this change with just the console -- I mean if tty
== NULL, how could uart_handle_break() indicate the SysRq status pending
so that uart_handle_sysrq_char() would actually do anything?]

2013-03-05 22:11:54

by Jiri Slaby

[permalink] [raw]
Subject: Re: WARNING at tty_buffer.c:428 process_one_work()

On 03/05/2013 10:43 PM, Jiri Slaby wrote:
> send a fix for this one).

Hmm, did it work before actually? Even though the change was
unintentional and undocumented, it fixed the sysrq handling as far as I
can tell.

--
js
suse labs