2014-01-14 00:45:49

by Pavel Roskin

[permalink] [raw]
Subject: serial8250: bogus low_latency destabilizes kernel, need sanity check

Hello!

I've been debugging an instability of a kernel on some 32-bit x86
embedded system. The kernel would just hang randomly. I had to enable
most debug options to find the reason.

The system has several serial ports, including ttyS4. There is also a
file called /etc/serial.conf that contains a line

/dev/ttyS4 uart 16550a irq 17 baud_base 921600 port 0xd000 low_latency

That file is processed by the setserial utility on startup that makes
the port as low_latency.

And then the kernel reports this:

BUG: sleeping function called from invalid context
at /root/src/linux-3.12.6/kernel/mutex.c:616 in_atomic(): 1,
irqs_disabled(): 1, pid: 0, name: swapper/0 INFO: lockdep is turned off.
irq event stamp: 296476
hardirqs last enabled at (296475): [<c10b4ff1>]
tick_nohz_idle_exit+0x151/0x1b0 hardirqs last disabled at (296476):
[<c1588f45>] _raw_spin_lock_irq+0x15/0x80 softirqs last enabled at
(296458): [<c1049c9d>] __do_softirq+0x2ad/0x3b0 softirqs last disabled
at (296421): [<c1004637>] do_softirq+0x97/0xf0 CPU: 0 PID: 0 Comm:
swapper/0 Not tainted 3.12.6 #3 Hardware name: RadiSys SandyBridge
Platform/S-CEQM67-i5-2515EE , BIOS 20.02.01 08/29/2012 00000000
00000000 f480de5c c1580991 c17d7a00 f480de84 c1076726 c1710154 00000001
00000001 00000000 c17d7cfc f495b000 f3df097c 00000000 f480decc c1584223
f495b058 00000001 f495b018 f480deb8 c134cf7c 00000001 00000002 Call
Trace: [<c1580991>] dump_stack+0x4b/0x66
[<c1076726>] __might_sleep+0x166/0x210
[<c1584223>] mutex_lock_nested+0x23/0x380
[<c134cf7c>] ? ldsem_down_read_trylock+0x7c/0xa0
[<c134acf2>] ? tty_ldisc_ref+0x22/0x50
[<c134acf2>] ? tty_ldisc_ref+0x22/0x50
[<c134bc5e>] flush_to_ldisc+0x3e/0x100
[<c134bd60>] tty_flip_buffer_push+0x40/0x50
[<c1361a15>] serial8250_rx_chars+0xc5/0x200
[<c158902b>] ? _raw_spin_lock_irqsave+0x7b/0x90
[<c1362e67>] ? serial8250_handle_irq+0x37/0xa0
[<c1362eb1>] serial8250_handle_irq+0x81/0xa0
[<c1362eec>] serial8250_default_handle_irq+0x1c/0x20
[<c1360d0c>] serial8250_interrupt+0x5c/0xd0
[<c10a3744>] handle_irq_event_percpu+0x54/0x390
[<c10a67e6>] ? handle_fasteoi_irq+0x16/0xe0
[<c10a3ab1>] ? handle_irq_event+0x31/0x60
[<c10a3aba>] handle_irq_event+0x3a/0x60
[<c10a67d0>] ? unmask_irq+0x30/0x30
[<c10a681e>] handle_fasteoi_irq+0x4e/0xe0
<IRQ> [<c15922d2>] ? do_IRQ+0x42/0xc0

That's a backtrace for Linux 3.12.6, but 3.13-rc8 does the same thing.

serial8250_handle_irq() tries to use the DMA and fails, so it calls
serial8250_rx_chars(). That function calls tty_flip_buffer_push().
The comment above tty_flip_buffer_push() says:

"This function must not be called from IRQ context if port->low_latency
is set"

And that's precisely what we are doing.

Sure, root can damage the system by using incorrect configuration
files. However, I think we need some sanity checking. After all, the
device may degrade and stop working as a low-latency port, and we don't
want the whole system to hang because of that.

Maybe we should unset the low_latency flag as soon as DMA fails? There
are two flags, one is state->uart_port->flags and the other is
port->low_latency. I guess we need to unset both.

--
Regards,
Pavel Roskin


2014-01-14 12:08:35

by Alan Cox

[permalink] [raw]
Subject: Re: serial8250: bogus low_latency destabilizes kernel, need sanity check

> Maybe we should unset the low_latency flag as soon as DMA fails? There
> are two flags, one is state->uart_port->flags and the other is
> port->low_latency. I guess we need to unset both.

Well low latency and DMA are pretty much exclusive in the real world so
probably DMA ports shouldn't allow low_latency to be set at all in DMA
mode.

Alan

2014-01-14 16:25:01

by Pavel Roskin

[permalink] [raw]
Subject: Re: serial8250: bogus low_latency destabilizes kernel, need sanity check

Hi Alan,

Quoting One Thousand Gnomes <[email protected]>:

>> Maybe we should unset the low_latency flag as soon as DMA fails? There
>> are two flags, one is state->uart_port->flags and the other is
>> port->low_latency. I guess we need to unset both.
>
> Well low latency and DMA are pretty much exclusive in the real world so
> probably DMA ports shouldn't allow low_latency to be set at all in DMA
> mode.

That's a useful insight. I assumed exactly the opposite.

But how should a real low_latency device work? Are they supported by
8250_core? Do they have hard IRQs? Are those IRQs handled by
serial8250_handle_irq()?

If DMA is not used, then serial8250_rx_chars() is the only way to
receive data. But serial8250_rx_chars() calls tty_flip_buffer_push()
unconditionally, and the later should not be called from the IRQ
context for low_latency devices, if the comment about it is to be
trusted.

--
Regards,
Pavel Roskin

2014-02-01 15:09:13

by Peter Hurley

[permalink] [raw]
Subject: Re: serial8250: bogus low_latency destabilizes kernel, need sanity check

On 01/14/2014 11:24 AM, Pavel Roskin wrote:
> Hi Alan,
>
> Quoting One Thousand Gnomes <[email protected]>:
>
>>> Maybe we should unset the low_latency flag as soon as DMA fails? There
>>> are two flags, one is state->uart_port->flags and the other is
>>> port->low_latency. I guess we need to unset both.
>>
>> Well low latency and DMA are pretty much exclusive in the real world so
>> probably DMA ports shouldn't allow low_latency to be set at all in DMA
>> mode.
>
> That's a useful insight. I assumed exactly the opposite.

The meaning of low_latency has migrated since 2.6.28

Although it used to mean 'process the input immediately even in
interrupt context', it has become 'the device doesn't receive data
in interrupt context so process the data now'.

> But how should a real low_latency device work?

A low_latency device does not receive data in interrupt context;
ie., tty_flip_buffers() cannot be called from an irq handler
or with interrupts disabled.

> Are they supported by 8250_core?

yes

> Do they have hard IRQs?

possibly but not for rx

> Are those IRQs handled by serial8250_handle_irq()?

possibly, but again not for rx

> If DMA is not used, then serial8250_rx_chars() is the only way to receive data.
> But serial8250_rx_chars() calls tty_flip_buffer_push() unconditionally,
> and the later should not be called from the IRQ context for low_latency devices,
> if the comment about it is to be trusted.

Exactly.

Perhaps we should unconditionally unset low_latency (or remove it entirely).
Real low latency can be addressed by using the -RT kernel.

Regards,
Peter Hurley

2014-02-03 11:11:14

by Alan Cox

[permalink] [raw]
Subject: Re: serial8250: bogus low_latency destabilizes kernel, need sanity check

On Sat, 01 Feb 2014 10:09:03 -0500
Peter Hurley <[email protected]> wrote:

> On 01/14/2014 11:24 AM, Pavel Roskin wrote:
> > Hi Alan,
> >
> > Quoting One Thousand Gnomes <[email protected]>:
> >
> >>> Maybe we should unset the low_latency flag as soon as DMA fails? There
> >>> are two flags, one is state->uart_port->flags and the other is
> >>> port->low_latency. I guess we need to unset both.
> >>
> >> Well low latency and DMA are pretty much exclusive in the real world so
> >> probably DMA ports shouldn't allow low_latency to be set at all in DMA
> >> mode.
> >
> > That's a useful insight. I assumed exactly the opposite.
>
> The meaning of low_latency has migrated since 2.6.28

Not really. The meaning of low latency was always "get the turn around
time for command/response protocols down as low as possible". DMA driven
serial usually reports a transfer completion on a watermark or a timeout,
so tends to work very badly within the Linux definition of 'low latency'
for tty.

What it does has certainly changed but thats implementation detail.

> Perhaps we should unconditionally unset low_latency (or remove it entirely).
> Real low latency can be addressed by using the -RT kernel.

Just saying "use -RT" would be a regression and actually hurt quite a few
annoying "simple protocol" using tools for all sorts of control systems.
We are talking about milliseconds not microseconds here.

The expected behaviour in low_latency is probably best described as

data arrives
processed
wakeup

and to avoid the case of

data arrives
queued for back end
[up to 10mS delay, but typically 1-2mS]
processed
wakeup


which multipled over a 50,000 S record download is a lot of time

Everything else is not user visible so can be changed freely to get that
assumption to work (including ending up not needing it in the first
place).

Getting tty to the point everything but N_TTY canonical mode is a fast
path would probably eliminate the need nicely - I don't know of any use
cases that expect ICANON, ECHO or I*/O* processing for low latency.

2014-02-04 12:42:53

by Peter Hurley

[permalink] [raw]
Subject: Re: serial8250: bogus low_latency destabilizes kernel, need sanity check

On 02/03/2014 06:10 AM, One Thousand Gnomes wrote:
> On Sat, 01 Feb 2014 10:09:03 -0500
> Peter Hurley <[email protected]> wrote:
>
>> On 01/14/2014 11:24 AM, Pavel Roskin wrote:
>>> Hi Alan,
>>>
>>> Quoting One Thousand Gnomes <[email protected]>:
>>>
>>>>> Maybe we should unset the low_latency flag as soon as DMA fails? There
>>>>> are two flags, one is state->uart_port->flags and the other is
>>>>> port->low_latency. I guess we need to unset both.
>>>>
>>>> Well low latency and DMA are pretty much exclusive in the real world so
>>>> probably DMA ports shouldn't allow low_latency to be set at all in DMA
>>>> mode.
>>>
>>> That's a useful insight. I assumed exactly the opposite.
>>
>> The meaning of low_latency has migrated since 2.6.28
>
> Not really. The meaning of low latency was always "get the turn around
> time for command/response protocols down as low as possible". DMA driven
> serial usually reports a transfer completion on a watermark or a timeout,
> so tends to work very badly within the Linux definition of 'low latency'
> for tty.
>
> What it does has certainly changed but thats implementation detail.

I meant the meaning as interpreted by the kernel, not the ideal meaning nor
its original intent.

>> Perhaps we should unconditionally unset low_latency (or remove it entirely).
>> Real low latency can be addressed by using the -RT kernel.
>
> Just saying "use -RT" would be a regression and actually hurt quite a few
> annoying "simple protocol" using tools for all sorts of control systems.
> We are talking about milliseconds not microseconds here.

Ok, fair enough.

[Although my gut feeling is that nominal overhead is more like sub 10 usecs,
and only when the scheduler is I/O-bound does worst case get near 1 msec.]

> The expected behaviour in low_latency is probably best described as
>
> data arrives
> processed
> wakeup

low_latency cannot guarantee that data will be processed, only that
it will not wait.

Examples:
1) SLIP is changing the mtu size. In this case, data will be dropped
because, since the net queue is stopped, no data is taken up but any
data passed to the receive_buf() is assumed to have been consumed.
2) tty buffers are being flushed. There may or may not be any data to
process but there's no way to know without waiting.
3) termios is changing/has been changed. Depending on the line
discipline, data may or may not be processed until termios changes
complete.

etc.

> and to avoid the case of
>
> data arrives
> queued for back end
> [up to 10mS delay, but typically 1-2mS]
> processed
> wakeup
>
>
> which multipled over a 50,000 S record download is a lot of time
>
> Everything else is not user visible so can be changed freely to get that
> assumption to work (including ending up not needing it in the first
> place).
>
> Getting tty to the point everything but N_TTY canonical mode is a fast
> path would probably eliminate the need nicely - I don't know of any use
> cases that expect ICANON, ECHO or I*/O* processing for low latency.

Easier said than done.

For example, what happens if termios is changing?
Presumably, data cannot be processed at that time. So the line discipline
returns early without having processed the data. [For example, the
receive_buf() path could use trylocks and abort, rather than waiting.]

But then, what restarts the attempt to process the data and can that wait?

Similarly for throttling. Unthrottling may be in progress; and even though
in progress, the condition that prompted the unthrottle may no longer be
true and throttling must be done. Ok, it can't happen right
now, but then when?

Regards,
Peter Hurley