2004-09-28 15:34:49

by Roland Caßebohm

[permalink] [raw]
Subject: Serial driver hangs

Hi,

my platform is uClinux kernel 2.4.24 with an ARM with PCI and
a 2 port VSCOM 200I PCI card.

I use the serial driver with very high load. With my
application sends and receives on two ports with 921600 baud.

After a while the driver hangs in an endless loop in the
interrupt routine.
As I can see, the bit TTY_DONT_FLIP in tty->flags is set, so
the receive-buffer can't be flipped. In receive_chars() all
ports are checked for received bytes, but if the buffer is
full and can't be flipped, no byte are read from the UART and
the interrupt will never go inactive.

>>>>>>>>>>>>
do {
if (tty->flip.count >= TTY_FLIPBUF_SIZE) {
tty->flip.tqueue.routine((void *) tty);
if (tty->flip.count >= TTY_FLIPBUF_SIZE)
return; // if TTY_DONT_FLIP is set
}
ch = serial_inp(info, UART_RX);
*tty->flip.char_buf_ptr = ch;
icount->rx++;
>>>>>>>>>>>>

I have tried just to read all byte left in the FIFO of the
UART in that case and throw them away.

>>>>>>>>>>>>
if (tty->flip.count >= TTY_FLIPBUF_SIZE)
{
do {
ch = serial_inp(info, UART_RX);
icount->overrun++;
*status = serial_inp(info, UART_LSR);
} while (*status & UART_LSR_DR);
return; // if TTY_DONT_FLIP is set
}
>>>>>>>>>>>>

This is working but would probably not the best way, because
there could be enough place in the other flip buffer. Maybe
it is possible to disable the receive interrupt of the UART
till the receive routine read_chan(), which sets
TTY_DONT_FLIP, releases the buffer.

Thanks for any help,
Roland
--
___________________________________________________

VS Vision Systems GmbH, Industrial Image Processing
Dipl.-Ing. Roland Ca?ebohm
Aspelohe 27A, D-22848 Norderstedt, Germany
http://www.visionsystems.de
___________________________________________________


2004-09-28 21:11:04

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Tue, 2004-09-28 at 10:34, Roland Caßebohm wrote:
> As I can see, the bit TTY_DONT_FLIP in tty->flags is set, so
> the receive-buffer can't be flipped. In receive_chars() all
> ports are checked for received bytes, but if the buffer is
> full and can't be flipped, no byte are read from the UART and
> the interrupt will never go inactive.

Definately a bug.
It is the same in 2.6 kernels (serial/8250.c).

One way or another, the interrupt must be cleared.
(serviced or deactivated)

> I have tried just to read all byte left in the FIFO of the
> UART in that case and throw them away.

In my opinion, this is the correct way to handle the problem.
This is what I do in the SyncLink drivers.

> This is working but would probably not the best way, because
> there could be enough place in the other flip buffer. Maybe
> it is possible to disable the receive interrupt of the UART
> till the receive routine read_chan(), which sets
> TTY_DONT_FLIP, releases the buffer.

The flip buffer and N_TTY line disciplines are
generic facilities for all tty drivers. They can't
directly perform device specific tasks like
enabling/disabling UART interrupts.

Adding notification to the driver to do these
tasks does not seem like a win either.
On a receive IRQ, the UART Rx FIFO
is already running out of space. If you
disable the rx IRQ, you will likely still lose
data shortly after.

That said, the flip buffer code could be improved
to provide better buffering to prevent lost data.
There may also be synchronization problems:
The driver usually calls tty_flip_buffer_push at
hard IRQ context to queue a task (flush_to_ldisc)
to feed the data to the line discipline. In the case
of a full flip buffer, the driver calls flush_to_ldisc
directly (via tty->flip.tqueue.routine) at hard IRQ context.
I don't see any locking in the flip buffer code
to synchronize this.

--
Paul Fulghum
[email protected]

2004-09-28 21:19:01

by Russell King

[permalink] [raw]
Subject: Re: Serial driver hangs

On Tue, Sep 28, 2004 at 04:10:31PM -0500, Paul Fulghum wrote:
> On Tue, 2004-09-28 at 10:34, Roland Caßebohm wrote:
> > I have tried just to read all byte left in the FIFO of the
> > UART in that case and throw them away.
>
> In my opinion, this is the correct way to handle the problem.
> This is what I do in the SyncLink drivers.

Some 16x50 ports (most of the ones higher than 16550A) have auto flow
control, so if this is enabled you really don't want to drop bytes in
the FIFO on the floor.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-09-28 23:03:32

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Tue, 2004-09-28 at 16:16, Russell King wrote:
> On Tue, Sep 28, 2004 at 04:10:31PM -0500, Paul Fulghum wrote:
> Some 16x50 ports (most of the ones higher than 16550A) have auto flow
> control, so if this is enabled you really don't want to drop bytes in
> the FIFO on the floor.

The alternative is to implement a flow control
mechanism between the flip buffer layer and
the tty drivers to (at the very least)
enable/disable receive interrupts.

Since the flip buffer implementation is probably
going to need rework anyways (eventually) along
with the other tty locking issues, this may not
be a trivial task.

Dropping the bytes is a simple, local fix that
will be more sane than the current
behavior of locking the machine.

A more optimized solution is likely to take a while.

--
Paul Fulghum
[email protected]


2004-09-29 00:46:42

by Alan

[permalink] [raw]
Subject: Re: Serial driver hangs

On Mer, 2004-09-29 at 00:03, Paul Fulghum wrote:
> The alternative is to implement a flow control
> mechanism between the flip buffer layer and
> the tty drivers to (at the very least)
> enable/disable receive interrupts.

We have throttle()/unthrottle(). Drivers also know if they can't
push data.

> Since the flip buffer implementation is probably
> going to need rework anyways (eventually) along
> with the other tty locking issues, this may not
> be a trivial task.

TTY_DONT_FLIP has to die. It's already broken for many cases.
Its basically a nasty quickfix someone did ages ago

2004-09-29 01:13:16

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Tue, 2004-09-28 at 17:12, Alan Cox wrote:
> We have throttle()/unthrottle(). Drivers also know if they can't
> push data.

Yes, though these are manipulated by the ldisc
in relation to the ldisc receive buffer.
Coordinating the use of these functions between
a buffering layer (like the flip buffer) and
the ldisc would require each to have
knowledge of the other's state to know who
calls what and when (yuck).

But much of that may go away when...

> TTY_DONT_FLIP has to die.

*bang*

Until then, flushing the UART receive
FIFO and dropping the bytes (and updating
overrun stat) seems a reasonable short term
solution to stop the machine from locking up
while leaving the device in a recoverable state.

We can even mark it with *FIXME* in a comment.
That always seems to work :-)

--
Paul Fulghum
[email protected]


2004-09-29 13:10:53

by Roland Caßebohm

[permalink] [raw]
Subject: Re: Serial driver hangs

Hi,

Am Mittwoch, 29. September 2004 03:12 schrieb Paul Fulghum:
> On Tue, 2004-09-28 at 17:12, Alan Cox wrote:
> > We have throttle()/unthrottle(). Drivers also know if
> > they can't push data.
>
> Yes, though these are manipulated by the ldisc
> in relation to the ldisc receive buffer.
> Coordinating the use of these functions between
> a buffering layer (like the flip buffer) and
> the ldisc would require each to have
> knowledge of the other's state to know who
> calls what and when (yuck).
>
> But much of that may go away when...
>
> > TTY_DONT_FLIP has to die.
>
> *bang*
>
> Until then, flushing the UART receive
> FIFO and dropping the bytes (and updating
> overrun stat) seems a reasonable short term
> solution to stop the machine from locking up
> while leaving the device in a recoverable state.
>
> We can even mark it with *FIXME* in a comment.
> That always seems to work :-)

I have made a little test, at which the receive interrupt is
disabled in that state. It seems to be no improvement to the
solution of just trow away the bytes of the FIFO. In both
cases characters got lost.

So I think you are right, it would be the best to make the
simple solution with flushing the UART receive FIFO till the
flip buffer implementation will be reworked.

Roland
--
___________________________________________________

VS Vision Systems GmbH, Industrial Image Processing
Dipl.-Ing. Roland Caßebohm
Aspelohe 27A, D-22848 Norderstedt, Germany
Mail: [email protected]
http://www.visionsystems.de
___________________________________________________

2004-09-29 13:18:12

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

Roland Caßebohm wrote:
> I have made a little test, at which the receive interrupt is
> disabled in that state. It seems to be no improvement to the
> solution of just trow away the bytes of the FIFO. In both
> cases characters got lost.

How did you reenable the receive interrupt in your test?

--
Paul Fulghum
[email protected]

2004-09-29 14:08:10

by Roland Caßebohm

[permalink] [raw]
Subject: Re: Serial driver hangs

Am Mittwoch, 29. September 2004 15:17 schrieb Paul Fulghum:
> Roland Caßebohm wrote:
> > I have made a little test, at which the receive interrupt
> > is disabled in that state. It seems to be no improvement
> > to the solution of just trow away the bytes of the FIFO.
> > In both cases characters got lost.
>
> How did you reenable the receive interrupt in your test?

I have added a routine to "struct tty_driver" for restarting
the RX interrupt after TTY_DONT_FLIP bit is cleared in
read_chan().

>>>>>>>>>>>>
clear_bit(TTY_DONT_FLIP, &tty->flags);
if (tty->driver.restart_rx)
tty->driver.restart_rx(tty);
>>>>>>>>>>>>

and in the interrupt routine I have disabled the RX interrupt,
if TTY_DONT_FLIP is set.

>>>>>>>>>>>>
if (tty->flip.count >= TTY_FLIPBUF_SIZE)
{
info->IER &= ~UART_IER_RDI;
serial_outp(info, UART_IER, info->IER);
return; // if TTY_DONT_FLIP is set
}
>>>>>>>>>>>>

and tty->driver.restart_rx() is:

>>>>>>>>>>>>
static void rs_restart_rx(struct tty_struct *tty)
{
struct async_struct *info = (struct async_struct
*)tty->driver_data;
unsigned long flags;

if (serial_paranoia_check(info, tty->device,
"rs_restart_rx"))
return;

save_flags(flags); cli();
if (!(info->IER & UART_IER_RDI)) {
info->IER |= UART_IER_RDI;
serial_out(info, UART_IER, info->IER);
}
restore_flags(flags);
}
>>>>>>>>>>>>

It seems to take to long time in read_chan(). Do you now what
is the exact reason of locking the filp buffer with the
TTY_DONT_FLIP flag? For a short look I would say the buffers
are safe locked by the spinlock tty->read_lock.

Roland
--
___________________________________________________

VS Vision Systems GmbH, Industrial Image Processing
Dipl.-Ing. Roland Caßebohm
Aspelohe 27A, D-22848 Norderstedt, Germany
Mail: [email protected]
http://www.visionsystems.de
___________________________________________________

2004-09-29 14:18:53

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

Here is Roland's change in patch form with added comment.
(against 2.4.28-pre3)

Roland: can you test this?

All: Comments?

--
Paul Fulghum
[email protected]

--- a/drivers/char/serial.c 2004-09-29 09:08:35.000000000 -0500
+++ b/drivers/char/serial.c 2004-09-29 09:09:07.000000000 -0500
@@ -573,8 +573,19 @@
do {
if (tty->flip.count >= TTY_FLIPBUF_SIZE) {
tty->flip.tqueue.routine((void *) tty);
- if (tty->flip.count >= TTY_FLIPBUF_SIZE)
+ if (tty->flip.count >= TTY_FLIPBUF_SIZE) {
+ /* no room in flip buffer, discard rx FIFO contents to clear IRQ
+ * *FIXME* Hardware with auto flow control
+ * would benefit from leaving the data in the FIFO and
+ * disabling the rx IRQ until space becomes available.
+ */
+ do {
+ serial_inp(info, UART_RX);
+ icount->overrun++;
+ *status = serial_inp(info, UART_LSR);
+ } while ((*status & UART_LSR_DR) && (max_count-- > 0));
return; // if TTY_DONT_FLIP is set
+ }
}
ch = serial_inp(info, UART_RX);
*tty->flip.char_buf_ptr = ch;


2004-09-29 14:28:04

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Wed, 2004-09-29 at 09:07, Roland Caßebohm wrote:
> I have added a routine to "struct tty_driver" for restarting
> the RX interrupt after TTY_DONT_FLIP bit is cleared in
> read_chan().

If you are using RTS/CTS flow control,
your scheme might prevent data loss if you also
drop RTS (like driver throttle method) when disabling
the rx IRQ and reasserting RTS (unthrottle) when
reenabling the IRQ. Unfortunately, this may interfere
with the line discipline's use of throttle/unthrottle.

> It seems to take to long time in read_chan(). Do you now what
> is the exact reason of locking the filp buffer with the
> TTY_DONT_FLIP flag? For a short look I would say the buffers
> are safe locked by the spinlock tty->read_lock.

I can't identify the reason.
If you feel brave, remove the setting/clearing
of TTY_DONT_FLIP and see what happens.

--
Paul Fulghum
[email protected]

2004-09-30 16:18:13

by Roland Caßebohm

[permalink] [raw]
Subject: Re: Serial driver hangs

Am Mittwoch, 29. September 2004 16:25 schrieb Paul Fulghum:
> On Wed, 2004-09-29 at 09:07, Roland Caßebohm wrote:
> > I have added a routine to "struct tty_driver" for
> > restarting the RX interrupt after TTY_DONT_FLIP bit is
> > cleared in read_chan().
>
> If you are using RTS/CTS flow control,
> your scheme might prevent data loss if you also
> drop RTS (like driver throttle method) when disabling
> the rx IRQ and reasserting RTS (unthrottle) when
> reenabling the IRQ. Unfortunately, this may interfere
> with the line discipline's use of throttle/unthrottle.

Maybe I can use the functions rs_throttle() and
rs_unthrottle(). In rs_unthrottle I could reenable the RX
interrupt. So I don't need to add a function in "struct
tty_driver". I only need to set the flag TTY_THROTTLED if I
disable the interrupt.

>
> > It seems to take to long time in read_chan(). Do you now
> > what is the exact reason of locking the filp buffer with
> > the TTY_DONT_FLIP flag? For a short look I would say the
> > buffers are safe locked by the spinlock tty->read_lock.
>
> I can't identify the reason.
> If you feel brave, remove the setting/clearing
> of TTY_DONT_FLIP and see what happens.

I've just commented out all places where TTY_DONT_FLIP would
be set and left everything else original. It seems to work
without problems.

My system is sending and receiving on two ports with 921600
baud with CPU load of 85%. Some bytes get still lost, but
less then before. The test is working for 2h now and I have
forced sometimes some more activity to have a CPU load of
100%, but it is still working.

Maybe TTY_DONT_FLIP is really don't needed anymore.
I think to be save and fast maybe one way could be, if the
flip buffer is full it should be flipped but not processed
with tty->ldisc.receive_buf() in the interrupt routine.
flush_to_ldisc() has then always to look at both flip buffers
and process them.
If the second flip buffer is still not clean, if the interrupt
routine needs to flip it, it has to stop the flow and disable
the receive interrupt.
unthrottle() could then reenable the interrupt.

Maybe my thinking is to simple, what do you think?

Roland
--
___________________________________________________

VS Vision Systems GmbH, Industrial Image Processing
Dipl.-Ing. Roland Caßebohm
Aspelohe 27A, D-22848 Norderstedt, Germany
Mail: [email protected]
http://www.visionsystems.de
___________________________________________________

2004-09-30 19:15:10

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 11:16, Roland Caßebohm wrote:
> Maybe I can use the functions rs_throttle() and
> rs_unthrottle(). In rs_unthrottle I could reenable the RX
> interrupt. So I don't need to add a function in "struct
> tty_driver". I only need to set the flag TTY_THROTTLED if I
> disable the interrupt.

The question is when to unthrottle:

Doing it in N_TTY read_chan() does not mean a
flip buffer is available.

Doing it in flush_to_ldisc() could unthrottle
when the ldisc receive buffer is still full.

It is a problem of using a single mechanism to
throttle the receiver in response to
two seperate triggers (flip buffer state and
ldisc receive buffer state).

> Maybe TTY_DONT_FLIP is really don't needed anymore.

The only use I see for TTY_DONT_FLIP is
delaying transfers of data from a flip buffer
to the N_TTY receive buffer via ldisc->receive_buf()
while N_TTY read_chan() is returning data
from the N_TTY receive buffer to the user app.
This behavior maximizes free space in the
N_TTY receive buffer before sending the next
flip buffer to ldisc->receive_buf().

N_TTY ldisc receive buffer access is
protected by tty->read_lock, so not using
TTY_DONT_FLIP wont corrupt this buffer.
However, if there is not enough room in the
N_TTY receive buffer when ldisc->receive_buf()
is called then data is lost.

> I think to be save and fast maybe one way could be, if the
> flip buffer is full it should be flipped but not processed
> with tty->ldisc.receive_buf() in the interrupt routine.
> flush_to_ldisc() has then always to look at both flip buffers
> and process them.
> If the second flip buffer is still not clean, if the interrupt
> routine needs to flip it, it has to stop the flow and disable
> the receive interrupt.
> unthrottle() could then reenable the interrupt.

The gaping hole in the flip buffer scheme is
flush_to_ldisc() can be called from hard IRQ
context while ldisc->receive_buf() is running.

This can flip buffers while ldisc->receive_buf() is
still reading from one of the flip buffers.
That corrupts data if the device ISR overwrites
the buffer before ldisc->receive_buf() finishes
reading from it.

A mechanism is required to prevent flipping
buffers while ldisc->receive_buf() is running.
TTY_DONT_FLIP would work, but is already in
use for the purpose described above :-)

I understand the plan is to eliminate or replace
the flip buffer scheme. I doubt patches tinkering
with the current scheme will be accepted.

--
Paul Fulghum
[email protected]

2004-09-30 19:37:12

by Alan

[permalink] [raw]
Subject: Re: Serial driver hangs

On Iau, 2004-09-30 at 20:09, Paul Fulghum wrote:
> The gaping hole in the flip buffer scheme is
> flush_to_ldisc() can be called from hard IRQ
> context while ldisc->receive_buf() is running.

This is strictly forbidden and always has been. I've no
plan to touch that restriction merely to re-educate
any offender

2004-09-30 19:52:06

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 13:34, Alan Cox wrote:
> This is strictly forbidden and always has been. I've no
> plan to touch that restriction merely to re-educate
> any offender

Any offender in this case is most serial drivers,
including the 8520/serial driver in current 2.6

--
Paul Fulghum
[email protected]

2004-09-30 20:02:29

by Russell King

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, Sep 30, 2004 at 02:51:52PM -0500, Paul Fulghum wrote:
> On Thu, 2004-09-30 at 13:34, Alan Cox wrote:
> > This is strictly forbidden and always has been. I've no
> > plan to touch that restriction merely to re-educate
> > any offender
>
> Any offender in this case is most serial drivers,
> including the 8520/serial driver in current 2.6

which in turn also means the bug exists in 2.4...

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-09-30 20:06:58

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 14:59, Russell King wrote:
> On Thu, Sep 30, 2004 at 02:51:52PM -0500, Paul Fulghum wrote:
> > On Thu, 2004-09-30 at 13:34, Alan Cox wrote:
> > > This is strictly forbidden and always has been. I've no
> > > plan to touch that restriction merely to re-educate
> > > any offender
> >
> > Any offender in this case is most serial drivers,
> > including the 8520/serial driver in current 2.6
>
> which in turn also means the bug exists in 2.4...

Yes, it is also in serial.c of 2.4

My statement of 'most drivers' is wrong.
I should have said 'some drivers'
including 2.4 serial.c and the 8250/serial of 2.6

--
Paul Fulghum
[email protected]

2004-09-30 20:33:22

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 15:05, Paul Fulghum wrote:
> My statement of 'most drivers' is wrong.
> I should have said 'some drivers'
> including 2.4 serial.c and the 8250/serial of 2.6

Specifically it is the use of this code fragment
in the ISR of serial.c (2.4) and the various
drivers in driver/serial of 2.6:

if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
tty->flip.work.func((void *)tty);
if (tty->flip.count >= TTY_FLIPBUF_SIZE)
return; // if TTY_DONT_FLIP is set
}

in 2.4 it is

if (tty->flip.count >= TTY_FLIPBUF_SIZE) {
tty->flip.tqueue.routine((void *) tty);
if (tty->flip.count >= TTY_FLIPBUF_SIZE)
return; // if TTY_DONT_FLIP is set
}

tty->flip.work.func and tty->flip.tqueue.routine
are set to flush_to_ldisc()

--
Paul Fulghum
[email protected]

2004-09-30 21:13:07

by Alan

[permalink] [raw]
Subject: Re: Serial driver hangs

On Iau, 2004-09-30 at 21:30, Paul Fulghum wrote:
> tty->flip.work.func and tty->flip.tqueue.routine
> are set to flush_to_ldisc()

flush_to_ldisc was ok, then someone added the low latency
flag. In the current 2.6.9rc3 patch flush_to_ldisc honours
TTY_DONT_FLIP also

2004-09-30 21:25:39

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 15:10, Alan Cox wrote:
> On Iau, 2004-09-30 at 21:30, Paul Fulghum wrote:
> > tty->flip.work.func and tty->flip.tqueue.routine
> > are set to flush_to_ldisc()
>
> flush_to_ldisc was ok, then someone added the low latency
> flag. In the current 2.6.9rc3 patch flush_to_ldisc honours
> TTY_DONT_FLIP also

In the cases I described the low latency flag
does not come into play because flush_to_ldisc()
is called directly instead of
through tty_flip_buffer_push().

TTY_DONT_FLIP is only set in read_chan().
If read_chan() is not running, TTY_DONT_FLIP is not
set and does not prevent buffers from flipping
if the ISR calls flush_to_ldisc() directly
while ldisc->receive_buf() is running.

The answer seems to be: don't call
flush_to_ldisc directly like the current
serial driver does.

--
Paul Fulghum
[email protected]

2004-10-01 00:50:14

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Thu, 2004-09-30 at 16:25, Paul Fulghum wrote:
> > flush_to_ldisc was ok, then someone added the low latency
> > flag. In the current 2.6.9rc3 patch flush_to_ldisc honours
> > TTY_DONT_FLIP also
>
> In the cases I described the low latency flag
> does not come into play because flush_to_ldisc()
> is called directly instead of
> through tty_flip_buffer_push().
>
> TTY_DONT_FLIP is only set in read_chan().
> If read_chan() is not running, TTY_DONT_FLIP is not
> set and does not prevent buffers from flipping
> if the ISR calls flush_to_ldisc() directly
> while ldisc->receive_buf() is running.
>
> The answer seems to be: don't call
> flush_to_ldisc directly like the current
> serial driver does.

I've looked at this more on 2.6

If a driver only calls tty_flip_buffer_push(),
with low_latency cleared, it is still possible for
flush_to_ldisc() to run concurrently on SMP machines.

* IRQ on proc 0, flush_to_ldisc work queued for events/0
* events/0 processes work item:
1) work->pending cleared (work can now be queued again)
2) work function runs on proc 0

While work function is running on proc 0:

* IRQ on proc 1, flush_to_ldisc work queued for events/1
* events/1 processes work item:
1) work->pending cleared (work can now be queued again)
2) work function runs on proc 1

flush_to_ldisc/ldisc->receive_buf do not set TTY_DONT_FLIP
and I see no other mechanism to serialize flush_to_ldisc

That means the buffers can flip while running in
ldisc->receive_buf() which reads from the buffers.

This is contrived, and timing may prevent
this from actually occurring in practice, but it
seems to indicate a hole that needs to be plugged.

I wrong in my reading of the code?

--
Paul Fulghum
[email protected]


2004-10-01 15:22:45

by Roland Caßebohm

[permalink] [raw]
Subject: Re: Serial driver hangs

Am Donnerstag, 30. September 2004 23:25 schrieb Paul Fulghum:
> On Thu, 2004-09-30 at 15:10, Alan Cox wrote:
> > On Iau, 2004-09-30 at 21:30, Paul Fulghum wrote:
> > > tty->flip.work.func and tty->flip.tqueue.routine
> > > are set to flush_to_ldisc()
> >
> > flush_to_ldisc was ok, then someone added the low latency
> > flag. In the current 2.6.9rc3 patch flush_to_ldisc
> > honours TTY_DONT_FLIP also
>
> In the cases I described the low latency flag
> does not come into play because flush_to_ldisc()
> is called directly instead of
> through tty_flip_buffer_push().
>
> TTY_DONT_FLIP is only set in read_chan().
> If read_chan() is not running, TTY_DONT_FLIP is not
> set and does not prevent buffers from flipping
> if the ISR calls flush_to_ldisc() directly
> while ldisc->receive_buf() is running.
>
> The answer seems to be: don't call
> flush_to_ldisc directly like the current
> serial driver does.

Yes, I think you are right, if the system is to slow to fetch
the data fast enough the buffer will be sometime full. And if
it would be possible to use the second flip buffer then, this
buffer would be full too sometime.
It would just take a little longer till data got lost. But if
I want that I could just make the buffers larger.

...

I have just tested it, but unfortunately I've got a very bad
result. :-( In my test case (2 port with 921600 baud) I get
very much data loss.

I think I will stay now by the solution of just trow away the
characters from the FIFO if flip buffer is full and
TTY_DONT_FLIP is set. I will test now the patch Paul has
made.

Roland

2004-10-01 15:28:46

by Roland Caßebohm

[permalink] [raw]
Subject: Re: Serial driver hangs

Am Mittwoch, 29. September 2004 16:13 schrieb Paul Fulghum:
> Here is Roland's change in patch form with added comment.
> (against 2.4.28-pre3)
>
> Roland: can you test this?

Yes, it works fine for me.

Roland
--
___________________________________________________

VS Vision Systems GmbH, Industrial Image Processing
Dipl.-Ing. Roland Caßebohm
Aspelohe 27A, D-22848 Norderstedt, Germany
http://www.visionsystems.de
___________________________________________________

2004-10-01 16:08:48

by Paul Fulghum

[permalink] [raw]
Subject: Re: Serial driver hangs

On Fri, 2004-10-01 at 10:22, Roland Caßebohm wrote:
> Yes, I think you are right, if the system is to slow to fetch
> the data fast enough the buffer will be sometime full. And if
> it would be possible to use the second flip buffer then, this
> buffer would be full too sometime.
> It would just take a little longer till data got lost. But if
> I want that I could just make the buffers larger.
>
> ...
>
> I have just tested it, but unfortunately I've got a very bad
> result. :-( In my test case (2 port with 921600 baud) I get
> very much data loss.

Roland:

Can I impose on you to try the following patches?

The differences here are:
* don't call flush_to_ldisc() directly from ISR
* flush_to_ldisc() keeps flushing flip buffers until empty

These patches are for testing purposes only
and not intended for general use.
I would like to see how your high speed setup reacts.

Thanks in advance.

--
Paul Fulghum
[email protected]

--- a/drivers/char/serial.c 2004-09-30 15:25:17.000000000 -0500
+++ b/drivers/char/serial.c 2004-10-01 10:42:33.000000000 -0500
@@ -572,9 +572,17 @@ static _INLINE_ void receive_chars(struc
icount = &info->state->icount;
do {
if (tty->flip.count >= TTY_FLIPBUF_SIZE) {
- tty->flip.tqueue.routine((void *) tty);
- if (tty->flip.count >= TTY_FLIPBUF_SIZE)
- return; // if TTY_DONT_FLIP is set
+ /* no room in flip buffer, discard rx FIFO contents to clear IRQ
+ * *FIXME* Hardware with auto flow control
+ * would benefit from leaving the data in the FIFO and
+ * disabling the rx IRQ until space becomes available.
+ */
+ do {
+ serial_inp(info, UART_RX);
+ icount->overrun++;
+ *status = serial_inp(info, UART_LSR);
+ } while ((*status & UART_LSR_DR) && (max_count-- > 0));
+ return; // if TTY_DONT_FLIP is set
}
ch = serial_inp(info, UART_RX);
*tty->flip.char_buf_ptr = ch;
--- a/drivers/char/tty_io.c 2004-04-14 08:05:29.000000000 -0500
+++ b/drivers/char/tty_io.c 2004-10-01 10:49:45.000000000 -0500
@@ -1944,32 +1944,34 @@ static void flush_to_ldisc(void *private
int count;
unsigned long flags;

- if (test_bit(TTY_DONT_FLIP, &tty->flags)) {
- queue_task(&tty->flip.tqueue, &tq_timer);
- return;
- }
- if (tty->flip.buf_num) {
- cp = tty->flip.char_buf + TTY_FLIPBUF_SIZE;
- fp = tty->flip.flag_buf + TTY_FLIPBUF_SIZE;
- tty->flip.buf_num = 0;
-
- save_flags(flags); cli();
- tty->flip.char_buf_ptr = tty->flip.char_buf;
- tty->flip.flag_buf_ptr = tty->flip.flag_buf;
- } else {
- cp = tty->flip.char_buf;
- fp = tty->flip.flag_buf;
- tty->flip.buf_num = 1;
-
- save_flags(flags); cli();
- tty->flip.char_buf_ptr = tty->flip.char_buf + TTY_FLIPBUF_SIZE;
- tty->flip.flag_buf_ptr = tty->flip.flag_buf + TTY_FLIPBUF_SIZE;
- }
- count = tty->flip.count;
- tty->flip.count = 0;
- restore_flags(flags);
+ while(tty->flip.count) {
+ if (test_bit(TTY_DONT_FLIP, &tty->flags)) {
+ queue_task(&tty->flip.tqueue, &tq_timer);
+ return;
+ }
+ if (tty->flip.buf_num) {
+ cp = tty->flip.char_buf + TTY_FLIPBUF_SIZE;
+ fp = tty->flip.flag_buf + TTY_FLIPBUF_SIZE;
+ tty->flip.buf_num = 0;
+
+ save_flags(flags); cli();
+ tty->flip.char_buf_ptr = tty->flip.char_buf;
+ tty->flip.flag_buf_ptr = tty->flip.flag_buf;
+ } else {
+ cp = tty->flip.char_buf;
+ fp = tty->flip.flag_buf;
+ tty->flip.buf_num = 1;
+
+ save_flags(flags); cli();
+ tty->flip.char_buf_ptr = tty->flip.char_buf + TTY_FLIPBUF_SIZE;
+ tty->flip.flag_buf_ptr = tty->flip.flag_buf + TTY_FLIPBUF_SIZE;
+ }
+ count = tty->flip.count;
+ tty->flip.count = 0;
+ restore_flags(flags);

- tty->ldisc.receive_buf(tty, cp, fp, count);
+ tty->ldisc.receive_buf(tty, cp, fp, count);
+ }
}

/*


2004-10-01 20:15:51

by Stuart MacDonald

[permalink] [raw]
Subject: RE: Serial driver hangs

From: [email protected]
> flush_to_ldisc was ok, then someone added the low latency
> flag. In the current 2.6.9rc3 patch flush_to_ldisc honours
> TTY_DONT_FLIP also

I've come late to this discussion. Not sure what the scope of this
cleanup is, but I'd like to see the flip buffers done away with
entirely, to be replaced by a single buffer with proper r/w locking.
Or keep the flip arrangement, but move it out of tty_struct so that it
can be made larger. Some of our high speed products find the rx buffer
to be less than sufficient.

..Stu

2004-10-01 20:46:16

by Paul Fulghum

[permalink] [raw]
Subject: RE: Serial driver hangs

On Fri, 2004-10-01 at 15:13, Stuart MacDonald wrote:
> I've come late to this discussion. Not sure what the scope of this
> cleanup is, but I'd like to see the flip buffers done away with
> entirely, to be replaced by a single buffer with proper r/w locking.
> Or keep the flip arrangement, but move it out of tty_struct so that it
> can be made larger. Some of our high speed products find the rx buffer
> to be less than sufficient.

This started as a bug report of a lockup under high load.
(2 ports @ 921600bps)

The cause was serial.c (kernel 2.4) not clearing
the receive IRQ if the flip buffer was full.
The ISR simply returned without flushing the
receive FIFO or disabling receive interrupts.
The short term cure for the lock up is to
flush the receive FIFO and discard the data.

The discussion then descended into analysis of the
flip buffer scheme in general. Everyone seems to agree
it should be eliminated or replaced.
It has synchronization problems for SMP.

Alan is busy reworking other tty locking
issues, and it probably annoyed by the noise
created by this thread :-)

I suspect when those issues are resovled there
will be an opportunity to submit suggestions and patches.

--
Paul Fulghum
[email protected]