2004-10-29 20:00:53

by Tim_T_Murphy

[permalink] [raw]
Subject: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

I am new to the list, hope this is ok..
I've read about several problems others are having with the new 2.6 serial driver in the list, and tried to see if their solutions solved my issue also, but unfortunately none that I have tried yet have helped.

We're migrating our applications for the Dell Remote Access Controller (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between the apps and the DRAC happen over a ppp link which is established via a service startup script; the script uses setserial to prepare an unused tty (based on the assigned hardware information, obtained via lspci), and the script then calls pppd to finish/establish the link.

Everything works fine with the UP kernel -- Although, there is a message in syslog regarding a spinlock (issued at approximately the same point in time where the SMP kernel hangs):
---
Oct 29 13:34:47 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 29 13:34:47 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 29 13:34:47 racjag-1 udev[3875]: creating device node '/dev/ppp'
Oct 29 13:34:47 racjag-1 pppd[3884]: pppd 2.4.2 started by root, uid 0
Oct 29 13:34:47 racjag-1 racser: pppd startup succeeded
Oct 29 13:34:48 racjag-1 chat[3886]: send (CLIENT^M)
Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER
Oct 29 13:34:48 racjag-1 chat[3886]: -- got it
Oct 29 13:34:48 racjag-1 chat[3886]: send ()
Oct 29 13:34:48 racjag-1 pppd[3884]: Serial connection established.
Oct 29 13:34:48 racjag-1 pppd[3884]: Using interface ppp0
Oct 29 13:34:48 racjag-1 pppd[3884]: Connect: ppp0 <--> /dev/ttyS2
Oct 29 13:34:49 racjag-1 pppd[3884]: local IP address 192.168.234.235
Oct 29 13:34:49 racjag-1 pppd[3884]: remote IP address 192.168.234.236
---

With the SMP kernel, it hangs very soon after starting pppd.
I enabled DEBUG in the serial driver and captured the syslog when the problem happens, but this is not detailed enough for me to finger the exact problem:
---
Oct 28 14:04:52 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 28 14:04:52 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 28 14:04:52 racjag-1 udev[3621]: creating device node '/dev/ppp'
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: Trying to free nonexistent resource <00000000-00000007>
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 pppd[3681]: pppd 2.4.1 started by root, uid 0
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 racser: pppd startup succeeded
Oct 28 14:05:20 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:20 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:20 racjag-1 chat[3683]: send (CLIENT^M)
---
The system hangs right there; must press and hold power to get the system to shut down.

Any suggestions to narrow down the cause? Please cc my email as I do not subscribe to this list.
Thanks,
Tim


2004-10-29 21:07:15

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel


> Shouldn't 8250_pci setup the ports already for you? If not, what
needs
> to be done to achieve this. Using setserial to setup ports for PCI
cards
> isn't the preferred way of doing this.

good question, i will have to understand more to answer it though.
our product has used this method for almost 2 years now.

> At a guess, you've enabled "low latency" setting on this port ?

yes. here's a snippet from the script:

echo -n "Starting ${racsvc}: "
# set serial characteristics for RAC device
setserial /dev/${ttyid} \
port 0x${maddr} irq ${irqno} ^skip_test autoconfig
setserial /dev/${ttyid} \
uart 16550A low_latency baud_base 1382400 \
close_delay 0 closing_wait infinite
# now start pppd
/sbin/modprobe -q ppp >/dev/null 2>&1
/sbin/modprobe -q ppp_async >/dev/null 2>&1
daemon pppd call ${service}
RETVAL=$?

Thanks
Tim

2004-10-29 21:20:05

by Russell King

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, Oct 29, 2004 at 04:04:40PM -0500, [email protected] wrote:
> > Shouldn't 8250_pci setup the ports already for you? If not, what
> > needs to be done to achieve this. Using setserial to setup ports
> > for PCI cards isn't the preferred way of doing this.
>
> good question, i will have to understand more to answer it though.
> our product has used this method for almost 2 years now.

Well, if you forward lspci -vvx and the "maddr" and "irqno" information
(in private mail if you prefer) then I'll fix 8250_pci to work.

> > At a guess, you've enabled "low latency" setting on this port ?
>
> yes. here's a snippet from the script:
>
> echo -n "Starting ${racsvc}: "
> # set serial characteristics for RAC device
> setserial /dev/${ttyid} \
> port 0x${maddr} irq ${irqno} ^skip_test autoconfig
> setserial /dev/${ttyid} \
> uart 16550A low_latency baud_base 1382400 \
> close_delay 0 closing_wait infinite
> # now start pppd
> /sbin/modprobe -q ppp >/dev/null 2>&1
> /sbin/modprobe -q ppp_async >/dev/null 2>&1
> daemon pppd call ${service}
> RETVAL=$?

I think dropping low_latency will work around the problem for the time
being.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-10-29 21:17:22

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, 2004-10-29 at 14:55, [email protected] wrote:
> Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
> Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER

One way this can happen is a receive interrupt:

serial8250_interrupt();
spin_lock(port->lock);
serial8250_handle_port();
receive_chars();
flip.work.func(); /* if FLIP buffer full */
ldisc->receive_buf(); /* N_TTY */
tty->driver->flush_chars();
uart_start();
spin_lock(port->lock); *BANG*

Try the attached patch and report what happens.

--
Paul Fulghum
[email protected]

--- linux-2.6.8/drivers/serial/8250.c 2004-08-14 00:36:13.000000000 -0500
+++ b/drivers/serial/8250.c 2004-10-29 15:58:28.076014336 -0500
@@ -830,9 +830,13 @@ receive_chars(struct uart_8250_port *up,

do {
if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
- tty->flip.work.func((void *)tty);
- if (tty->flip.count >= TTY_FLIPBUF_SIZE)
- return; // if TTY_DONT_FLIP is set
+ /* no room in flip buffer, discard rx FIFO contents to clear IRQ */
+ do {
+ serial_inp(up, UART_RX);
+ up->port.icount.overrun++;
+ *status = serial_inp(up, UART_LSR);
+ } while ((*status & UART_LSR_DR) && (max_count-- > 0));
+ return; /* if TTY_DONT_FLIP is set */
}
ch = serial_inp(up, UART_RX);
*tty->flip.char_buf_ptr = ch;


2004-10-29 20:27:44

by Russell King

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, Oct 29, 2004 at 02:55:10PM -0500, [email protected] wrote:
> I've read about several problems others are having with the new 2.6
> serial driver in the list, and tried to see if their solutions solved
> my issue also, but unfortunately none that I have tried yet have helped.

Well, this is the first I know of this kind of problem...

> We're migrating our applications for the Dell Remote Access Controller
> (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between
> the apps and the DRAC happen over a ppp link which is established via
> a service startup script; the script uses setserial to prepare an unused
> tty (based on the assigned hardware information, obtained via lspci),
> and the script then calls pppd to finish/establish the link.

Shouldn't 8250_pci setup the ports already for you? If not, what needs
to be done to achieve this. Using setserial to setup ports for PCI cards
isn't the preferred way of doing this.

At a guess, you've enabled "low latency" setting on this port ?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-10-29 22:32:49

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Ah, that would explain the problem better than
the code path I saw (flip buffer full).
The problem is still the same: calling the flip
work routine from the ISR, which calls through
N_TTY receive_buf->flush_chars->start_tx.

--
Paul Fulghum
[email protected]


2004-10-29 23:34:30

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel


> Well, if you forward lspci -vvx and the "maddr" and "irqno"
information
> (in private mail if you prefer) then I'll fix 8250_pci to work.

maddr: 10 # note, this is for the UP kernel. for SMP,
maddr=201
irqno: ec40
lspci -d 1028:0008 -vvx:

00:08.1 Class ff00: Dell Remote Access Card III
Subsystem: Dell Remote Access Card III
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 10
Region 0: Memory at fe202000 (32-bit, non-prefetchable)
[size=4K]
Region 1: I/O ports at ec40 [size=64]
Region 2: Memory at feb00000 (32-bit, prefetchable) [size=512K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 28 10 08 00 03 01 90 02 00 00 00 ff 10 20 80 00
10: 00 20 20 fe 41 ec 00 00 08 00 b0 fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 08 00
30: 00 00 00 00 48 00 00 00 00 00 00 00 0a 02 00 00

> I think dropping low_latency will work around the problem for the time
> being.

Thanks a lot for the help and advice, I will try this and report
results.

Tim

2004-10-29 23:37:53

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

> maddr: 10 # note, this is for the UP kernel. for
SMP, maddr=201
> irqno: ec40

duh, i got maddr and irqno backwards in my last post, sorry.
Tim

2004-10-29 23:58:34

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Would it make sense to do something like (in tty_io.c) the following?

void tty_flip_buffer_push(struct tty_struct *tty)
{
if (tty->low_latency) {
if (in_interrupt()) {
printk(KERN_ERR "tty_flip_buffer_push called with low latency from interrupt!\n");
dump_stack();
schedule_delayed_work(&tty->flip.work, 1);
}
else
flush_to_ldisc((void *) tty);
}
else
schedule_delayed_work(&tty->flip.work, 1);
}

--
Paul Fulghum
[email protected]


2004-10-30 16:16:48

by Russell King

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Fri, Oct 29, 2004 at 06:30:01PM -0500, [email protected] wrote:
>
> > Well, if you forward lspci -vvx and the "maddr" and "irqno"
> information
> > (in private mail if you prefer) then I'll fix 8250_pci to work.
>
> maddr: 10 # note, this is for the UP kernel. for SMP,
> maddr=201
> irqno: ec40
> lspci -d 1028:0008 -vvx:

Ok, could you check whether this patch automatically detects the serial
port please?

Thanks.

diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x *.orig -x *.rej orig/drivers/serial/8250_pci.c linux/drivers/serial/8250_pci.c
--- orig/drivers/serial/8250_pci.c Sat Oct 23 11:39:13 2004
+++ linux/drivers/serial/8250_pci.c Sat Oct 30 16:57:59 2004
@@ -1026,6 +1026,7 @@ enum pci_board_num_t {

pbn_b1_bt_2_921600,

+ pbn_b1_1_1382400,
pbn_b1_2_1382400,
pbn_b1_4_1382400,
pbn_b1_8_1382400,
@@ -1253,6 +1254,12 @@ static struct pci_board pci_boards[] __d
.uart_offset = 8,
},

+ [pbn_b1_1_1382400] = {
+ .flags = FL_BASE1,
+ .num_ports = 1,
+ .base_baud = 1382400,
+ .uart_offest = 8,
+ },
[pbn_b1_2_1382400] = {
.flags = FL_BASE1,
.num_ports = 2,
@@ -2109,6 +2116,13 @@ static struct pci_device_id serial_pci_t
pbn_b0_bt_1_460800 },

/*
+ * Dell Remote Access Card III - [email protected]
+ */
+ { PCI_VENDOR_ID_DELL, PCI_DEVICE_ID_DELL_RACIII,
+ PCI_ID_ANY, PCI_ID_ANY, 0, 0,
+ pbn_b1_1_1382400 },
+
+ /*
* RAStel 2 port modem, [email protected]
*/
{ PCI_VENDOR_ID_MORETON, PCI_DEVICE_ID_RASTEL_2PORT,
diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x *.orig -x *.rej orig/include/linux/pci_ids.h linux/include/linux/pci_ids.h
--- orig/include/linux/pci_ids.h Sat Oct 23 11:40:03 2004
+++ linux/include/linux/pci_ids.h Sat Oct 30 16:52:46 2004
@@ -522,6 +522,7 @@
#define PCI_DEVICE_ID_AI_M1435 0x1435

#define PCI_VENDOR_ID_DELL 0x1028
+#define PCI_DEVICE_ID_DELL_RACIII 0x0008

#define PCI_VENDOR_ID_MATROX 0x102B
#define PCI_DEVICE_ID_MATROX_MGA_2 0x0518

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-10-30 23:47:20

by Alan

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> On Fri, 2004-10-29 at 15:20, Russell King wrote:
> > At a guess, you've enabled "low latency" setting on this port ?
>
> Would it make sense to do something like (in tty_io.c) the following?

Not really because it can legally occur if you flip the low latency
flag while a transaction is queued. It might work if you waited for
scheduled work to complete in the flag changing.

2004-10-31 00:26:52

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Sat, 2004-10-30 at 17:43, Alan Cox wrote:
> On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> > Would it make sense to do something like (in tty_io.c) the following?
>
> Not really because it can legally occur if you flip the low latency
> flag while a transaction is queued. It might work if you waited for
> scheduled work to complete in the flag changing.

I don't see how having flush_to_ldisc() queued
or already running (on another processor) negates
the prohibition on calling tty_flip_buffer_push()
with low_latency set in interrupt context.

The comments for tty_flip_buffer_push() state the
function should not be called in interrupt context
if low_latency is set (no exceptions are listed).
Meaning flush_to_ldisc() should only be called
in process context.

If flush_to_ldisc() is queued or already executing,
there is no protection against calling
flush_to_ldisc() again, directly in interrupt context.
TTY_DONT_FLIP is no protection, that is only set
in read_chan() of n_tty.c

If I'm missing something, please point it out.

--
Paul Fulghum
[email protected]


2004-11-01 07:15:24

by Stuart MacDonald

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel

From: Paul Fulghum
> I don't see how having flush_to_ldisc() queued
> or already running (on another processor) negates
> the prohibition on calling tty_flip_buffer_push()
> with low_latency set in interrupt context.

I always thought the whole point of low_latency was to make the
receive-path very fast, which means specifically allowing the flip
routine to run from the ISR. So checking for calling from the ISR and
specifically disallowing that is basically negating the entire raison
d'etre for low_latency.

Having said that, the interrupt context "taint" that is allowed by the
low_latency flag has been a thorn in our side for some time. It would
be nice if that path was cleaned up to run properly from interrupt or
process context.

..Stu
http://www.connecttech.com

2004-11-01 14:42:58

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

> Ok, could you check whether this patch automatically detects
> the serial port please?

Yes, other than fixing a couple typos:
uart_offest -> uart_offset
PCI_ID_ANY -> PCI_ANY_ID
I now get ttyS4 in my /proc/tty/driver/serial output, on bootup:

serinfo:1.0 driver revision:
0: uart:16550A port:000003F8 irq:4 tx:22 rx:0 RI
1: uart:16550A port:000002F8 irq:3 tx:22 rx:0 RI
2: uart:unknown port:000003E8 irq:4
3: uart:unknown port:000002E8 irq:3
4: uart:16550A port:0000EC40 irq:201 tx:0 rx:0 CTS|DSR|CD
5: uart:unknown port:00000000 irq:0
6: uart:unknown port:00000000 irq:0
7: uart:unknown port:00000000 irq:0

Also: the removal of "low_latency" does avoid the hang with the SMP
kernel; I am removing this setting from our service startup script. In
addition, I will be changing the script to only perform the setserial
commands against an unused tty if it cannot first identify a tty that
already describes our virtual uart (ala Russell's 8250_pci fix).

Thanks to all who replied, much appreciated!
Tim

2004-11-01 15:21:59

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel

Stuart MacDonald wrote:
> From: Paul Fulghum
> I always thought the whole point of low_latency was to make the
> receive-path very fast, which means specifically allowing the flip
> routine to run from the ISR. So checking for calling from the ISR and
> specifically disallowing that is basically negating the entire raison
> d'etre for low_latency.

I was thought it was to speed processing if the
caller was already in process context. Maybe the
real intentions are lost to history.

Moving forward, Alan stated that the flip
routine should not be called in interrupt context.
His last post concerning some transient state
of low_latency has confused me.

Currently, with the 8250 driver and N_TTY
line discipline, calling the flip routine from
ISR causes an SMP deadlock. There are two paths that
cause this:
1. low_latency is set
2. flip buffer becomes full

So calling the flip routine from the ISR may work
with some specific drivers, but it would be
dangerous to assume this works in all cases.

--
Paul Fulghum
[email protected]

2004-11-01 15:36:30

by Stuart MacDonald

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel

From: Paul Fulghum [mailto:[email protected]]
> Stuart MacDonald wrote:
> > I always thought the whole point of low_latency was to make the
> > receive-path very fast, which means specifically allowing the flip
> > routine to run from the ISR. So checking for calling from
> the ISR and
> > specifically disallowing that is basically negating the
> entire raison
> > d'etre for low_latency.
>
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

Best person to ask may be Ted; he was once the serial maintainer. Ted?

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

I didn't follow that either, but I wasn't reading too closely.

> Currently, with the 8250 driver and N_TTY
> line discipline, calling the flip routine from
> ISR causes an SMP deadlock. There are two paths that
> cause this:
> 1. low_latency is set
> 2. flip buffer becomes full
>
> So calling the flip routine from the ISR may work
> with some specific drivers, but it would be
> dangerous to assume this works in all cases.

I haven't looked at the 2.6 serial rewrite in depth yet, but the
problem always existed in the 2.4 driver. I got around the problem by
checking for interrupt context and taking the locks or not at a much
earlier stage.

..Stu
http://www.connecttech.com

2004-11-01 15:45:40

by Russell King

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Mon, Nov 01, 2004 at 08:28:35AM -0600, [email protected] wrote:
> > Ok, could you check whether this patch automatically detects
> > the serial port please?
>
> Yes, other than fixing a couple typos:
> uart_offest -> uart_offset
> PCI_ID_ANY -> PCI_ANY_ID

Thanks for testing - I'll be adding this to mainline kernels.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-11-01 16:13:17

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

> Thanks for testing - I'll be adding this to mainline kernels.
Thanks Russell.
I'd be glad to help by testing any further low_latency related patches
also.
Tim

2004-11-02 00:23:01

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel

On Mon, 2004-11-01 at 17:02, Alan Cox wrote:
> On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> > His last post concerning some transient state
> > of low_latency has confused me.
>
> You were correct about that

What? That I'm easily confused?
*snicker*

--
Paul Fulghum
[email protected]


2004-11-02 00:12:17

by Alan

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel

On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

It was added way back by Ted to improve performance when dealing with
low latency requirements for I/O.

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

You were correct about that


2005-01-06 14:58:10

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

sorry for the huge delay since my last post on this, but disabling
low_latency is resulting in dropped characters.

this looks to be exactly what was reported in
http://www.uwsg.iu.edu/hypermail/linux/kernel/0212.0/0412.html

anything i can do to avoid dropping characters without using
low_latency, which still hangs SMP kernels?
thanks,
tim
> -----Original Message-----
> From: Murphy, Tim T
> Sent: Monday, November 01, 2004 10:07 AM
> To: 'Russell King'
> Cc: [email protected]
> Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel,
> but not the
> UP kernel
>
>
> > Thanks for testing - I'll be adding this to mainline kernels.
> Thanks Russell.
> I'd be glad to help by testing any further low_latency
> related patches also.
> Tim
>

2005-01-06 22:50:28

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel


> anything i can do to avoid dropping characters without using
> low_latency, which still hangs SMP kernels?

this patch fixes the problem for me, but its probably an awful hack -- a
brief interrupt storm occurs until tty processes its buffer, but IMHO
that's better than dropping characters.

is there a better alternative?
thanks,
tim

--- 8250-orig.c 2005-01-06 16:25:24.000000000 -0600
+++ 8250.c 2005-01-06 16:27:21.000000000 -0600
@@ -989,8 +989,10 @@
if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
if(tty->low_latency)
tty_flip_buffer_push(tty);
- /* If this failed then we will throw away the
- bytes but must do so to clear interrupts */
+ else
+ break;
+ /* If this failed then we will just leave now
+ rather than dropping bytes (interrupts not
cleared) */
}
ch = serial_inp(up, UART_RX);
flag = TTY_NORMAL;

2005-01-06 23:58:50

by Tim_T_Murphy

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel


> this patch fixes the problem for me, but its probably an awful hack --

> a brief interrupt storm occurs until tty processes its buffer,
> but IMHO that's better than dropping characters.

sorry, i see now that its not an interrupt storm but rather the
interrupt handler doesn't end until it quits due to 'too much work'.

tim

2005-01-07 00:24:49

by Alan

[permalink] [raw]
Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Iau, 2005-01-06 at 22:47, [email protected] wrote:
> > anything i can do to avoid dropping characters without using
> > low_latency, which still hangs SMP kernels?
>
> this patch fixes the problem for me, but its probably an awful hack -- a
> brief interrupt storm occurs until tty processes its buffer, but IMHO
> that's better than dropping characters.

On a PCI device you may never get to process the buffer if you do that.
2.6.10 throws away the other bytes carefully and clears the IRQ.

Presumably this is a device with a fake 8250 that produces sudden large
bursts of data ? If so then for now you -need- to set low_latency and
should probably do it by the PCI vendor subid/device id. The problem is
that the serial layer expects serial data arriving at serial speeds. It
completely breaks down when it hits an emulation of a generic uart that
suddenely receives 32Kbytes of data at ethernet speed.

The longer term fix for this is when the flip buffers go away, and the
same problem gets cleaned up for things like mainframes and some of the
high performance DMA devices. Until then just set low_latency and
comment it as "not your fault" 8)

Alan

2005-01-07 00:51:54

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

Alan Cox wrote:
> On Iau, 2005-01-06 at 22:47, [email protected] wrote:
>
>>>anything i can do to avoid dropping characters without using
>>>low_latency, which still hangs SMP kernels?
>>
>>this patch fixes the problem for me, but its probably an awful hack -- a
>>brief interrupt storm occurs until tty processes its buffer, but IMHO
>>that's better than dropping characters.
>
> Presumably this is a device with a fake 8250 that produces sudden large
> bursts of data ? If so then for now you -need- to set low_latency and
> should probably do it by the PCI vendor subid/device id. The problem is
> that the serial layer expects serial data arriving at serial speeds. It
> completely breaks down when it hits an emulation of a generic uart that
> suddenely receives 32Kbytes of data at ethernet speed.
>
> The longer term fix for this is when the flip buffers go away, and the
> same problem gets cleaned up for things like mainframes and some of the
> high performance DMA devices. Until then just set low_latency and
> comment it as "not your fault" 8)

IIRC that guarantees a deadlock on SMP due to the
generic serial layer trying to grab a spinlock
that is already held. (Which prompted the original
bug report by Tim several months ago)

Perhaps the FIFO trigger threshold for this
specific device can be altered
to try and smooth the amount of data dumped
per IRQ.

--
Paul Fulghum
[email protected]

2005-01-07 03:00:48

by Alan

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

On Gwe, 2005-01-07 at 00:43, Paul Fulghum wrote:
> IIRC that guarantees a deadlock on SMP due to the
> generic serial layer trying to grab a spinlock
> that is already held. (Which prompted the original
> bug report by Tim several months ago)

I fixed the tty locking issues with that. If there are any left they
should be solely in the serial generic code and I've no idea there

2005-01-07 14:09:25

by Paul Fulghum

[permalink] [raw]
Subject: Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel

Alan Cox wrote:
> On Gwe, 2005-01-07 at 00:43, Paul Fulghum wrote:
>
>>IIRC that guarantees a deadlock on SMP due to the
>>generic serial layer trying to grab a spinlock
>>that is already held. (Which prompted the original
>>bug report by Tim several months ago)
>
>
> I fixed the tty locking issues with that. If there are any left they
> should be solely in the serial generic code and I've no idea there

Yes, that is where the locking problems were.
When I last looked at it the problem call path was:

serial8250_interrupt();
spin_lock(port->lock);
serial8250_handle_port();
receive_chars();
flip.work.func(); /* if FLIP buffer full or low_latency set */
ldisc->receive_buf(); /* N_TTY */
tty->driver->flush_chars();
uart_start();
spin_lock(port->lock); *BANG*

--
Paul Fulghum
Microgate Systems, Ltd