2008-06-23 23:30:17

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH -rt] serial: 8250: fix shared interrupts issues under RT kernels

8250's initialization routines relies on the fact that _irqsave spinlock
will disable local hardirqs, so that the driver can issue IRQ-triggering
operations before registering the port in the IRQ chain.

With RT kernels and preemptable hardirqs this isn't true, _irqsave
spinlock does not disable local hardirqs, and this causes following
trace:

$ cat /dev/ttyS1
irq 42: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[C0475EB0] [C0008A98] show_stack+0x4c/0x1ac (unreliable)
[C0475EF0] [C004BBD4] __report_bad_irq+0x34/0xb8
[C0475F10] [C004BD38] note_interrupt+0xe0/0x308
[C0475F50] [C004B09C] thread_simple_irq+0xdc/0x104
[C0475F70] [C004B3FC] do_irqd+0x338/0x3c8
[C0475FC0] [C00398E0] kthread+0xf8/0x100
[C0475FF0] [C0011FE0] original_kernel_thread+0x44/0x60
handlers:
[<c02112c4>] (serial8250_interrupt+0x0/0x138)
Disabling IRQ #42

After this, all serial ports on the given IRQ are non-functional.

To fix the issue we should explicitly disable shared IRQ before
issuing any IRQ-triggering operations.

I also changed spin_lock_irqsave to the ordinary spin_lock, since it
seems to be safe: chain does not contain new port (yet), thus nobody
will interfere us from the ISRs.

Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/serial/8250.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index 76ccef7..702e0d3 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -1831,7 +1831,9 @@ static int serial8250_startup(struct uart_port *port)
* the interrupt is enabled. Delays are necessary to
* allow register changes to become visible.
*/
- spin_lock_irqsave(&up->port.lock, flags);
+ spin_lock(&up->port.lock);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ disable_irq(up->port.irq);

wait_for_xmitr(up, UART_LSR_THRE);
serial_out_sync(up, UART_IER, UART_IER_THRI);
@@ -1843,7 +1845,9 @@ static int serial8250_startup(struct uart_port *port)
iir = serial_in(up, UART_IIR);
serial_out(up, UART_IER, 0);

- spin_unlock_irqrestore(&up->port.lock, flags);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ enable_irq(up->port.irq);
+ spin_unlock(&up->port.lock);

/*
* If the interrupt is not reasserted, setup a timer to
--
1.5.5.4


2008-06-24 00:12:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -rt] serial: 8250: fix shared interrupts issues under RT kernels


* Anton Vorontsov <[email protected]> wrote:

> 8250's initialization routines relies on the fact that _irqsave
> spinlock will disable local hardirqs, so that the driver can issue
> IRQ-triggering operations before registering the port in the IRQ
> chain.
>
> With RT kernels and preemptable hardirqs this isn't true, _irqsave
> spinlock does not disable local hardirqs, and this causes following
> trace:

again, please let the -rt maintainers sort out which patches need to be
propagated to upstream maintainers.

Ingo

2008-07-01 13:43:55

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH] serial: 8250: fix shared interrupts issues with SMP and RT kernels

With SMP kernels _irqsave spinlock disables only local interrupts, while
the shared serial interrupt could be assigned to the CPU that is not
currently starting up the serial port.

This might cause issues because serial8250_startup() routine issues
IRQ-triggering operations before registering the port in the IRQ chain
(though, this is fine to do and done explicitly because we don't want to
process any interrupts on the port startup).

With RT kernels and preemptable hardirqs, _irqsave spinlock does not
disable local hardirqs, and the bug could be reproduced much easily:

$ cat /dev/ttyS0 &
$ cat /dev/ttyS1
irq 42: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[C0475EB0] [C0008A98] show_stack+0x4c/0x1ac (unreliable)
[C0475EF0] [C004BBD4] __report_bad_irq+0x34/0xb8
[C0475F10] [C004BD38] note_interrupt+0xe0/0x308
[C0475F50] [C004B09C] thread_simple_irq+0xdc/0x104
[C0475F70] [C004B3FC] do_irqd+0x338/0x3c8
[C0475FC0] [C00398E0] kthread+0xf8/0x100
[C0475FF0] [C0011FE0] original_kernel_thread+0x44/0x60
handlers:
[<c02112c4>] (serial8250_interrupt+0x0/0x138)
Disabling IRQ #42

After this, all serial ports on the given IRQ are non-functional.

To fix the issue we should explicitly disable shared IRQ before
issuing any IRQ-triggering operations.

I also changed spin_lock_irqsave to the ordinary spin_lock, since it
seems to be safe: chain does not contain new port (yet), thus nobody
will interfere us from the ISRs.

Signed-off-by: Anton Vorontsov <[email protected]>
---

On Tue, Jun 24, 2008 at 02:12:21AM +0200, Ingo Molnar wrote:
>
> * Anton Vorontsov <[email protected]> wrote:
>
> > 8250's initialization routines relies on the fact that _irqsave
> > spinlock will disable local hardirqs, so that the driver can issue
> > IRQ-triggering operations before registering the port in the IRQ
> > chain.
> >
> > With RT kernels and preemptable hardirqs this isn't true, _irqsave
> > spinlock does not disable local hardirqs, and this causes following
> > trace:
>
> again, please let the -rt maintainers sort out which patches need to be
> propagated to upstream maintainers.

This appears to be not only RT issue though. In theory, this can be
triggered on SMP also. Thanks to Daniel Walker for pointing this out.

drivers/serial/8250.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index 76ccef7..702e0d3 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -1831,7 +1831,9 @@ static int serial8250_startup(struct uart_port *port)
* the interrupt is enabled. Delays are necessary to
* allow register changes to become visible.
*/
- spin_lock_irqsave(&up->port.lock, flags);
+ spin_lock(&up->port.lock);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ disable_irq(up->port.irq);

wait_for_xmitr(up, UART_LSR_THRE);
serial_out_sync(up, UART_IER, UART_IER_THRI);
@@ -1843,7 +1845,9 @@ static int serial8250_startup(struct uart_port *port)
iir = serial_in(up, UART_IIR);
serial_out(up, UART_IER, 0);

- spin_unlock_irqrestore(&up->port.lock, flags);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ enable_irq(up->port.irq);
+ spin_unlock(&up->port.lock);

/*
* If the interrupt is not reasserted, setup a timer to
--
1.5.5.4

2008-07-01 14:08:59

by Alan

[permalink] [raw]
Subject: Re: [PATCH] serial: 8250: fix shared interrupts issues with SMP and RT kernels

> > again, please let the -rt maintainers sort out which patches need to be
> > propagated to upstream maintainers.
>
> This appears to be not only RT issue though. In theory, this can be

Agreed - RT is showing up a real bug here.

> triggered on SMP also. Thanks to Daniel Walker for pointing this out.

It looks correct to me except that you cannot use spin_lock/disable_irq
in that way safely. You must always disable_irq before taking the lock,
or prove it is safe and use disable_irq_nosync

The reason:
CPU#0 spin_lock_... [taken]
CPU#1 IRQ
CPU#1 spin_lock [waits]
CPU#0 disable_irq (deadlock)

Note that is also not generally safe to do

disable IRQ on device
spin_lock
disable_irq

because IRQ propogation occurs asynchronously to PCI bus traffic even on
PC class systems (especially Pentium-PII era boxes with SMP). You can
disable the device IRQ and still have an IRQ 'in flight' that arrives
afterwards.

So the fix needs some reworking in its ordering I think

Alan

2008-07-01 15:03:14

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH v2] serial: 8250: fix shared interrupts issues with SMP and RT kernels

With SMP kernels _irqsave spinlock disables only local interrupts, while
the shared serial interrupt could be assigned to the CPU that is not
currently starting up the serial port.

This might cause issues because serial8250_startup() routine issues
IRQ-triggering operations before registering the port in the IRQ chain
(though, this is fine to do and done explicitly because we don't want to
process any interrupts on the port startup).

With RT kernels and preemptable hardirqs, _irqsave spinlock does not
disable local hardirqs, and the bug could be reproduced much easily:

$ cat /dev/ttyS0 &
$ cat /dev/ttyS1
irq 42: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[C0475EB0] [C0008A98] show_stack+0x4c/0x1ac (unreliable)
[C0475EF0] [C004BBD4] __report_bad_irq+0x34/0xb8
[C0475F10] [C004BD38] note_interrupt+0xe0/0x308
[C0475F50] [C004B09C] thread_simple_irq+0xdc/0x104
[C0475F70] [C004B3FC] do_irqd+0x338/0x3c8
[C0475FC0] [C00398E0] kthread+0xf8/0x100
[C0475FF0] [C0011FE0] original_kernel_thread+0x44/0x60
handlers:
[<c02112c4>] (serial8250_interrupt+0x0/0x138)
Disabling IRQ #42

After this, all serial ports on the given IRQ are non-functional.

To fix the issue we should explicitly disable shared IRQ before
issuing any IRQ-triggering operations.

I also changed spin_lock_irqsave to the ordinary spin_lock, since it
seems to be safe: chain does not contain new port (yet), thus nobody
will interfere us from the ISRs.

Signed-off-by: Anton Vorontsov <[email protected]>
---

On Tue, Jul 01, 2008 at 02:43:53PM +0100, Alan Cox wrote:
> > > again, please let the -rt maintainers sort out which patches need to be
> > > propagated to upstream maintainers.
> >
> > This appears to be not only RT issue though. In theory, this can be
>
> Agreed - RT is showing up a real bug here.
>
> > triggered on SMP also. Thanks to Daniel Walker for pointing this out.
>
> It looks correct to me except that you cannot use spin_lock/disable_irq
> in that way safely. You must always disable_irq before taking the lock,
> or prove it is safe and use disable_irq_nosync
>
> The reason:
> CPU#0 spin_lock_... [taken]
> CPU#1 IRQ
> CPU#1 spin_lock [waits]
> CPU#0 disable_irq (deadlock)

This deadlock possibility is interesting by itself, thanks for
mentioning it.

But this can't happen here. IRQ will not grab the up->port.lock,
because port isn't registered in the 8250 IRQ handling chain (yet).

As for _nosync, probably this is good idea indeed, and should be safe
AFAICS.

drivers/serial/8250.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index 76ccef7..cad0c2d 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -1831,7 +1831,9 @@ static int serial8250_startup(struct uart_port *port)
* the interrupt is enabled. Delays are necessary to
* allow register changes to become visible.
*/
- spin_lock_irqsave(&up->port.lock, flags);
+ spin_lock(&up->port.lock);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ disable_irq_nosync(up->port.irq);

wait_for_xmitr(up, UART_LSR_THRE);
serial_out_sync(up, UART_IER, UART_IER_THRI);
@@ -1843,7 +1845,9 @@ static int serial8250_startup(struct uart_port *port)
iir = serial_in(up, UART_IIR);
serial_out(up, UART_IER, 0);

- spin_unlock_irqrestore(&up->port.lock, flags);
+ if (up->port.flags & UPF_SHARE_IRQ)
+ enable_irq(up->port.irq);
+ spin_unlock(&up->port.lock);

/*
* If the interrupt is not reasserted, setup a timer to
--
1.5.5.4

2008-07-09 19:29:59

by Anton Vorontsov

[permalink] [raw]
Subject: Re: [PATCH v2] serial: 8250: fix shared interrupts issues with SMP and RT kernels

On Tue, Jul 01, 2008 at 07:02:54PM +0400, Anton Vorontsov wrote:
[...]
> Signed-off-by: Anton Vorontsov <[email protected]>
> ---
>
> On Tue, Jul 01, 2008 at 02:43:53PM +0100, Alan Cox wrote:
[...]
> > It looks correct to me except that you cannot use spin_lock/disable_irq
> > in that way safely. You must always disable_irq before taking the lock,
> > or prove it is safe and use disable_irq_nosync
> >
> > The reason:
> > CPU#0 spin_lock_... [taken]
> > CPU#1 IRQ
> > CPU#1 spin_lock [waits]
> > CPU#0 disable_irq (deadlock)
>
> This deadlock possibility is interesting by itself, thanks for
> mentioning it.
>
> But this can't happen here. IRQ will not grab the up->port.lock,
> because port isn't registered in the 8250 IRQ handling chain (yet).

Alan, are there any issues with the proof or the patch itself?

Thanks,

> As for _nosync, probably this is good idea indeed, and should be safe
> AFAICS.
>
> drivers/serial/8250.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
> index 76ccef7..cad0c2d 100644
> --- a/drivers/serial/8250.c
> +++ b/drivers/serial/8250.c
> @@ -1831,7 +1831,9 @@ static int serial8250_startup(struct uart_port *port)
> * the interrupt is enabled. Delays are necessary to
> * allow register changes to become visible.
> */
> - spin_lock_irqsave(&up->port.lock, flags);
> + spin_lock(&up->port.lock);
> + if (up->port.flags & UPF_SHARE_IRQ)
> + disable_irq_nosync(up->port.irq);
>
> wait_for_xmitr(up, UART_LSR_THRE);
> serial_out_sync(up, UART_IER, UART_IER_THRI);
> @@ -1843,7 +1845,9 @@ static int serial8250_startup(struct uart_port *port)
> iir = serial_in(up, UART_IIR);
> serial_out(up, UART_IER, 0);
>
> - spin_unlock_irqrestore(&up->port.lock, flags);
> + if (up->port.flags & UPF_SHARE_IRQ)
> + enable_irq(up->port.irq);
> + spin_unlock(&up->port.lock);
>
> /*
> * If the interrupt is not reasserted, setup a timer to
> --
> 1.5.5.4
>

--
Anton Vorontsov
email: [email protected]
irc://irc.freenode.net/bd2

2008-07-09 19:57:31

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] serial: 8250: fix shared interrupts issues with SMP and RT kernels

> > But this can't happen here. IRQ will not grab the up->port.lock,
> > because port isn't registered in the 8250 IRQ handling chain (yet).
>
> Alan, are there any issues with the proof or the patch itself?

The logic looks fine, the analysis looks fine.

Alan