2018-09-12 14:54:36

by Phil Elwell

[permalink] [raw]
Subject: [PATCH 0/2] sc16is7xx interrupt fixes

The interrupt handling of the sc16is7xx driver is broken in a number of ways,
as observed by multiple Raspberry Pi users. The attached patches attempt to
address its failings, with apparent success.

The first is a workaround for a side-effect of the switch away from using a
thread IRQ, a change which has necessitated using what is actually a
level-triggered interrupt as if it were edge-triggered. Doing so is fraught
with potential race conditions, but the patch makes them much less likely.

The second is a workaround for a bug in the design of the SC16IS752 which
requires mutual exclusion between the interrupt handler and access to the
Enhanced Features Register.

Phil Elwell (2):
sc16is7xx: Fix for multi-channel stall
sc16is7xx: Fix for "Unexpected interrupt: 8"

drivers/tty/serial/sc16is7xx.c | 50 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 44 insertions(+), 6 deletions(-)

--
2.7.4



2018-09-12 14:54:45

by Phil Elwell

[permalink] [raw]
Subject: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

The SC16IS752 is a dual-channel device. The two channels are largely
independent, but the IRQ signals are wired together as an open-drain,
active low signal which will be driven low while either of the
channels requires attention, which can be for significant periods of
time until operations complete and the interrupt can be acknowledged.
In that respect it is should be treated as a true level-sensitive IRQ.

The kernel, however, needs to be able to exit interrupt context in
order to use I2C or SPI to access the device registers (which may
involve sleeping). Therefore the interrupt needs to be masked out or
paused in some way.

The usual way to manage sleeping from within an interrupt handler
is to use a threaded interrupt handler - a regular interrupt routine
does the minimum amount of work needed to triage the interrupt before
waking the interrupt service thread. If the threaded IRQ is marked as
IRQF_ONESHOT the kernel will automatically mask out the interrupt
until the thread runs to completion. The sc16is7xx driver used to
use a threaded IRQ, but a patch switched to using a kthread_worker
in order to set realtime priorities on the handler thread and for
other optimisations. The end result is non-threaded IRQ that
schedules some work then returns IRQ_HANDLED, making the kernel
think that all IRQ processing has completed.

The work-around to prevent a constant stream of interrupts is to
mark the interrupt as edge-sensitive rather than level-sensitive,
but interpreting an active-low source as a falling-edge source
requires care to prevent a total cessation of interrupts. Whereas
an edge-triggering source will generate a new edge for every interrupt
condition a level-triggering source will keep the signal at the
interrupting level until it no longer requires attention; in other
words, the host won't see another edge until all interrupt conditions
are cleared. It is therefore vital that the interrupt handler does not
exit with an outstanding interrupt condition, otherwise the kernel
will not receive another interrupt unless some other operation causes
the interrupt state on the device to be cleared.

The existing sc16is7xx driver has a very simple interrupt "thread"
(kthread_work job) that processes interrupts on each channel in turn
until there are no more. If both channels are active and the first
channel starts interrupting while the handler for the second channel
is running then it will not be detected and an IRQ stall ensues. This
could be handled easily if there was a shared IRQ status register, or
a convenient way to determine if the IRQ had been deasserted for any
length of time, but both appear to be lacking.

Avoid this problem (or at least make it much less likely to happen)
by reducing the granularity of per-channel interrupt processing
to one condition per iteration, only exiting the overall loop when
both channels are no longer interrupting.

Signed-off-by: Phil Elwell <[email protected]>
---
drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 243c960..47b4115 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -657,7 +657,7 @@ static void sc16is7xx_handle_tx(struct uart_port *port)
uart_write_wakeup(port);
}

-static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
+static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
{
struct uart_port *port = &s->p[portno].port;

@@ -666,7 +666,7 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)

iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG);
if (iir & SC16IS7XX_IIR_NO_INT_BIT)
- break;
+ return false;

iir &= SC16IS7XX_IIR_ID_MASK;

@@ -688,16 +688,23 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
port->line, iir);
break;
}
- } while (1);
+ } while (0);
+ return true;
}

static void sc16is7xx_ist(struct kthread_work *ws)
{
struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
- int i;

- for (i = 0; i < s->devtype->nr_uart; ++i)
- sc16is7xx_port_irq(s, i);
+ while (1) {
+ bool keep_polling = false;
+ int i;
+
+ for (i = 0; i < s->devtype->nr_uart; ++i)
+ keep_polling |= sc16is7xx_port_irq(s, i);
+ if (!keep_polling)
+ break;
+ }
}

static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
--
2.7.4


2018-09-12 14:54:49

by Phil Elwell

[permalink] [raw]
Subject: [PATCH 2/2] sc16is7xx: Fix for "Unexpected interrupt: 8"

The SC16IS752 has an Enhanced Feature Register which is aliased at the
same address as the Interrupt Identification Register; accessing it
requires that a magic value is written to the Line Configuration
Register. If an interrupt is raised while the EFR is mapped in then
the ISR won't be able to access the IIR, leading to the "Unexpected
interrupt" error messages.

Avoid the problem by claiming a mutex around accesses to the EFR
register, also claiming the mutex in the interrupt handler work
item (this is equivalent to disabling interrupts to interlock against
a non-threaded interrupt handler).

See: https://github.com/raspberrypi/linux/issues/2529

Signed-off-by: Phil Elwell <[email protected]>
---
drivers/tty/serial/sc16is7xx.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 47b4115..2680986 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -328,6 +328,7 @@ struct sc16is7xx_port {
struct kthread_worker kworker;
struct task_struct *kworker_task;
struct kthread_work irq_work;
+ struct mutex efr_lock;
struct sc16is7xx_one p[0];
};

@@ -499,6 +500,21 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
div /= 4;
}

+ /* In an amazing feat of design, the Enhanced Features Register shares
+ * the address of the Interrupt Identification Register, and is
+ * switched in by writing a magic value (0xbf) to the Line Control
+ * Register. Any interrupt firing during this time will see the EFR
+ * where it expects the IIR to be, leading to "Unexpected interrupt"
+ * messages.
+ *
+ * Prevent this possibility by claiming a mutex while accessing the
+ * EFR, and claiming the same mutex from within the interrupt handler.
+ * This is similar to disabling the interrupt, but that doesn't work
+ * because the bulk of the interrupt processing is run as a workqueue
+ * job in thread context.
+ */
+ mutex_lock(&s->efr_lock);
+
lcr = sc16is7xx_port_read(port, SC16IS7XX_LCR_REG);

/* Open the LCR divisors for configuration */
@@ -514,6 +530,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
/* Put LCR back to the normal mode */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);

+ mutex_unlock(&s->efr_lock);
+
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG,
SC16IS7XX_MCR_CLKSEL_BIT,
prescaler);
@@ -696,6 +714,8 @@ static void sc16is7xx_ist(struct kthread_work *ws)
{
struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);

+ mutex_lock(&s->efr_lock);
+
while (1) {
bool keep_polling = false;
int i;
@@ -705,6 +725,8 @@ static void sc16is7xx_ist(struct kthread_work *ws)
if (!keep_polling)
break;
}
+
+ mutex_unlock(&s->efr_lock);
}

static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
@@ -899,6 +921,9 @@ static void sc16is7xx_set_termios(struct uart_port *port,
if (!(termios->c_cflag & CREAD))
port->ignore_status_mask |= SC16IS7XX_LSR_BRK_ERROR_MASK;

+ /* As above, claim the mutex while accessing the EFR. */
+ mutex_lock(&s->efr_lock);
+
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
SC16IS7XX_LCR_CONF_MODE_B);

@@ -920,6 +945,8 @@ static void sc16is7xx_set_termios(struct uart_port *port,
/* Update LCR register */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);

+ mutex_unlock(&s->efr_lock);
+
/* Get baud rate generator configuration */
baud = uart_get_baud_rate(port, termios, old,
port->uartclk / 16 / 4 / 0xffff,
@@ -1185,6 +1212,7 @@ static int sc16is7xx_probe(struct device *dev,
s->regmap = regmap;
s->devtype = devtype;
dev_set_drvdata(dev, s);
+ mutex_init(&s->efr_lock);

kthread_init_worker(&s->kworker);
kthread_init_work(&s->irq_work, sc16is7xx_ist);
--
2.7.4


2018-09-18 13:04:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

On Wed, Sep 12, 2018 at 03:31:55PM +0100, Phil Elwell wrote:
> The SC16IS752 is a dual-channel device. The two channels are largely
> independent, but the IRQ signals are wired together as an open-drain,
> active low signal which will be driven low while either of the
> channels requires attention, which can be for significant periods of
> time until operations complete and the interrupt can be acknowledged.
> In that respect it is should be treated as a true level-sensitive IRQ.
>
> The kernel, however, needs to be able to exit interrupt context in
> order to use I2C or SPI to access the device registers (which may
> involve sleeping). Therefore the interrupt needs to be masked out or
> paused in some way.
>
> The usual way to manage sleeping from within an interrupt handler
> is to use a threaded interrupt handler - a regular interrupt routine
> does the minimum amount of work needed to triage the interrupt before
> waking the interrupt service thread. If the threaded IRQ is marked as
> IRQF_ONESHOT the kernel will automatically mask out the interrupt
> until the thread runs to completion. The sc16is7xx driver used to
> use a threaded IRQ, but a patch switched to using a kthread_worker
> in order to set realtime priorities on the handler thread and for
> other optimisations. The end result is non-threaded IRQ that
> schedules some work then returns IRQ_HANDLED, making the kernel
> think that all IRQ processing has completed.
>
> The work-around to prevent a constant stream of interrupts is to
> mark the interrupt as edge-sensitive rather than level-sensitive,
> but interpreting an active-low source as a falling-edge source
> requires care to prevent a total cessation of interrupts. Whereas
> an edge-triggering source will generate a new edge for every interrupt
> condition a level-triggering source will keep the signal at the
> interrupting level until it no longer requires attention; in other
> words, the host won't see another edge until all interrupt conditions
> are cleared. It is therefore vital that the interrupt handler does not
> exit with an outstanding interrupt condition, otherwise the kernel
> will not receive another interrupt unless some other operation causes
> the interrupt state on the device to be cleared.
>
> The existing sc16is7xx driver has a very simple interrupt "thread"
> (kthread_work job) that processes interrupts on each channel in turn
> until there are no more. If both channels are active and the first
> channel starts interrupting while the handler for the second channel
> is running then it will not be detected and an IRQ stall ensues. This
> could be handled easily if there was a shared IRQ status register, or
> a convenient way to determine if the IRQ had been deasserted for any
> length of time, but both appear to be lacking.
>
> Avoid this problem (or at least make it much less likely to happen)
> by reducing the granularity of per-channel interrupt processing
> to one condition per iteration, only exiting the overall loop when
> both channels are no longer interrupting.
>
> Signed-off-by: Phil Elwell <[email protected]>
> ---
> drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index 243c960..47b4115 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -657,7 +657,7 @@ static void sc16is7xx_handle_tx(struct uart_port *port)
> uart_write_wakeup(port);
> }
>
> -static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> +static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> {
> struct uart_port *port = &s->p[portno].port;
>
> @@ -666,7 +666,7 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>
> iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG);
> if (iir & SC16IS7XX_IIR_NO_INT_BIT)
> - break;
> + return false;
>
> iir &= SC16IS7XX_IIR_ID_MASK;
>
> @@ -688,16 +688,23 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> port->line, iir);
> break;
> }
> - } while (1);
> + } while (0);
> + return true;
> }
>
> static void sc16is7xx_ist(struct kthread_work *ws)
> {
> struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
> - int i;
>
> - for (i = 0; i < s->devtype->nr_uart; ++i)
> - sc16is7xx_port_irq(s, i);
> + while (1) {
> + bool keep_polling = false;
> + int i;
> +
> + for (i = 0; i < s->devtype->nr_uart; ++i)
> + keep_polling |= sc16is7xx_port_irq(s, i);
> + if (!keep_polling)
> + break;

This makes me worried, there is no "timeout" now? What happens if this
never happens, will you just sit and spin forever? What prevents that?

thanks,

greg k-h

2018-09-18 13:16:08

by Phil Elwell

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

Hi Greg,

On 18/09/2018 14:02, Greg Kroah-Hartman wrote:
> On Wed, Sep 12, 2018 at 03:31:55PM +0100, Phil Elwell wrote:
>> The SC16IS752 is a dual-channel device. The two channels are largely
>> independent, but the IRQ signals are wired together as an open-drain,
>> active low signal which will be driven low while either of the
>> channels requires attention, which can be for significant periods of
>> time until operations complete and the interrupt can be acknowledged.
>> In that respect it is should be treated as a true level-sensitive IRQ.
>>
>> The kernel, however, needs to be able to exit interrupt context in
>> order to use I2C or SPI to access the device registers (which may
>> involve sleeping). Therefore the interrupt needs to be masked out or
>> paused in some way.
>>
>> The usual way to manage sleeping from within an interrupt handler
>> is to use a threaded interrupt handler - a regular interrupt routine
>> does the minimum amount of work needed to triage the interrupt before
>> waking the interrupt service thread. If the threaded IRQ is marked as
>> IRQF_ONESHOT the kernel will automatically mask out the interrupt
>> until the thread runs to completion. The sc16is7xx driver used to
>> use a threaded IRQ, but a patch switched to using a kthread_worker
>> in order to set realtime priorities on the handler thread and for
>> other optimisations. The end result is non-threaded IRQ that
>> schedules some work then returns IRQ_HANDLED, making the kernel
>> think that all IRQ processing has completed.
>>
>> The work-around to prevent a constant stream of interrupts is to
>> mark the interrupt as edge-sensitive rather than level-sensitive,
>> but interpreting an active-low source as a falling-edge source
>> requires care to prevent a total cessation of interrupts. Whereas
>> an edge-triggering source will generate a new edge for every interrupt
>> condition a level-triggering source will keep the signal at the
>> interrupting level until it no longer requires attention; in other
>> words, the host won't see another edge until all interrupt conditions
>> are cleared. It is therefore vital that the interrupt handler does not
>> exit with an outstanding interrupt condition, otherwise the kernel
>> will not receive another interrupt unless some other operation causes
>> the interrupt state on the device to be cleared.
>>
>> The existing sc16is7xx driver has a very simple interrupt "thread"
>> (kthread_work job) that processes interrupts on each channel in turn
>> until there are no more. If both channels are active and the first
>> channel starts interrupting while the handler for the second channel
>> is running then it will not be detected and an IRQ stall ensues. This
>> could be handled easily if there was a shared IRQ status register, or
>> a convenient way to determine if the IRQ had been deasserted for any
>> length of time, but both appear to be lacking.
>>
>> Avoid this problem (or at least make it much less likely to happen)
>> by reducing the granularity of per-channel interrupt processing
>> to one condition per iteration, only exiting the overall loop when
>> both channels are no longer interrupting.
>>
>> Signed-off-by: Phil Elwell <[email protected]>
>> ---
>> drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
>> index 243c960..47b4115 100644
>> --- a/drivers/tty/serial/sc16is7xx.c
>> +++ b/drivers/tty/serial/sc16is7xx.c
>> @@ -657,7 +657,7 @@ static void sc16is7xx_handle_tx(struct uart_port *port)
>> uart_write_wakeup(port);
>> }
>>
>> -static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>> +static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>> {
>> struct uart_port *port = &s->p[portno].port;
>>
>> @@ -666,7 +666,7 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>>
>> iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG);
>> if (iir & SC16IS7XX_IIR_NO_INT_BIT)
>> - break;
>> + return false;
>>
>> iir &= SC16IS7XX_IIR_ID_MASK;
>>
>> @@ -688,16 +688,23 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>> port->line, iir);
>> break;
>> }
>> - } while (1);
>> + } while (0);
>> + return true;
>> }
>>
>> static void sc16is7xx_ist(struct kthread_work *ws)
>> {
>> struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
>> - int i;
>>
>> - for (i = 0; i < s->devtype->nr_uart; ++i)
>> - sc16is7xx_port_irq(s, i);
>> + while (1) {
>> + bool keep_polling = false;
>> + int i;
>> +
>> + for (i = 0; i < s->devtype->nr_uart; ++i)
>> + keep_polling |= sc16is7xx_port_irq(s, i);
>> + if (!keep_polling)
>> + break;
>
> This makes me worried, there is no "timeout" now? What happens if this
> never happens, will you just sit and spin forever? What prevents that?

The patch is keeping to the spirit of the original, which also has a
potentially infinite loop (in sc16is7xx_port_irq) - this just moves it
up one level.

I could add a limit on the number of iterations, but if the limit is ever hit,
leading to an early exit, the port is basically dead because it will never
receive another interrupt.

Phil

2018-09-18 13:29:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

On Tue, Sep 18, 2018 at 02:13:15PM +0100, Phil Elwell wrote:
> Hi Greg,
>
> On 18/09/2018 14:02, Greg Kroah-Hartman wrote:
> > On Wed, Sep 12, 2018 at 03:31:55PM +0100, Phil Elwell wrote:
> >> The SC16IS752 is a dual-channel device. The two channels are largely
> >> independent, but the IRQ signals are wired together as an open-drain,
> >> active low signal which will be driven low while either of the
> >> channels requires attention, which can be for significant periods of
> >> time until operations complete and the interrupt can be acknowledged.
> >> In that respect it is should be treated as a true level-sensitive IRQ.
> >>
> >> The kernel, however, needs to be able to exit interrupt context in
> >> order to use I2C or SPI to access the device registers (which may
> >> involve sleeping). Therefore the interrupt needs to be masked out or
> >> paused in some way.
> >>
> >> The usual way to manage sleeping from within an interrupt handler
> >> is to use a threaded interrupt handler - a regular interrupt routine
> >> does the minimum amount of work needed to triage the interrupt before
> >> waking the interrupt service thread. If the threaded IRQ is marked as
> >> IRQF_ONESHOT the kernel will automatically mask out the interrupt
> >> until the thread runs to completion. The sc16is7xx driver used to
> >> use a threaded IRQ, but a patch switched to using a kthread_worker
> >> in order to set realtime priorities on the handler thread and for
> >> other optimisations. The end result is non-threaded IRQ that
> >> schedules some work then returns IRQ_HANDLED, making the kernel
> >> think that all IRQ processing has completed.
> >>
> >> The work-around to prevent a constant stream of interrupts is to
> >> mark the interrupt as edge-sensitive rather than level-sensitive,
> >> but interpreting an active-low source as a falling-edge source
> >> requires care to prevent a total cessation of interrupts. Whereas
> >> an edge-triggering source will generate a new edge for every interrupt
> >> condition a level-triggering source will keep the signal at the
> >> interrupting level until it no longer requires attention; in other
> >> words, the host won't see another edge until all interrupt conditions
> >> are cleared. It is therefore vital that the interrupt handler does not
> >> exit with an outstanding interrupt condition, otherwise the kernel
> >> will not receive another interrupt unless some other operation causes
> >> the interrupt state on the device to be cleared.
> >>
> >> The existing sc16is7xx driver has a very simple interrupt "thread"
> >> (kthread_work job) that processes interrupts on each channel in turn
> >> until there are no more. If both channels are active and the first
> >> channel starts interrupting while the handler for the second channel
> >> is running then it will not be detected and an IRQ stall ensues. This
> >> could be handled easily if there was a shared IRQ status register, or
> >> a convenient way to determine if the IRQ had been deasserted for any
> >> length of time, but both appear to be lacking.
> >>
> >> Avoid this problem (or at least make it much less likely to happen)
> >> by reducing the granularity of per-channel interrupt processing
> >> to one condition per iteration, only exiting the overall loop when
> >> both channels are no longer interrupting.
> >>
> >> Signed-off-by: Phil Elwell <[email protected]>
> >> ---
> >> drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
> >> 1 file changed, 13 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> >> index 243c960..47b4115 100644
> >> --- a/drivers/tty/serial/sc16is7xx.c
> >> +++ b/drivers/tty/serial/sc16is7xx.c
> >> @@ -657,7 +657,7 @@ static void sc16is7xx_handle_tx(struct uart_port *port)
> >> uart_write_wakeup(port);
> >> }
> >>
> >> -static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> >> +static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> >> {
> >> struct uart_port *port = &s->p[portno].port;
> >>
> >> @@ -666,7 +666,7 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> >>
> >> iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG);
> >> if (iir & SC16IS7XX_IIR_NO_INT_BIT)
> >> - break;
> >> + return false;
> >>
> >> iir &= SC16IS7XX_IIR_ID_MASK;
> >>
> >> @@ -688,16 +688,23 @@ static void sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
> >> port->line, iir);
> >> break;
> >> }
> >> - } while (1);
> >> + } while (0);
> >> + return true;
> >> }
> >>
> >> static void sc16is7xx_ist(struct kthread_work *ws)
> >> {
> >> struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
> >> - int i;
> >>
> >> - for (i = 0; i < s->devtype->nr_uart; ++i)
> >> - sc16is7xx_port_irq(s, i);
> >> + while (1) {
> >> + bool keep_polling = false;
> >> + int i;
> >> +
> >> + for (i = 0; i < s->devtype->nr_uart; ++i)
> >> + keep_polling |= sc16is7xx_port_irq(s, i);
> >> + if (!keep_polling)
> >> + break;
> >
> > This makes me worried, there is no "timeout" now? What happens if this
> > never happens, will you just sit and spin forever? What prevents that?
>
> The patch is keeping to the spirit of the original, which also has a
> potentially infinite loop (in sc16is7xx_port_irq) - this just moves it
> up one level.
>
> I could add a limit on the number of iterations, but if the limit is ever hit,
> leading to an early exit, the port is basically dead because it will never
> receive another interrupt.

Ok, it's your hardware, good luck with it! :)

greg k-h

2018-09-18 13:39:27

by Rogier Wolff

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

On Tue, Sep 18, 2018 at 02:13:15PM +0100, Phil Elwell wrote:
> I could add a limit on the number of iterations, but if the limit is ever hit,
> leading to an early exit, the port is basically dead because it will never
> receive another interrupt.

Especially if you print something like: "<driver name>: Too many
iterations with work-to-do bailing out" while bailing out, then
hanging just one driver/piece of hardware as opposed to the whole
system when somehow the hardware never indicates "all work done"
would be preferable.

Under normal circumstances you never expect to hit that number of
iterations. But if the card keeps hitting the driver with "more work
to do" then you'd hang the system. Better try and recover, and provide
debug info for the user who knows where to look.

Best would be to ignore the driver for say a second and start handling
interrupts again a while later. Should the system be overloaded with
work, (and a slow CPU?) then you can recover and just make things slow
down a bit without hanging the system.

Getting edge-triggered interrupts right is VERY difficult. In general
I'd advise against them. It looks like a nice solution, but in reality
the chances for difficult-to-debug race conditions is enormous. In
those race conditions the card will get "new work to do" and
(re-)assert the interrupt line when the driver is already on the "no
more work to do" path.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.

2018-09-29 14:05:13

by Andreas Färber

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

Hi Phil and Greg,

Am 12.09.18 um 16:31 schrieb Phil Elwell:
> The SC16IS752 is a dual-channel device. The two channels are largely
> independent, but the IRQ signals are wired together as an open-drain,
> active low signal which will be driven low while either of the
> channels requires attention, which can be for significant periods of
> time until operations complete and the interrupt can be acknowledged.
> In that respect it is should be treated as a true level-sensitive IRQ.
>
> The kernel, however, needs to be able to exit interrupt context in
> order to use I2C or SPI to access the device registers (which may
> involve sleeping). Therefore the interrupt needs to be masked out or
> paused in some way.
>
> The usual way to manage sleeping from within an interrupt handler
> is to use a threaded interrupt handler - a regular interrupt routine
> does the minimum amount of work needed to triage the interrupt before
> waking the interrupt service thread. If the threaded IRQ is marked as
> IRQF_ONESHOT the kernel will automatically mask out the interrupt
> until the thread runs to completion. The sc16is7xx driver used to
> use a threaded IRQ, but a patch switched to using a kthread_worker
> in order to set realtime priorities on the handler thread and for
> other optimisations. The end result is non-threaded IRQ that
> schedules some work then returns IRQ_HANDLED, making the kernel
> think that all IRQ processing has completed.
>
> The work-around to prevent a constant stream of interrupts is to
> mark the interrupt as edge-sensitive rather than level-sensitive,
> but interpreting an active-low source as a falling-edge source
> requires care to prevent a total cessation of interrupts. Whereas
> an edge-triggering source will generate a new edge for every interrupt
> condition a level-triggering source will keep the signal at the
> interrupting level until it no longer requires attention; in other
> words, the host won't see another edge until all interrupt conditions
> are cleared. It is therefore vital that the interrupt handler does not
> exit with an outstanding interrupt condition, otherwise the kernel
> will not receive another interrupt unless some other operation causes
> the interrupt state on the device to be cleared.
>
> The existing sc16is7xx driver has a very simple interrupt "thread"
> (kthread_work job) that processes interrupts on each channel in turn
> until there are no more. If both channels are active and the first
> channel starts interrupting while the handler for the second channel
> is running then it will not be detected and an IRQ stall ensues. This
> could be handled easily if there was a shared IRQ status register, or
> a convenient way to determine if the IRQ had been deasserted for any
> length of time, but both appear to be lacking.
>
> Avoid this problem (or at least make it much less likely to happen)
> by reducing the granularity of per-channel interrupt processing
> to one condition per iteration, only exiting the overall loop when
> both channels are no longer interrupting.
>
> Signed-off-by: Phil Elwell <[email protected]>
> ---
> drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)

These two patches seem to be applied in linux-next tree, but are lacking
a Fixes: header for backporting to affected stable trees.

openSUSE Tumbleweed's 4.18 appears to be affected, and I didn't see it
in linux.git for upcoming 4.19 either.

Can the commit message still be updated to get this fixed everywhere?

Which is the offending commit, this one from 2016? The only other 2018
change seems an unrelated error handling cleanup.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/tty/serial/sc16is7xx.c?id=04da73803c05dc1150ccc31cbf93e8cd56679c09

Also, the above commit message confuses me as to how after this commit
the driver should be used: Should the interrupt still be falling-edge,
as documented in the DT bindings, or do user/documentation need to
switch to level-triggered now?

Thanks,
Andreas

--
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

2018-10-02 20:39:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

On Sat, Sep 29, 2018 at 04:04:48PM +0200, Andreas F?rber wrote:
> Hi Phil and Greg,
>
> Am 12.09.18 um 16:31 schrieb Phil Elwell:
> > The SC16IS752 is a dual-channel device. The two channels are largely
> > independent, but the IRQ signals are wired together as an open-drain,
> > active low signal which will be driven low while either of the
> > channels requires attention, which can be for significant periods of
> > time until operations complete and the interrupt can be acknowledged.
> > In that respect it is should be treated as a true level-sensitive IRQ.
> >
> > The kernel, however, needs to be able to exit interrupt context in
> > order to use I2C or SPI to access the device registers (which may
> > involve sleeping). Therefore the interrupt needs to be masked out or
> > paused in some way.
> >
> > The usual way to manage sleeping from within an interrupt handler
> > is to use a threaded interrupt handler - a regular interrupt routine
> > does the minimum amount of work needed to triage the interrupt before
> > waking the interrupt service thread. If the threaded IRQ is marked as
> > IRQF_ONESHOT the kernel will automatically mask out the interrupt
> > until the thread runs to completion. The sc16is7xx driver used to
> > use a threaded IRQ, but a patch switched to using a kthread_worker
> > in order to set realtime priorities on the handler thread and for
> > other optimisations. The end result is non-threaded IRQ that
> > schedules some work then returns IRQ_HANDLED, making the kernel
> > think that all IRQ processing has completed.
> >
> > The work-around to prevent a constant stream of interrupts is to
> > mark the interrupt as edge-sensitive rather than level-sensitive,
> > but interpreting an active-low source as a falling-edge source
> > requires care to prevent a total cessation of interrupts. Whereas
> > an edge-triggering source will generate a new edge for every interrupt
> > condition a level-triggering source will keep the signal at the
> > interrupting level until it no longer requires attention; in other
> > words, the host won't see another edge until all interrupt conditions
> > are cleared. It is therefore vital that the interrupt handler does not
> > exit with an outstanding interrupt condition, otherwise the kernel
> > will not receive another interrupt unless some other operation causes
> > the interrupt state on the device to be cleared.
> >
> > The existing sc16is7xx driver has a very simple interrupt "thread"
> > (kthread_work job) that processes interrupts on each channel in turn
> > until there are no more. If both channels are active and the first
> > channel starts interrupting while the handler for the second channel
> > is running then it will not be detected and an IRQ stall ensues. This
> > could be handled easily if there was a shared IRQ status register, or
> > a convenient way to determine if the IRQ had been deasserted for any
> > length of time, but both appear to be lacking.
> >
> > Avoid this problem (or at least make it much less likely to happen)
> > by reducing the granularity of per-channel interrupt processing
> > to one condition per iteration, only exiting the overall loop when
> > both channels are no longer interrupting.
> >
> > Signed-off-by: Phil Elwell <[email protected]>
> > ---
> > drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
> > 1 file changed, 13 insertions(+), 6 deletions(-)
>
> These two patches seem to be applied in linux-next tree, but are lacking
> a Fixes: header for backporting to affected stable trees.
>
> openSUSE Tumbleweed's 4.18 appears to be affected, and I didn't see it
> in linux.git for upcoming 4.19 either.
>
> Can the commit message still be updated to get this fixed everywhere?

I can't change the tree, sorry, I can not rebase.

If you want these applied to the stable tree, please email
[email protected] when they hit Linus's tree with the git commit
ids and I will be glad to queue them up then.

thanks,

greg k-h