Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
connection failed due corrupted Rx payload. The problem was narrowed
down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs
despite LSR having DR bit set, which is precondition for attempting to
start DMA Rx in the first place.
From a debug patch:
[x.807834] 8250irq: iir=cc lsr+saved=60 received=0/15 ier=0f dma_t/rx/err=0/0/0
[x.808676] 8250irq: iir=c2 lsr+saved=61 received=0/0 ier=0f dma_t/rx/err=0/0/0
[x.808776] 8250irq: iir=cc lsr+saved=60 received=1/12 ier=0d dma_t/rx/err=0/1/0
[x.808870] Bluetooth: hci0: Frame reassembly failed (-84)
In the debug snippet, received field indicates 1 byte was transferred
over DMA and 12 bytes after that with the non-DMA Rx. The sole byte DMA
handled was corrupted (gets zeroed) which leads to the HCI failure.
This problem became apparent after commit e8ffbb71f783 ("serial: 8250:
use THRE & __stop_tx also with DMA") changed Tx stop behavior. Tx stop
is now triggered from a THRI interrupt.
Despite that this problem looks like a HW bug, this fix is not adding
UART_BUG_xx flag to the driver beucase it seems useful in general to
avoid starting DMA when there are only a few bytes to transfer.
Skipping DMA for small transfers avoids the extra overhead DMA incurs.
Thus, don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
interrupt which has Rx a related IIR value.
By returning false from handle_rx_dma(), the DMA vs non-DMA decision is
postponed until either UART_IIR_RDI (FIFO threshold worth of bytes
awaiting) or UART_IIR_TIMEOUT (inter-character timeout) triggers at a
later time which allows better to discern whether the number of bytes
warrants starting DMA or not.
Reported-by: Hans de Goede <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Fixes: e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
Cc: [email protected]
Signed-off-by: Ilpo Järvinen <[email protected]>
---
drivers/tty/serial/8250/8250_port.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..3ba9c8b93ae6 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1903,6 +1903,17 @@ EXPORT_SYMBOL_GPL(serial8250_modem_status);
static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
{
switch (iir & 0x3f) {
+ case UART_IIR_THRI:
+ /*
+ * Postpone DMA or not decision to IIR_RDI or IIR_RX_TIMEOUT
+ * because it's impossible to do an informed decision about
+ * that with IIR_THRI.
+ *
+ * This also fixes one known DMA Rx corruption issue where
+ * DR is asserted but DMA Rx only gets a corrupted zero byte
+ * (too early DR?).
+ */
+ return false;
case UART_IIR_RDI:
if (!up->dma->rx_running)
break;
--
2.30.2
Hi,
On 3/17/23 11:30, Ilpo Järvinen wrote:
> Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
> connection failed due corrupted Rx payload. The problem was narrowed
> down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs
> despite LSR having DR bit set, which is precondition for attempting to
> start DMA Rx in the first place.
>
> From a debug patch:
> [x.807834] 8250irq: iir=cc lsr+saved=60 received=0/15 ier=0f dma_t/rx/err=0/0/0
> [x.808676] 8250irq: iir=c2 lsr+saved=61 received=0/0 ier=0f dma_t/rx/err=0/0/0
> [x.808776] 8250irq: iir=cc lsr+saved=60 received=1/12 ier=0d dma_t/rx/err=0/1/0
> [x.808870] Bluetooth: hci0: Frame reassembly failed (-84)
>
> In the debug snippet, received field indicates 1 byte was transferred
> over DMA and 12 bytes after that with the non-DMA Rx. The sole byte DMA
> handled was corrupted (gets zeroed) which leads to the HCI failure.
>
> This problem became apparent after commit e8ffbb71f783 ("serial: 8250:
> use THRE & __stop_tx also with DMA") changed Tx stop behavior. Tx stop
> is now triggered from a THRI interrupt.
>
> Despite that this problem looks like a HW bug, this fix is not adding
> UART_BUG_xx flag to the driver beucase it seems useful in general to
> avoid starting DMA when there are only a few bytes to transfer.
> Skipping DMA for small transfers avoids the extra overhead DMA incurs.
>
> Thus, don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
> interrupt which has Rx a related IIR value.
>
> By returning false from handle_rx_dma(), the DMA vs non-DMA decision is
> postponed until either UART_IIR_RDI (FIFO threshold worth of bytes
> awaiting) or UART_IIR_TIMEOUT (inter-character timeout) triggers at a
> later time which allows better to discern whether the number of bytes
> warrants starting DMA or not.
>
> Reported-by: Hans de Goede <[email protected]>
> Tested-by: Hans de Goede <[email protected]>
> Fixes: e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
> Cc: [email protected]
> Signed-off-by: Ilpo Järvinen <[email protected]>
Thanks, patch looks good to me:
Acked-by: Hans de Goede <[email protected]>
Regards,
Hans
> ---
> drivers/tty/serial/8250/8250_port.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index fa43df05342b..3ba9c8b93ae6 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -1903,6 +1903,17 @@ EXPORT_SYMBOL_GPL(serial8250_modem_status);
> static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
> {
> switch (iir & 0x3f) {
> + case UART_IIR_THRI:
> + /*
> + * Postpone DMA or not decision to IIR_RDI or IIR_RX_TIMEOUT
> + * because it's impossible to do an informed decision about
> + * that with IIR_THRI.
> + *
> + * This also fixes one known DMA Rx corruption issue where
> + * DR is asserted but DMA Rx only gets a corrupted zero byte
> + * (too early DR?).
> + */
> + return false;
> case UART_IIR_RDI:
> if (!up->dma->rx_running)
> break;