2023-03-10 17:21:13

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH 0/2] Fix TX/RX interrupt handling

Previously a patch series was sent up to change the way that DROM was read
to prefer directly from NVM instead of bit banging.

This series was produced due to issues found where TBT3 DROM CRC wouldn't
match. In looking at it from USB4 analyzer the DROM wasn't corrupted
before it arrived at the router. In analyzing the failure mode, every
single failure occurred during a retried TX because RX interrupt
"never came".

This was actually a smoking gun; when the hardware responded too quickly
both TX and RX interrupt status bits were set before the ISR would run.
By the ISR using auto clear on read to process the TX this would make the
RX interrupt bit get lost and the RX interrupt was never handled.

To fix this issue, disable auto clear in the ISR and instead only clear
the interrupt that is actually triggering the ISR.

This fixes the communication for a long series of transactions such as
bit banging and probably also fixes other situations that control transfers
were retried a number of times due to a missing RX.

Mario Limonciello (2):
thunderbolt: Use const qualifier for `ring_interrupt_index`
thunderbolt: Disable interrupt auto clear for rings

drivers/thunderbolt/nhi.c | 42 +++++++++++++++++++++-------------
drivers/thunderbolt/nhi_regs.h | 6 +++--
2 files changed, 30 insertions(+), 18 deletions(-)

--
2.25.1



2023-03-10 17:21:19

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH 1/2] thunderbolt: Use const qualifier for `ring_interrupt_index`

`ring_interrupt_index` doesn't change the data for `ring` so
mark it as const.

Cc: Sanju Mehta <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/thunderbolt/nhi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 4dce2edd86ea0..fdc0c3ba2ef01 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -46,7 +46,7 @@
#define QUIRK_AUTO_CLEAR_INT BIT(0)
#define QUIRK_E2E BIT(1)

-static int ring_interrupt_index(struct tb_ring *ring)
+static int ring_interrupt_index(const struct tb_ring *ring)
{
int bit = ring->hop;
if (!ring->is_tx)
--
2.25.1


2023-03-10 17:21:26

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH 2/2] thunderbolt: Disable interrupt auto clear for rings

When interrupt auto clear is programmed, any read to the interrupt
status register will clear all interrupts. If two interrupts have
come in before one can be serviced then this will cause lost interrupts.

On AMD USB4 routers this has manifested in odd problems particularly
with long strings of control tranfers such as reading the DROM via bit
banging.

Instead of clearing interrupts automatically, clear the bit corresponding
to the given ring's interrupt in the ISR.

Fixes: 7a1808f82a37 ("thunderbolt: Handle ring interrupt by reading interrupt status register")
Cc: Sanju Mehta <[email protected]>
Tested-by: Anson Tsao <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/thunderbolt/nhi.c | 40 +++++++++++++++++++++-------------
drivers/thunderbolt/nhi_regs.h | 6 +++--
2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index fdc0c3ba2ef01..318d20bd5b695 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -71,24 +71,31 @@ static void ring_interrupt_active(struct tb_ring *ring, bool active)
u32 step, shift, ivr, misc;
void __iomem *ivr_base;
int index;
+ int bit;

if (ring->is_tx)
index = ring->hop;
else
index = ring->hop + ring->nhi->hop_count;

- if (ring->nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
- /*
- * Ask the hardware to clear interrupt status
- * bits automatically since we already know
- * which interrupt was triggered.
- */
- misc = ioread32(ring->nhi->iobase + REG_DMA_MISC);
- if (!(misc & REG_DMA_MISC_INT_AUTO_CLEAR)) {
- misc |= REG_DMA_MISC_INT_AUTO_CLEAR;
- iowrite32(misc, ring->nhi->iobase + REG_DMA_MISC);
- }
- }
+ /*
+ * Intel routers support a bit that isn't part of
+ * the USB4 spec to ask the hardware to clear
+ * interrupt status bits automatically since
+ * we already know which interrupt was triggered.
+ *
+ * Other routers explicitly disable auto-clear
+ * to prevent conditions that may occur where two
+ * MSIX interrupts are simultaneously active and
+ * reading the register clears both of them.
+ */
+ misc = ioread32(ring->nhi->iobase + REG_DMA_MISC);
+ if (ring->nhi->quirks & QUIRK_AUTO_CLEAR_INT)
+ bit = REG_DMA_MISC_INT_AUTO_CLEAR;
+ else
+ bit = REG_DMA_MISC_DISABLE_AUTO_CLEAR;
+ if (!(misc & bit))
+ iowrite32(misc | bit, ring->nhi->iobase + REG_DMA_MISC);

ivr_base = ring->nhi->iobase + REG_INT_VEC_ALLOC_BASE;
step = index / REG_INT_VEC_ALLOC_REGS * REG_INT_VEC_ALLOC_BITS;
@@ -393,14 +400,17 @@ EXPORT_SYMBOL_GPL(tb_ring_poll_complete);

static void ring_clear_msix(const struct tb_ring *ring)
{
+ int bit;
+
if (ring->nhi->quirks & QUIRK_AUTO_CLEAR_INT)
return;

+ bit = ring_interrupt_index(ring) & 31;
if (ring->is_tx)
- ioread32(ring->nhi->iobase + REG_RING_NOTIFY_BASE);
+ iowrite32(BIT(bit), ring->nhi->iobase + REG_RING_INT_CLEAR);
else
- ioread32(ring->nhi->iobase + REG_RING_NOTIFY_BASE +
- 4 * (ring->nhi->hop_count / 32));
+ iowrite32(BIT(bit), ring->nhi->iobase + REG_RING_INT_CLEAR +
+ 4 * (ring->nhi->hop_count / 32));
}

static irqreturn_t ring_msix(int irq, void *data)
diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 0d4970dcef842..faef165a919cc 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -77,12 +77,13 @@ struct ring_desc {

/*
* three bitfields: tx, rx, rx overflow
- * Every bitfield contains one bit for every hop (REG_HOP_COUNT). Registers are
- * cleared on read. New interrupts are fired only after ALL registers have been
+ * Every bitfield contains one bit for every hop (REG_HOP_COUNT).
+ * New interrupts are fired only after ALL registers have been
* read (even those containing only disabled rings).
*/
#define REG_RING_NOTIFY_BASE 0x37800
#define RING_NOTIFY_REG_COUNT(nhi) ((31 + 3 * nhi->hop_count) / 32)
+#define REG_RING_INT_CLEAR 0x37808

/*
* two bitfields: rx, tx
@@ -105,6 +106,7 @@ struct ring_desc {

#define REG_DMA_MISC 0x39864
#define REG_DMA_MISC_INT_AUTO_CLEAR BIT(2)
+#define REG_DMA_MISC_DISABLE_AUTO_CLEAR BIT(17)

#define REG_INMAIL_DATA 0x39900

--
2.25.1


2023-03-13 05:46:20

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 2/2] thunderbolt: Disable interrupt auto clear for rings

Hi Mario,

On Fri, Mar 10, 2023 at 11:20:50AM -0600, Mario Limonciello wrote:
> When interrupt auto clear is programmed, any read to the interrupt
> status register will clear all interrupts. If two interrupts have
> come in before one can be serviced then this will cause lost interrupts.
>
> On AMD USB4 routers this has manifested in odd problems particularly
> with long strings of control tranfers such as reading the DROM via bit
> banging.

Nice catch! Does this mean we can drop [1] now?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/commit/?h=next&id=8d7f459107f74fbbdde3dd5b3874d2e748cb8a21

I would still like to keep the nice refactor you did for the DROM
parsing, though.

2023-03-14 14:28:36

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix TX/RX interrupt handling

Hi Mario,

On Fri, Mar 10, 2023 at 11:20:48AM -0600, Mario Limonciello wrote:
> Previously a patch series was sent up to change the way that DROM was read
> to prefer directly from NVM instead of bit banging.
>
> This series was produced due to issues found where TBT3 DROM CRC wouldn't
> match. In looking at it from USB4 analyzer the DROM wasn't corrupted
> before it arrived at the router. In analyzing the failure mode, every
> single failure occurred during a retried TX because RX interrupt
> "never came".
>
> This was actually a smoking gun; when the hardware responded too quickly
> both TX and RX interrupt status bits were set before the ISR would run.
> By the ISR using auto clear on read to process the TX this would make the
> RX interrupt bit get lost and the RX interrupt was never handled.
>
> To fix this issue, disable auto clear in the ISR and instead only clear
> the interrupt that is actually triggering the ISR.
>
> This fixes the communication for a long series of transactions such as
> bit banging and probably also fixes other situations that control transfers
> were retried a number of times due to a missing RX.
>
> Mario Limonciello (2):
> thunderbolt: Use const qualifier for `ring_interrupt_index`
> thunderbolt: Disable interrupt auto clear for rings

Applied both to thunderbolt.git/fixes for v6.3-rc and marked them for
stable as well. Thanks! I dropped the other patch that adjusted the NVM
reading as now it is not necessary anymore (please correct me if I'm
mistaken).