2024-05-30 22:46:28

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 0/7] serial: qcom-geni: Overhaul TX handling to fix crashes/hangs


While trying to reproduce -EBUSY errors that our lab was getting in
suspend/resume testing, I ended up finding a whole pile of problems
with the Qualcomm GENI serial driver. I've posted a fix for the -EBUSY
issue separately [1]. This series is fixing all of the Qualcomm GENI
problems that I found.

As far as I can tell most of the problems have been in the Qualcomm
GENI serial driver since inception, but it can be noted that the
behavior got worse with the new kfifo changes. Previously when the OS
took data out of the circular queue we'd just spit stale data onto the
serial port. Now we'll hard lockup. :-P

I've tried to break this series up as much as possible to make it
easier to understand but the final patch is still a lot of change at
once. Hopefully it's OK.

[1] https://lore.kernel.org/r/20240530084841.v2.1.I2395e66cf70c6e67d774c56943825c289b9c13e4@changeid

Changes in v2:
- soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers
- serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()
- serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()
- serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()
- serial: qcom-geni: Just set the watermark level once
- Totally rework / rename patch to handle suspend while active xfer
- serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

Douglas Anderson (7):
soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers
serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()
serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()
serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()
serial: qcom-geni: Just set the watermark level once
serial: qcom-geni: Fix suspend while active UART xfer
serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

drivers/tty/serial/qcom_geni_serial.c | 316 ++++++++++++++++----------
include/linux/soc/qcom/geni-se.h | 6 +
2 files changed, 203 insertions(+), 119 deletions(-)

--
2.45.1.288.g0e0cd299f1-goog



2024-05-30 22:46:37

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 1/7] soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers

For UART devices the M_GP_LENGTH is the TX word count. For other
devices this is the transaction word count.

For UART devices the S_GP_LENGTH is the RX word count.

The IRQ_EN set/clear registers allow you to set or clear bits in the
IRQ_EN register without needing a read-modify-write.

Signed-off-by: Douglas Anderson <[email protected]>
---
Since these new definitions are used in the future UART patches the
hope is that they could be acked by Qualcomm folks and then go through
the same tree as the UART patches that need them.

Changes in v2:
- New

include/linux/soc/qcom/geni-se.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/linux/soc/qcom/geni-se.h b/include/linux/soc/qcom/geni-se.h
index 0f038a1a0330..8d07c442029b 100644
--- a/include/linux/soc/qcom/geni-se.h
+++ b/include/linux/soc/qcom/geni-se.h
@@ -88,11 +88,15 @@ struct geni_se {
#define SE_GENI_M_IRQ_STATUS 0x610
#define SE_GENI_M_IRQ_EN 0x614
#define SE_GENI_M_IRQ_CLEAR 0x618
+#define SE_GENI_M_IRQ_EN_SET 0x61c
+#define SE_GENI_M_IRQ_EN_CLEAR 0x620
#define SE_GENI_S_CMD0 0x630
#define SE_GENI_S_CMD_CTRL_REG 0x634
#define SE_GENI_S_IRQ_STATUS 0x640
#define SE_GENI_S_IRQ_EN 0x644
#define SE_GENI_S_IRQ_CLEAR 0x648
+#define SE_GENI_S_IRQ_EN_SET 0x64c
+#define SE_GENI_S_IRQ_EN_CLEAR 0x650
#define SE_GENI_TX_FIFOn 0x700
#define SE_GENI_RX_FIFOn 0x780
#define SE_GENI_TX_FIFO_STATUS 0x800
@@ -101,6 +105,8 @@ struct geni_se {
#define SE_GENI_RX_WATERMARK_REG 0x810
#define SE_GENI_RX_RFR_WATERMARK_REG 0x814
#define SE_GENI_IOS 0x908
+#define SE_GENI_M_GP_LENGTH 0x910
+#define SE_GENI_S_GP_LENGTH 0x914
#define SE_DMA_TX_IRQ_STAT 0xc40
#define SE_DMA_TX_IRQ_CLR 0xc44
#define SE_DMA_TX_FSM_RST 0xc58
--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:46:55

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 2/7] serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()

The qcom_geni_serial_poll_bit() is supposed to be able to be used to
poll a bit that's will become set when a TX transfer finishes. Because
of this it tries to set its timeout based on how long the UART will
take to shift out all of the queued bytes. There are two problems
here:
1. There appears to be a hidden extra word on the firmware side which
is the word that the firmware has already taken out of the FIFO and
is currently shifting out. We need to account for this.
2. The timeout calculation was assuming that it would only need 8 bits
on the wire to shift out 1 byte. This isn't true. Typically 10 bits
are used (8 data bits, 1 start and 1 stop bit), but as much as 13
bits could be used (14 if we allowed 9 bits per byte, which we
don't).

The too-short timeout was seen causing problems in a future patch
which more properly waited for bytes to transfer out of the UART
before cancelling.

Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 32 ++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..32e025705f99 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -271,7 +271,8 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
u32 reg;
struct qcom_geni_serial_port *port;
unsigned int baud;
- unsigned int fifo_bits;
+ unsigned int max_queued_bytes;
+ unsigned int max_queued_bits;
unsigned long timeout_us = 20000;
struct qcom_geni_private_data *private_data = uport->private_data;

@@ -280,12 +281,37 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
baud = port->baud;
if (!baud)
baud = 115200;
- fifo_bits = port->tx_fifo_depth * port->tx_fifo_width;
+
+ /*
+ * Add 1 to tx_fifo_depth to account for the hidden register
+ * on the firmware side that can hold a word.
+ */
+ max_queued_bytes =
+ DIV_ROUND_UP((port->tx_fifo_depth + 1) * port->tx_fifo_width,
+ BITS_PER_BYTE);
+
+ /*
+ * The maximum number of bits per byte on the wire is 13 from:
+ * - 1 start bit
+ * - 8 data bits
+ * - 1 parity bit
+ * - 3 stop bits
+ *
+ * While we could try count the actual bits per byte based on
+ * the port configuration, this is a rough timeout anyway so
+ * using the max is fine.
+ */
+ max_queued_bits = max_queued_bytes * 13;
+
/*
* Total polling iterations based on FIFO worth of bytes to be
* sent at current baud. Add a little fluff to the wait.
+ *
+ * NOTE: this assumes that flow control isn't used, but with
+ * flow control we could wait indefinitely and that wouldn't
+ * be OK.
*/
- timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500;
+ timeout_us = ((max_queued_bits * USEC_PER_SEC) / baud) + 500;
}

/*
--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:47:20

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 3/7] serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()

The "offset" passed in should be unsigned since it's always a positive
offset from our memory mapped IO.

The "field" should be u32 since we're anding it with a 32-bit value
read from the device.

Suggested-by: Stephen Boyd <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 32e025705f99..71258eefa654 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -266,7 +266,7 @@ static bool qcom_geni_serial_secondary_active(struct uart_port *uport)
}

static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
- int offset, int field, bool set)
+ unsigned int offset, u32 field, bool set)
{
u32 reg;
struct qcom_geni_serial_port *port;
--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:47:22

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 4/7] serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()

With a small modification the qcom_geni_serial_poll_bit() function
could be used to poll more than just a single bit. Let's generalize
it. We'll make the qcom_geni_serial_poll_bit() into just a wrapper of
the general function.

Signed-off-by: Douglas Anderson <[email protected]>
---
The new function isn't used yet (except by the wrapper) but will be
used in a future change.

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 71258eefa654..539a6ac85511 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -265,8 +265,8 @@ static bool qcom_geni_serial_secondary_active(struct uart_port *uport)
return readl(uport->membase + SE_GENI_STATUS) & S_GENI_CMD_ACTIVE;
}

-static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
- unsigned int offset, u32 field, bool set)
+static bool qcom_geni_serial_poll_bitfield(struct uart_port *uport,
+ unsigned int offset, u32 field, u32 val)
{
u32 reg;
struct qcom_geni_serial_port *port;
@@ -321,7 +321,7 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
timeout_us = DIV_ROUND_UP(timeout_us, 10) * 10;
while (timeout_us) {
reg = readl(uport->membase + offset);
- if ((bool)(reg & field) == set)
+ if ((reg & field) == val)
return true;
udelay(10);
timeout_us -= 10;
@@ -329,6 +329,12 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
return false;
}

+static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
+ unsigned int offset, u32 field, bool set)
+{
+ return qcom_geni_serial_poll_bitfield(uport, offset, field, set ? field : 0);
+}
+
static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size)
{
u32 m_cmd;
--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:47:34

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 5/7] serial: qcom-geni: Just set the watermark level once

There's no reason to set the TX watermark level to 0 when we disable
TX since we're disabling the interrupt anyway. Just set the watermark
level once at init time and leave it alone.

Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 539a6ac85511..d7814f9e5c26 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -418,7 +418,6 @@ static int qcom_geni_serial_get_char(struct uart_port *uport)
static void qcom_geni_serial_poll_put_char(struct uart_port *uport,
unsigned char c)
{
- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_setup_tx(uport, 1);
WARN_ON(!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
M_TX_FIFO_WATERMARK_EN, true));
@@ -462,7 +461,6 @@ __qcom_geni_serial_console_write(struct uart_port *uport, const char *s,
bytes_to_send++;
}

- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_setup_tx(uport, bytes_to_send);
for (i = 0; i < count; ) {
size_t chars_to_write = 0;
@@ -690,7 +688,6 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;

- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}

@@ -701,7 +698,6 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)

irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
- writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
/* Possible stop tx is called multiple times. */
if (!qcom_geni_serial_main_active(uport))
@@ -1153,6 +1149,7 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport)
false, true, true);
geni_se_init(&port->se, UART_RX_WM, port->rx_fifo_depth - 2);
geni_se_select_mode(&port->se, port->dev_data->mode);
+ writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_start_rx(uport);
port->setup = true;

--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:47:49

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 6/7] serial: qcom-geni: Fix suspend while active UART xfer

On devices using Qualcomm's GENI UART it is possible to get the UART
stuck such that it no longer outputs data. Specifically, I could
reproduce this problem by logging in via an agetty on the debug serial
port (which was _not_ used for kernel console) and running:
cat /var/log/messages
..and then (via an SSH session) forcing a few suspend/resume cycles.

Digging into this showed a number of problems that are all related.

The root of the problems was with qcom_geni_serial_stop_tx_fifo()
which is called as part of the suspend process. Specific problems with
that function:
- When we cancel an in-progress "tx" command it doesn't appear to
fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
continued to report that the FIFO wasn't empty. The
qcom_geni_serial_start_tx_fifo() function didn't re-enable
interrupts in this case so we'd never start transferring again.
- We cancelled the current "tx" command but we forgot to zero out
"tx_remaining". This confused logic elsewhere in the driver
- From experimentation, it appears that cancelling the "tx" command
could drop some of the queued up bytes. While maybe not the end of
the world, it doesn't seem like we should be dropping bytes when
stopping the FIFO, which is defined more of a "pause".

One idea to fix the above would be to add FIFO draining to
qcom_geni_serial_stop_tx_fifo(). However, digging into the
documentation in serial_core.h for stop_tx() makes this seem like the
wrong choice. Specifically stop_tx() is called with local interrupts
disabled. Waiting for a FIFO (which might be 64 bytes big) to drain at
115.2 kbps doesn't seem like a wise move.

Ideally qcom_geni_serial_stop_tx_fifo() would be able to pause the
transmitter, but nothing in the documentation for the GENI UART makes
me believe that is possible.

Given the lack of better choices, we'll change
qcom_geni_serial_stop_tx_fifo() to simply disable the
TX_FIFO_WATERMARK interrupt and call it a day. This seems OK as per
the serial core docs since stop_tx() is supposed to stop transferring
bytes "as soon as possible" and there doesn't seem to be any possible
way to stop transferring sooner. As part of this, get rid of some of
the extra conditions on qcom_geni_serial_start_tx_fifo() which simply
weren't needed and are now getting in the way. It's always fine to
turn the interrupts on if we want to receive and it'll be up to the
IRQ handler to turn them back off if somehow they're not needed. This
works fine.

Unfortunately, doing just the above change causes new/different
problems with suspend/resume. Now if you suspend while an active
transfer is happening you can find that after resume time you're no
longer receiving UART interrupts at all. It appears to be important to
drain the FIFO and send a "cancel" command if the UART is active to
avoid this. Since we've already decided that
qcom_geni_serial_stop_tx_fifo() shouldn't be doing this, let's add the
draining / cancelling logic to the shutdown() call where it should be
OK to delay a bit. This is called as part of the suspend process via
uart_suspend_port().

Finally, with all of the above, the test case where we're spamming the
UART with data and going through suspend/resume cycles doesn't kill
the UART and doesn't drop bytes.

NOTE: though I haven't gone back and validated on ancient code, it
appears from code inspection that many of these problems have existed
since the start of the driver. In the very least, I could reproduce
the problems on vanilla v5.15. The problems don't seem to reproduce
when using the serial port for kernel console output and also don't
seem to reproduce if nothing is being printed to the console at
suspend time, so this is presumably why they were not noticed until
now.

Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <[email protected]>
---
There are still a number of problems with GENI UART after this but
I've kept this change separate to make it easier to understand.
Specifically on mainline just hitting "Ctrl-C" after dumping
/var/log/messages to the serial port hangs things after the kfifo
changes. Those issues will be addressed in future patches.

Changes in v2:
- Totally rework / rename patch to handle suspend while active xfer

drivers/tty/serial/qcom_geni_serial.c | 97 +++++++++++++++++++++------
1 file changed, 75 insertions(+), 22 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index d7814f9e5c26..10aeb0313f9b 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -131,6 +131,7 @@ struct qcom_geni_serial_port {
bool brk;

unsigned int tx_remaining;
+ unsigned int tx_total;
int wakeup_irq;
bool rx_tx_swap;
bool cts_rts_swap;
@@ -337,11 +338,14 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,

static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size)
{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
u32 m_cmd;

writel(xmit_size, uport->membase + SE_UART_TX_TRANS_LEN);
m_cmd = UART_START_TX << M_OPCODE_SHFT;
writel(m_cmd, uport->membase + SE_GENI_M_CMD0);
+
+ port->tx_total = xmit_size;
}

static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
@@ -361,6 +365,64 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR);
}

+static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
+ /*
+ * If the main sequencer is inactive it means that the TX command has
+ * been completed and all bytes have been sent. Nothing to do in that
+ * case.
+ */
+ if (!qcom_geni_serial_main_active(uport))
+ return;
+
+ /*
+ * Wait until the FIFO has been drained. We've already taken bytes out
+ * of the higher level queue in qcom_geni_serial_send_chunk_fifo() so
+ * if we don't drain the FIFO but send the "cancel" below they seem to
+ * get lost.
+ */
+ qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, 0xffffffff,
+ port->tx_total - port->tx_remaining);
+
+ /*
+ * If clearing the FIFO made us inactive then we're done--no need for
+ * a cancel.
+ */
+ if (!qcom_geni_serial_main_active(uport))
+ return;
+
+ /*
+ * Cancel the current command. After this the main sequencer will
+ * stop reporting that it's active and we'll have to start a new
+ * transfer command.
+ *
+ * If we skip doing this cancel and then continue with a system
+ * suspend while there's an active command in the main sequencer
+ * then after resume time we won't get any more interrupts on the
+ * main sequencer until we send the cancel.
+ */
+ geni_se_cancel_m_cmd(&port->se);
+ if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
+ M_CMD_CANCEL_EN, true)) {
+ /* The cancel failed; try an abort as a fallback. */
+ geni_se_abort_m_cmd(&port->se);
+ qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
+ M_CMD_ABORT_EN, true);
+ writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ }
+ writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ /*
+ * We've cancelled the current command. "tx_remaining" stores how
+ * many bytes are left to finish in the current command so we know
+ * when to start a new command. Since the command was cancelled we
+ * need to zero "tx_remaining".
+ */
+ port->tx_remaining = 0;
+}
+
static void qcom_geni_serial_abort_rx(struct uart_port *uport)
{
u32 irq_clear = S_CMD_DONE_EN | S_CMD_ABORT_EN;
@@ -681,37 +743,18 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
u32 irq_en;

- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
-
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}

static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);

irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
- if (!qcom_geni_serial_main_active(uport))
- return;
-
- geni_se_cancel_m_cmd(&port->se);
- if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_CMD_CANCEL_EN, true)) {
- geni_se_abort_m_cmd(&port->se);
- qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_CMD_ABORT_EN, true);
- writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
- }
- writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}

static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1093,7 +1136,15 @@ static int setup_fifos(struct qcom_geni_serial_port *port)
}


-static void qcom_geni_serial_shutdown(struct uart_port *uport)
+static void qcom_geni_serial_shutdown_dma(struct uart_port *uport)
+{
+ disable_irq(uport->irq);
+
+ qcom_geni_serial_stop_tx(uport);
+ qcom_geni_serial_stop_rx(uport);
+}
+
+static void qcom_geni_serial_shutdown_fifo(struct uart_port *uport)
{
disable_irq(uport->irq);

@@ -1102,6 +1153,8 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)

qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_drain_tx_fifo(uport);
}

static int qcom_geni_serial_port_setup(struct uart_port *uport)
@@ -1560,7 +1613,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.startup = qcom_geni_serial_startup,
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
- .shutdown = qcom_geni_serial_shutdown,
+ .shutdown = qcom_geni_serial_shutdown_fifo,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
@@ -1582,7 +1635,7 @@ static const struct uart_ops qcom_geni_uart_pops = {
.startup = qcom_geni_serial_startup,
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
- .shutdown = qcom_geni_serial_shutdown,
+ .shutdown = qcom_geni_serial_shutdown_dma,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
--
2.45.1.288.g0e0cd299f1-goog


2024-05-30 22:48:01

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 7/7] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

The fact that the Qualcomm GENI hardware interface is based around
"packets" is really awkward to fit into Linux's UART design.
Specifically, in order to send bytes you need to start up a new
"command" saying how many bytes you want to send and then you need to
send all those bytes. Once you've committed to sending that number of
bytes it's very awkward to change your mind and send fewer, especially
if you want to do so without dropping bytes on the ground.

There may be a few cases where you might want to send fewer bytes than
you originally expected:
1. You might want to interrupt the transfer with something higher
priority, like the kernel console or kdb.
2. You might want to enter system suspend.
3. The user might have killed the program that had queued bytes for
sending over the UART.

Despite this awkwardness the Linux driver has still tried to send
bytes using large transfers. Whenever the driver started a new
transfer it would look at the number of bytes in the OS's queue and
start a transfer for that many. The idea of using larger transfers is
that it should be more efficient. When we're in the middle of a large
transfer we can get interrupted when the hardware FIFO is close to
empty and add more bytes in. Whenever we get to the end of a transfer
we have to wait until the transfer is totally done before we can add
more bytes and, depending on interrupt latency, that can cause the
UART to idle a bit.

Unfortunately there were lots of corner cases that the Linux driver
didn't handle.

One problem with the current driver is that if the user killed the
program that queued bytes for sending over the UART then bad things
would happen. Before commit 1788cf6a91d9 ("tty: serial: switch from
circ_buf to kfifo") we'd just send stale data out the UART. After that
commit we'll hard lockup.

Another problem with the current driver can be seen if you queue a
bunch of data to the UART and enter kdb. Specifically on a device
_without_ kernel console on the UART, with an agetty on the uart, and
with kgdb on the UART, I did `cat /var/log/messages` and then dropped
into kdb. After resuming from kdb console output stopped.

Let's give up on trying to use large transfers in FIFO mode on GENI
UART since there doesn't appear to be any way to solve these problems
cleanly. Visually inspecting the console output even after these
patches doesn't show any big pauses so this should be fine.

In order to make this all work:
- Switch the watermark interrupt to just being used to prime the TX
pump. Once transfers are running we'll use "done" to queue the next
batch. As part of this, change the watermark to fire whenever the
queue is empty.
- Never queue more than what can fit in the FIFO. This means we don't
need to keep track of a command we're partway through.
- For the console code and kgdb code where we can safely block while
the queue empties we can just do that rather than trying to queue a
command when one was already in progress. This didn't actually work
so well which is presumably why there were some weird/awkward hacks
in qcom_geni_serial_console_write().
- Leave the CMD_DONE interrupt enabled all the time since there's
never any reason we don't want to see it.
- Start using the "SE_GENI_M_IRQ_EN_SET" and "SE_GENI_M_IRQ_EN_CLEAR"
registers to avoid read-modify-write of the "SE_GENI_M_IRQ_EN"
register. We could do this in more of the driver if needed but for
now just update code we're touching.

Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Signed-off-by: Douglas Anderson <[email protected]>
---
I'm listing two "fixes" commits here. The first is the kfifo change
since it is very easy to see a hardlockup after that change. Almost
certainly anyone with the kfifo patch wants this patch. I've also
listed a much earlier patch as one being fixed since that was the one
that made us send larger transfers.

I've tested this commit on an sc7180-trogdor board both with and
without kernel console going to the UART. I've tested across some
suspend/resume cycles and with kgdb. I've also confirmed that
bluetooth, which uses the DMA paths in this driver, continues to work.
That all being said, a lot of things change here so I'd love any
testing folks want to do.

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 192 +++++++++++++-------------
1 file changed, 94 insertions(+), 98 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 10aeb0313f9b..853f5288dde5 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -78,7 +78,7 @@
#define GENI_UART_CONS_PORTS 1
#define GENI_UART_PORTS 3
#define DEF_FIFO_DEPTH_WORDS 16
-#define DEF_TX_WM 2
+#define DEF_TX_WM 1
#define DEF_FIFO_WIDTH_BITS 32
#define UART_RX_WM 2

@@ -129,8 +129,8 @@ struct qcom_geni_serial_port {
void *rx_buf;
u32 loopback;
bool brk;
+ bool tx_fifo_stopped;

- unsigned int tx_remaining;
unsigned int tx_total;
int wakeup_irq;
bool rx_tx_swap;
@@ -363,6 +363,14 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
M_CMD_ABORT_EN, true);
}
writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ /*
+ * Re-enable the TX watermark interrupt when we clear the "done"
+ * in case we were waiting on the "done" bit before starting a new
+ * command. The interrupt routine will re-disable this if it's not
+ * appropriate.
+ */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
}

static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
@@ -384,7 +392,7 @@ static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
* get lost.
*/
qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, 0xffffffff,
- port->tx_total - port->tx_remaining);
+ port->tx_total);

/*
* If clearing the FIFO made us inactive then we're done--no need for
@@ -413,14 +421,6 @@ static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
-
- /*
- * We've cancelled the current command. "tx_remaining" stores how
- * many bytes are left to finish in the current command so we know
- * when to start a new command. Since the command was cancelled we
- * need to zero "tx_remaining".
- */
- port->tx_remaining = 0;
}

static void qcom_geni_serial_abort_rx(struct uart_port *uport)
@@ -480,11 +480,12 @@ static int qcom_geni_serial_get_char(struct uart_port *uport)
static void qcom_geni_serial_poll_put_char(struct uart_port *uport,
unsigned char c)
{
+ qcom_geni_serial_drain_tx_fifo(uport);
+
qcom_geni_serial_setup_tx(uport, 1);
WARN_ON(!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
M_TX_FIFO_WATERMARK_EN, true));
writel(c, uport->membase + SE_GENI_TX_FIFOn);
- writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
qcom_geni_serial_poll_tx_done(uport);
}
#endif
@@ -514,6 +515,8 @@ __qcom_geni_serial_console_write(struct uart_port *uport, const char *s,
int i;
u32 bytes_to_send = count;

+ qcom_geni_serial_drain_tx_fifo(uport);
+
for (i = 0; i < count; i++) {
/*
* uart_console_write() adds a carriage return for each newline.
@@ -564,7 +567,6 @@ static void qcom_geni_serial_console_write(struct console *co, const char *s,
bool locked = true;
unsigned long flags;
u32 geni_status;
- u32 irq_en;

WARN_ON(co->index < 0 || co->index >= GENI_UART_CONS_PORTS);

@@ -580,38 +582,10 @@ static void qcom_geni_serial_console_write(struct console *co, const char *s,

geni_status = readl(uport->membase + SE_GENI_STATUS);

- if (!locked) {
- /*
- * We can only get here if an oops is in progress then we were
- * unable to get the lock. This means we can't safely access
- * our state variables like tx_remaining. About the best we
- * can do is wait for the FIFO to be empty before we start our
- * transfer, so we'll do that.
- */
- qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_TX_FIFO_NOT_EMPTY_EN, false);
- } else if ((geni_status & M_GENI_CMD_ACTIVE) && !port->tx_remaining) {
- /*
- * It seems we can't interrupt existing transfers if all data
- * has been sent, in which case we need to look for done first.
- */
- qcom_geni_serial_poll_tx_done(uport);
-
- if (!kfifo_is_empty(&uport->state->port.xmit_fifo)) {
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- writel(irq_en | M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
- }
- }
-
__qcom_geni_serial_console_write(uport, s, count);

-
- if (locked) {
- if (port->tx_remaining)
- qcom_geni_serial_setup_tx(uport, port->tx_remaining);
+ if (locked)
uart_port_unlock_irqrestore(uport, flags);
- }
}

static void handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
@@ -688,9 +662,9 @@ static void qcom_geni_serial_stop_tx_dma(struct uart_port *uport)

if (port->tx_dma_addr) {
geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr,
- port->tx_remaining);
+ port->tx_total);
port->tx_dma_addr = 0;
- port->tx_remaining = 0;
+ port->tx_total = 0;
}

geni_se_cancel_m_cmd(&port->se);
@@ -735,26 +709,27 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
qcom_geni_serial_stop_tx_dma(uport);
return;
}
-
- port->tx_remaining = xmit_size;
}

static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
- u32 irq_en;
+ struct qcom_geni_serial_port *port = to_dev_port(uport);

- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
- writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
+ port->tx_fifo_stopped = false;
+
+ /* Prime the pump to get data flowing. */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
}

static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
- u32 irq_en;
+ struct qcom_geni_serial_port *port = to_dev_port(uport);

- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
- writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
+ /*
+ * We can't do anything to safely pause the bytes that have already
+ * been queued up so just set a flag saying we shouldn't queue any more.
+ */
+ port->tx_fifo_stopped = true;
}

static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -922,10 +897,20 @@ static void qcom_geni_serial_stop_tx(struct uart_port *uport)
uport->ops->stop_tx(uport);
}

+static void qcom_geni_serial_enable_cmd_done(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
+ /* If we're not in FIFO mode we don't use CMD_DONE. */
+ if (port->dev_data->mode != GENI_SE_FIFO)
+ return;
+
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
+}
+
static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
unsigned int chunk)
{
- struct qcom_geni_serial_port *port = to_dev_port(uport);
unsigned int tx_bytes, remaining = chunk;
u8 buf[BYTES_PER_FIFO_WORD];

@@ -938,52 +923,74 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);

remaining -= tx_bytes;
- port->tx_remaining -= tx_bytes;
}
}

-static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
- bool done, bool active)
+static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
struct tty_port *tport = &uport->state->port;
size_t avail;
size_t pending;
u32 status;
- u32 irq_en;
unsigned int chunk;
+ bool active;

- status = readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
-
- /* Complete the current tx command before taking newly added data */
- if (active)
- pending = port->tx_remaining;
- else
- pending = kfifo_len(&tport->xmit_fifo);
+ /*
+ * The TX watermark interrupt is only used to "prime the pump" for
+ * transfers. Once transfers have been kicked off we always use the
+ * "done" interrupt to queue the next batch. Once were here we can
+ * always disable the TX watermark interrupt.
+ *
+ * NOTE: we use the TX watermark in this way because we don't ever
+ * kick off TX transfers larger than we can stuff into the FIFO. This
+ * is because bytes from the OS's circular queue can disappear and
+ * there's no known safe/non-blocking way to cancel the larger
+ * transfer when bytes disappear. See qcom_geni_serial_drain_tx_fifo()
+ * for an example of a safe (but blocking) way to drain, but that's
+ * not appropriate in an IRQ handler. We also can't just kick off one
+ * large transfer and queue bytes whenever because we're using 4 bytes
+ * per FIFO word and thus we can only queue non-multiple-of-4 bytes as
+ * in the last word of a transfer.
+ */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_CLEAR);

- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
- qcom_geni_serial_stop_tx_fifo(uport);
+ /*
+ * If we've got an active TX command running then we expect to still
+ * see the "done" bit in the future and we can't kick off another
+ * transfer till then. Bail. NOTE: it's important that we read "active"
+ * after we've cleared the "done" interrupt (which the caller already
+ * did for us) so that we know that if we show as non-active we're
+ * guaranteed to later get "done".
+ *
+ * If nothing is pending we _also_ want to bail. Later start_tx()
+ * will start transfers again by temporarily turning on the TX
+ * watermark.
+ */
+ active = readl(uport->membase + SE_GENI_STATUS) & M_GENI_CMD_ACTIVE;
+ pending = port->tx_fifo_stopped ? 0 : kfifo_len(&tport->xmit_fifo);
+ if (active || !pending)
goto out_write_wakeup;
- }

+ /* Calculate how much space is available in the FIFO right now. */
+ status = readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
avail *= BYTES_PER_FIFO_WORD;

- chunk = min(avail, pending);
- if (!chunk)
+ /*
+ * It's a bit odd if we get here and have bytes pending and we're
+ * handling a "done" or "TX watermark" interrupt but we don't
+ * have space in the FIFO. Stick in a warning and bail.
+ */
+ if (!avail) {
+ dev_warn(uport->dev, "FIFO unexpectedly out of space\n");
goto out_write_wakeup;
-
- if (!port->tx_remaining) {
- qcom_geni_serial_setup_tx(uport, pending);
- port->tx_remaining = pending;
-
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- if (!(irq_en & M_TX_FIFO_WATERMARK_EN))
- writel(irq_en | M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
}

+
+ /* We're ready to throw some bytes into the FIFO. */
+ chunk = min(avail, pending);
+ qcom_geni_serial_setup_tx(uport, chunk);
qcom_geni_serial_send_chunk_fifo(uport, chunk);

/*
@@ -991,17 +998,9 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
* cleared it in qcom_geni_serial_isr it will have already reasserted
* so we must clear it again here after our writes.
*/
- writel(M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_CLEAR);
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);

out_write_wakeup:
- if (!port->tx_remaining) {
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- if (irq_en & M_TX_FIFO_WATERMARK_EN)
- writel(irq_en & ~M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
- }
-
if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS)
uart_write_wakeup(uport);
}
@@ -1011,10 +1010,10 @@ static void qcom_geni_serial_handle_tx_dma(struct uart_port *uport)
struct qcom_geni_serial_port *port = to_dev_port(uport);
struct tty_port *tport = &uport->state->port;

- uart_xmit_advance(uport, port->tx_remaining);
- geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr, port->tx_remaining);
+ uart_xmit_advance(uport, port->tx_total);
+ geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr, port->tx_total);
port->tx_dma_addr = 0;
- port->tx_remaining = 0;
+ port->tx_total = 0;

if (!kfifo_is_empty(&tport->xmit_fifo))
qcom_geni_serial_start_tx_dma(uport);
@@ -1028,7 +1027,6 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
u32 m_irq_en;
u32 m_irq_status;
u32 s_irq_status;
- u32 geni_status;
u32 dma;
u32 dma_tx_status;
u32 dma_rx_status;
@@ -1046,7 +1044,6 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
s_irq_status = readl(uport->membase + SE_GENI_S_IRQ_STATUS);
dma_tx_status = readl(uport->membase + SE_DMA_TX_IRQ_STAT);
dma_rx_status = readl(uport->membase + SE_DMA_RX_IRQ_STAT);
- geni_status = readl(uport->membase + SE_GENI_STATUS);
dma = readl(uport->membase + SE_GENI_DMA_MODE_EN);
m_irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
writel(m_irq_status, uport->membase + SE_GENI_M_IRQ_CLEAR);
@@ -1093,9 +1090,7 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
} else {
if (m_irq_status & m_irq_en &
(M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN))
- qcom_geni_serial_handle_tx_fifo(uport,
- m_irq_status & M_CMD_DONE_EN,
- geni_status & M_GENI_CMD_ACTIVE);
+ qcom_geni_serial_handle_tx_fifo(uport);

if (s_irq_status & (S_RX_FIFO_WATERMARK_EN | S_RX_FIFO_LAST_EN))
qcom_geni_serial_handle_rx_fifo(uport, drop_rx);
@@ -1203,6 +1198,7 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport)
geni_se_init(&port->se, UART_RX_WM, port->rx_fifo_depth - 2);
geni_se_select_mode(&port->se, port->dev_data->mode);
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
+ qcom_geni_serial_enable_cmd_done(uport);
qcom_geni_serial_start_rx(uport);
port->setup = true;

--
2.45.1.288.g0e0cd299f1-goog


2024-05-31 08:34:22

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()

On Thu, May 30, 2024 at 03:45:54PM -0700, Douglas Anderson wrote:
> The qcom_geni_serial_poll_bit() is supposed to be able to be used to
> poll a bit that's will become set when a TX transfer finishes. Because
> of this it tries to set its timeout based on how long the UART will
> take to shift out all of the queued bytes. There are two problems
> here:
> 1. There appears to be a hidden extra word on the firmware side which
> is the word that the firmware has already taken out of the FIFO and
> is currently shifting out. We need to account for this.
> 2. The timeout calculation was assuming that it would only need 8 bits
> on the wire to shift out 1 byte. This isn't true. Typically 10 bits
> are used (8 data bits, 1 start and 1 stop bit), but as much as 13
> bits could be used (14 if we allowed 9 bits per byte, which we
> don't).
>
> The too-short timeout was seen causing problems in a future patch
> which more properly waited for bytes to transfer out of the UART
> before cancelling.

..

> + /*
> + * Add 1 to tx_fifo_depth to account for the hidden register
> + * on the firmware side that can hold a word.
> + */
> + max_queued_bytes =
> + DIV_ROUND_UP((port->tx_fifo_depth + 1) * port->tx_fifo_width,
> + BITS_PER_BYTE);

BITS_TO_BYTES()

..

> - timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500;
> + timeout_us = ((max_queued_bits * USEC_PER_SEC) / baud) + 500;

Too many parentheses. (The outer ones can be dropped.

--
With Best Regards,
Andy Shevchenko



2024-05-31 08:38:12

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] serial: qcom-geni: Fix suspend while active UART xfer

On Thu, May 30, 2024 at 03:45:58PM -0700, Douglas Anderson wrote:
> On devices using Qualcomm's GENI UART it is possible to get the UART
> stuck such that it no longer outputs data. Specifically, I could
> reproduce this problem by logging in via an agetty on the debug serial
> port (which was _not_ used for kernel console) and running:
> cat /var/log/messages
> ...and then (via an SSH session) forcing a few suspend/resume cycles.
>
> Digging into this showed a number of problems that are all related.
>
> The root of the problems was with qcom_geni_serial_stop_tx_fifo()
> which is called as part of the suspend process. Specific problems with
> that function:
> - When we cancel an in-progress "tx" command it doesn't appear to
> fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
> continued to report that the FIFO wasn't empty. The
> qcom_geni_serial_start_tx_fifo() function didn't re-enable
> interrupts in this case so we'd never start transferring again.
> - We cancelled the current "tx" command but we forgot to zero out
> "tx_remaining". This confused logic elsewhere in the driver
> - From experimentation, it appears that cancelling the "tx" command
> could drop some of the queued up bytes. While maybe not the end of
> the world, it doesn't seem like we should be dropping bytes when
> stopping the FIFO, which is defined more of a "pause".
>
> One idea to fix the above would be to add FIFO draining to
> qcom_geni_serial_stop_tx_fifo(). However, digging into the
> documentation in serial_core.h for stop_tx() makes this seem like the
> wrong choice. Specifically stop_tx() is called with local interrupts
> disabled. Waiting for a FIFO (which might be 64 bytes big) to drain at
> 115.2 kbps doesn't seem like a wise move.
>
> Ideally qcom_geni_serial_stop_tx_fifo() would be able to pause the
> transmitter, but nothing in the documentation for the GENI UART makes
> me believe that is possible.
>
> Given the lack of better choices, we'll change
> qcom_geni_serial_stop_tx_fifo() to simply disable the
> TX_FIFO_WATERMARK interrupt and call it a day. This seems OK as per
> the serial core docs since stop_tx() is supposed to stop transferring
> bytes "as soon as possible" and there doesn't seem to be any possible
> way to stop transferring sooner. As part of this, get rid of some of
> the extra conditions on qcom_geni_serial_start_tx_fifo() which simply
> weren't needed and are now getting in the way. It's always fine to
> turn the interrupts on if we want to receive and it'll be up to the
> IRQ handler to turn them back off if somehow they're not needed. This
> works fine.
>
> Unfortunately, doing just the above change causes new/different
> problems with suspend/resume. Now if you suspend while an active
> transfer is happening you can find that after resume time you're no
> longer receiving UART interrupts at all. It appears to be important to
> drain the FIFO and send a "cancel" command if the UART is active to
> avoid this. Since we've already decided that
> qcom_geni_serial_stop_tx_fifo() shouldn't be doing this, let's add the
> draining / cancelling logic to the shutdown() call where it should be
> OK to delay a bit. This is called as part of the suspend process via
> uart_suspend_port().
>
> Finally, with all of the above, the test case where we're spamming the
> UART with data and going through suspend/resume cycles doesn't kill
> the UART and doesn't drop bytes.
>
> NOTE: though I haven't gone back and validated on ancient code, it
> appears from code inspection that many of these problems have existed
> since the start of the driver. In the very least, I could reproduce
> the problems on vanilla v5.15. The problems don't seem to reproduce
> when using the serial port for kernel console output and also don't
> seem to reproduce if nothing is being printed to the console at
> suspend time, so this is presumably why they were not noticed until
> now.

..

> + qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, 0xffffffff,

It's easy to miscount f:s, GENMASK()?

> + port->tx_total - port->tx_remaining);

--
With Best Regards,
Andy Shevchenko



2024-05-31 14:58:24

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()

On Thu, 30 May 2024, Douglas Anderson wrote:

> The qcom_geni_serial_poll_bit() is supposed to be able to be used to
> poll a bit that's will become set when a TX transfer finishes. Because
> of this it tries to set its timeout based on how long the UART will
> take to shift out all of the queued bytes. There are two problems
> here:
> 1. There appears to be a hidden extra word on the firmware side which
> is the word that the firmware has already taken out of the FIFO and
> is currently shifting out. We need to account for this.
> 2. The timeout calculation was assuming that it would only need 8 bits
> on the wire to shift out 1 byte. This isn't true. Typically 10 bits
> are used (8 data bits, 1 start and 1 stop bit), but as much as 13
> bits could be used (14 if we allowed 9 bits per byte, which we
> don't).
>
> The too-short timeout was seen causing problems in a future patch
> which more properly waited for bytes to transfer out of the UART
> before cancelling.
>
> Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v2:
> - New
>
> drivers/tty/serial/qcom_geni_serial.c | 32 ++++++++++++++++++++++++---
> 1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> index 2bd25afe0d92..32e025705f99 100644
> --- a/drivers/tty/serial/qcom_geni_serial.c
> +++ b/drivers/tty/serial/qcom_geni_serial.c
> @@ -271,7 +271,8 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
> u32 reg;
> struct qcom_geni_serial_port *port;
> unsigned int baud;
> - unsigned int fifo_bits;
> + unsigned int max_queued_bytes;
> + unsigned int max_queued_bits;
> unsigned long timeout_us = 20000;
> struct qcom_geni_private_data *private_data = uport->private_data;
>
> @@ -280,12 +281,37 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
> baud = port->baud;
> if (!baud)
> baud = 115200;
> - fifo_bits = port->tx_fifo_depth * port->tx_fifo_width;
> +
> + /*
> + * Add 1 to tx_fifo_depth to account for the hidden register
> + * on the firmware side that can hold a word.
> + */
> + max_queued_bytes =
> + DIV_ROUND_UP((port->tx_fifo_depth + 1) * port->tx_fifo_width,
> + BITS_PER_BYTE);
> +
> + /*
> + * The maximum number of bits per byte on the wire is 13 from:
> + * - 1 start bit
> + * - 8 data bits
> + * - 1 parity bit
> + * - 3 stop bits
> + *
> + * While we could try count the actual bits per byte based on
> + * the port configuration, this is a rough timeout anyway so
> + * using the max is fine.
> + */
> + max_queued_bits = max_queued_bytes * 13;
> +
> /*
> * Total polling iterations based on FIFO worth of bytes to be
> * sent at current baud. Add a little fluff to the wait.
> + *
> + * NOTE: this assumes that flow control isn't used, but with
> + * flow control we could wait indefinitely and that wouldn't
> + * be OK.
> */
> - timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500;
> + timeout_us = ((max_queued_bits * USEC_PER_SEC) / baud) + 500;

You should try to generalize the existing uart_fifo_timeout() to suit what
you're trying to do here instead of writing more variants of code with
this same intent.

--
i.


2024-05-31 15:14:22

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] serial: qcom-geni: Fix suspend while active UART xfer

On Thu, 30 May 2024, Douglas Anderson wrote:

> On devices using Qualcomm's GENI UART it is possible to get the UART
> stuck such that it no longer outputs data. Specifically, I could
> reproduce this problem by logging in via an agetty on the debug serial
> port (which was _not_ used for kernel console) and running:
> cat /var/log/messages
> ...and then (via an SSH session) forcing a few suspend/resume cycles.
>
> Digging into this showed a number of problems that are all related.
>
> The root of the problems was with qcom_geni_serial_stop_tx_fifo()
> which is called as part of the suspend process. Specific problems with
> that function:
> - When we cancel an in-progress "tx" command it doesn't appear to
> fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
> continued to report that the FIFO wasn't empty. The
> qcom_geni_serial_start_tx_fifo() function didn't re-enable
> interrupts in this case so we'd never start transferring again.
> - We cancelled the current "tx" command but we forgot to zero out
> "tx_remaining". This confused logic elsewhere in the driver
> - From experimentation, it appears that cancelling the "tx" command
> could drop some of the queued up bytes. While maybe not the end of
> the world, it doesn't seem like we should be dropping bytes when
> stopping the FIFO, which is defined more of a "pause".
>
> One idea to fix the above would be to add FIFO draining to
> qcom_geni_serial_stop_tx_fifo(). However, digging into the
> documentation in serial_core.h for stop_tx() makes this seem like the
> wrong choice. Specifically stop_tx() is called with local interrupts
> disabled. Waiting for a FIFO (which might be 64 bytes big) to drain at
> 115.2 kbps doesn't seem like a wise move.
>
> Ideally qcom_geni_serial_stop_tx_fifo() would be able to pause the
> transmitter, but nothing in the documentation for the GENI UART makes
> me believe that is possible.
>
> Given the lack of better choices, we'll change
> qcom_geni_serial_stop_tx_fifo() to simply disable the
> TX_FIFO_WATERMARK interrupt and call it a day. This seems OK as per
> the serial core docs since stop_tx() is supposed to stop transferring
> bytes "as soon as possible" and there doesn't seem to be any possible
> way to stop transferring sooner. As part of this, get rid of some of
> the extra conditions on qcom_geni_serial_start_tx_fifo() which simply
> weren't needed and are now getting in the way. It's always fine to
> turn the interrupts on if we want to receive and it'll be up to the
> IRQ handler to turn them back off if somehow they're not needed. This
> works fine.
>
> Unfortunately, doing just the above change causes new/different
> problems with suspend/resume. Now if you suspend while an active
> transfer is happening you can find that after resume time you're no
> longer receiving UART interrupts at all. It appears to be important to
> drain the FIFO and send a "cancel" command if the UART is active to
> avoid this. Since we've already decided that
> qcom_geni_serial_stop_tx_fifo() shouldn't be doing this, let's add the
> draining / cancelling logic to the shutdown() call where it should be
> OK to delay a bit. This is called as part of the suspend process via
> uart_suspend_port().
>
> Finally, with all of the above, the test case where we're spamming the
> UART with data and going through suspend/resume cycles doesn't kill
> the UART and doesn't drop bytes.
>
> NOTE: though I haven't gone back and validated on ancient code, it
> appears from code inspection that many of these problems have existed
> since the start of the driver. In the very least, I could reproduce
> the problems on vanilla v5.15. The problems don't seem to reproduce
> when using the serial port for kernel console output and also don't
> seem to reproduce if nothing is being printed to the console at
> suspend time, so this is presumably why they were not noticed until
> now.

Hi,

This was quite tiring to read. :-) It's has lots of useful information but
it could be structured better.

Could you try to rewrite this entire description so that it's easier to
find the problem and final solution information from it. Start with those
two things, and in that part, try to avoid detouring to extra branches you
took while finding and solving the problem.

You can place how the problem can be reproduced after you've described the
root cause & final solution first. Extra information why some other
approaches do not work is also useful information, but please place it
after the final solution has been covered first.

Also, try to avoid I/you/we, use imperative tone.

--
i.


> Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> There are still a number of problems with GENI UART after this but
> I've kept this change separate to make it easier to understand.
> Specifically on mainline just hitting "Ctrl-C" after dumping
> /var/log/messages to the serial port hangs things after the kfifo
> changes. Those issues will be addressed in future patches.
>
> Changes in v2:
> - Totally rework / rename patch to handle suspend while active xfer
>
> drivers/tty/serial/qcom_geni_serial.c | 97 +++++++++++++++++++++------
> 1 file changed, 75 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> index d7814f9e5c26..10aeb0313f9b 100644
> --- a/drivers/tty/serial/qcom_geni_serial.c
> +++ b/drivers/tty/serial/qcom_geni_serial.c
> @@ -131,6 +131,7 @@ struct qcom_geni_serial_port {
> bool brk;
>
> unsigned int tx_remaining;
> + unsigned int tx_total;
> int wakeup_irq;
> bool rx_tx_swap;
> bool cts_rts_swap;
> @@ -337,11 +338,14 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
>
> static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size)
> {
> + struct qcom_geni_serial_port *port = to_dev_port(uport);
> u32 m_cmd;
>
> writel(xmit_size, uport->membase + SE_UART_TX_TRANS_LEN);
> m_cmd = UART_START_TX << M_OPCODE_SHFT;
> writel(m_cmd, uport->membase + SE_GENI_M_CMD0);
> +
> + port->tx_total = xmit_size;
> }
>
> static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
> @@ -361,6 +365,64 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
> writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR);
> }
>
> +static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
> +{
> + struct qcom_geni_serial_port *port = to_dev_port(uport);
> +
> + /*
> + * If the main sequencer is inactive it means that the TX command has
> + * been completed and all bytes have been sent. Nothing to do in that
> + * case.
> + */
> + if (!qcom_geni_serial_main_active(uport))
> + return;
> +
> + /*
> + * Wait until the FIFO has been drained. We've already taken bytes out
> + * of the higher level queue in qcom_geni_serial_send_chunk_fifo() so
> + * if we don't drain the FIFO but send the "cancel" below they seem to
> + * get lost.
> + */
> + qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, 0xffffffff,
> + port->tx_total - port->tx_remaining);
> +
> + /*
> + * If clearing the FIFO made us inactive then we're done--no need for
> + * a cancel.
> + */
> + if (!qcom_geni_serial_main_active(uport))
> + return;
> +
> + /*
> + * Cancel the current command. After this the main sequencer will
> + * stop reporting that it's active and we'll have to start a new
> + * transfer command.
> + *
> + * If we skip doing this cancel and then continue with a system
> + * suspend while there's an active command in the main sequencer
> + * then after resume time we won't get any more interrupts on the
> + * main sequencer until we send the cancel.
> + */
> + geni_se_cancel_m_cmd(&port->se);
> + if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> + M_CMD_CANCEL_EN, true)) {
> + /* The cancel failed; try an abort as a fallback. */
> + geni_se_abort_m_cmd(&port->se);
> + qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> + M_CMD_ABORT_EN, true);
> + writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> + }
> + writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> +
> + /*
> + * We've cancelled the current command. "tx_remaining" stores how
> + * many bytes are left to finish in the current command so we know
> + * when to start a new command. Since the command was cancelled we
> + * need to zero "tx_remaining".
> + */
> + port->tx_remaining = 0;
> +}
> +
> static void qcom_geni_serial_abort_rx(struct uart_port *uport)
> {
> u32 irq_clear = S_CMD_DONE_EN | S_CMD_ABORT_EN;
> @@ -681,37 +743,18 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
> {
> u32 irq_en;
>
> - if (qcom_geni_serial_main_active(uport) ||
> - !qcom_geni_serial_tx_empty(uport))
> - return;
> -
> irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
> irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
> -
> writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
> }
>
> static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
> {
> u32 irq_en;
> - struct qcom_geni_serial_port *port = to_dev_port(uport);
>
> irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
> irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
> writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
> - /* Possible stop tx is called multiple times. */
> - if (!qcom_geni_serial_main_active(uport))
> - return;
> -
> - geni_se_cancel_m_cmd(&port->se);
> - if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> - M_CMD_CANCEL_EN, true)) {
> - geni_se_abort_m_cmd(&port->se);
> - qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> - M_CMD_ABORT_EN, true);
> - writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> - }
> - writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> }
>
> static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
> @@ -1093,7 +1136,15 @@ static int setup_fifos(struct qcom_geni_serial_port *port)
> }
>
>
> -static void qcom_geni_serial_shutdown(struct uart_port *uport)
> +static void qcom_geni_serial_shutdown_dma(struct uart_port *uport)
> +{
> + disable_irq(uport->irq);
> +
> + qcom_geni_serial_stop_tx(uport);
> + qcom_geni_serial_stop_rx(uport);
> +}
> +
> +static void qcom_geni_serial_shutdown_fifo(struct uart_port *uport)
> {
> disable_irq(uport->irq);
>
> @@ -1102,6 +1153,8 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
>
> qcom_geni_serial_stop_tx(uport);
> qcom_geni_serial_stop_rx(uport);
> +
> + qcom_geni_serial_drain_tx_fifo(uport);
> }
>
> static int qcom_geni_serial_port_setup(struct uart_port *uport)
> @@ -1560,7 +1613,7 @@ static const struct uart_ops qcom_geni_console_pops = {
> .startup = qcom_geni_serial_startup,
> .request_port = qcom_geni_serial_request_port,
> .config_port = qcom_geni_serial_config_port,
> - .shutdown = qcom_geni_serial_shutdown,
> + .shutdown = qcom_geni_serial_shutdown_fifo,
> .type = qcom_geni_serial_get_type,
> .set_mctrl = qcom_geni_serial_set_mctrl,
> .get_mctrl = qcom_geni_serial_get_mctrl,
> @@ -1582,7 +1635,7 @@ static const struct uart_ops qcom_geni_uart_pops = {
> .startup = qcom_geni_serial_startup,
> .request_port = qcom_geni_serial_request_port,
> .config_port = qcom_geni_serial_config_port,
> - .shutdown = qcom_geni_serial_shutdown,
> + .shutdown = qcom_geni_serial_shutdown_dma,
> .type = qcom_geni_serial_get_type,
> .set_mctrl = qcom_geni_serial_set_mctrl,
> .get_mctrl = qcom_geni_serial_get_mctrl,
>

2024-06-02 04:23:47

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers

On Thu, May 30, 2024 at 03:45:53PM GMT, Douglas Anderson wrote:
> For UART devices the M_GP_LENGTH is the TX word count. For other
> devices this is the transaction word count.
>
> For UART devices the S_GP_LENGTH is the RX word count.
>
> The IRQ_EN set/clear registers allow you to set or clear bits in the
> IRQ_EN register without needing a read-modify-write.
>

Acked-by: Bjorn Andersson <[email protected]>

Regards,
Bjorn

> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> Since these new definitions are used in the future UART patches the
> hope is that they could be acked by Qualcomm folks and then go through
> the same tree as the UART patches that need them.
>
> Changes in v2:
> - New
>
> include/linux/soc/qcom/geni-se.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/soc/qcom/geni-se.h b/include/linux/soc/qcom/geni-se.h
> index 0f038a1a0330..8d07c442029b 100644
> --- a/include/linux/soc/qcom/geni-se.h
> +++ b/include/linux/soc/qcom/geni-se.h
> @@ -88,11 +88,15 @@ struct geni_se {
> #define SE_GENI_M_IRQ_STATUS 0x610
> #define SE_GENI_M_IRQ_EN 0x614
> #define SE_GENI_M_IRQ_CLEAR 0x618
> +#define SE_GENI_M_IRQ_EN_SET 0x61c
> +#define SE_GENI_M_IRQ_EN_CLEAR 0x620
> #define SE_GENI_S_CMD0 0x630
> #define SE_GENI_S_CMD_CTRL_REG 0x634
> #define SE_GENI_S_IRQ_STATUS 0x640
> #define SE_GENI_S_IRQ_EN 0x644
> #define SE_GENI_S_IRQ_CLEAR 0x648
> +#define SE_GENI_S_IRQ_EN_SET 0x64c
> +#define SE_GENI_S_IRQ_EN_CLEAR 0x650
> #define SE_GENI_TX_FIFOn 0x700
> #define SE_GENI_RX_FIFOn 0x780
> #define SE_GENI_TX_FIFO_STATUS 0x800
> @@ -101,6 +105,8 @@ struct geni_se {
> #define SE_GENI_RX_WATERMARK_REG 0x810
> #define SE_GENI_RX_RFR_WATERMARK_REG 0x814
> #define SE_GENI_IOS 0x908
> +#define SE_GENI_M_GP_LENGTH 0x910
> +#define SE_GENI_S_GP_LENGTH 0x914
> #define SE_DMA_TX_IRQ_STAT 0xc40
> #define SE_DMA_TX_IRQ_CLR 0xc44
> #define SE_DMA_TX_FSM_RST 0xc58
> --
> 2.45.1.288.g0e0cd299f1-goog
>

2024-06-04 16:06:24

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] serial: qcom-geni: Fix suspend while active UART xfer

Hi,

On Fri, May 31, 2024 at 8:13 AM Ilpo Järvinen
<[email protected]> wrote:
>
> On Thu, 30 May 2024, Douglas Anderson wrote:
>
> > On devices using Qualcomm's GENI UART it is possible to get the UART
> > stuck such that it no longer outputs data. Specifically, I could
> > reproduce this problem by logging in via an agetty on the debug serial
> > port (which was _not_ used for kernel console) and running:
> > cat /var/log/messages
> > ...and then (via an SSH session) forcing a few suspend/resume cycles.
> >
> > Digging into this showed a number of problems that are all related.
> >
> > The root of the problems was with qcom_geni_serial_stop_tx_fifo()
> > which is called as part of the suspend process. Specific problems with
> > that function:
> > - When we cancel an in-progress "tx" command it doesn't appear to
> > fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
> > continued to report that the FIFO wasn't empty. The
> > qcom_geni_serial_start_tx_fifo() function didn't re-enable
> > interrupts in this case so we'd never start transferring again.
> > - We cancelled the current "tx" command but we forgot to zero out
> > "tx_remaining". This confused logic elsewhere in the driver
> > - From experimentation, it appears that cancelling the "tx" command
> > could drop some of the queued up bytes. While maybe not the end of
> > the world, it doesn't seem like we should be dropping bytes when
> > stopping the FIFO, which is defined more of a "pause".
> >
> > One idea to fix the above would be to add FIFO draining to
> > qcom_geni_serial_stop_tx_fifo(). However, digging into the
> > documentation in serial_core.h for stop_tx() makes this seem like the
> > wrong choice. Specifically stop_tx() is called with local interrupts
> > disabled. Waiting for a FIFO (which might be 64 bytes big) to drain at
> > 115.2 kbps doesn't seem like a wise move.
> >
> > Ideally qcom_geni_serial_stop_tx_fifo() would be able to pause the
> > transmitter, but nothing in the documentation for the GENI UART makes
> > me believe that is possible.
> >
> > Given the lack of better choices, we'll change
> > qcom_geni_serial_stop_tx_fifo() to simply disable the
> > TX_FIFO_WATERMARK interrupt and call it a day. This seems OK as per
> > the serial core docs since stop_tx() is supposed to stop transferring
> > bytes "as soon as possible" and there doesn't seem to be any possible
> > way to stop transferring sooner. As part of this, get rid of some of
> > the extra conditions on qcom_geni_serial_start_tx_fifo() which simply
> > weren't needed and are now getting in the way. It's always fine to
> > turn the interrupts on if we want to receive and it'll be up to the
> > IRQ handler to turn them back off if somehow they're not needed. This
> > works fine.
> >
> > Unfortunately, doing just the above change causes new/different
> > problems with suspend/resume. Now if you suspend while an active
> > transfer is happening you can find that after resume time you're no
> > longer receiving UART interrupts at all. It appears to be important to
> > drain the FIFO and send a "cancel" command if the UART is active to
> > avoid this. Since we've already decided that
> > qcom_geni_serial_stop_tx_fifo() shouldn't be doing this, let's add the
> > draining / cancelling logic to the shutdown() call where it should be
> > OK to delay a bit. This is called as part of the suspend process via
> > uart_suspend_port().
> >
> > Finally, with all of the above, the test case where we're spamming the
> > UART with data and going through suspend/resume cycles doesn't kill
> > the UART and doesn't drop bytes.
> >
> > NOTE: though I haven't gone back and validated on ancient code, it
> > appears from code inspection that many of these problems have existed
> > since the start of the driver. In the very least, I could reproduce
> > the problems on vanilla v5.15. The problems don't seem to reproduce
> > when using the serial port for kernel console output and also don't
> > seem to reproduce if nothing is being printed to the console at
> > suspend time, so this is presumably why they were not noticed until
> > now.
>
> Hi,
>
> This was quite tiring to read. :-) It's has lots of useful information but
> it could be structured better.
>
> Could you try to rewrite this entire description so that it's easier to
> find the problem and final solution information from it. Start with those
> two things, and in that part, try to avoid detouring to extra branches you
> took while finding and solving the problem.
>
> You can place how the problem can be reproduced after you've described the
> root cause & final solution first. Extra information why some other
> approaches do not work is also useful information, but please place it
> after the final solution has been covered first.
>
> Also, try to avoid I/you/we, use imperative tone.

Sure. I'll try. It's always a tradeoff between providing too much
information and not providing enough. In general I find that providing
the thought process can help someone else who is likely going to go
through the same thing as they're trying to understand the patch, but
I agree it can also be overwhelming.

Sure. I've attempted to use the imperative tone when possible. In
general (unless my understanding is flawed) it's not possible to use
imperative when explaining to the reader how the hardware/driver works
or what the problem is and (IMO) we shouldn't fully remove these types
of explanations from the commit message. When describing what the
patch actually does, though, I've tried to make sure it's in
imperative form. If you have wording changes on v3 then please suggest
specific changes.

-Doug