2024-06-10 22:25:46

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 0/8] serial: qcom-geni: Overhaul TX handling to fix crashes/hangs


While trying to reproduce -EBUSY errors that our lab was getting in
suspend/resume testing, I ended up finding a whole pile of problems
with the Qualcomm GENI serial driver. I've posted a fix for the -EBUSY
issue separately [1]. This series is fixing all of the Qualcomm GENI
problems that I found.

As far as I can tell most of the problems have been in the Qualcomm
GENI serial driver since inception, but it can be noted that the
behavior got worse with the new kfifo changes. Previously when the OS
took data out of the circular queue we'd just spit stale data onto the
serial port. Now we'll hard lockup. :-P

I've tried to break this series up as much as possible to make it
easier to understand but the final patch is still a lot of change at
once. Hopefully it's OK.

[1] https://lore.kernel.org/r/20240530084841.v2.1.I2395e66cf70c6e67d774c56943825c289b9c13e4@changeid

Changes in v4:
- Add GP_LENGTH field definition.
- Fix indentation.
- GENMASK(31, 0) -> GP_LENGTH.
- Use uart_fifo_timeout_ms() for timeout.
- tty: serial: Add uart_fifo_timeout_ms()

Changes in v3:
- 0xffffffff => GENMASK(31, 0)
- Reword commit message.
- Use uart_fifo_timeout() for timeout.

Changes in v2:
- Totally rework / rename patch to handle suspend while active xfer
- serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()
- serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()
- serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()
- serial: qcom-geni: Just set the watermark level once
- serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups
- soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers

Douglas Anderson (8):
soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers
tty: serial: Add uart_fifo_timeout_ms()
serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()
serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()
serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()
serial: qcom-geni: Just set the watermark level once
serial: qcom-geni: Fix suspend while active UART xfer
serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

drivers/tty/serial/qcom_geni_serial.c | 322 +++++++++++++++-----------
include/linux/serial_core.h | 15 +-
include/linux/soc/qcom/geni-se.h | 9 +
3 files changed, 206 insertions(+), 140 deletions(-)

--
2.45.2.505.gda0bf45e8d-goog



2024-06-10 22:25:56

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 1/8] soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers

For UART devices the M_GP_LENGTH is the TX word count. For other
devices this is the transaction word count.

For UART devices the S_GP_LENGTH is the RX word count.

The IRQ_EN set/clear registers allow you to set or clear bits in the
IRQ_EN register without needing a read-modify-write.

Acked-by: Bjorn Andersson <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
Since these new definitions are used in the future UART patches and
Bjorn has Acked them, I'd expect them to go through the same tree as
the UART patches that need them.

Note: in v4 I added the GP_LENGTH but kept Bjorn's Ack since it seemed
very minor.

Changes in v4:
- Add GP_LENGTH field definition.

Changes in v2:
- New

include/linux/soc/qcom/geni-se.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/soc/qcom/geni-se.h b/include/linux/soc/qcom/geni-se.h
index 0f038a1a0330..c3bca9c0bf2c 100644
--- a/include/linux/soc/qcom/geni-se.h
+++ b/include/linux/soc/qcom/geni-se.h
@@ -88,11 +88,15 @@ struct geni_se {
#define SE_GENI_M_IRQ_STATUS 0x610
#define SE_GENI_M_IRQ_EN 0x614
#define SE_GENI_M_IRQ_CLEAR 0x618
+#define SE_GENI_M_IRQ_EN_SET 0x61c
+#define SE_GENI_M_IRQ_EN_CLEAR 0x620
#define SE_GENI_S_CMD0 0x630
#define SE_GENI_S_CMD_CTRL_REG 0x634
#define SE_GENI_S_IRQ_STATUS 0x640
#define SE_GENI_S_IRQ_EN 0x644
#define SE_GENI_S_IRQ_CLEAR 0x648
+#define SE_GENI_S_IRQ_EN_SET 0x64c
+#define SE_GENI_S_IRQ_EN_CLEAR 0x650
#define SE_GENI_TX_FIFOn 0x700
#define SE_GENI_RX_FIFOn 0x780
#define SE_GENI_TX_FIFO_STATUS 0x800
@@ -101,6 +105,8 @@ struct geni_se {
#define SE_GENI_RX_WATERMARK_REG 0x810
#define SE_GENI_RX_RFR_WATERMARK_REG 0x814
#define SE_GENI_IOS 0x908
+#define SE_GENI_M_GP_LENGTH 0x910
+#define SE_GENI_S_GP_LENGTH 0x914
#define SE_DMA_TX_IRQ_STAT 0xc40
#define SE_DMA_TX_IRQ_CLR 0xc44
#define SE_DMA_TX_FSM_RST 0xc58
@@ -234,6 +240,9 @@ struct geni_se {
#define IO2_DATA_IN BIT(1)
#define RX_DATA_IN BIT(0)

+/* SE_GENI_M_GP_LENGTH and SE_GENI_S_GP_LENGTH fields */
+#define GP_LENGTH GENMASK(31, 0)
+
/* SE_DMA_TX_IRQ_STAT Register fields */
#define TX_DMA_DONE BIT(0)
#define TX_EOT BIT(1)
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:26:16

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

The current uart_fifo_timeout() returns jiffies, which is not always
the most convenient for callers. Add a variant uart_fifo_timeout_ms()
that returns the timeout in milliseconds.

NOTES:
- msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
because msecs_to_jiffies() is actually intended for device drivers
to calculate timeout value. This means we don't need to take the max
of the timeout and "1" since the timeout will always be > 0 ms (we
add 20 ms of slop).
- uart_fifo_timeout_ms() returns "unsigned int" but we leave
uart_fifo_timeout() returning "unsigned long". This matches the
types of msecs_to_jiffies().

Suggested-by: Ilpo Järvinen <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v4:
- New

include/linux/serial_core.h | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 8cb65f50e830..97968acfd564 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
/*
* Calculates FIFO drain time.
*/
-static inline unsigned long uart_fifo_timeout(struct uart_port *port)
+static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
{
u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
+ unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);

- /* Add .02 seconds of slop */
- fifo_timeout += 20 * NSEC_PER_MSEC;
+ /*
+ * Add .02 seconds of slop. This also helps account for the fact that
+ * when we converted from ns to ms that we didn't round up.
+ */
+ return fifo_timeout_ms + 20;
+}

- return max(nsecs_to_jiffies(fifo_timeout), 1UL);
+static inline unsigned long uart_fifo_timeout(struct uart_port *port)
+{
+ return msecs_to_jiffies(uart_fifo_timeout_ms(port));
}

/* Base timer interval for polling */
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:01

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 5/8] serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield()

With a small modification the qcom_geni_serial_poll_bit() function
could be used to poll more than just a single bit. Let's generalize
it. We'll make the qcom_geni_serial_poll_bit() into just a wrapper of
the general function.

Signed-off-by: Douglas Anderson <[email protected]>
---
The new function isn't used yet (except by the wrapper) but will be
used in a future change.

(no changes since v2)

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index db933a1549ad..bd03b998ed04 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -264,8 +264,8 @@ static bool qcom_geni_serial_secondary_active(struct uart_port *uport)
return readl(uport->membase + SE_GENI_STATUS) & S_GENI_CMD_ACTIVE;
}

-static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
- unsigned int offset, u32 field, bool set)
+static bool qcom_geni_serial_poll_bitfield(struct uart_port *uport,
+ unsigned int offset, u32 field, u32 val)
{
u32 reg;
unsigned long timeout_us;
@@ -294,7 +294,7 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
*/
while (timeout_us) {
reg = readl(uport->membase + offset);
- if ((bool)(reg & field) == set)
+ if ((reg & field) == val)
return true;
udelay(10);
timeout_us -= 10;
@@ -302,6 +302,12 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
return false;
}

+static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
+ unsigned int offset, u32 field, bool set)
+{
+ return qcom_geni_serial_poll_bitfield(uport, offset, field, set ? field : 0);
+}
+
static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size)
{
u32 m_cmd;
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:03

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 4/8] serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit()

The "offset" passed in should be unsigned since it's always a positive
offset from our memory mapped IO.

The "field" should be u32 since we're anding it with a 32-bit value
read from the device.

Suggested-by: Stephen Boyd <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---

(no changes since v2)

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index e44edf63db78..db933a1549ad 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -265,7 +265,7 @@ static bool qcom_geni_serial_secondary_active(struct uart_port *uport)
}

static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
- int offset, int field, bool set)
+ unsigned int offset, u32 field, bool set)
{
u32 reg;
unsigned long timeout_us;
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:27

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 6/8] serial: qcom-geni: Just set the watermark level once

There's no reason to set the TX watermark level to 0 when we disable
TX since we're disabling the interrupt anyway. Just set the watermark
level once at init time and leave it alone.

Signed-off-by: Douglas Anderson <[email protected]>
---

(no changes since v2)

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index bd03b998ed04..132669a2da34 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -391,7 +391,6 @@ static int qcom_geni_serial_get_char(struct uart_port *uport)
static void qcom_geni_serial_poll_put_char(struct uart_port *uport,
unsigned char c)
{
- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_setup_tx(uport, 1);
WARN_ON(!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
M_TX_FIFO_WATERMARK_EN, true));
@@ -435,7 +434,6 @@ __qcom_geni_serial_console_write(struct uart_port *uport, const char *s,
bytes_to_send++;
}

- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_setup_tx(uport, bytes_to_send);
for (i = 0; i < count; ) {
size_t chars_to_write = 0;
@@ -663,7 +661,6 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;

- writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}

@@ -674,7 +671,6 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)

irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
- writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
/* Possible stop tx is called multiple times. */
if (!qcom_geni_serial_main_active(uport))
@@ -1126,6 +1122,7 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport)
false, true, true);
geni_se_init(&port->se, UART_RX_WM, port->rx_fifo_depth - 2);
geni_se_select_mode(&port->se, port->dev_data->mode);
+ writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_start_rx(uport);
port->setup = true;

--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:35

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 3/8] serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit()

The qcom_geni_serial_poll_bit() is supposed to be able to be used to
poll a bit that's will become set when a TX transfer finishes. Because
of this it tries to set its timeout based on how long the UART will
take to shift out all of the queued bytes. There are two problems
here:
1. There appears to be a hidden extra word on the firmware side which
is the word that the firmware has already taken out of the FIFO and
is currently shifting out. We need to account for this.
2. The timeout calculation was assuming that it would only need 8 bits
on the wire to shift out 1 byte. This isn't true. Typically 10 bits
are used (8 data bits, 1 start and 1 stop bit), but as much as 13
bits could be used (14 if we allowed 9 bits per byte, which we
don't).

The too-short timeout was seen causing problems in a future patch
which more properly waited for bytes to transfer out of the UART
before cancelling.

Rather than fix the calculation, replace it with the core-provided
uart_fifo_timeout() function.

NOTE: during earlycon, uart_fifo_timeout() has the same limitations
about not being able to figure out the exact timeout that the old
function did. Luckily uart_fifo_timeout() returns the same default
timeout of 20ms in this case. We'll add a comment about it, though, to
make it more obvious what's happening.

Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Suggested-by: Ilpo Järvinen <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v4:
- Use uart_fifo_timeout_ms() for timeout.

Changes in v3:
- Use uart_fifo_timeout() for timeout.

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 38 +++++++++++++--------------
1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..e44edf63db78 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -124,7 +124,6 @@ struct qcom_geni_serial_port {
dma_addr_t tx_dma_addr;
dma_addr_t rx_dma_addr;
bool setup;
- unsigned int baud;
unsigned long clk_rate;
void *rx_buf;
u32 loopback;
@@ -269,30 +268,30 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
int offset, int field, bool set)
{
u32 reg;
- struct qcom_geni_serial_port *port;
- unsigned int baud;
- unsigned int fifo_bits;
- unsigned long timeout_us = 20000;
- struct qcom_geni_private_data *private_data = uport->private_data;
+ unsigned long timeout_us;

- if (private_data->drv) {
- port = to_dev_port(uport);
- baud = port->baud;
- if (!baud)
- baud = 115200;
- fifo_bits = port->tx_fifo_depth * port->tx_fifo_width;
- /*
- * Total polling iterations based on FIFO worth of bytes to be
- * sent at current baud. Add a little fluff to the wait.
- */
- timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500;
- }
+ /*
+ * This function is used to poll bits, some of which (like CMD_DONE)
+ * might take as long as it takes for the FIFO plus the temp register
+ * on the geni side to drain. The Linux core calculates such a timeout
+ * for us and we can get it from uart_fifo_timeout().
+ *
+ * It should be noted that during earlycon the variables that
+ * uart_fifo_timeout() makes use of in "uport" may not be setup yet.
+ * It's difficult to set things up for earlycon since it can't
+ * necessarily figure out the baud rate and reading the FIFO depth
+ * from the wrapper means some extra MMIO maps that we don't get by
+ * default. This isn't a big problem, though, since uart_fifo_timeout()
+ * gives back its "slop" of 20ms as a minimum and that should be
+ * plenty of time for earlycon unless we're running at an extremely
+ * low baud rate.
+ */
+ timeout_us = uart_fifo_timeout_ms(uport) * USEC_PER_MSEC;

/*
* Use custom implementation instead of readl_poll_atomic since ktimer
* is not ready at the time of early console.
*/
- timeout_us = DIV_ROUND_UP(timeout_us, 10) * 10;
while (timeout_us) {
reg = readl(uport->membase + offset);
if ((bool)(reg & field) == set)
@@ -1224,7 +1223,6 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport,
qcom_geni_serial_stop_rx(uport);
/* baud rate */
baud = uart_get_baud_rate(uport, termios, old, 300, 4000000);
- port->baud = baud;

sampling_rate = UART_OVERSAMPLING;
/* Sampling rate is halved for IP versions >= 2.5 */
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:41

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 7/8] serial: qcom-geni: Fix suspend while active UART xfer

On devices using Qualcomm's GENI UART it is possible to get the UART
stuck such that it no longer outputs data. Specifically, logging in
via an agetty on the debug serial port (which was _not_ used for
kernel console) and running:
cat /var/log/messages
...and then (via an SSH session) forcing a few suspend/resume cycles
causes the UART to stop transmitting.

The root of the problems was with qcom_geni_serial_stop_tx_fifo()
which is called as part of the suspend process. Specific problems with
that function:
- When an in-progress "tx" command is cancelled it doesn't appear to
fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
continued to report that the FIFO wasn't empty. The
qcom_geni_serial_start_tx_fifo() function didn't re-enable
interrupts in this case so the driver would never start transferring
again.
- When the driver cancelled the current "tx" command but it forgot to
zero out "tx_remaining". This confused logic elsewhere in the
driver.
- From experimentation, it appears that cancelling the "tx" command
could drop some of the queued up bytes.

While qcom_geni_serial_stop_tx_fifo() could be fixed to drain the FIFO
and shut things down properly, stop_tx() isn't supposed to be a slow
function. It is run with local interrupts off and is documented to
stop transmitting "as soon as possible". Change the function to just
stop new bytes from being queued. In order to make this work, change
qcom_geni_serial_start_tx_fifo() to remove some conditions. It's
always safe to enable the watermark interrupt and the IRQ handler will
disable it if it's not needed.

For system suspend the queue still needs to be drained. Failure to do
so means that the hardware won't provide new interrupts until a
"cancel" command is sent. Add draining logic (fixing the issues noted
above) at suspend time.

NOTE: It would be ideal if qcom_geni_serial_stop_tx_fifo() could
"pause" the transmitter right away. There is no obvious way to do this
in the docs and experimentation didn't find any tricks either, so
stopping TX "as soon as possible" isn't very soon but is the best
possible.

Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <[email protected]>
---
There are still a number of problems with GENI UART after this but
I've kept this change separate to make it easier to understand.
Specifically on mainline just hitting "Ctrl-C" after dumping
/var/log/messages to the serial port hangs things after the kfifo
changes. Those issues will be addressed in future patches.

It should also be noted that the "Fixes" tag here is a bit of a
swag. I haven't gone and tested on ancient code, but at least the
problems exist on kernel 5.15 and much of the code touched here has
been here since the beginning, or at least since as long as the driver
was stable.

Changes in v4:
- Fix indentation.
- GENMASK(31, 0) -> GP_LENGTH.

Changes in v3:
- 0xffffffff => GENMASK(31, 0)
- Reword commit message.

Changes in v2:
- Totally rework / rename patch to handle suspend while active xfer

drivers/tty/serial/qcom_geni_serial.c | 97 +++++++++++++++++++++------
1 file changed, 75 insertions(+), 22 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 132669a2da34..1a66424f0f5f 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -130,6 +130,7 @@ struct qcom_geni_serial_port {
bool brk;

unsigned int tx_remaining;
+ unsigned int tx_total;
int wakeup_irq;
bool rx_tx_swap;
bool cts_rts_swap;
@@ -310,11 +311,14 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,

static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size)
{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
u32 m_cmd;

writel(xmit_size, uport->membase + SE_UART_TX_TRANS_LEN);
m_cmd = UART_START_TX << M_OPCODE_SHFT;
writel(m_cmd, uport->membase + SE_GENI_M_CMD0);
+
+ port->tx_total = xmit_size;
}

static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
@@ -334,6 +338,64 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR);
}

+static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
+ /*
+ * If the main sequencer is inactive it means that the TX command has
+ * been completed and all bytes have been sent. Nothing to do in that
+ * case.
+ */
+ if (!qcom_geni_serial_main_active(uport))
+ return;
+
+ /*
+ * Wait until the FIFO has been drained. We've already taken bytes out
+ * of the higher level queue in qcom_geni_serial_send_chunk_fifo() so
+ * if we don't drain the FIFO but send the "cancel" below they seem to
+ * get lost.
+ */
+ qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, GP_LENGTH,
+ port->tx_total - port->tx_remaining);
+
+ /*
+ * If clearing the FIFO made us inactive then we're done--no need for
+ * a cancel.
+ */
+ if (!qcom_geni_serial_main_active(uport))
+ return;
+
+ /*
+ * Cancel the current command. After this the main sequencer will
+ * stop reporting that it's active and we'll have to start a new
+ * transfer command.
+ *
+ * If we skip doing this cancel and then continue with a system
+ * suspend while there's an active command in the main sequencer
+ * then after resume time we won't get any more interrupts on the
+ * main sequencer until we send the cancel.
+ */
+ geni_se_cancel_m_cmd(&port->se);
+ if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
+ M_CMD_CANCEL_EN, true)) {
+ /* The cancel failed; try an abort as a fallback. */
+ geni_se_abort_m_cmd(&port->se);
+ qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
+ M_CMD_ABORT_EN, true);
+ writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ }
+ writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ /*
+ * We've cancelled the current command. "tx_remaining" stores how
+ * many bytes are left to finish in the current command so we know
+ * when to start a new command. Since the command was cancelled we
+ * need to zero "tx_remaining".
+ */
+ port->tx_remaining = 0;
+}
+
static void qcom_geni_serial_abort_rx(struct uart_port *uport)
{
u32 irq_clear = S_CMD_DONE_EN | S_CMD_ABORT_EN;
@@ -654,37 +716,18 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
u32 irq_en;

- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
-
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}

static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);

irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
- if (!qcom_geni_serial_main_active(uport))
- return;
-
- geni_se_cancel_m_cmd(&port->se);
- if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_CMD_CANCEL_EN, true)) {
- geni_se_abort_m_cmd(&port->se);
- qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_CMD_ABORT_EN, true);
- writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
- }
- writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}

static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1066,7 +1109,15 @@ static int setup_fifos(struct qcom_geni_serial_port *port)
}


-static void qcom_geni_serial_shutdown(struct uart_port *uport)
+static void qcom_geni_serial_shutdown_dma(struct uart_port *uport)
+{
+ disable_irq(uport->irq);
+
+ qcom_geni_serial_stop_tx(uport);
+ qcom_geni_serial_stop_rx(uport);
+}
+
+static void qcom_geni_serial_shutdown_fifo(struct uart_port *uport)
{
disable_irq(uport->irq);

@@ -1075,6 +1126,8 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)

qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_drain_tx_fifo(uport);
}

static int qcom_geni_serial_port_setup(struct uart_port *uport)
@@ -1532,7 +1585,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.startup = qcom_geni_serial_startup,
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
- .shutdown = qcom_geni_serial_shutdown,
+ .shutdown = qcom_geni_serial_shutdown_fifo,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
@@ -1554,7 +1607,7 @@ static const struct uart_ops qcom_geni_uart_pops = {
.startup = qcom_geni_serial_startup,
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
- .shutdown = qcom_geni_serial_shutdown,
+ .shutdown = qcom_geni_serial_shutdown_dma,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
--
2.45.2.505.gda0bf45e8d-goog


2024-06-10 22:27:44

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v4 8/8] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

The fact that the Qualcomm GENI hardware interface is based around
"packets" is really awkward to fit into Linux's UART design.
Specifically, in order to send bytes you need to start up a new
"command" saying how many bytes you want to send and then you need to
send all those bytes. Once you've committed to sending that number of
bytes it's very awkward to change your mind and send fewer, especially
if you want to do so without dropping bytes on the ground.

There may be a few cases where you might want to send fewer bytes than
you originally expected:
1. You might want to interrupt the transfer with something higher
priority, like the kernel console or kdb.
2. You might want to enter system suspend.
3. The user might have killed the program that had queued bytes for
sending over the UART.

Despite this awkwardness the Linux driver has still tried to send
bytes using large transfers. Whenever the driver started a new
transfer it would look at the number of bytes in the OS's queue and
start a transfer for that many. The idea of using larger transfers is
that it should be more efficient. When you're in the middle of a large
transfer you can get interrupted when the hardware FIFO is close to
empty and add more bytes in. Whenever you get to the end of a transfer
you have to wait until the transfer is totally done before you can add
more bytes and, depending on interrupt latency, that can cause the
UART to idle a bit.

Unfortunately there were lots of corner cases that the Linux driver
didn't handle.

One problem with the current driver is that if the user killed the
program that queued bytes for sending over the UART then bad things
would happen. Before commit 1788cf6a91d9 ("tty: serial: switch from
circ_buf to kfifo") we'd just send stale data out the UART. After that
commit we'll hard lockup.

Another problem with the current driver can be seen if you queue a
bunch of data to the UART and enter kdb. Specifically on a device
_without_ kernel console on the UART, with an agetty on the UART, and
with kgdb on the UART, doing `cat /var/log/messages` and then dropping
into kdb and resuming caused console output to stop.

Give up on trying to use large transfers in FIFO mode on GENI UART
since there doesn't appear to be any way to solve these problems
cleanly. Visually inspecting the console output even after these
patches doesn't show any big pauses.

In order to make this all work:
- Switch the watermark interrupt to just being used to prime the TX
pump. Once transfers are running, use "done" to queue the next
batch. As part of this, change the watermark to fire whenever the
queue is empty.
- Never queue more than what can fit in the FIFO. This means we don't
need to keep track of a command we're partway through.
- For the console code and kgdb code where we can safely block while
the queue empties, just do that rather than trying to queue a
command when one was already in progress (which didn't work so well
and is why there were some weird/awkward hacks in
qcom_geni_serial_console_write()).
- Leave the CMD_DONE interrupt enabled all the time since there's
never any reason we don't want to see it.
- Start using the "SE_GENI_M_IRQ_EN_SET" and "SE_GENI_M_IRQ_EN_CLEAR"
registers to avoid read-modify-write of the "SE_GENI_M_IRQ_EN"
register. This could be done in more of the driver if needed but for
now just update code that's touched.

Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Signed-off-by: Douglas Anderson <[email protected]>
---
I'm listing two "fixes" commits here. The first is the kfifo change
since it is very easy to see a hardlockup after that change. Almost
certainly anyone with the kfifo patch wants this patch. I've also
listed a much earlier patch as one being fixed since that was the one
that made us send larger transfers.

I've tested this commit on an sc7180-trogdor board both with and
without kernel console going to the UART. I've tested across some
suspend/resume cycles and with kgdb. I've also confirmed that
bluetooth, which uses the DMA paths in this driver, continues to work.
That all being said, a lot of things change here so I'd love any
testing folks want to do.

I'm not explicitly CCing stable here. The only truly terrible problem
is the hardlockup introduced by the kfifo change. The rest of the
issue have been around for years. If someone wants the fixes ported
back to stable that's fine but IMO unless you're seeing problems it's
not 100% required.

(no changes since v3)

Changes in v3:
- Reword commit message.

Changes in v2:
- New

drivers/tty/serial/qcom_geni_serial.c | 192 +++++++++++++-------------
1 file changed, 94 insertions(+), 98 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 1a66424f0f5f..9d71296eae11 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -78,7 +78,7 @@
#define GENI_UART_CONS_PORTS 1
#define GENI_UART_PORTS 3
#define DEF_FIFO_DEPTH_WORDS 16
-#define DEF_TX_WM 2
+#define DEF_TX_WM 1
#define DEF_FIFO_WIDTH_BITS 32
#define UART_RX_WM 2

@@ -128,8 +128,8 @@ struct qcom_geni_serial_port {
void *rx_buf;
u32 loopback;
bool brk;
+ bool tx_fifo_stopped;

- unsigned int tx_remaining;
unsigned int tx_total;
int wakeup_irq;
bool rx_tx_swap;
@@ -336,6 +336,14 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport)
M_CMD_ABORT_EN, true);
}
writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ /*
+ * Re-enable the TX watermark interrupt when we clear the "done"
+ * in case we were waiting on the "done" bit before starting a new
+ * command. The interrupt routine will re-disable this if it's not
+ * appropriate.
+ */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
}

static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
@@ -357,7 +365,7 @@ static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
* get lost.
*/
qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, GP_LENGTH,
- port->tx_total - port->tx_remaining);
+ port->tx_total);

/*
* If clearing the FIFO made us inactive then we're done--no need for
@@ -386,14 +394,6 @@ static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
-
- /*
- * We've cancelled the current command. "tx_remaining" stores how
- * many bytes are left to finish in the current command so we know
- * when to start a new command. Since the command was cancelled we
- * need to zero "tx_remaining".
- */
- port->tx_remaining = 0;
}

static void qcom_geni_serial_abort_rx(struct uart_port *uport)
@@ -453,11 +453,12 @@ static int qcom_geni_serial_get_char(struct uart_port *uport)
static void qcom_geni_serial_poll_put_char(struct uart_port *uport,
unsigned char c)
{
+ qcom_geni_serial_drain_tx_fifo(uport);
+
qcom_geni_serial_setup_tx(uport, 1);
WARN_ON(!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
M_TX_FIFO_WATERMARK_EN, true));
writel(c, uport->membase + SE_GENI_TX_FIFOn);
- writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
qcom_geni_serial_poll_tx_done(uport);
}
#endif
@@ -487,6 +488,8 @@ __qcom_geni_serial_console_write(struct uart_port *uport, const char *s,
int i;
u32 bytes_to_send = count;

+ qcom_geni_serial_drain_tx_fifo(uport);
+
for (i = 0; i < count; i++) {
/*
* uart_console_write() adds a carriage return for each newline.
@@ -537,7 +540,6 @@ static void qcom_geni_serial_console_write(struct console *co, const char *s,
bool locked = true;
unsigned long flags;
u32 geni_status;
- u32 irq_en;

WARN_ON(co->index < 0 || co->index >= GENI_UART_CONS_PORTS);

@@ -553,38 +555,10 @@ static void qcom_geni_serial_console_write(struct console *co, const char *s,

geni_status = readl(uport->membase + SE_GENI_STATUS);

- if (!locked) {
- /*
- * We can only get here if an oops is in progress then we were
- * unable to get the lock. This means we can't safely access
- * our state variables like tx_remaining. About the best we
- * can do is wait for the FIFO to be empty before we start our
- * transfer, so we'll do that.
- */
- qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
- M_TX_FIFO_NOT_EMPTY_EN, false);
- } else if ((geni_status & M_GENI_CMD_ACTIVE) && !port->tx_remaining) {
- /*
- * It seems we can't interrupt existing transfers if all data
- * has been sent, in which case we need to look for done first.
- */
- qcom_geni_serial_poll_tx_done(uport);
-
- if (!kfifo_is_empty(&uport->state->port.xmit_fifo)) {
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- writel(irq_en | M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
- }
- }
-
__qcom_geni_serial_console_write(uport, s, count);

-
- if (locked) {
- if (port->tx_remaining)
- qcom_geni_serial_setup_tx(uport, port->tx_remaining);
+ if (locked)
uart_port_unlock_irqrestore(uport, flags);
- }
}

static void handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
@@ -661,9 +635,9 @@ static void qcom_geni_serial_stop_tx_dma(struct uart_port *uport)

if (port->tx_dma_addr) {
geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr,
- port->tx_remaining);
+ port->tx_total);
port->tx_dma_addr = 0;
- port->tx_remaining = 0;
+ port->tx_total = 0;
}

geni_se_cancel_m_cmd(&port->se);
@@ -708,26 +682,27 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
qcom_geni_serial_stop_tx_dma(uport);
return;
}
-
- port->tx_remaining = xmit_size;
}

static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
- u32 irq_en;
+ struct qcom_geni_serial_port *port = to_dev_port(uport);

- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
- writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
+ port->tx_fifo_stopped = false;
+
+ /* Prime the pump to get data flowing. */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
}

static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
- u32 irq_en;
+ struct qcom_geni_serial_port *port = to_dev_port(uport);

- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
- writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
+ /*
+ * We can't do anything to safely pause the bytes that have already
+ * been queued up so just set a flag saying we shouldn't queue any more.
+ */
+ port->tx_fifo_stopped = true;
}

static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -895,10 +870,20 @@ static void qcom_geni_serial_stop_tx(struct uart_port *uport)
uport->ops->stop_tx(uport);
}

+static void qcom_geni_serial_enable_cmd_done(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
+ /* If we're not in FIFO mode we don't use CMD_DONE. */
+ if (port->dev_data->mode != GENI_SE_FIFO)
+ return;
+
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_EN_SET);
+}
+
static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
unsigned int chunk)
{
- struct qcom_geni_serial_port *port = to_dev_port(uport);
unsigned int tx_bytes, remaining = chunk;
u8 buf[BYTES_PER_FIFO_WORD];

@@ -911,52 +896,74 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);

remaining -= tx_bytes;
- port->tx_remaining -= tx_bytes;
}
}

-static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
- bool done, bool active)
+static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
struct tty_port *tport = &uport->state->port;
size_t avail;
size_t pending;
u32 status;
- u32 irq_en;
unsigned int chunk;
+ bool active;

- status = readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
-
- /* Complete the current tx command before taking newly added data */
- if (active)
- pending = port->tx_remaining;
- else
- pending = kfifo_len(&tport->xmit_fifo);
+ /*
+ * The TX watermark interrupt is only used to "prime the pump" for
+ * transfers. Once transfers have been kicked off we always use the
+ * "done" interrupt to queue the next batch. Once were here we can
+ * always disable the TX watermark interrupt.
+ *
+ * NOTE: we use the TX watermark in this way because we don't ever
+ * kick off TX transfers larger than we can stuff into the FIFO. This
+ * is because bytes from the OS's circular queue can disappear and
+ * there's no known safe/non-blocking way to cancel the larger
+ * transfer when bytes disappear. See qcom_geni_serial_drain_tx_fifo()
+ * for an example of a safe (but blocking) way to drain, but that's
+ * not appropriate in an IRQ handler. We also can't just kick off one
+ * large transfer and queue bytes whenever because we're using 4 bytes
+ * per FIFO word and thus we can only queue non-multiple-of-4 bytes as
+ * in the last word of a transfer.
+ */
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_EN_CLEAR);

- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
- qcom_geni_serial_stop_tx_fifo(uport);
+ /*
+ * If we've got an active TX command running then we expect to still
+ * see the "done" bit in the future and we can't kick off another
+ * transfer till then. Bail. NOTE: it's important that we read "active"
+ * after we've cleared the "done" interrupt (which the caller already
+ * did for us) so that we know that if we show as non-active we're
+ * guaranteed to later get "done".
+ *
+ * If nothing is pending we _also_ want to bail. Later start_tx()
+ * will start transfers again by temporarily turning on the TX
+ * watermark.
+ */
+ active = readl(uport->membase + SE_GENI_STATUS) & M_GENI_CMD_ACTIVE;
+ pending = port->tx_fifo_stopped ? 0 : kfifo_len(&tport->xmit_fifo);
+ if (active || !pending)
goto out_write_wakeup;
- }

+ /* Calculate how much space is available in the FIFO right now. */
+ status = readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
avail *= BYTES_PER_FIFO_WORD;

- chunk = min(avail, pending);
- if (!chunk)
+ /*
+ * It's a bit odd if we get here and have bytes pending and we're
+ * handling a "done" or "TX watermark" interrupt but we don't
+ * have space in the FIFO. Stick in a warning and bail.
+ */
+ if (!avail) {
+ dev_warn(uport->dev, "FIFO unexpectedly out of space\n");
goto out_write_wakeup;
-
- if (!port->tx_remaining) {
- qcom_geni_serial_setup_tx(uport, pending);
- port->tx_remaining = pending;
-
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- if (!(irq_en & M_TX_FIFO_WATERMARK_EN))
- writel(irq_en | M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
}

+
+ /* We're ready to throw some bytes into the FIFO. */
+ chunk = min(avail, pending);
+ qcom_geni_serial_setup_tx(uport, chunk);
qcom_geni_serial_send_chunk_fifo(uport, chunk);

/*
@@ -964,17 +971,9 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
* cleared it in qcom_geni_serial_isr it will have already reasserted
* so we must clear it again here after our writes.
*/
- writel(M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_CLEAR);
+ writel(M_TX_FIFO_WATERMARK_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);

out_write_wakeup:
- if (!port->tx_remaining) {
- irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
- if (irq_en & M_TX_FIFO_WATERMARK_EN)
- writel(irq_en & ~M_TX_FIFO_WATERMARK_EN,
- uport->membase + SE_GENI_M_IRQ_EN);
- }
-
if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS)
uart_write_wakeup(uport);
}
@@ -984,10 +983,10 @@ static void qcom_geni_serial_handle_tx_dma(struct uart_port *uport)
struct qcom_geni_serial_port *port = to_dev_port(uport);
struct tty_port *tport = &uport->state->port;

- uart_xmit_advance(uport, port->tx_remaining);
- geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr, port->tx_remaining);
+ uart_xmit_advance(uport, port->tx_total);
+ geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr, port->tx_total);
port->tx_dma_addr = 0;
- port->tx_remaining = 0;
+ port->tx_total = 0;

if (!kfifo_is_empty(&tport->xmit_fifo))
qcom_geni_serial_start_tx_dma(uport);
@@ -1001,7 +1000,6 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
u32 m_irq_en;
u32 m_irq_status;
u32 s_irq_status;
- u32 geni_status;
u32 dma;
u32 dma_tx_status;
u32 dma_rx_status;
@@ -1019,7 +1017,6 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
s_irq_status = readl(uport->membase + SE_GENI_S_IRQ_STATUS);
dma_tx_status = readl(uport->membase + SE_DMA_TX_IRQ_STAT);
dma_rx_status = readl(uport->membase + SE_DMA_RX_IRQ_STAT);
- geni_status = readl(uport->membase + SE_GENI_STATUS);
dma = readl(uport->membase + SE_GENI_DMA_MODE_EN);
m_irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
writel(m_irq_status, uport->membase + SE_GENI_M_IRQ_CLEAR);
@@ -1066,9 +1063,7 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
} else {
if (m_irq_status & m_irq_en &
(M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN))
- qcom_geni_serial_handle_tx_fifo(uport,
- m_irq_status & M_CMD_DONE_EN,
- geni_status & M_GENI_CMD_ACTIVE);
+ qcom_geni_serial_handle_tx_fifo(uport);

if (s_irq_status & (S_RX_FIFO_WATERMARK_EN | S_RX_FIFO_LAST_EN))
qcom_geni_serial_handle_rx_fifo(uport, drop_rx);
@@ -1176,6 +1171,7 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport)
geni_se_init(&port->se, UART_RX_WM, port->rx_fifo_depth - 2);
geni_se_select_mode(&port->se, port->dev_data->mode);
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
+ qcom_geni_serial_enable_cmd_done(uport);
qcom_geni_serial_start_rx(uport);
port->setup = true;

--
2.45.2.505.gda0bf45e8d-goog


2024-06-12 11:53:27

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

On Mon, 10 Jun 2024, Douglas Anderson wrote:

> The current uart_fifo_timeout() returns jiffies, which is not always
> the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> that returns the timeout in milliseconds.
>
> NOTES:
> - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> because msecs_to_jiffies() is actually intended for device drivers
> to calculate timeout value. This means we don't need to take the max
> of the timeout and "1" since the timeout will always be > 0 ms (we
> add 20 ms of slop).
> - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> uart_fifo_timeout() returning "unsigned long". This matches the
> types of msecs_to_jiffies().
>
> Suggested-by: Ilpo Järvinen <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v4:
> - New
>
> include/linux/serial_core.h | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> index 8cb65f50e830..97968acfd564 100644
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> /*
> * Calculates FIFO drain time.
> */
> -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> {
> u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> + unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
>
> - /* Add .02 seconds of slop */
> - fifo_timeout += 20 * NSEC_PER_MSEC;
> + /*
> + * Add .02 seconds of slop. This also helps account for the fact that
> + * when we converted from ns to ms that we didn't round up.
> + */
> + return fifo_timeout_ms + 20;
> +}
>
> - return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> +{
> + return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> }

Hi,

This is definitely towards the right direction! However, it now does
double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
would be better to retain the nsecs version (maybe rename it to _ns for
consistency) and add _ms variant that does the nsec -> msec conversion.

--
i.

2024-06-12 19:16:51

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

Hi,

On Wed, Jun 12, 2024 at 12:38 AM Ilpo Järvinen
<[email protected]> wrote:
>
> On Mon, 10 Jun 2024, Douglas Anderson wrote:
>
> > The current uart_fifo_timeout() returns jiffies, which is not always
> > the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> > that returns the timeout in milliseconds.
> >
> > NOTES:
> > - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> > because msecs_to_jiffies() is actually intended for device drivers
> > to calculate timeout value. This means we don't need to take the max
> > of the timeout and "1" since the timeout will always be > 0 ms (we
> > add 20 ms of slop).
> > - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> > uart_fifo_timeout() returning "unsigned long". This matches the
> > types of msecs_to_jiffies().
> >
> > Suggested-by: Ilpo Järvinen <[email protected]>
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> >
> > Changes in v4:
> > - New
> >
> > include/linux/serial_core.h | 15 +++++++++++----
> > 1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > index 8cb65f50e830..97968acfd564 100644
> > --- a/include/linux/serial_core.h
> > +++ b/include/linux/serial_core.h
> > @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> > /*
> > * Calculates FIFO drain time.
> > */
> > -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> > {
> > u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> > + unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
> >
> > - /* Add .02 seconds of slop */
> > - fifo_timeout += 20 * NSEC_PER_MSEC;
> > + /*
> > + * Add .02 seconds of slop. This also helps account for the fact that
> > + * when we converted from ns to ms that we didn't round up.
> > + */
> > + return fifo_timeout_ms + 20;
> > +}
> >
> > - return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> > +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > +{
> > + return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> > }
>
> Hi,
>
> This is definitely towards the right direction! However, it now does
> double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
> would be better to retain the nsecs version (maybe rename it to _ns for
> consistency) and add _ms variant that does the nsec -> msec conversion.

I spent a bit of time thinking about it and I don't agree. If you feel
very strongly about it or someone else wants to jump in and break the
tie then I can look again, but:

1. The comment before nsecs_to_jiffies() specifically states that it's
not supposed to be used for this purpose. Specifically, it says:

* Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
* And this doesn't return MAX_JIFFY_OFFSET since this function is designed
* for scheduler, not for use in device drivers to calculate timeout value.

...so switching away from nsecs_to_jiffies() to msecs_to_jiffies() is
arguably a "bugfix", or at least avoids using the API in a way that's
against the documentation.

2. As mentioned in the commit message, nsecs_to_jiffies() truncates
where msecs_to_jiffies() rounds up. Presumably this difference is
related to the comment above where the "ns" version is intended for
the scheduler. Using the "ms" version allows us to get rid of the
extra call to "max()" which is a benefit. Technically since the
timeout is at least 20 ms the minimum HZ is 100 I guess we didn't need
the max anyway, but I guess someone thought it was cleaner and now we
can definitely get rid of it.

3. These functions are inline anyway, so I don't think it's causing a
huge bloat of instructions. In fact, moving from 64-bit math to 32-bit
math sooner could make the code smaller.

4. I don't feel like it hurts the readability to convert down to ms
and then to jiffies. In fact, IMO it helps since it makes it more
obvious that we're working with ms.

2024-06-13 06:56:21

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

On Wed, 12 Jun 2024, Doug Anderson wrote:
> On Wed, Jun 12, 2024 at 12:38 AM Ilpo Järvinen
> <[email protected]> wrote:
> >
> > On Mon, 10 Jun 2024, Douglas Anderson wrote:
> >
> > > The current uart_fifo_timeout() returns jiffies, which is not always
> > > the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> > > that returns the timeout in milliseconds.
> > >
> > > NOTES:
> > > - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> > > because msecs_to_jiffies() is actually intended for device drivers
> > > to calculate timeout value. This means we don't need to take the max
> > > of the timeout and "1" since the timeout will always be > 0 ms (we
> > > add 20 ms of slop).
> > > - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> > > uart_fifo_timeout() returning "unsigned long". This matches the
> > > types of msecs_to_jiffies().
> > >
> > > Suggested-by: Ilpo Järvinen <[email protected]>
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > ---
> > >
> > > Changes in v4:
> > > - New
> > >
> > > include/linux/serial_core.h | 15 +++++++++++----
> > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > > index 8cb65f50e830..97968acfd564 100644
> > > --- a/include/linux/serial_core.h
> > > +++ b/include/linux/serial_core.h
> > > @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> > > /*
> > > * Calculates FIFO drain time.
> > > */
> > > -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> > > {
> > > u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> > > + unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
> > >
> > > - /* Add .02 seconds of slop */
> > > - fifo_timeout += 20 * NSEC_PER_MSEC;
> > > + /*
> > > + * Add .02 seconds of slop. This also helps account for the fact that
> > > + * when we converted from ns to ms that we didn't round up.
> > > + */
> > > + return fifo_timeout_ms + 20;
> > > +}
> > >
> > > - return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> > > +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > +{
> > > + return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> > > }
> >
> > Hi,
> >
> > This is definitely towards the right direction! However, it now does
> > double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
> > would be better to retain the nsecs version (maybe rename it to _ns for
> > consistency) and add _ms variant that does the nsec -> msec conversion.
>
> I spent a bit of time thinking about it and I don't agree. If you feel
> very strongly about it or someone else wants to jump in and break the
> tie then I can look again, but:
>
> 1. The comment before nsecs_to_jiffies() specifically states that it's
> not supposed to be used for this purpose. Specifically, it says:
>
> * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
> * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
> * for scheduler, not for use in device drivers to calculate timeout value.
>
> ...so switching away from nsecs_to_jiffies() to msecs_to_jiffies() is
> arguably a "bugfix", or at least avoids using the API in a way that's
> against the documentation.

Okay, I see. However, there's no way around using u64 here even with your
version that does not use nsecs_to_jiffies() because nsecs is the most
useful form of input when starting from frame_time, usecs is a bit
coarse-grained for higher data rates.

> 2. As mentioned in the commit message, nsecs_to_jiffies() truncates
> where msecs_to_jiffies() rounds up. Presumably this difference is
> related to the comment above where the "ns" version is intended for
> the scheduler. Using the "ms" version allows us to get rid of the
> extra call to "max()" which is a benefit. Technically since the
> timeout is at least 20 ms the minimum HZ is 100 I guess we didn't need
> the max anyway, but I guess someone thought it was cleaner and now we
> can definitely get rid of it.
>
> 3. These functions are inline anyway, so I don't think it's causing a
> huge bloat of instructions. In fact, moving from 64-bit math to 32-bit
> math sooner could make the code smaller.
>
> 4. I don't feel like it hurts the readability to convert down to ms
> and then to jiffies. In fact, IMO it helps since it makes it more
> obvious that we're working with ms.

I'd be lying if I'd say I feel strongly about it but my only argument
involves doing an extra divide which is somewhat costly. It's a
plain 32-bit divide though so not as bad as the u64 one that is
unavoidable.


--
i.

2024-06-13 14:03:41

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

Hi,

On Wed, Jun 12, 2024 at 11:56 PM Ilpo Järvinen
<[email protected]> wrote:
>
> On Wed, 12 Jun 2024, Doug Anderson wrote:
> > On Wed, Jun 12, 2024 at 12:38 AM Ilpo Järvinen
> > <[email protected]> wrote:
> > >
> > > On Mon, 10 Jun 2024, Douglas Anderson wrote:
> > >
> > > > The current uart_fifo_timeout() returns jiffies, which is not always
> > > > the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> > > > that returns the timeout in milliseconds.
> > > >
> > > > NOTES:
> > > > - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> > > > because msecs_to_jiffies() is actually intended for device drivers
> > > > to calculate timeout value. This means we don't need to take the max
> > > > of the timeout and "1" since the timeout will always be > 0 ms (we
> > > > add 20 ms of slop).
> > > > - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> > > > uart_fifo_timeout() returning "unsigned long". This matches the
> > > > types of msecs_to_jiffies().
> > > >
> > > > Suggested-by: Ilpo Järvinen <[email protected]>
> > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > ---
> > > >
> > > > Changes in v4:
> > > > - New
> > > >
> > > > include/linux/serial_core.h | 15 +++++++++++----
> > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > > > index 8cb65f50e830..97968acfd564 100644
> > > > --- a/include/linux/serial_core.h
> > > > +++ b/include/linux/serial_core.h
> > > > @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> > > > /*
> > > > * Calculates FIFO drain time.
> > > > */
> > > > -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> > > > {
> > > > u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> > > > + unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
> > > >
> > > > - /* Add .02 seconds of slop */
> > > > - fifo_timeout += 20 * NSEC_PER_MSEC;
> > > > + /*
> > > > + * Add .02 seconds of slop. This also helps account for the fact that
> > > > + * when we converted from ns to ms that we didn't round up.
> > > > + */
> > > > + return fifo_timeout_ms + 20;
> > > > +}
> > > >
> > > > - return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> > > > +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +{
> > > > + return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> > > > }
> > >
> > > Hi,
> > >
> > > This is definitely towards the right direction! However, it now does
> > > double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
> > > would be better to retain the nsecs version (maybe rename it to _ns for
> > > consistency) and add _ms variant that does the nsec -> msec conversion.
> >
> > I spent a bit of time thinking about it and I don't agree. If you feel
> > very strongly about it or someone else wants to jump in and break the
> > tie then I can look again, but:
> >
> > 1. The comment before nsecs_to_jiffies() specifically states that it's
> > not supposed to be used for this purpose. Specifically, it says:
> >
> > * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
> > * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
> > * for scheduler, not for use in device drivers to calculate timeout value.
> >
> > ...so switching away from nsecs_to_jiffies() to msecs_to_jiffies() is
> > arguably a "bugfix", or at least avoids using the API in a way that's
> > against the documentation.
>
> Okay, I see. However, there's no way around using u64 here even with your
> version that does not use nsecs_to_jiffies() because nsecs is the most
> useful form of input when starting from frame_time, usecs is a bit
> coarse-grained for higher data rates.

Right. We have to start with u64 because the frame time is in ns and
we can only fit ~4 seconds worth of ns in 32-bits. That seems iffy.


> > 2. As mentioned in the commit message, nsecs_to_jiffies() truncates
> > where msecs_to_jiffies() rounds up. Presumably this difference is
> > related to the comment above where the "ns" version is intended for
> > the scheduler. Using the "ms" version allows us to get rid of the
> > extra call to "max()" which is a benefit. Technically since the
> > timeout is at least 20 ms the minimum HZ is 100 I guess we didn't need
> > the max anyway, but I guess someone thought it was cleaner and now we
> > can definitely get rid of it.
> >
> > 3. These functions are inline anyway, so I don't think it's causing a
> > huge bloat of instructions. In fact, moving from 64-bit math to 32-bit
> > math sooner could make the code smaller.
> >
> > 4. I don't feel like it hurts the readability to convert down to ms
> > and then to jiffies. In fact, IMO it helps since it makes it more
> > obvious that we're working with ms.
>
> I'd be lying if I'd say I feel strongly about it

Fair enough. If someone wants to throw in an opinion and tiebreak then they can.


> but my only argument
> involves doing an extra divide which is somewhat costly. It's a
> plain 32-bit divide though so not as bad as the u64 one that is
> unavoidable.

We shouldn't be calling this in a loop anyway, so it's unlikely to
matter. In any case, I'd note that with the old code we had:

1. 64-bit multiply (time * fifosize)
2. 64-bit addition (result + 20ms)
3. 64-bit => 32-bit division (to jiffies)
4. 32-bit comparison against the value 1.
5. Conditional setting of the value to 1.

Now we have:

1. 64-bit multiply (time * fifosize)
2. 64-bit => 32-bit division (to ms)
3. 32-bit addition with a small immediate (20)
4. 32-bit addition (div round up) if HZ != 1000
5. 32-bit division (div round up) if HZ != 1000

I didn't try disassembling to see what the compiler did and it would
be different for each compiler / ISA / optimization level / value of
HZ, but I guess my point is that while we have one more divide (unless
HZ == 1000) we may have one less conditional. We're also tending to do
our math with small immediates which some ISAs can handle more
efficiently.

I think the real answer, though, is that this doesn't really matter
and that we should pick the solution that's cleaner/easier to
understand. I'm still in favor of the patch as it is. As I said, if
folks feel really strongly then it doesn't matter and I can change it,
but otherwise I'd rather keep it the way it is.


-Doug