Many serial drivers do the same thing:
* send x_char if set
* keep sending from the xmit circular buffer until either
- the loop reaches the end of the xmit buffer
- TX is stopped
- HW fifo is full
* check for pending characters and:
- wake up tty writers to fill for more data into xmit buffer
- stop TX if there is nothing in the xmit buffer
The only differences are:
* how to write the character to the HW fifo
* the check of the end condition:
- is the HW fifo full?
- is limit of the written characters reached?
So unify the above into two helpers:
* uart_port_tx_limit() -- the generic one, it performs the above taking
into account the written characters limit
* uart_port_tx() -- calls the above with ~0 as the limit. So it only
checks the HW fullness.
We need three more hooks in struct uart_ops for all this to work:
* tx_ready() -- returns true if HW can accept more data.
* put_char() -- write a character to the device.
* tx_done() -- when the write loop is done, perform arbitrary action
before potential invocation of ops->stop_tx() happens.
NOTE1: Maybe the three hooks in uart_ops above are overkill. We can
instead pass pointers to the three functions directly to the new helpers
as they are not used elsewhere. Similar to uart_console_write() and its
putchar().
NOTE2: These two new helper functions call the hooks per every character
processed. I was unable to measure any difference, provided most time is
spent by readb (or alike) in the hooks themselves. First, LTO might
help to eliminate these explicit calls (we might need NOTE1 to be
implemented for this to be true). Second, if this turns out to be a
problem, we can introduce a macro to build the helper in the driver's
code instead of serial_core. That is, similar to wait_event().
Signed-off-by: Jiri Slaby <[email protected]>
---
Documentation/driver-api/serial/driver.rst | 28 ++++++++++++
drivers/tty/serial/serial_core.c | 53 ++++++++++++++++++++++
include/linux/serial_core.h | 9 ++++
3 files changed, 90 insertions(+)
diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
index 06ec04ba086f..7dc3791addeb 100644
--- a/Documentation/driver-api/serial/driver.rst
+++ b/Documentation/driver-api/serial/driver.rst
@@ -80,6 +80,34 @@ hardware.
This call must not sleep
+ tx_ready(port)
+ The driver returns true if the HW can accept more data to be sent.
+
+ Locking: port->lock taken.
+
+ Interrupts: locally disabled.
+
+ This call must not sleep.
+
+ put_char(port, ch)
+ The driver is asked to write ch to the device.
+
+ Locking: port->lock taken.
+
+ Interrupts: locally disabled.
+
+ This call must not sleep.
+
+ tx_done(port)
+ When the write loop is done, the driver can perform arbitrary action
+ here before potential invocation of ops->stop_tx() happens.
+
+ Locking: port->lock taken.
+
+ Interrupts: locally disabled.
+
+ This call must not sleep.
+
set_mctrl(port, mctrl)
This function sets the modem control lines for port described
by 'port' to the state described by mctrl. The relevant bits
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 6a8963caf954..1be14e90066c 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -107,6 +107,59 @@ void uart_write_wakeup(struct uart_port *port)
}
EXPORT_SYMBOL(uart_write_wakeup);
+static bool uart_port_tx_always_ready(struct uart_port *port)
+{
+ return true;
+}
+
+/**
+ * uart_port_tx_limit -- transmit helper for uart_port
+ * @port: from which port to transmit
+ * @count: limit count
+ *
+ * uart_port_tx_limit() transmits characters from the xmit buffer to the
+ * hardware using @uart_port::ops::put_char(). It does so until @count
+ * characters are sent and while @uart_port::ops::tx_ready() still returns
+ * non-zero (if non-NULL).
+ *
+ * Return: number of characters in the xmit buffer when done.
+ */
+unsigned int uart_port_tx_limit(struct uart_port *port, unsigned int count)
+{
+ struct circ_buf *xmit = &port->state->xmit;
+ bool (*tx_ready)(struct uart_port *) = port->ops->tx_ready ? :
+ uart_port_tx_always_ready;
+ unsigned int pending;
+
+ for (; count && tx_ready(port); count--, port->icount.tx++) {
+ if (port->x_char) {
+ port->ops->put_char(port, port->x_char);
+ port->x_char = 0;
+ continue;
+ }
+
+ if (uart_circ_empty(xmit) || uart_tx_stopped(port))
+ break;
+
+ port->ops->put_char(port, xmit->buf[xmit->tail]);
+ xmit->tail = (xmit->tail + 1) % UART_XMIT_SIZE;
+ }
+
+ if (port->ops->tx_done)
+ port->ops->tx_done(port);
+
+ pending = uart_circ_chars_pending(xmit);
+ if (pending < WAKEUP_CHARS) {
+ uart_write_wakeup(port);
+
+ if (pending == 0)
+ port->ops->stop_tx(port);
+ }
+
+ return pending;
+}
+EXPORT_SYMBOL(uart_port_tx_limit);
+
static void uart_stop(struct tty_struct *tty)
{
struct uart_state *state = tty->driver_data;
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index d4828e69087a..c990a20b9989 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -37,6 +37,9 @@ struct gpio_desc;
*/
struct uart_ops {
unsigned int (*tx_empty)(struct uart_port *);
+ bool (*tx_ready)(struct uart_port *);
+ void (*put_char)(struct uart_port *, unsigned char ch);
+ void (*tx_done)(struct uart_port *);
void (*set_mctrl)(struct uart_port *, unsigned int mctrl);
unsigned int (*get_mctrl)(struct uart_port *);
void (*stop_tx)(struct uart_port *);
@@ -321,6 +324,12 @@ struct uart_driver {
};
void uart_write_wakeup(struct uart_port *port);
+unsigned int uart_port_tx_limit(struct uart_port *port, unsigned int count);
+
+static inline unsigned int uart_port_tx(struct uart_port *port)
+{
+ return uart_port_tx_limit(port, ~0U);
+}
/*
* Baud rate helpers.
--
2.35.1
On Mon, Apr 11, 2022 at 12:54:03PM +0200, Jiri Slaby wrote:
> Many serial drivers do the same thing:
> * send x_char if set
> * keep sending from the xmit circular buffer until either
> - the loop reaches the end of the xmit buffer
> - TX is stopped
> - HW fifo is full
> * check for pending characters and:
> - wake up tty writers to fill for more data into xmit buffer
> - stop TX if there is nothing in the xmit buffer
>
> The only differences are:
> * how to write the character to the HW fifo
> * the check of the end condition:
> - is the HW fifo full?
> - is limit of the written characters reached?
>
> So unify the above into two helpers:
> * uart_port_tx_limit() -- the generic one, it performs the above taking
> into account the written characters limit
> * uart_port_tx() -- calls the above with ~0 as the limit. So it only
> checks the HW fullness.
>
> We need three more hooks in struct uart_ops for all this to work:
> * tx_ready() -- returns true if HW can accept more data.
> * put_char() -- write a character to the device.
> * tx_done() -- when the write loop is done, perform arbitrary action
> before potential invocation of ops->stop_tx() happens.
>
> NOTE1: Maybe the three hooks in uart_ops above are overkill. We can
> instead pass pointers to the three functions directly to the new helpers
> as they are not used elsewhere. Similar to uart_console_write() and its
> putchar().
>
> NOTE2: These two new helper functions call the hooks per every character
> processed. I was unable to measure any difference, provided most time is
> spent by readb (or alike) in the hooks themselves. First, LTO might
> help to eliminate these explicit calls (we might need NOTE1 to be
> implemented for this to be true). Second, if this turns out to be a
> problem, we can introduce a macro to build the helper in the driver's
> code instead of serial_core. That is, similar to wait_event().
>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> Documentation/driver-api/serial/driver.rst | 28 ++++++++++++
> drivers/tty/serial/serial_core.c | 53 ++++++++++++++++++++++
> include/linux/serial_core.h | 9 ++++
> 3 files changed, 90 insertions(+)
>
> diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
> index 06ec04ba086f..7dc3791addeb 100644
> --- a/Documentation/driver-api/serial/driver.rst
> +++ b/Documentation/driver-api/serial/driver.rst
> @@ -80,6 +80,34 @@ hardware.
>
> This call must not sleep
>
> + tx_ready(port)
> + The driver returns true if the HW can accept more data to be sent.
> +
> + Locking: port->lock taken.
> +
> + Interrupts: locally disabled.
> +
> + This call must not sleep.
> +
> + put_char(port, ch)
> + The driver is asked to write ch to the device.
> +
> + Locking: port->lock taken.
> +
> + Interrupts: locally disabled.
> +
> + This call must not sleep.
> +
> + tx_done(port)
> + When the write loop is done, the driver can perform arbitrary action
> + here before potential invocation of ops->stop_tx() happens.
> +
> + Locking: port->lock taken.
> +
> + Interrupts: locally disabled.
> +
> + This call must not sleep.
> +
> set_mctrl(port, mctrl)
> This function sets the modem control lines for port described
> by 'port' to the state described by mctrl. The relevant bits
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index 6a8963caf954..1be14e90066c 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -107,6 +107,59 @@ void uart_write_wakeup(struct uart_port *port)
> }
> EXPORT_SYMBOL(uart_write_wakeup);
>
> +static bool uart_port_tx_always_ready(struct uart_port *port)
> +{
> + return true;
> +}
> +
> +/**
> + * uart_port_tx_limit -- transmit helper for uart_port
> + * @port: from which port to transmit
> + * @count: limit count
> + *
> + * uart_port_tx_limit() transmits characters from the xmit buffer to the
> + * hardware using @uart_port::ops::put_char(). It does so until @count
> + * characters are sent and while @uart_port::ops::tx_ready() still returns
> + * non-zero (if non-NULL).
> + *
> + * Return: number of characters in the xmit buffer when done.
> + */
> +unsigned int uart_port_tx_limit(struct uart_port *port, unsigned int count)
> +{
> + struct circ_buf *xmit = &port->state->xmit;
> + bool (*tx_ready)(struct uart_port *) = port->ops->tx_ready ? :
> + uart_port_tx_always_ready;
> + unsigned int pending;
> +
> + for (; count && tx_ready(port); count--, port->icount.tx++) {
> + if (port->x_char) {
> + port->ops->put_char(port, port->x_char);
> + port->x_char = 0;
> + continue;
> + }
> +
> + if (uart_circ_empty(xmit) || uart_tx_stopped(port))
> + break;
> +
> + port->ops->put_char(port, xmit->buf[xmit->tail]);
That's a lot of redirection and function pointer mess per each character
sent now. With the spectre overhead here (and only getting worse), this
feels like a step backwards.
I doubt throughput matters here given cpu speeds now, _but_ the cpu load
should go up.
Although on smaller cpus with slower Mhz and faster line rates, this
feels like a lot of extra work happening for no real good reason.
Any benchmarks?
thanks,
greg k-h
On Fri, Apr 15, 2022 at 09:47:26AM +0200, Jiri Slaby wrote:
> On 14. 04. 22, 18:32, Greg KH wrote:
> > On Mon, Apr 11, 2022 at 12:54:03PM +0200, Jiri Slaby wrote:
> > > Many serial drivers do the same thing:
> > > * send x_char if set
> > > * keep sending from the xmit circular buffer until either
> > > - the loop reaches the end of the xmit buffer
> > > - TX is stopped
> > > - HW fifo is full
> > > * check for pending characters and:
> > > - wake up tty writers to fill for more data into xmit buffer
> > > - stop TX if there is nothing in the xmit buffer
> > >
> > > The only differences are:
> > > * how to write the character to the HW fifo
> > > * the check of the end condition:
> > > - is the HW fifo full?
> > > - is limit of the written characters reached?
> > >
> > > So unify the above into two helpers:
> > > * uart_port_tx_limit() -- the generic one, it performs the above taking
> > > into account the written characters limit
> > > * uart_port_tx() -- calls the above with ~0 as the limit. So it only
> > > checks the HW fullness.
> > >
> > > We need three more hooks in struct uart_ops for all this to work:
> > > * tx_ready() -- returns true if HW can accept more data.
> > > * put_char() -- write a character to the device.
> > > * tx_done() -- when the write loop is done, perform arbitrary action
> > > before potential invocation of ops->stop_tx() happens.
> > >
> > > NOTE1: Maybe the three hooks in uart_ops above are overkill. We can
> > > instead pass pointers to the three functions directly to the new helpers
> > > as they are not used elsewhere. Similar to uart_console_write() and its
> > > putchar().
> > >
> > > NOTE2: These two new helper functions call the hooks per every character
> > > processed. I was unable to measure any difference, provided most time is
> > > spent by readb (or alike) in the hooks themselves. First, LTO might
> > > help to eliminate these explicit calls (we might need NOTE1 to be
> > > implemented for this to be true). Second, if this turns out to be a
> > > problem, we can introduce a macro to build the helper in the driver's
> > > code instead of serial_core. That is, similar to wait_event().
> > >
> > > Signed-off-by: Jiri Slaby <[email protected]>
> > > ---
> > > Documentation/driver-api/serial/driver.rst | 28 ++++++++++++
> > > drivers/tty/serial/serial_core.c | 53 ++++++++++++++++++++++
> > > include/linux/serial_core.h | 9 ++++
> > > 3 files changed, 90 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
> > > index 06ec04ba086f..7dc3791addeb 100644
> > > --- a/Documentation/driver-api/serial/driver.rst
> > > +++ b/Documentation/driver-api/serial/driver.rst
> > > @@ -80,6 +80,34 @@ hardware.
> > > This call must not sleep
> > > + tx_ready(port)
> > > + The driver returns true if the HW can accept more data to be sent.
> > > +
> > > + Locking: port->lock taken.
> > > +
> > > + Interrupts: locally disabled.
> > > +
> > > + This call must not sleep.
> > > +
> > > + put_char(port, ch)
> > > + The driver is asked to write ch to the device.
> > > +
> > > + Locking: port->lock taken.
> > > +
> > > + Interrupts: locally disabled.
> > > +
> > > + This call must not sleep.
> > > +
> > > + tx_done(port)
> > > + When the write loop is done, the driver can perform arbitrary action
> > > + here before potential invocation of ops->stop_tx() happens.
> > > +
> > > + Locking: port->lock taken.
> > > +
> > > + Interrupts: locally disabled.
> > > +
> > > + This call must not sleep.
> > > +
> > > set_mctrl(port, mctrl)
> > > This function sets the modem control lines for port described
> > > by 'port' to the state described by mctrl. The relevant bits
> > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > index 6a8963caf954..1be14e90066c 100644
> > > --- a/drivers/tty/serial/serial_core.c
> > > +++ b/drivers/tty/serial/serial_core.c
> > > @@ -107,6 +107,59 @@ void uart_write_wakeup(struct uart_port *port)
> > > }
> > > EXPORT_SYMBOL(uart_write_wakeup);
> > > +static bool uart_port_tx_always_ready(struct uart_port *port)
> > > +{
> > > + return true;
> > > +}
> > > +
> > > +/**
> > > + * uart_port_tx_limit -- transmit helper for uart_port
> > > + * @port: from which port to transmit
> > > + * @count: limit count
> > > + *
> > > + * uart_port_tx_limit() transmits characters from the xmit buffer to the
> > > + * hardware using @uart_port::ops::put_char(). It does so until @count
> > > + * characters are sent and while @uart_port::ops::tx_ready() still returns
> > > + * non-zero (if non-NULL).
> > > + *
> > > + * Return: number of characters in the xmit buffer when done.
> > > + */
> > > +unsigned int uart_port_tx_limit(struct uart_port *port, unsigned int count)
> > > +{
> > > + struct circ_buf *xmit = &port->state->xmit;
> > > + bool (*tx_ready)(struct uart_port *) = port->ops->tx_ready ? :
> > > + uart_port_tx_always_ready;
> > > + unsigned int pending;
> > > +
> > > + for (; count && tx_ready(port); count--, port->icount.tx++) {
> > > + if (port->x_char) {
> > > + port->ops->put_char(port, port->x_char);
> > > + port->x_char = 0;
> > > + continue;
> > > + }
> > > +
> > > + if (uart_circ_empty(xmit) || uart_tx_stopped(port))
> > > + break;
> > > +
> > > + port->ops->put_char(port, xmit->buf[xmit->tail]);
> >
> > That's a lot of redirection and function pointer mess per each character
> > sent now. With the spectre overhead here (and only getting worse), this
> > feels like a step backwards.
> >
> > I doubt throughput matters here given cpu speeds now, _but_ the cpu load
> > should go up.
> >
> > Although on smaller cpus with slower Mhz and faster line rates, this
> > feels like a lot of extra work happening for no real good reason.
>
> I know… Did you miss NOTE2 in the commit log? Any idea on that?
I did see it, but I LTO can not handle function pointer redirection. I
was wondering if you ran any benchmarks to see if this is noticeable.
I am all for making the drivers smaller, but not at the increased
overhead of every character being sent :(
thanks,
greg k-h
On 14. 04. 22, 18:32, Greg KH wrote:
> On Mon, Apr 11, 2022 at 12:54:03PM +0200, Jiri Slaby wrote:
>> Many serial drivers do the same thing:
>> * send x_char if set
>> * keep sending from the xmit circular buffer until either
>> - the loop reaches the end of the xmit buffer
>> - TX is stopped
>> - HW fifo is full
>> * check for pending characters and:
>> - wake up tty writers to fill for more data into xmit buffer
>> - stop TX if there is nothing in the xmit buffer
>>
>> The only differences are:
>> * how to write the character to the HW fifo
>> * the check of the end condition:
>> - is the HW fifo full?
>> - is limit of the written characters reached?
>>
>> So unify the above into two helpers:
>> * uart_port_tx_limit() -- the generic one, it performs the above taking
>> into account the written characters limit
>> * uart_port_tx() -- calls the above with ~0 as the limit. So it only
>> checks the HW fullness.
>>
>> We need three more hooks in struct uart_ops for all this to work:
>> * tx_ready() -- returns true if HW can accept more data.
>> * put_char() -- write a character to the device.
>> * tx_done() -- when the write loop is done, perform arbitrary action
>> before potential invocation of ops->stop_tx() happens.
>>
>> NOTE1: Maybe the three hooks in uart_ops above are overkill. We can
>> instead pass pointers to the three functions directly to the new helpers
>> as they are not used elsewhere. Similar to uart_console_write() and its
>> putchar().
>>
>> NOTE2: These two new helper functions call the hooks per every character
>> processed. I was unable to measure any difference, provided most time is
>> spent by readb (or alike) in the hooks themselves. First, LTO might
>> help to eliminate these explicit calls (we might need NOTE1 to be
>> implemented for this to be true). Second, if this turns out to be a
>> problem, we can introduce a macro to build the helper in the driver's
>> code instead of serial_core. That is, similar to wait_event().
>>
>> Signed-off-by: Jiri Slaby <[email protected]>
>> ---
>> Documentation/driver-api/serial/driver.rst | 28 ++++++++++++
>> drivers/tty/serial/serial_core.c | 53 ++++++++++++++++++++++
>> include/linux/serial_core.h | 9 ++++
>> 3 files changed, 90 insertions(+)
>>
>> diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
>> index 06ec04ba086f..7dc3791addeb 100644
>> --- a/Documentation/driver-api/serial/driver.rst
>> +++ b/Documentation/driver-api/serial/driver.rst
>> @@ -80,6 +80,34 @@ hardware.
>>
>> This call must not sleep
>>
>> + tx_ready(port)
>> + The driver returns true if the HW can accept more data to be sent.
>> +
>> + Locking: port->lock taken.
>> +
>> + Interrupts: locally disabled.
>> +
>> + This call must not sleep.
>> +
>> + put_char(port, ch)
>> + The driver is asked to write ch to the device.
>> +
>> + Locking: port->lock taken.
>> +
>> + Interrupts: locally disabled.
>> +
>> + This call must not sleep.
>> +
>> + tx_done(port)
>> + When the write loop is done, the driver can perform arbitrary action
>> + here before potential invocation of ops->stop_tx() happens.
>> +
>> + Locking: port->lock taken.
>> +
>> + Interrupts: locally disabled.
>> +
>> + This call must not sleep.
>> +
>> set_mctrl(port, mctrl)
>> This function sets the modem control lines for port described
>> by 'port' to the state described by mctrl. The relevant bits
>> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
>> index 6a8963caf954..1be14e90066c 100644
>> --- a/drivers/tty/serial/serial_core.c
>> +++ b/drivers/tty/serial/serial_core.c
>> @@ -107,6 +107,59 @@ void uart_write_wakeup(struct uart_port *port)
>> }
>> EXPORT_SYMBOL(uart_write_wakeup);
>>
>> +static bool uart_port_tx_always_ready(struct uart_port *port)
>> +{
>> + return true;
>> +}
>> +
>> +/**
>> + * uart_port_tx_limit -- transmit helper for uart_port
>> + * @port: from which port to transmit
>> + * @count: limit count
>> + *
>> + * uart_port_tx_limit() transmits characters from the xmit buffer to the
>> + * hardware using @uart_port::ops::put_char(). It does so until @count
>> + * characters are sent and while @uart_port::ops::tx_ready() still returns
>> + * non-zero (if non-NULL).
>> + *
>> + * Return: number of characters in the xmit buffer when done.
>> + */
>> +unsigned int uart_port_tx_limit(struct uart_port *port, unsigned int count)
>> +{
>> + struct circ_buf *xmit = &port->state->xmit;
>> + bool (*tx_ready)(struct uart_port *) = port->ops->tx_ready ? :
>> + uart_port_tx_always_ready;
>> + unsigned int pending;
>> +
>> + for (; count && tx_ready(port); count--, port->icount.tx++) {
>> + if (port->x_char) {
>> + port->ops->put_char(port, port->x_char);
>> + port->x_char = 0;
>> + continue;
>> + }
>> +
>> + if (uart_circ_empty(xmit) || uart_tx_stopped(port))
>> + break;
>> +
>> + port->ops->put_char(port, xmit->buf[xmit->tail]);
>
> That's a lot of redirection and function pointer mess per each character
> sent now. With the spectre overhead here (and only getting worse), this
> feels like a step backwards.
>
> I doubt throughput matters here given cpu speeds now, _but_ the cpu load
> should go up.
>
> Although on smaller cpus with slower Mhz and faster line rates, this
> feels like a lot of extra work happening for no real good reason.
I know… Did you miss NOTE2 in the commit log? Any idea on that?
thanks,
--
js
suse labs