2024-02-19 22:35:08

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 0/5] spi: add support for pre-cooking messages

This is a follow-up to [1] where it was suggested to break down the
proposed SPI offload support into smaller series.

This takes on the first suggested task of introducing an API to
"pre-cook" SPI messages. This idea was first discussed extensively in
2013 [2][3] and revisited more briefly 2022 [4].

The goal here is to be able to improve performance (higher throughput,
and reduced CPU usage) by allowing peripheral drivers that use the
same struct spi_message repeatedly to "pre-cook" the message once to
avoid repeating the same validation, and possibly other operations each
time the message is sent.

This series includes __spi_validate() and the automatic splitting of
xfers in the optimizations. Another frequently suggested optimization
is doing DMA mapping only once. This is not included in this series, but
can be added later (preferably by someone with a real use case for it).

To show how this all works and get some real-world measurements, this
series includes the core changes, optimization of a SPI controller
driver, and optimization of an ADC driver. This test case was only able
to take advantage of the single validation optimization, since it didn't
require splitting transfers. With these changes, CPU usage of the
threaded interrupt handler, which calls spi_sync(), was reduced from
83% to 73% while at the same time the sample rate (frequency of SPI
xfers) was increased from 20kHz to 25kHz.

[1]: https://lore.kernel.org/linux-spi/[email protected]/T/
[2]: https://lore.kernel.org/linux-spi/[email protected]/T/
[3]: https://lore.kernel.org/linux-spi/[email protected]/T/
[4]: https://lore.kernel.org/linux-spi/20220525163946.48ea40c9@erd992/T/

---
Changes in v2:
- Removed pre_optimized parameter from __spi_optimize_message()
- Added comment explaining purpose of pre_optimized flag
- Fixed missing doc comment for @pre_optimized
- Removed kernel doc inclusion (/** -> /*) from static members
- Removed unrelated comment about calling spi_finalize_current_message()
- Reworked IIO driver patch
- Link to v1: https://lore.kernel.org/r/20240212-mainline-spi-precook-message-v1-0-a2373cd72d36@baylibre.com

---
David Lechner (5):
spi: add spi_optimize_message() APIs
spi: move splitting transfers to spi_optimize_message()
spi: stm32: move splitting transfers to optimize_message
spi: axi-spi-engine: move message compile to optimize_message
iio: adc: ad7380: use spi_optimize_message()

drivers/iio/adc/ad7380.c | 36 +++++-
drivers/spi/spi-axi-spi-engine.c | 40 +++---
drivers/spi/spi-stm32.c | 28 +++--
drivers/spi/spi.c | 259 ++++++++++++++++++++++++++++++++-------
include/linux/spi/spi.h | 20 +++
5 files changed, 297 insertions(+), 86 deletions(-)
---
base-commit: 55072343f1df834879b8bae9e419cd5cbb5f3259
prerequisite-patch-id: 844c06b6caf25a2724e130dfa7999dc90dd26fde
change-id: 20240208-mainline-spi-precook-message-189b2f08ba7f


2024-02-19 22:35:37

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 2/5] spi: move splitting transfers to spi_optimize_message()

Splitting transfers is an expensive operation so we can potentially
optimize it by doing it only once per optimization of the message
instead of repeating each time the message is transferred.

The transfer splitting functions are currently the only user of
spi_res_alloc() so spi_res_release() can be safely moved at this time
from spi_finalize_current_message() to spi_unoptimize_message().

The doc comments of the public functions for splitting transfers are
also updated so that callers will know when it is safe to call them
to ensure proper resource management.

Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: David Lechner <[email protected]>
---

v2 changes:
- Changed line break for multiline if condition
- Removed kernel doc inclusion (/** -> /*) from static members
- Picked up Jonathan's Reviewed-by

drivers/spi/spi.c | 110 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 68 insertions(+), 42 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index f68d92b57543..ba4d3fde2054 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1747,38 +1747,6 @@ static int __spi_pump_transfer_message(struct spi_controller *ctlr,

trace_spi_message_start(msg);

- /*
- * If an SPI controller does not support toggling the CS line on each
- * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
- * for the CS line, we can emulate the CS-per-word hardware function by
- * splitting transfers into one-word transfers and ensuring that
- * cs_change is set for each transfer.
- */
- if ((msg->spi->mode & SPI_CS_WORD) && (!(ctlr->mode_bits & SPI_CS_WORD) ||
- spi_is_csgpiod(msg->spi))) {
- ret = spi_split_transfers_maxwords(ctlr, msg, 1);
- if (ret) {
- msg->status = ret;
- spi_finalize_current_message(ctlr);
- return ret;
- }
-
- list_for_each_entry(xfer, &msg->transfers, transfer_list) {
- /* Don't change cs_change on the last entry in the list */
- if (list_is_last(&xfer->transfer_list, &msg->transfers))
- break;
- xfer->cs_change = 1;
- }
- } else {
- ret = spi_split_transfers_maxsize(ctlr, msg,
- spi_max_transfer_size(msg->spi));
- if (ret) {
- msg->status = ret;
- spi_finalize_current_message(ctlr);
- return ret;
- }
- }
-
if (ctlr->prepare_message) {
ret = ctlr->prepare_message(ctlr, msg);
if (ret) {
@@ -2124,6 +2092,8 @@ static void __spi_unoptimize_message(struct spi_message *msg)
if (ctlr->unoptimize_message)
ctlr->unoptimize_message(msg);

+ spi_res_release(ctlr, msg);
+
msg->optimized = false;
msg->opt_state = NULL;
}
@@ -2169,15 +2139,6 @@ void spi_finalize_current_message(struct spi_controller *ctlr)

spi_unmap_msg(ctlr, mesg);

- /*
- * In the prepare_messages callback the SPI bus has the opportunity
- * to split a transfer to smaller chunks.
- *
- * Release the split transfers here since spi_map_msg() is done on
- * the split transfers.
- */
- spi_res_release(ctlr, mesg);
-
if (mesg->prepared && ctlr->unprepare_message) {
ret = ctlr->unprepare_message(ctlr, mesg);
if (ret) {
@@ -3819,6 +3780,10 @@ static int __spi_split_transfer_maxsize(struct spi_controller *ctlr,
* @msg: the @spi_message to transform
* @maxsize: the maximum when to apply this
*
+ * This function allocates resources that are automatically freed during the
+ * spi message unoptimize phase so this function should only be called from
+ * optimize_message callbacks.
+ *
* Return: status of transformation
*/
int spi_split_transfers_maxsize(struct spi_controller *ctlr,
@@ -3857,6 +3822,10 @@ EXPORT_SYMBOL_GPL(spi_split_transfers_maxsize);
* @msg: the @spi_message to transform
* @maxwords: the number of words to limit each transfer to
*
+ * This function allocates resources that are automatically freed during the
+ * spi message unoptimize phase so this function should only be called from
+ * optimize_message callbacks.
+ *
* Return: status of transformation
*/
int spi_split_transfers_maxwords(struct spi_controller *ctlr,
@@ -4231,6 +4200,57 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
return 0;
}

+/*
+ * spi_split_transfers - generic handling of transfer splitting
+ * @msg: the message to split
+ *
+ * Under certain conditions, a SPI controller may not support arbitrary
+ * transfer sizes or other features required by a peripheral. This function
+ * will split the transfers in the message into smaller transfers that are
+ * supported by the controller.
+ *
+ * Controllers with special requirements not covered here can also split
+ * transfers in the optimize_message() callback.
+ *
+ * Context: can sleep
+ * Return: zero on success, else a negative error code
+ */
+static int spi_split_transfers(struct spi_message *msg)
+{
+ struct spi_controller *ctlr = msg->spi->controller;
+ struct spi_transfer *xfer;
+ int ret;
+
+ /*
+ * If an SPI controller does not support toggling the CS line on each
+ * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
+ * for the CS line, we can emulate the CS-per-word hardware function by
+ * splitting transfers into one-word transfers and ensuring that
+ * cs_change is set for each transfer.
+ */
+ if ((msg->spi->mode & SPI_CS_WORD) &&
+ (!(ctlr->mode_bits & SPI_CS_WORD) || spi_is_csgpiod(msg->spi))) {
+ ret = spi_split_transfers_maxwords(ctlr, msg, 1);
+ if (ret)
+ return ret;
+
+ list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+ /* Don't change cs_change on the last entry in the list */
+ if (list_is_last(&xfer->transfer_list, &msg->transfers))
+ break;
+
+ xfer->cs_change = 1;
+ }
+ } else {
+ ret = spi_split_transfers_maxsize(ctlr, msg,
+ spi_max_transfer_size(msg->spi));
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* __spi_optimize_message - shared implementation for spi_optimize_message()
* and spi_maybe_optimize_message()
@@ -4254,10 +4274,16 @@ static int __spi_optimize_message(struct spi_device *spi,
if (ret)
return ret;

+ ret = spi_split_transfers(msg);
+ if (ret)
+ return ret;
+
if (ctlr->optimize_message) {
ret = ctlr->optimize_message(msg);
- if (ret)
+ if (ret) {
+ spi_res_release(ctlr, msg);
return ret;
+ }
}

msg->optimized = true;

--
2.43.2


2024-02-19 22:35:37

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 1/5] spi: add spi_optimize_message() APIs

This adds a new spi_optimize_message() function that can be used to
optimize SPI messages that are used more than once. Peripheral drivers
that use the same message multiple times can use this API to perform SPI
message validation and controller-specific optimizations once and then
reuse the message while avoiding the overhead of revalidating the
message on each spi_(a)sync() call.

Internally, the SPI core will also call this function for each message
if the peripheral driver did not explicitly call it. This is done to so
that controller drivers don't have to have multiple code paths for
optimized and non-optimized messages.

A hook is provided for controller drivers to perform controller-specific
optimizations.

Suggested-by: Martin Sperl <[email protected]>
Link: https://lore.kernel.org/linux-spi/[email protected]/
Signed-off-by: David Lechner <[email protected]>
---

v2 changes:
- Removed pre_optimized parameter from __spi_optimize_message()
- Added comment explaining purpose of pre_optimized flag
- Fixed missing doc comment for @pre_optimized
- Removed kernel doc inclusion (/** -> /*) from static members
- Removed unrelated comment about calling spi_finalize_current_message()

drivers/spi/spi.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/spi/spi.h | 20 +++++++
2 files changed, 167 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c2b10e2c75f0..f68d92b57543 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct spi_controller *ctlr)
}
EXPORT_SYMBOL_GPL(spi_get_next_queued_message);

+/*
+ * __spi_unoptimize_message - shared implementation of spi_unoptimize_message()
+ * and spi_maybe_unoptimize_message()
+ * @msg: the message to unoptimize
+ *
+ * Peripheral drivers should use spi_unoptimize_message() and callers inside
+ * core should use spi_maybe_unoptimize_message() rather than calling this
+ * function directly.
+ *
+ * It is not valid to call this on a message that is not currently optimized.
+ */
+static void __spi_unoptimize_message(struct spi_message *msg)
+{
+ struct spi_controller *ctlr = msg->spi->controller;
+
+ if (ctlr->unoptimize_message)
+ ctlr->unoptimize_message(msg);
+
+ msg->optimized = false;
+ msg->opt_state = NULL;
+}
+
+/*
+ * spi_maybe_unoptimize_message - unoptimize msg not managed by a peripheral
+ * @msg: the message to unoptimize
+ *
+ * This function is used to unoptimize a message if and only if it was
+ * optimized by the core (via spi_maybe_optimize_message()).
+ */
+static void spi_maybe_unoptimize_message(struct spi_message *msg)
+{
+ if (!msg->pre_optimized && msg->optimized)
+ __spi_unoptimize_message(msg);
+}
+
/**
* spi_finalize_current_message() - the current message is complete
* @ctlr: the controller to return the message to
@@ -2153,6 +2188,8 @@ void spi_finalize_current_message(struct spi_controller *ctlr)

mesg->prepared = false;

+ spi_maybe_unoptimize_message(mesg);
+
WRITE_ONCE(ctlr->cur_msg_incomplete, false);
smp_mb(); /* See __spi_pump_transfer_message()... */
if (READ_ONCE(ctlr->cur_msg_need_completion))
@@ -4194,6 +4231,110 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
return 0;
}

+/*
+ * __spi_optimize_message - shared implementation for spi_optimize_message()
+ * and spi_maybe_optimize_message()
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ *
+ * Peripheral drivers will call spi_optimize_message() and the spi core will
+ * call spi_maybe_optimize_message() instead of calling this directly.
+ *
+ * It is not valid to call this on a message that has already been optimized.
+ *
+ * Return: zero on success, else a negative error code
+ */
+static int __spi_optimize_message(struct spi_device *spi,
+ struct spi_message *msg)
+{
+ struct spi_controller *ctlr = spi->controller;
+ int ret;
+
+ ret = __spi_validate(spi, msg);
+ if (ret)
+ return ret;
+
+ if (ctlr->optimize_message) {
+ ret = ctlr->optimize_message(msg);
+ if (ret)
+ return ret;
+ }
+
+ msg->optimized = true;
+
+ return 0;
+}
+
+/*
+ * spi_maybe_optimize_message - optimize message if it isn't already pre-optimized
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ * Return: zero on success, else a negative error code
+ */
+static int spi_maybe_optimize_message(struct spi_device *spi,
+ struct spi_message *msg)
+{
+ if (msg->pre_optimized)
+ return 0;
+
+ return __spi_optimize_message(spi, msg);
+}
+
+/**
+ * spi_optimize_message - do any one-time validation and setup for a SPI message
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ *
+ * Peripheral drivers that reuse the same message repeatedly may call this to
+ * perform as much message prep as possible once, rather than repeating it each
+ * time a message transfer is performed to improve throughput and reduce CPU
+ * usage.
+ *
+ * Once a message has been optimized, it cannot be modified with the exception
+ * of updating the contents of any xfer->tx_buf (the pointer can't be changed,
+ * only the data in the memory it points to).
+ *
+ * Calls to this function must be balanced with calls to spi_unoptimize_message()
+ * to avoid leaking resources.
+ *
+ * Context: can sleep
+ * Return: zero on success, else a negative error code
+ */
+int spi_optimize_message(struct spi_device *spi, struct spi_message *msg)
+{
+ int ret;
+
+ ret = __spi_optimize_message(spi, msg);
+ if (ret)
+ return ret;
+
+ /*
+ * This flag indicates that the peripheral driver called spi_optimize_message()
+ * and therefore we shouldn't unoptimize message automatically when finalizing
+ * the message but rather wait until spi_unoptimize_message() is called
+ * by the peripheral driver.
+ */
+ msg->pre_optimized = true;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(spi_optimize_message);
+
+/**
+ * spi_unoptimize_message - releases any resources allocated by spi_optimize_message()
+ * @msg: the message to unoptimize
+ *
+ * Calls to this function must be balanced with calls to spi_optimize_message().
+ *
+ * Context: can sleep
+ */
+void spi_unoptimize_message(struct spi_message *msg)
+{
+ __spi_unoptimize_message(msg);
+ msg->pre_optimized = false;
+}
+EXPORT_SYMBOL_GPL(spi_unoptimize_message);
+
static int __spi_async(struct spi_device *spi, struct spi_message *message)
{
struct spi_controller *ctlr = spi->controller;
@@ -4258,8 +4399,8 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
int ret;
unsigned long flags;

- ret = __spi_validate(spi, message);
- if (ret != 0)
+ ret = spi_maybe_optimize_message(spi, message);
+ if (ret)
return ret;

spin_lock_irqsave(&ctlr->bus_lock_spinlock, flags);
@@ -4271,6 +4412,8 @@ int spi_async(struct spi_device *spi, struct spi_message *message)

spin_unlock_irqrestore(&ctlr->bus_lock_spinlock, flags);

+ spi_maybe_unoptimize_message(message);
+
return ret;
}
EXPORT_SYMBOL_GPL(spi_async);
@@ -4331,8 +4474,8 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
return -ESHUTDOWN;
}

- status = __spi_validate(spi, message);
- if (status != 0)
+ status = spi_maybe_optimize_message(spi, message);
+ if (status)
return status;

SPI_STATISTICS_INCREMENT_FIELD(ctlr->pcpu_statistics, spi_sync);
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 2b8e2746769a..ddfb66dd4caf 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -475,6 +475,8 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch
*
* @set_cs: set the logic level of the chip select line. May be called
* from interrupt context.
+ * @optimize_message: optimize the message for reuse
+ * @unoptimize_message: release resources allocated by optimize_message
* @prepare_message: set up the controller to transfer a single message,
* for example doing DMA mapping. Called from threaded
* context.
@@ -715,6 +717,8 @@ struct spi_controller {
struct completion xfer_completion;
size_t max_dma_len;

+ int (*optimize_message)(struct spi_message *msg);
+ int (*unoptimize_message)(struct spi_message *msg);
int (*prepare_transfer_hardware)(struct spi_controller *ctlr);
int (*transfer_one_message)(struct spi_controller *ctlr,
struct spi_message *mesg);
@@ -1111,6 +1115,8 @@ struct spi_transfer {
* @spi: SPI device to which the transaction is queued
* @is_dma_mapped: if true, the caller provided both DMA and CPU virtual
* addresses for each transfer buffer
+ * @pre_optimized: peripheral driver pre-optimized the message
+ * @optimized: the message is in the optimized state
* @prepared: spi_prepare_message was called for the this message
* @status: zero for success, else negative errno
* @complete: called to report transaction completions
@@ -1120,6 +1126,7 @@ struct spi_transfer {
* successful segments
* @queue: for use by whichever driver currently owns the message
* @state: for use by whichever driver currently owns the message
+ * @opt_state: for use by whichever driver currently owns the message
* @resources: for resource management when the SPI message is processed
*
* A @spi_message is used to execute an atomic sequence of data transfers,
@@ -1143,6 +1150,11 @@ struct spi_message {

unsigned is_dma_mapped:1;

+ /* spi_optimize_message() was called for this message */
+ bool pre_optimized;
+ /* __spi_optimize_message() was called for this message */
+ bool optimized;
+
/* spi_prepare_message() was called for this message */
bool prepared;

@@ -1172,6 +1184,11 @@ struct spi_message {
*/
struct list_head queue;
void *state;
+ /*
+ * Optional state for use by controller driver between calls to
+ * __spi_optimize_message() and __spi_unoptimize_message().
+ */
+ void *opt_state;

/* List of spi_res resources when the SPI message is processed */
struct list_head resources;
@@ -1255,6 +1272,9 @@ static inline void spi_message_free(struct spi_message *m)
kfree(m);
}

+extern int spi_optimize_message(struct spi_device *spi, struct spi_message *msg);
+extern void spi_unoptimize_message(struct spi_message *msg);
+
extern int spi_setup(struct spi_device *spi);
extern int spi_async(struct spi_device *spi, struct spi_message *message);
extern int spi_slave_abort(struct spi_device *spi);

--
2.43.2


2024-02-19 22:35:47

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 4/5] spi: axi-spi-engine: move message compile to optimize_message

In the AXI SPI Engine driver, compiling the message is an expensive
operation. Previously, it was done per message transfer in the
prepare_message hook. This patch moves the message compile to the
optimize_message hook so that it is only done once per message in
cases where the peripheral driver calls spi_optimize_message().

This can be a significant performance improvement for some peripherals.
For example, the ad7380 driver saw a 13% improvement in throughput
when using the AXI SPI Engine driver with this patch.

Since we now need two message states, one for the optimization stage
that doesn't change for the lifetime of the message and one that is
reset on each transfer for managing the current transfer state, the old
msg->state is split into msg->opt_state and spi_engine->msg_state. The
latter is included in the driver struct now since there is only one
current message at a time that can ever use it and it is in a hot path
so avoiding allocating a new one on each message transfer saves a few
cpu cycles and lets us get rid of the prepare_message callback.

Signed-off-by: David Lechner <[email protected]>
---

v2 changes: none

drivers/spi/spi-axi-spi-engine.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c
index ca66d202f0e2..6177c1a8d56e 100644
--- a/drivers/spi/spi-axi-spi-engine.c
+++ b/drivers/spi/spi-axi-spi-engine.c
@@ -109,6 +109,7 @@ struct spi_engine {
spinlock_t lock;

void __iomem *base;
+ struct spi_engine_message_state msg_state;
struct completion msg_complete;
unsigned int int_enable;
};
@@ -499,17 +500,11 @@ static irqreturn_t spi_engine_irq(int irq, void *devid)
return IRQ_HANDLED;
}

-static int spi_engine_prepare_message(struct spi_controller *host,
- struct spi_message *msg)
+static int spi_engine_optimize_message(struct spi_message *msg)
{
struct spi_engine_program p_dry, *p;
- struct spi_engine_message_state *st;
size_t size;

- st = kzalloc(sizeof(*st), GFP_KERNEL);
- if (!st)
- return -ENOMEM;
-
spi_engine_precompile_message(msg);

p_dry.length = 0;
@@ -517,31 +512,22 @@ static int spi_engine_prepare_message(struct spi_controller *host,

size = sizeof(*p->instructions) * (p_dry.length + 1);
p = kzalloc(sizeof(*p) + size, GFP_KERNEL);
- if (!p) {
- kfree(st);
+ if (!p)
return -ENOMEM;
- }

spi_engine_compile_message(msg, false, p);

spi_engine_program_add_cmd(p, false, SPI_ENGINE_CMD_SYNC(
AXI_SPI_ENGINE_CUR_MSG_SYNC_ID));

- st->p = p;
- st->cmd_buf = p->instructions;
- st->cmd_length = p->length;
- msg->state = st;
+ msg->opt_state = p;

return 0;
}

-static int spi_engine_unprepare_message(struct spi_controller *host,
- struct spi_message *msg)
+static int spi_engine_unoptimize_message(struct spi_message *msg)
{
- struct spi_engine_message_state *st = msg->state;
-
- kfree(st->p);
- kfree(st);
+ kfree(msg->opt_state);

return 0;
}
@@ -550,10 +536,18 @@ static int spi_engine_transfer_one_message(struct spi_controller *host,
struct spi_message *msg)
{
struct spi_engine *spi_engine = spi_controller_get_devdata(host);
- struct spi_engine_message_state *st = msg->state;
+ struct spi_engine_message_state *st = &spi_engine->msg_state;
+ struct spi_engine_program *p = msg->opt_state;
unsigned int int_enable = 0;
unsigned long flags;

+ /* reinitialize message state for this transfer */
+ memset(st, 0, sizeof(*st));
+ st->p = p;
+ st->cmd_buf = p->instructions;
+ st->cmd_length = p->length;
+ msg->state = st;
+
reinit_completion(&spi_engine->msg_complete);

spin_lock_irqsave(&spi_engine->lock, flags);
@@ -658,8 +652,8 @@ static int spi_engine_probe(struct platform_device *pdev)
host->bits_per_word_mask = SPI_BPW_RANGE_MASK(1, 32);
host->max_speed_hz = clk_get_rate(spi_engine->ref_clk) / 2;
host->transfer_one_message = spi_engine_transfer_one_message;
- host->prepare_message = spi_engine_prepare_message;
- host->unprepare_message = spi_engine_unprepare_message;
+ host->optimize_message = spi_engine_optimize_message;
+ host->unoptimize_message = spi_engine_unoptimize_message;
host->num_chipselect = 8;

if (host->max_speed_hz == 0)

--
2.43.2


2024-02-19 22:35:47

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 3/5] spi: stm32: move splitting transfers to optimize_message

Since splitting transfers was moved to spi_optimize_message() in the
core SPI code, we now need to use the optimize_message callback in the
STM32 SPI driver to ensure that the operation is only performed once
when spi_optimize_message() is used by peripheral drivers explicitly.

Signed-off-by: David Lechner <[email protected]>
---

v2 changes: none

drivers/spi/spi-stm32.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
index c32e57bb38bd..e4e7ddb7524a 100644
--- a/drivers/spi/spi-stm32.c
+++ b/drivers/spi/spi-stm32.c
@@ -1118,6 +1118,21 @@ static irqreturn_t stm32h7_spi_irq_thread(int irq, void *dev_id)
return IRQ_HANDLED;
}

+static int stm32_spi_optimize_message(struct spi_message *msg)
+{
+ struct spi_controller *ctrl = msg->spi->controller;
+ struct stm32_spi *spi = spi_controller_get_devdata(ctrl);
+
+ /* On STM32H7, messages should not exceed a maximum size set
+ * later via the set_number_of_data function. In order to
+ * ensure that, split large messages into several messages
+ */
+ if (spi->cfg->set_number_of_data)
+ return spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
+
+ return 0;
+}
+
/**
* stm32_spi_prepare_msg - set up the controller to transfer a single message
* @ctrl: controller interface
@@ -1163,18 +1178,6 @@ static int stm32_spi_prepare_msg(struct spi_controller *ctrl,
!!(spi_dev->mode & SPI_LSB_FIRST),
!!(spi_dev->mode & SPI_CS_HIGH));

- /* On STM32H7, messages should not exceed a maximum size setted
- * afterward via the set_number_of_data function. In order to
- * ensure that, split large messages into several messages
- */
- if (spi->cfg->set_number_of_data) {
- int ret;
-
- ret = spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
- if (ret)
- return ret;
- }
-
spin_lock_irqsave(&spi->lock, flags);

/* CPOL, CPHA and LSB FIRST bits have common register */
@@ -2180,6 +2183,7 @@ static int stm32_spi_probe(struct platform_device *pdev)
ctrl->max_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_min;
ctrl->min_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_max;
ctrl->use_gpio_descriptors = true;
+ ctrl->optimize_message = stm32_spi_optimize_message;
ctrl->prepare_message = stm32_spi_prepare_msg;
ctrl->transfer_one = stm32_spi_transfer_one;
ctrl->unprepare_message = stm32_spi_unprepare_msg;

--
2.43.2


2024-02-19 22:36:07

by David Lechner

[permalink] [raw]
Subject: [PATCH v2 5/5] iio: adc: ad7380: use spi_optimize_message()

This modifies the ad7380 ADC driver to use spi_optimize_message() to
optimize the SPI message for the buffered read operation. Since buffered
reads reuse the same SPI message for each read, this can improve
performance by reducing the overhead of setting up some parts the SPI
message in each spi_sync() call.

Signed-off-by: David Lechner <[email protected]>
---

v2 changes:
- Removed dynamic allocation of spi xfer/msg
- Moved spi message optimization to probe function
- Dropped buffer pre/post callbacks

drivers/iio/adc/ad7380.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
index abd746aef868..6b3fd20c8f1f 100644
--- a/drivers/iio/adc/ad7380.c
+++ b/drivers/iio/adc/ad7380.c
@@ -133,6 +133,9 @@ struct ad7380_state {
struct spi_device *spi;
struct regulator *vref;
struct regmap *regmap;
+ /* xfer and msg for buffer reads */
+ struct spi_transfer xfer;
+ struct spi_message msg;
/*
* DMA (thus cache coherency maintenance) requires the
* transfer buffers to live in their own cache lines.
@@ -236,14 +239,9 @@ static irqreturn_t ad7380_trigger_handler(int irq, void *p)
struct iio_poll_func *pf = p;
struct iio_dev *indio_dev = pf->indio_dev;
struct ad7380_state *st = iio_priv(indio_dev);
- struct spi_transfer xfer = {
- .bits_per_word = st->chip_info->channels[0].scan_type.realbits,
- .len = 4,
- .rx_buf = st->scan_data.raw,
- };
int ret;

- ret = spi_sync_transfer(st->spi, &xfer, 1);
+ ret = spi_sync(st->spi, &st->msg);
if (ret)
goto out;

@@ -335,6 +333,28 @@ static const struct iio_info ad7380_info = {
.debugfs_reg_access = &ad7380_debugfs_reg_access,
};

+static void ad7380_unoptimize_spi_msg(void *msg)
+{
+ spi_unoptimize_message(msg);
+}
+
+static int devm_ad7380_setup_spi_msg(struct device *dev, struct ad7380_state *st)
+{
+ int ret;
+
+ st->xfer.bits_per_word = st->chip_info->channels[0].scan_type.realbits;
+ st->xfer.len = 4;
+ st->xfer.rx_buf = st->scan_data.raw;
+
+ spi_message_init_with_transfers(&st->msg, &st->xfer, 1);
+
+ ret = spi_optimize_message(st->spi, &st->msg);
+ if (ret)
+ return dev_err_probe(dev, ret, "failed to optimize message\n");
+
+ return devm_add_action_or_reset(dev, ad7380_unoptimize_spi_msg, &st->msg);
+}
+
static int ad7380_init(struct ad7380_state *st)
{
int ret;
@@ -411,6 +431,10 @@ static int ad7380_probe(struct spi_device *spi)
return dev_err_probe(&spi->dev, PTR_ERR(st->regmap),
"failed to allocate register map\n");

+ ret = devm_ad7380_setup_spi_msg(&spi->dev, st);
+ if (ret)
+ return ret;
+
indio_dev->channels = st->chip_info->channels;
indio_dev->num_channels = st->chip_info->num_channels;
indio_dev->name = st->chip_info->name;

--
2.43.2


2024-02-20 10:42:04

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] iio: adc: ad7380: use spi_optimize_message()

On Mon, 2024-02-19 at 16:33 -0600, David Lechner wrote:
> This modifies the ad7380 ADC driver to use spi_optimize_message() to
> optimize the SPI message for the buffered read operation. Since buffered
> reads reuse the same SPI message for each read, this can improve
> performance by reducing the overhead of setting up some parts the SPI
> message in each spi_sync() call.
>
> Signed-off-by: David Lechner <[email protected]>
> ---
>

Reviewed-by: Nuno Sa <[email protected]>

> v2 changes:
> - Removed dynamic allocation of spi xfer/msg
> - Moved spi message optimization to probe function
> - Dropped buffer pre/post callbacks
>
>  drivers/iio/adc/ad7380.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> index abd746aef868..6b3fd20c8f1f 100644
> --- a/drivers/iio/adc/ad7380.c
> +++ b/drivers/iio/adc/ad7380.c
> @@ -133,6 +133,9 @@ struct ad7380_state {
>   struct spi_device *spi;
>   struct regulator *vref;
>   struct regmap *regmap;
> + /* xfer and msg for buffer reads */
> + struct spi_transfer xfer;
> + struct spi_message msg;
>   /*
>   * DMA (thus cache coherency maintenance) requires the
>   * transfer buffers to live in their own cache lines.
> @@ -236,14 +239,9 @@ static irqreturn_t ad7380_trigger_handler(int irq, void *p)
>   struct iio_poll_func *pf = p;
>   struct iio_dev *indio_dev = pf->indio_dev;
>   struct ad7380_state *st = iio_priv(indio_dev);
> - struct spi_transfer xfer = {
> - .bits_per_word = st->chip_info->channels[0].scan_type.realbits,
> - .len = 4,
> - .rx_buf = st->scan_data.raw,
> - };
>   int ret;
>  
> - ret = spi_sync_transfer(st->spi, &xfer, 1);
> + ret = spi_sync(st->spi, &st->msg);
>   if (ret)
>   goto out;
>  
> @@ -335,6 +333,28 @@ static const struct iio_info ad7380_info = {
>   .debugfs_reg_access = &ad7380_debugfs_reg_access,
>  };
>  
> +static void ad7380_unoptimize_spi_msg(void *msg)
> +{
> + spi_unoptimize_message(msg);
> +}
> +
> +static int devm_ad7380_setup_spi_msg(struct device *dev, struct ad7380_state *st)
> +{
> + int ret;
> +
> + st->xfer.bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> + st->xfer.len = 4;
> + st->xfer.rx_buf = st->scan_data.raw;
> +
> + spi_message_init_with_transfers(&st->msg, &st->xfer, 1);
> +
> + ret = spi_optimize_message(st->spi, &st->msg);
> + if (ret)
> + return dev_err_probe(dev, ret, "failed to optimize message\n");
> +
> + return devm_add_action_or_reset(dev, ad7380_unoptimize_spi_msg, &st->msg);
> +}
> +
>  static int ad7380_init(struct ad7380_state *st)
>  {
>   int ret;
> @@ -411,6 +431,10 @@ static int ad7380_probe(struct spi_device *spi)
>   return dev_err_probe(&spi->dev, PTR_ERR(st->regmap),
>        "failed to allocate register map\n");
>  
> + ret = devm_ad7380_setup_spi_msg(&spi->dev, st);
> + if (ret)
> + return ret;
> +
>   indio_dev->channels = st->chip_info->channels;
>   indio_dev->num_channels = st->chip_info->num_channels;
>   indio_dev->name = st->chip_info->name;
>


2024-02-20 10:46:09

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] spi: axi-spi-engine: move message compile to optimize_message

On Mon, 2024-02-19 at 16:33 -0600, David Lechner wrote:
> In the AXI SPI Engine driver, compiling the message is an expensive
> operation. Previously, it was done per message transfer in the
> prepare_message hook. This patch moves the message compile to the
> optimize_message hook so that it is only done once per message in
> cases where the peripheral driver calls spi_optimize_message().
>
> This can be a significant performance improvement for some peripherals.
> For example, the ad7380 driver saw a 13% improvement in throughput
> when using the AXI SPI Engine driver with this patch.
>
> Since we now need two message states, one for the optimization stage
> that doesn't change for the lifetime of the message and one that is
> reset on each transfer for managing the current transfer state, the old
> msg->state is split into msg->opt_state and spi_engine->msg_state. The
> latter is included in the driver struct now since there is only one
> current message at a time that can ever use it and it is in a hot path
> so avoiding allocating a new one on each message transfer saves a few
> cpu cycles and lets us get rid of the prepare_message callback.
>
> Signed-off-by: David Lechner <[email protected]>
> ---

Reviewed-by: Nuno Sa <[email protected]>

>
> v2 changes: none
>
>  drivers/spi/spi-axi-spi-engine.c | 40 +++++++++++++++++-----------------------
>  1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c
> index ca66d202f0e2..6177c1a8d56e 100644
> --- a/drivers/spi/spi-axi-spi-engine.c
> +++ b/drivers/spi/spi-axi-spi-engine.c
> @@ -109,6 +109,7 @@ struct spi_engine {
>   spinlock_t lock;
>  
>   void __iomem *base;
> + struct spi_engine_message_state msg_state;
>   struct completion msg_complete;
>   unsigned int int_enable;
>  };
> @@ -499,17 +500,11 @@ static irqreturn_t spi_engine_irq(int irq, void *devid)
>   return IRQ_HANDLED;
>  }
>  
> -static int spi_engine_prepare_message(struct spi_controller *host,
> -       struct spi_message *msg)
> +static int spi_engine_optimize_message(struct spi_message *msg)
>  {
>   struct spi_engine_program p_dry, *p;
> - struct spi_engine_message_state *st;
>   size_t size;
>  
> - st = kzalloc(sizeof(*st), GFP_KERNEL);
> - if (!st)
> - return -ENOMEM;
> -
>   spi_engine_precompile_message(msg);
>  
>   p_dry.length = 0;
> @@ -517,31 +512,22 @@ static int spi_engine_prepare_message(struct spi_controller
> *host,
>  
>   size = sizeof(*p->instructions) * (p_dry.length + 1);
>   p = kzalloc(sizeof(*p) + size, GFP_KERNEL);
> - if (!p) {
> - kfree(st);
> + if (!p)
>   return -ENOMEM;
> - }
>  
>   spi_engine_compile_message(msg, false, p);
>  
>   spi_engine_program_add_cmd(p, false, SPI_ENGINE_CMD_SYNC(
>   AXI_SPI_ENGINE_CUR_MSG_SYNC_ID));
>  
> - st->p = p;
> - st->cmd_buf = p->instructions;
> - st->cmd_length = p->length;
> - msg->state = st;
> + msg->opt_state = p;
>  
>   return 0;
>  }
>  
> -static int spi_engine_unprepare_message(struct spi_controller *host,
> - struct spi_message *msg)
> +static int spi_engine_unoptimize_message(struct spi_message *msg)
>  {
> - struct spi_engine_message_state *st = msg->state;
> -
> - kfree(st->p);
> - kfree(st);
> + kfree(msg->opt_state);
>  
>   return 0;
>  }
> @@ -550,10 +536,18 @@ static int spi_engine_transfer_one_message(struct
> spi_controller *host,
>   struct spi_message *msg)
>  {
>   struct spi_engine *spi_engine = spi_controller_get_devdata(host);
> - struct spi_engine_message_state *st = msg->state;
> + struct spi_engine_message_state *st = &spi_engine->msg_state;
> + struct spi_engine_program *p = msg->opt_state;
>   unsigned int int_enable = 0;
>   unsigned long flags;
>  
> + /* reinitialize message state for this transfer */
> + memset(st, 0, sizeof(*st));
> + st->p = p;
> + st->cmd_buf = p->instructions;
> + st->cmd_length = p->length;
> + msg->state = st;
> +
>   reinit_completion(&spi_engine->msg_complete);
>  
>   spin_lock_irqsave(&spi_engine->lock, flags);
> @@ -658,8 +652,8 @@ static int spi_engine_probe(struct platform_device *pdev)
>   host->bits_per_word_mask = SPI_BPW_RANGE_MASK(1, 32);
>   host->max_speed_hz = clk_get_rate(spi_engine->ref_clk) / 2;
>   host->transfer_one_message = spi_engine_transfer_one_message;
> - host->prepare_message = spi_engine_prepare_message;
> - host->unprepare_message = spi_engine_unprepare_message;
> + host->optimize_message = spi_engine_optimize_message;
> + host->unoptimize_message = spi_engine_unoptimize_message;
>   host->num_chipselect = 8;
>  
>   if (host->max_speed_hz == 0)
>


2024-02-20 11:15:30

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] spi: add spi_optimize_message() APIs

On Mon, 2024-02-19 at 16:33 -0600, David Lechner wrote:
> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.
>
> Internally, the SPI core will also call this function for each message
> if the peripheral driver did not explicitly call it. This is done to so
> that controller drivers don't have to have multiple code paths for
> optimized and non-optimized messages.
>
> A hook is provided for controller drivers to perform controller-specific
> optimizations.
>
> Suggested-by: Martin Sperl <[email protected]>
> Link:
> https://lore.kernel.org/linux-spi/[email protected]/
> Signed-off-by: David Lechner <[email protected]>
> ---

Acked-by: Nuno Sa <[email protected]>

>
> v2 changes:
> - Removed pre_optimized parameter from __spi_optimize_message()
> - Added comment explaining purpose of pre_optimized flag
> - Fixed missing doc comment for @pre_optimized
> - Removed kernel doc inclusion (/** -> /*) from static members
> - Removed unrelated comment about calling spi_finalize_current_message()
>
>  drivers/spi/spi.c       | 151 ++++++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/spi/spi.h |  20 +++++++
>  2 files changed, 167 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index c2b10e2c75f0..f68d92b57543 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct
> spi_controller *ctlr)
>  }
>  EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
>  
> +/*
> + * __spi_unoptimize_message - shared implementation of spi_unoptimize_message()
> + *                            and spi_maybe_unoptimize_message()
> + * @msg: the message to unoptimize
> + *
> + * Peripheral drivers should use spi_unoptimize_message() and callers inside
> + * core should use spi_maybe_unoptimize_message() rather than calling this
> + * function directly.
> + *
> + * It is not valid to call this on a message that is not currently optimized.
> + */
> +static void __spi_unoptimize_message(struct spi_message *msg)
> +{
> + struct spi_controller *ctlr = msg->spi->controller;
> +
> + if (ctlr->unoptimize_message)
> + ctlr->unoptimize_message(msg);
> +
> + msg->optimized = false;
> + msg->opt_state = NULL;
> +}
> +
> +/*
> + * spi_maybe_unoptimize_message - unoptimize msg not managed by a peripheral
> + * @msg: the message to unoptimize
> + *
> + * This function is used to unoptimize a message if and only if it was
> + * optimized by the core (via spi_maybe_optimize_message()).
> + */
> +static void spi_maybe_unoptimize_message(struct spi_message *msg)
> +{
> + if (!msg->pre_optimized && msg->optimized)
> + __spi_unoptimize_message(msg);
> +}
> +
>  /**
>   * spi_finalize_current_message() - the current message is complete
>   * @ctlr: the controller to return the message to
> @@ -2153,6 +2188,8 @@ void spi_finalize_current_message(struct spi_controller
> *ctlr)
>  
>   mesg->prepared = false;
>  
> + spi_maybe_unoptimize_message(mesg);
> +
>   WRITE_ONCE(ctlr->cur_msg_incomplete, false);
>   smp_mb(); /* See __spi_pump_transfer_message()... */
>   if (READ_ONCE(ctlr->cur_msg_need_completion))
> @@ -4194,6 +4231,110 @@ static int __spi_validate(struct spi_device *spi, struct
> spi_message *message)
>   return 0;
>  }
>  
> +/*
> + * __spi_optimize_message - shared implementation for spi_optimize_message()
> + *                          and spi_maybe_optimize_message()
> + * @spi: the device that will be used for the message
> + * @msg: the message to optimize
> + *
> + * Peripheral drivers will call spi_optimize_message() and the spi core will
> + * call spi_maybe_optimize_message() instead of calling this directly.
> + *
> + * It is not valid to call this on a message that has already been optimized.
> + *
> + * Return: zero on success, else a negative error code
> + */
> +static int __spi_optimize_message(struct spi_device *spi,
> +   struct spi_message *msg)
> +{
> + struct spi_controller *ctlr = spi->controller;
> + int ret;
> +
> + ret = __spi_validate(spi, msg);
> + if (ret)
> + return ret;
> +
> + if (ctlr->optimize_message) {
> + ret = ctlr->optimize_message(msg);
> + if (ret)
> + return ret;
> + }
> +
> + msg->optimized = true;
> +
> + return 0;
> +}
> +
> +/*
> + * spi_maybe_optimize_message - optimize message if it isn't already pre-optimized
> + * @spi: the device that will be used for the message
> + * @msg: the message to optimize
> + * Return: zero on success, else a negative error code
> + */
> +static int spi_maybe_optimize_message(struct spi_device *spi,
> +       struct spi_message *msg)
> +{
> + if (msg->pre_optimized)
> + return 0;
> +
> + return __spi_optimize_message(spi, msg);
> +}
> +
> +/**
> + * spi_optimize_message - do any one-time validation and setup for a SPI message
> + * @spi: the device that will be used for the message
> + * @msg: the message to optimize
> + *
> + * Peripheral drivers that reuse the same message repeatedly may call this to
> + * perform as much message prep as possible once, rather than repeating it each
> + * time a message transfer is performed to improve throughput and reduce CPU
> + * usage.
> + *
> + * Once a message has been optimized, it cannot be modified with the exception
> + * of updating the contents of any xfer->tx_buf (the pointer can't be changed,
> + * only the data in the memory it points to).
> + *
> + * Calls to this function must be balanced with calls to spi_unoptimize_message()
> + * to avoid leaking resources.
> + *
> + * Context: can sleep
> + * Return: zero on success, else a negative error code
> + */
> +int spi_optimize_message(struct spi_device *spi, struct spi_message *msg)
> +{
> + int ret;
> +
> + ret = __spi_optimize_message(spi, msg);
> + if (ret)
> + return ret;
> +
> + /*
> + * This flag indicates that the peripheral driver called
> spi_optimize_message()
> + * and therefore we shouldn't unoptimize message automatically when
> finalizing
> + * the message but rather wait until spi_unoptimize_message() is called
> + * by the peripheral driver.
> + */
> + msg->pre_optimized = true;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(spi_optimize_message);
> +
> +/**
> + * spi_unoptimize_message - releases any resources allocated by
> spi_optimize_message()
> + * @msg: the message to unoptimize
> + *
> + * Calls to this function must be balanced with calls to spi_optimize_message().
> + *
> + * Context: can sleep
> + */
> +void spi_unoptimize_message(struct spi_message *msg)
> +{
> + __spi_unoptimize_message(msg);
> + msg->pre_optimized = false;
> +}
> +EXPORT_SYMBOL_GPL(spi_unoptimize_message);
> +
>  static int __spi_async(struct spi_device *spi, struct spi_message *message)
>  {
>   struct spi_controller *ctlr = spi->controller;
> @@ -4258,8 +4399,8 @@ int spi_async(struct spi_device *spi, struct spi_message
> *message)
>   int ret;
>   unsigned long flags;
>  
> - ret = __spi_validate(spi, message);
> - if (ret != 0)
> + ret = spi_maybe_optimize_message(spi, message);
> + if (ret)
>   return ret;
>  
>   spin_lock_irqsave(&ctlr->bus_lock_spinlock, flags);
> @@ -4271,6 +4412,8 @@ int spi_async(struct spi_device *spi, struct spi_message
> *message)
>  
>   spin_unlock_irqrestore(&ctlr->bus_lock_spinlock, flags);
>  
> + spi_maybe_unoptimize_message(message);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(spi_async);
> @@ -4331,8 +4474,8 @@ static int __spi_sync(struct spi_device *spi, struct
> spi_message *message)
>   return -ESHUTDOWN;
>   }
>  
> - status = __spi_validate(spi, message);
> - if (status != 0)
> + status = spi_maybe_optimize_message(spi, message);
> + if (status)
>   return status;
>  
>   SPI_STATISTICS_INCREMENT_FIELD(ctlr->pcpu_statistics, spi_sync);
> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> index 2b8e2746769a..ddfb66dd4caf 100644
> --- a/include/linux/spi/spi.h
> +++ b/include/linux/spi/spi.h
> @@ -475,6 +475,8 @@ extern struct spi_device *spi_new_ancillary_device(struct
> spi_device *spi, u8 ch
>   *
>   * @set_cs: set the logic level of the chip select line.  May be called
>   *          from interrupt context.
> + * @optimize_message: optimize the message for reuse
> + * @unoptimize_message: release resources allocated by optimize_message
>   * @prepare_message: set up the controller to transfer a single message,
>   *                   for example doing DMA mapping  Called from threaded
>   *                   context.
> @@ -715,6 +717,8 @@ struct spi_controller {
>   struct completion               xfer_completion;
>   size_t max_dma_len;
>  
> + int (*optimize_message)(struct spi_message *msg);
> + int (*unoptimize_message)(struct spi_message *msg);
>   int (*prepare_transfer_hardware)(struct spi_controller *ctlr);
>   int (*transfer_one_message)(struct spi_controller *ctlr,
>       struct spi_message *mesg);
> @@ -1111,6 +1115,8 @@ struct spi_transfer {
>   * @spi: SPI device to which the transaction is queued
>   * @is_dma_mapped: if true, the caller provided both DMA and CPU virtual
>   * addresses for each transfer buffer
> + * @pre_optimized: peripheral driver pre-optimized the message
> + * @optimized: the message is in the optimized state
>   * @prepared: spi_prepare_message was called for the this message
>   * @status: zero for success, else negative errno
>   * @complete: called to report transaction completions
> @@ -1120,6 +1126,7 @@ struct spi_transfer {
>   * successful segments
>   * @queue: for use by whichever driver currently owns the message
>   * @state: for use by whichever driver currently owns the message
> + * @opt_state: for use by whichever driver currently owns the message
>   * @resources: for resource management when the SPI message is processed
>   *
>   * A @spi_message is used to execute an atomic sequence of data transfers,
> @@ -1143,6 +1150,11 @@ struct spi_message {
>  
>   unsigned is_dma_mapped:1;
>  
> + /* spi_optimize_message() was called for this message */
> + bool pre_optimized;
> + /* __spi_optimize_message() was called for this message */
> + bool optimized;
> +
>   /* spi_prepare_message() was called for this message */
>   bool prepared;
>  
> @@ -1172,6 +1184,11 @@ struct spi_message {
>   */
>   struct list_head queue;
>   void *state;
> + /*
> + * Optional state for use by controller driver between calls to
> + * __spi_optimize_message() and __spi_unoptimize_message().
> + */
> + void *opt_state;
>  
>   /* List of spi_res resources when the SPI message is processed */
>   struct list_head        resources;
> @@ -1255,6 +1272,9 @@ static inline void spi_message_free(struct spi_message *m)
>   kfree(m);
>  }
>  
> +extern int spi_optimize_message(struct spi_device *spi, struct spi_message *msg);
> +extern void spi_unoptimize_message(struct spi_message *msg);
> +
>  extern int spi_setup(struct spi_device *spi);
>  extern int spi_async(struct spi_device *spi, struct spi_message *message);
>  extern int spi_slave_abort(struct spi_device *spi);
>


2024-02-20 11:16:21

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] spi: move splitting transfers to spi_optimize_message()

On Mon, 2024-02-19 at 16:33 -0600, David Lechner wrote:
> Splitting transfers is an expensive operation so we can potentially
> optimize it by doing it only once per optimization of the message
> instead of repeating each time the message is transferred.
>
> The transfer splitting functions are currently the only user of
> spi_res_alloc() so spi_res_release() can be safely moved at this time
> from spi_finalize_current_message() to spi_unoptimize_message().
>
> The doc comments of the public functions for splitting transfers are
> also updated so that callers will know when it is safe to call them
> to ensure proper resource management.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: David Lechner <[email protected]>
> ---

Acked-by: Nuno Sa <[email protected]>

>
> v2 changes:
> - Changed line break for multiline if condition
> - Removed kernel doc inclusion (/** -> /*) from static members
> - Picked up Jonathan's Reviewed-by
>
>  drivers/spi/spi.c | 110 +++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 68 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index f68d92b57543..ba4d3fde2054 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -1747,38 +1747,6 @@ static int __spi_pump_transfer_message(struct spi_controller
> *ctlr,
>  
>   trace_spi_message_start(msg);
>  
> - /*
> - * If an SPI controller does not support toggling the CS line on each
> - * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
> - * for the CS line, we can emulate the CS-per-word hardware function by
> - * splitting transfers into one-word transfers and ensuring that
> - * cs_change is set for each transfer.
> - */
> - if ((msg->spi->mode & SPI_CS_WORD) && (!(ctlr->mode_bits & SPI_CS_WORD) ||
> -        spi_is_csgpiod(msg->spi))) {
> - ret = spi_split_transfers_maxwords(ctlr, msg, 1);
> - if (ret) {
> - msg->status = ret;
> - spi_finalize_current_message(ctlr);
> - return ret;
> - }
> -
> - list_for_each_entry(xfer, &msg->transfers, transfer_list) {
> - /* Don't change cs_change on the last entry in the list */
> - if (list_is_last(&xfer->transfer_list, &msg->transfers))
> - break;
> - xfer->cs_change = 1;
> - }
> - } else {
> - ret = spi_split_transfers_maxsize(ctlr, msg,
> -   spi_max_transfer_size(msg-
> >spi));
> - if (ret) {
> - msg->status = ret;
> - spi_finalize_current_message(ctlr);
> - return ret;
> - }
> - }
> -
>   if (ctlr->prepare_message) {
>   ret = ctlr->prepare_message(ctlr, msg);
>   if (ret) {
> @@ -2124,6 +2092,8 @@ static void __spi_unoptimize_message(struct spi_message *msg)
>   if (ctlr->unoptimize_message)
>   ctlr->unoptimize_message(msg);
>  
> + spi_res_release(ctlr, msg);
> +
>   msg->optimized = false;
>   msg->opt_state = NULL;
>  }
> @@ -2169,15 +2139,6 @@ void spi_finalize_current_message(struct spi_controller
> *ctlr)
>  
>   spi_unmap_msg(ctlr, mesg);
>  
> - /*
> - * In the prepare_messages callback the SPI bus has the opportunity
> - * to split a transfer to smaller chunks.
> - *
> - * Release the split transfers here since spi_map_msg() is done on
> - * the split transfers.
> - */
> - spi_res_release(ctlr, mesg);
> -
>   if (mesg->prepared && ctlr->unprepare_message) {
>   ret = ctlr->unprepare_message(ctlr, mesg);
>   if (ret) {
> @@ -3819,6 +3780,10 @@ static int __spi_split_transfer_maxsize(struct
> spi_controller *ctlr,
>   * @msg:   the @spi_message to transform
>   * @maxsize:  the maximum when to apply this
>   *
> + * This function allocates resources that are automatically freed during the
> + * spi message unoptimize phase so this function should only be called from
> + * optimize_message callbacks.
> + *
>   * Return: status of transformation
>   */
>  int spi_split_transfers_maxsize(struct spi_controller *ctlr,
> @@ -3857,6 +3822,10 @@ EXPORT_SYMBOL_GPL(spi_split_transfers_maxsize);
>   * @msg:      the @spi_message to transform
>   * @maxwords: the number of words to limit each transfer to
>   *
> + * This function allocates resources that are automatically freed during the
> + * spi message unoptimize phase so this function should only be called from
> + * optimize_message callbacks.
> + *
>   * Return: status of transformation
>   */
>  int spi_split_transfers_maxwords(struct spi_controller *ctlr,
> @@ -4231,6 +4200,57 @@ static int __spi_validate(struct spi_device *spi, struct
> spi_message *message)
>   return 0;
>  }
>  
> +/*
> + * spi_split_transfers - generic handling of transfer splitting
> + * @msg: the message to split
> + *
> + * Under certain conditions, a SPI controller may not support arbitrary
> + * transfer sizes or other features required by a peripheral. This function
> + * will split the transfers in the message into smaller transfers that are
> + * supported by the controller.
> + *
> + * Controllers with special requirements not covered here can also split
> + * transfers in the optimize_message() callback.
> + *
> + * Context: can sleep
> + * Return: zero on success, else a negative error code
> + */
> +static int spi_split_transfers(struct spi_message *msg)
> +{
> + struct spi_controller *ctlr = msg->spi->controller;
> + struct spi_transfer *xfer;
> + int ret;
> +
> + /*
> + * If an SPI controller does not support toggling the CS line on each
> + * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
> + * for the CS line, we can emulate the CS-per-word hardware function by
> + * splitting transfers into one-word transfers and ensuring that
> + * cs_change is set for each transfer.
> + */
> + if ((msg->spi->mode & SPI_CS_WORD) &&
> +     (!(ctlr->mode_bits & SPI_CS_WORD) || spi_is_csgpiod(msg->spi))) {
> + ret = spi_split_transfers_maxwords(ctlr, msg, 1);
> + if (ret)
> + return ret;
> +
> + list_for_each_entry(xfer, &msg->transfers, transfer_list) {
> + /* Don't change cs_change on the last entry in the list */
> + if (list_is_last(&xfer->transfer_list, &msg->transfers))
> + break;
> +
> + xfer->cs_change = 1;
> + }
> + } else {
> + ret = spi_split_transfers_maxsize(ctlr, msg,
> +   spi_max_transfer_size(msg-
> >spi));
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
>  /*
>   * __spi_optimize_message - shared implementation for spi_optimize_message()
>   *                          and spi_maybe_optimize_message()
> @@ -4254,10 +4274,16 @@ static int __spi_optimize_message(struct spi_device *spi,
>   if (ret)
>   return ret;
>  
> + ret = spi_split_transfers(msg);
> + if (ret)
> + return ret;
> +
>   if (ctlr->optimize_message) {
>   ret = ctlr->optimize_message(msg);
> - if (ret)
> + if (ret) {
> + spi_res_release(ctlr, msg);
>   return ret;
> + }
>   }
>  
>   msg->optimized = true;
>


2024-02-24 16:36:46

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] spi: add spi_optimize_message() APIs

On Mon, 19 Feb 2024 16:33:18 -0600
David Lechner <[email protected]> wrote:

> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.
>
> Internally, the SPI core will also call this function for each message
> if the peripheral driver did not explicitly call it. This is done to so
> that controller drivers don't have to have multiple code paths for
> optimized and non-optimized messages.
>
> A hook is provided for controller drivers to perform controller-specific
> optimizations.
>
> Suggested-by: Martin Sperl <[email protected]>
> Link: https://lore.kernel.org/linux-spi/[email protected]/
> Signed-off-by: David Lechner <[email protected]>

Very nice.

Reviewed-by: Jonathan Cameron <[email protected]>

2024-02-24 16:46:25

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] spi: stm32: move splitting transfers to optimize_message

On Mon, 19 Feb 2024 16:33:20 -0600
David Lechner <[email protected]> wrote:

> Since splitting transfers was moved to spi_optimize_message() in the
> core SPI code, we now need to use the optimize_message callback in the
> STM32 SPI driver to ensure that the operation is only performed once
> when spi_optimize_message() is used by peripheral drivers explicitly.
>
> Signed-off-by: David Lechner <[email protected]>
Trivial comment inline. Otherwise LGTM
Reviewed-by: Jonathan Cameron <[email protected]>

There are changes to when this happens wrt to locking but I think those
are all positive as the bus lock is held for less time and there
is nothing in here that needs that lock held.
> ---
>
> v2 changes: none
>
> drivers/spi/spi-stm32.c | 28 ++++++++++++++++------------
> 1 file changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
> index c32e57bb38bd..e4e7ddb7524a 100644
> --- a/drivers/spi/spi-stm32.c
> +++ b/drivers/spi/spi-stm32.c
> @@ -1118,6 +1118,21 @@ static irqreturn_t stm32h7_spi_irq_thread(int irq, void *dev_id)
> return IRQ_HANDLED;
> }
>
> +static int stm32_spi_optimize_message(struct spi_message *msg)
> +{
> + struct spi_controller *ctrl = msg->spi->controller;
> + struct stm32_spi *spi = spi_controller_get_devdata(ctrl);
> +
> + /* On STM32H7, messages should not exceed a maximum size set
If you spin a v3, this isn't in keeping with local comment style.

/*
* On...

> + * later via the set_number_of_data function. In order to
> + * ensure that, split large messages into several messages
> + */
> + if (spi->cfg->set_number_of_data)
> + return spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
> +
> + return 0;
> +}
> +
> /**
> * stm32_spi_prepare_msg - set up the controller to transfer a single message
> * @ctrl: controller interface
> @@ -1163,18 +1178,6 @@ static int stm32_spi_prepare_msg(struct spi_controller *ctrl,
> !!(spi_dev->mode & SPI_LSB_FIRST),
> !!(spi_dev->mode & SPI_CS_HIGH));
>
> - /* On STM32H7, messages should not exceed a maximum size setted
> - * afterward via the set_number_of_data function. In order to
> - * ensure that, split large messages into several messages
> - */
> - if (spi->cfg->set_number_of_data) {
> - int ret;
> -
> - ret = spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
> - if (ret)
> - return ret;
> - }
> -
> spin_lock_irqsave(&spi->lock, flags);
>
> /* CPOL, CPHA and LSB FIRST bits have common register */
> @@ -2180,6 +2183,7 @@ static int stm32_spi_probe(struct platform_device *pdev)
> ctrl->max_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_min;
> ctrl->min_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_max;
> ctrl->use_gpio_descriptors = true;
> + ctrl->optimize_message = stm32_spi_optimize_message;
> ctrl->prepare_message = stm32_spi_prepare_msg;
> ctrl->transfer_one = stm32_spi_transfer_one;
> ctrl->unprepare_message = stm32_spi_unprepare_msg;
>


2024-02-24 16:51:57

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] spi: axi-spi-engine: move message compile to optimize_message

On Mon, 19 Feb 2024 16:33:21 -0600
David Lechner <[email protected]> wrote:

> In the AXI SPI Engine driver, compiling the message is an expensive
> operation. Previously, it was done per message transfer in the
> prepare_message hook. This patch moves the message compile to the
> optimize_message hook so that it is only done once per message in
> cases where the peripheral driver calls spi_optimize_message().
>
> This can be a significant performance improvement for some peripherals.
> For example, the ad7380 driver saw a 13% improvement in throughput
> when using the AXI SPI Engine driver with this patch.
>
> Since we now need two message states, one for the optimization stage
> that doesn't change for the lifetime of the message and one that is
> reset on each transfer for managing the current transfer state, the old
> msg->state is split into msg->opt_state and spi_engine->msg_state. The
> latter is included in the driver struct now since there is only one
> current message at a time that can ever use it and it is in a hot path
> so avoiding allocating a new one on each message transfer saves a few
> cpu cycles and lets us get rid of the prepare_message callback.
>
> Signed-off-by: David Lechner <[email protected]>
Whilst I'm not familiar with this driver, from a quick look at this
patch and the driver code, looks fine to me. So FWIW
Reviewed-by: Jonathan Cameron <[email protected]>

2024-02-24 16:57:37

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] iio: adc: ad7380: use spi_optimize_message()

On Mon, 19 Feb 2024 16:33:22 -0600
David Lechner <[email protected]> wrote:

> This modifies the ad7380 ADC driver to use spi_optimize_message() to
> optimize the SPI message for the buffered read operation. Since buffered
> reads reuse the same SPI message for each read, this can improve
> performance by reducing the overhead of setting up some parts the SPI
> message in each spi_sync() call.
>
> Signed-off-by: David Lechner <[email protected]>
Looks good to me.

As this is the driver you asked me to drop earlier this cycle,
how do we plan to merge this series?

If Mark is fine taking 1-4 with the user following along that's
fine by me, if not I guess we are in immutable tree territory for
next cycle?

Jonathan

2024-02-24 18:16:42

by Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] spi: add spi_optimize_message() APIs

> … call it. This is done to so
> that controller drivers …

I hope that such a wording will be improved for the final change description.

Regards,
Markus

2024-02-24 20:16:51

by Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] spi: add spi_optimize_message() APIs


> +++ b/drivers/spi/spi.c

> +static int __spi_optimize_message(struct spi_device *spi,
> + struct spi_message *msg)


I propose to reconsider the usage of leading underscores in such identifiers.

See also:
https://wiki.sei.cmu.edu/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier

Regards,
Markus

2024-02-26 13:49:08

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] spi: add spi_optimize_message() APIs

On Mon, Feb 19, 2024 at 04:33:18PM -0600, David Lechner wrote:
> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.

This doesn't apply against current code, please check and resend.


Attachments:
(No filename) (536.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-26 19:26:07

by Mark Brown

[permalink] [raw]
Subject: Re: (subset) [PATCH v2 0/5] spi: add support for pre-cooking messages

On Mon, 19 Feb 2024 16:33:17 -0600, David Lechner wrote:
> This is a follow-up to [1] where it was suggested to break down the
> proposed SPI offload support into smaller series.
>
> This takes on the first suggested task of introducing an API to
> "pre-cook" SPI messages. This idea was first discussed extensively in
> 2013 [2][3] and revisited more briefly 2022 [4].
>
> [...]

Applied to

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/5] spi: add spi_optimize_message() APIs
commit: 7b1d87af14d9ae902ed0c5dc5fabf4eea5abdf02
[2/5] spi: move splitting transfers to spi_optimize_message()
commit: fab53fea21a909e4e0656764a8ee7c356fe89d6f
[3/5] spi: stm32: move splitting transfers to optimize_message
commit: c2bcfe7c6edf418d5adf731a7d60a8abd81e952f
[4/5] spi: axi-spi-engine: move message compile to optimize_message
commit: 7dba2adb063bcf7a293eacb88980e0975b1fb1fd

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


2024-02-27 16:37:09

by David Lechner

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] iio: adc: ad7380: use spi_optimize_message()

On Sat, Feb 24, 2024 at 10:57 AM Jonathan Cameron <[email protected]> wrote:
>
> On Mon, 19 Feb 2024 16:33:22 -0600
> David Lechner <[email protected]> wrote:
>
> > This modifies the ad7380 ADC driver to use spi_optimize_message() to
> > optimize the SPI message for the buffered read operation. Since buffered
> > reads reuse the same SPI message for each read, this can improve
> > performance by reducing the overhead of setting up some parts the SPI
> > message in each spi_sync() call.
> >
> > Signed-off-by: David Lechner <[email protected]>
> Looks good to me.
>
> As this is the driver you asked me to drop earlier this cycle,
> how do we plan to merge this series?
>
> If Mark is fine taking 1-4 with the user following along that's
> fine by me, if not I guess we are in immutable tree territory for
> next cycle?

I've been out sick for a week so trying to get back up to speed here.
It looks like Mark has picked up the spi changes, so that part is
resolved. I'll work on getting the ad7380 driver resubmitted, then we
can come back to this patch after 6.9-rc1 (assuming the SPI changes
make it in to 3.9 of course).