2023-05-15 14:31:28

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH 1/3][For 4.19/4.14] spi: spi-fsl-spi: automatically adapt bits-per-word in cpu mode

From: Rasmus Villemoes <[email protected]>

(cherry picked from upstream af0e6242909c3c4297392ca3e94eff1b4db71a97)

Taking one interrupt for every byte is rather slow. Since the
controller is perfectly capable of transmitting 32 bits at a time,
change t->bits_per-word to 32 when the length is divisible by 4 and
large enough that the reduced number of interrupts easily compensates
for the one or two extra fsl_spi_setup_transfer() calls this causes.

Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
---
drivers/spi/spi-fsl-spi.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 479d10dc6cb8..946b417f2d1c 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -357,12 +357,28 @@ static int fsl_spi_bufs(struct spi_device *spi, struct spi_transfer *t,
static int fsl_spi_do_one_msg(struct spi_master *master,
struct spi_message *m)
{
+ struct mpc8xxx_spi *mpc8xxx_spi = spi_master_get_devdata(master);
struct spi_device *spi = m->spi;
struct spi_transfer *t, *first;
unsigned int cs_change;
const int nsecs = 50;
int status;

+ /*
+ * In CPU mode, optimize large byte transfers to use larger
+ * bits_per_word values to reduce number of interrupts taken.
+ */
+ if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
+ list_for_each_entry(t, &m->transfers, transfer_list) {
+ if (t->len < 256 || t->bits_per_word != 8)
+ continue;
+ if ((t->len & 3) == 0)
+ t->bits_per_word = 32;
+ else if ((t->len & 1) == 0)
+ t->bits_per_word = 16;
+ }
+ }
+
/* Don't allow changes if CS is active */
first = list_first_entry(&m->transfers, struct spi_transfer,
transfer_list);
--
2.40.1



2023-05-15 14:32:00

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH 3/3][For 4.19/4.14] spi: fsl-cpm: Use 16 bit mode for large transfers with even size

(cherry picked from upstream fc96ec826bced75cc6b9c07a4ac44bbf651337ab)

On CPM, the RISC core is a lot more efficiant when doing transfers
in 16-bits chunks than in 8-bits chunks, but unfortunately the
words need to be byte swapped as seen in a previous commit.

So, for large tranfers with an even size, allocate a temporary tx
buffer and byte-swap data before and after transfer.

This change allows setting higher speed for transfer. For instance
on an MPC 8xx (CPM1 comms RISC processor), the documentation tells
that transfer in byte mode at 1 kbit/s uses 0.200% of CPM load
at 25 MHz while a word transfer at the same speed uses 0.032%
of CPM load. This means the speed can be 6 times higher in
word mode for the same CPM load.

For the time being, only do it on CPM1 as there must be a
trade-off between the CPM load reduction and the CPU load required
to byte swap the data.

Signed-off-by: Christophe Leroy <[email protected]>
Link: https://lore.kernel.org/r/f2e981f20f92dd28983c3949702a09248c23845c.1680371809.git.christophe.leroy@csgroup.eu
Signed-off-by: Mark Brown <[email protected]>
---
drivers/spi/spi-fsl-cpm.c | 23 +++++++++++++++++++++++
drivers/spi/spi-fsl-spi.c | 3 +++
2 files changed, 26 insertions(+)

diff --git a/drivers/spi/spi-fsl-cpm.c b/drivers/spi/spi-fsl-cpm.c
index 8f7b26ec181e..0485593dc2f5 100644
--- a/drivers/spi/spi-fsl-cpm.c
+++ b/drivers/spi/spi-fsl-cpm.c
@@ -25,6 +25,7 @@
#include <linux/spi/spi.h>
#include <linux/types.h>
#include <linux/platform_device.h>
+#include <linux/byteorder/generic.h>

#include "spi-fsl-cpm.h"
#include "spi-fsl-lib.h"
@@ -124,6 +125,21 @@ int fsl_spi_cpm_bufs(struct mpc8xxx_spi *mspi,
mspi->rx_dma = mspi->dma_dummy_rx;
mspi->map_rx_dma = 0;
}
+ if (t->bits_per_word == 16 && t->tx_buf) {
+ const u16 *src = t->tx_buf;
+ u16 *dst;
+ int i;
+
+ dst = kmalloc(t->len, GFP_KERNEL);
+ if (!dst)
+ return -ENOMEM;
+
+ for (i = 0; i < t->len >> 1; i++)
+ dst[i] = cpu_to_le16p(src + i);
+
+ mspi->tx = dst;
+ mspi->map_tx_dma = 1;
+ }

if (mspi->map_tx_dma) {
void *nonconst_tx = (void *)mspi->tx; /* shut up gcc */
@@ -177,6 +193,13 @@ void fsl_spi_cpm_bufs_complete(struct mpc8xxx_spi *mspi)
if (mspi->map_rx_dma)
dma_unmap_single(dev, mspi->rx_dma, t->len, DMA_FROM_DEVICE);
mspi->xfer_in_progress = NULL;
+
+ if (t->bits_per_word == 16 && t->rx_buf) {
+ int i;
+
+ for (i = 0; i < t->len; i += 2)
+ le16_to_cpus(t->rx_buf + i);
+ }
}
EXPORT_SYMBOL_GPL(fsl_spi_cpm_bufs_complete);

diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index e08a11070c5c..5e49fed487f8 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -366,6 +366,9 @@ static int fsl_spi_do_one_msg(struct spi_master *master,
return -EINVAL;
if (t->bits_per_word == 16 || t->bits_per_word == 32)
t->bits_per_word = 8; /* pretend its 8 bits */
+ if (t->bits_per_word == 8 && t->len >= 256 &&
+ (mpc8xxx_spi->flags & SPI_CPM1))
+ t->bits_per_word = 16;
}
}

--
2.40.1