2017-02-02 04:47:10

by Anup Patel

[permalink] [raw]
Subject: [PATCH 0/6] Broadcom SBA RAID support

The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

Important limitations of Broadcom SBA RAID hardware are:
1. Requires disk position instead of disk coefficient
2. Supports only 30 PQ disk coefficients

To address limitation #1, we have added raid_gflog table which
will help driver convert disk coefficient to disk position. To
address limitation #2, we have extended Linux Async Tx APIs to
check for available PQ coefficients before doing PQ offload.

This patchset is based on Linux-4.10-rc6 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v1 branch of
https://github.com/Broadcom/arm64-linux.git

Anup Patel (6):
mailbox: Add new API mbox_channel_device() for clients
lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
async_tx: Handle DMA devices having support for fewer PQ coefficients
async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
dmaengine: Add Broadcom SBA RAID driver
dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

.../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 +
crypto/async_tx/async_pq.c | 8 +-
crypto/async_tx/async_raid6_recov.c | 12 +-
drivers/dma/Kconfig | 13 +
drivers/dma/Makefile | 1 +
drivers/dma/bcm-sba-raid.c | 1309 ++++++++++++++++++++
drivers/mailbox/mailbox.c | 21 +
include/linux/dmaengine.h | 19 +
include/linux/mailbox_client.h | 1 +
include/linux/raid/pq.h | 4 +
lib/raid6/mktables.c | 20 +
11 files changed, 1432 insertions(+), 5 deletions(-)
create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
create mode 100644 drivers/dma/bcm-sba-raid.c

--
2.7.4


2017-02-02 04:47:11

by Anup Patel

[permalink] [raw]
Subject: [PATCH 1/6] mailbox: Add new API mbox_channel_device() for clients

The remote processor can have DMAENGINE capabilities and client
can pass data to be processed via main memory. In such cases,
the client will require DMAble memory for remote processor.

This patch adds new API mbox_channel_device() which can be
used by clients to get struct device pointer of underlying
mailbox controller. This struct device pointer of mailbox
controller can be used by clients to allocate DMAble memory
for remote processor.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
---
drivers/mailbox/mailbox.c | 21 +++++++++++++++++++++
include/linux/mailbox_client.h | 1 +
2 files changed, 22 insertions(+)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4671f8a..d4380fc 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -281,6 +281,27 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
EXPORT_SYMBOL_GPL(mbox_send_message);

/**
+ * mbox_channel_device - Get device pointer of a mailbox channel.
+ * @chan: Mailbox channel assigned to this client.
+ *
+ * The remote processor can have DMAENGINE capabilities and client
+ * can pass data to be processed via main memory. In such cases,
+ * the client will require struct device pointer of the mailbox
+ * channel to map/unmap/allocate/free DMAble memory.
+ *
+ * Return: Pointer to the struct device of mailbox channel.
+ * ERR_PTR on failure.
+ */
+struct device *mbox_channel_device(struct mbox_chan *chan)
+{
+ if (!chan || !chan->cl)
+ return ERR_PTR(-EINVAL);
+
+ return chan->mbox->dev;
+}
+EXPORT_SYMBOL_GPL(mbox_channel_device);
+
+/**
* mbox_request_channel - Request a mailbox channel.
* @cl: Identity of the client requesting the channel.
* @index: Index of mailbox specifier in 'mboxes' property.
diff --git a/include/linux/mailbox_client.h b/include/linux/mailbox_client.h
index 4434871..3daffad 100644
--- a/include/linux/mailbox_client.h
+++ b/include/linux/mailbox_client.h
@@ -40,6 +40,7 @@ struct mbox_client {
void (*tx_done)(struct mbox_client *cl, void *mssg, int r);
};

+struct device *mbox_channel_device(struct mbox_chan *chan);
struct mbox_chan *mbox_request_channel_byname(struct mbox_client *cl,
const char *name);
struct mbox_chan *mbox_request_channel(struct mbox_client *cl, int index);
--
2.7.4

2017-02-02 04:48:16

by Anup Patel

[permalink] [raw]
Subject: [PATCH 2/6] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
---
include/linux/raid/pq.h | 1 +
lib/raid6/mktables.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
extern const u8 raid6_gfexp[256] __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256] __attribute__((aligned(256)));
extern const u8 raid6_gfinv[256] __attribute__((aligned(256)));
extern const u8 raid6_gfexi[256] __attribute__((aligned(256)));

diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
printf("EXPORT_SYMBOL(raid6_gfexp);\n");
printf("#endif\n");

+ /* Compute log-of-2 table */
+ printf("\nconst u8 __attribute__((aligned(256)))\n"
+ "raid6_gflog[256] =\n" "{\n");
+ for (i = 0; i < 256; i += 8) {
+ printf("\t");
+ for (j = 0; j < 8; j++) {
+ v = 255;
+ for (k = 0; k < 256; k++)
+ if (exptbl[k] == (i + j)) {
+ v = k;
+ break;
+ }
+ printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+ }
+ }
+ printf("};\n");
+ printf("#ifdef __KERNEL__\n");
+ printf("EXPORT_SYMBOL(raid6_gflog);\n");
+ printf("#endif\n");
+
/* Compute inverse table x^-1 == x^254 */
printf("\nconst u8 __attribute__((aligned(256)))\n"
"raid6_gfinv[256] =\n" "{\n");
--
2.7.4

2017-02-02 04:48:24

by Anup Patel

[permalink] [raw]
Subject: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

The DMAENGINE framework assumes that if PQ offload is supported by a
DMA device then all 256 PQ coefficients are supported. This assumption
does not hold anymore because we now have BCM-SBA-RAID offload engine
which supports PQ offload with limited number of PQ coefficients.

This patch extends async_tx APIs to handle DMA devices with support
for fewer PQ coefficients.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
---
crypto/async_tx/async_pq.c | 3 +++
crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
include/linux/dmaengine.h | 19 +++++++++++++++++++
include/linux/raid/pq.h | 3 +++
4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..16c6526 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -187,6 +187,9 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,

BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks)));

+ if (device && dma_maxpqcoef(device) < src_cnt)
+ device = NULL;
+
if (device)
unmap = dmaengine_get_unmap_data(device->dev, disks, GFP_NOWAIT);

diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 8fab627..2916f95 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -352,6 +352,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
{
void *scribble = submit->scribble;
int non_zero_srcs, i;
+ struct dma_chan *chan = async_dma_find_channel(DMA_PQ);

BUG_ON(faila == failb);
if (failb < faila)
@@ -359,12 +360,15 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,

pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);

+ if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+ chan = NULL;
+
/* if a dma resource is not available or a scribble buffer is not
* available punt to the synchronous path. In the 'dma not
* available' case be sure to use the scribble buffer to
* preserve the content of 'blocks' as the caller intended.
*/
- if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+ if (!chan || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;

async_tx_quiesce(&submit->depend_tx);
@@ -432,15 +436,19 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
void *scribble = submit->scribble;
int good_srcs, good, i;
struct page *srcs[2];
+ struct dma_chan *chan = async_dma_find_channel(DMA_PQ);

pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);

+ if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+ chan = NULL;
+
/* if a dma resource is not available or a scribble buffer is not
* available punt to the synchronous path. In the 'dma not
* available' case be sure to use the scribble buffer to
* preserve the content of 'blocks' as the caller intended.
*/
- if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+ if (!chan || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;

async_tx_quiesce(&submit->depend_tx);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index feee6ec..d938a8b 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -24,6 +24,7 @@
#include <linux/scatterlist.h>
#include <linux/bitmap.h>
#include <linux/types.h>
+#include <linux/raid/pq.h>
#include <asm/page.h>

/**
@@ -668,6 +669,7 @@ struct dma_filter {
* @cap_mask: one or more dma_capability flags
* @max_xor: maximum number of xor sources, 0 if no capability
* @max_pq: maximum number of PQ sources and PQ-continue capability
+ * @max_pqcoef: maximum number of PQ coefficients, 0 if all supported
* @copy_align: alignment shift for memcpy operations
* @xor_align: alignment shift for xor operations
* @pq_align: alignment shift for pq operations
@@ -727,11 +729,13 @@ struct dma_device {
dma_cap_mask_t cap_mask;
unsigned short max_xor;
unsigned short max_pq;
+ unsigned short max_pqcoef;
enum dmaengine_alignment copy_align;
enum dmaengine_alignment xor_align;
enum dmaengine_alignment pq_align;
enum dmaengine_alignment fill_align;
#define DMA_HAS_PQ_CONTINUE (1 << 15)
+ #define DMA_HAS_FEWER_PQ_COEF (1 << 15)

int dev_id;
struct device *dev;
@@ -1122,6 +1126,21 @@ static inline int dma_maxpq(struct dma_device *dma, enum dma_ctrl_flags flags)
BUG();
}

+static inline void dma_set_maxpqcoef(struct dma_device *dma,
+ unsigned short max_pqcoef)
+{
+ if (max_pqcoef < RAID6_PQ_MAX_COEF) {
+ dma->max_pqcoef = max_pqcoef;
+ dma->max_pqcoef |= DMA_HAS_FEWER_PQ_COEF;
+ }
+}
+
+static inline unsigned short dma_maxpqcoef(struct dma_device *dma)
+{
+ return (dma->max_pqcoef & DMA_HAS_FEWER_PQ_COEF) ?
+ (dma->max_pqcoef & ~DMA_HAS_FEWER_PQ_COEF) : RAID6_PQ_MAX_COEF;
+}
+
static inline size_t dmaengine_get_icg(bool inc, bool sgl, size_t icg,
size_t dir_icg)
{
diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 30f9453..f3a04bb 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -15,6 +15,9 @@

#ifdef __KERNEL__

+/* Max number of PQ coefficients */
+#define RAID6_PQ_MAX_COEF 256
+
/* Set to 1 to use kernel-wide empty_zero_page */
#define RAID6_USE_EMPTY_ZERO_PAGE 0
#include <linux/blkdev.h>
--
2.7.4

2017-02-02 04:47:14

by Anup Patel

[permalink] [raw]
Subject: [PATCH 4/6] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()

The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
---
crypto/async_tx/async_pq.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 16c6526..947cf35 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
dma_addr_t dma_dest[2];
int src_off = 0;

- if (submit->flags & ASYNC_TX_FENCE)
- dma_flags |= DMA_PREP_FENCE;
-
while (src_cnt > 0) {
submit->flags = flags_orig;
pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
if (cb_fn_orig)
dma_flags |= DMA_PREP_INTERRUPT;
}
+ if (submit->flags & ASYNC_TX_FENCE)
+ dma_flags |= DMA_PREP_FENCE;

/* Drivers force forward progress in case they can not provide
* a descriptor
--
2.7.4

2017-02-02 04:48:41

by Anup Patel

[permalink] [raw]
Subject: [PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
---
drivers/dma/Kconfig | 13 +
drivers/dma/Makefile | 1 +
drivers/dma/bcm-sba-raid.c | 1309 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 1323 insertions(+)
create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..58d0463 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,19 @@ config AXI_DMAC
controller is often used in Analog Device's reference designs for FPGA
platforms.

+config BCM_SBA_RAID
+ tristate "Broadcom SBA RAID engine support"
+ depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+ select DMA_ENGINE
+ select DMA_ENGINE_RAID
+ select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+ default ARCH_BCM_IPROC
+ help
+ Enable support for Broadcom SBA RAID Engine. The SBA RAID
+ engine is available on most of the Broadcom iProc SoCs. It
+ has the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
config COH901318
bool "ST-Ericsson COH901318 DMA support"
select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 0000000..bf39f3f
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1309 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/list.h>
+#include <linux/mailbox_client.h>
+#include <linux/mailbox/brcm-message.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/slab.h>
+#include <linux/raid/pq.h>
+
+#include "dmaengine.h"
+
+/* SBA command helper macros */
+#define SBA_DEC(_d, _s, _m) (((_d) >> (_s)) & (_m))
+#define SBA_ENC(_d, _v, _s, _m) \
+ do { \
+ (_d) &= ~((u64)(_m) << (_s)); \
+ (_d) |= (((u64)(_v) & (_m)) << (_s)); \
+ } while (0)
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT 48
+#define SBA_TYPE_MASK 0x3
+#define SBA_TYPE_A 0x0
+#define SBA_TYPE_B 0x2
+#define SBA_TYPE_C 0x3
+#define SBA_USER_DEF_SHIFT 32
+#define SBA_USER_DEF_MASK 0xffff
+#define SBA_R_MDATA_SHIFT 24
+#define SBA_R_MDATA_MASK 0xff
+#define SBA_C_MDATA_MS_SHIFT 18
+#define SBA_C_MDATA_MS_MASK 0x3
+#define SBA_INT_SHIFT 17
+#define SBA_INT_MASK 0x1
+#define SBA_RESP_SHIFT 16
+#define SBA_RESP_MASK 0x1
+#define SBA_C_MDATA_SHIFT 8
+#define SBA_C_MDATA_MASK 0xff
+#define SBA_CMD_SHIFT 0
+#define SBA_CMD_MASK 0xf
+#define SBA_CMD_ZERO_ALL_BUFFERS 0x8
+#define SBA_CMD_LOAD_BUFFER 0x9
+#define SBA_CMD_XOR 0xa
+#define SBA_CMD_GALOIS_XOR 0xb
+#define SBA_CMD_ZERO_BUFFER 0x4
+#define SBA_CMD_WRITE_BUFFER 0xc
+
+/* SBA C_MDATA helper macros */
+#define SBA_C_MDATA_LOAD_VAL(__bnum0) ((__bnum0) & 0x3)
+#define SBA_C_MDATA_WRITE_VAL(__bnum0) ((__bnum0) & 0x3)
+#define SBA_C_MDATA_XOR_VAL(__bnum1, __bnum0) \
+ ({ u32 __v = ((__bnum0) & 0x3); \
+ __v |= ((__bnum1) & 0x3) << 2; \
+ __v; \
+ })
+#define SBA_C_MDATA_PQ_VAL(__dnum, __bnum1, __bnum0) \
+ ({ u32 __v = ((__bnum0) & 0x3); \
+ __v |= ((__bnum1) & 0x3) << 2; \
+ __v |= ((__dnum) & 0x1f) << 5; \
+ __v; \
+ })
+#define SBA_C_MDATA_LS(__c_mdata_val) ((__c_mdata_val) & 0xff)
+#define SBA_C_MDATA_MS(__c_mdata_val) (((__c_mdata_val) >> 8) & 0x3)
+
+/* Driver helper macros */
+#define to_sba_request(tx) \
+ container_of(tx, struct sba_request, tx)
+#define to_sba_device(dchan) \
+ container_of(dchan, struct sba_device, dma_chan)
+
+enum sba_request_state {
+ SBA_REQUEST_STATE_FREE = 1,
+ SBA_REQUEST_STATE_ALLOCED = 2,
+ SBA_REQUEST_STATE_PENDING = 3,
+ SBA_REQUEST_STATE_ACTIVE = 4,
+ SBA_REQUEST_STATE_COMPLETED = 5,
+ SBA_REQUEST_STATE_ABORTED = 6,
+};
+
+struct sba_request {
+ struct list_head node;
+ struct sba_device *sba;
+ enum sba_request_state state;
+ bool fence;
+ void *resp;
+ dma_addr_t resp_dma;
+ struct brcm_sba_command *cmds;
+ struct brcm_message *msgs;
+ struct brcm_message bmsg;
+ atomic_t msgs_pending_count;
+ struct dma_async_tx_descriptor tx;
+};
+
+enum sba_version {
+ SBA_VER_1 = 0,
+ SBA_VER_2
+};
+
+struct sba_device {
+ /* Underlying device */
+ struct device *dev;
+ /* DT configuration parameters */
+ enum sba_version ver;
+ u32 max_req;
+ u32 req_size;
+ /* Derived configuration parameters */
+ u32 hw_buf_size;
+ u32 hw_resp_size;
+ u32 max_pq_coefs;
+ u32 max_pq_srcs;
+ u32 max_msg_per_req;
+ u32 max_cmd_per_msg;
+ u32 max_cmd_per_req;
+ u32 max_xor_srcs;
+ u32 max_resp_pool_size;
+ u32 max_cmds_pool_size;
+ /* Maibox client and Mailbox channels */
+ struct mbox_client client;
+ int mchans_count;
+ atomic_t mchans_current;
+ struct mbox_chan **mchans;
+ struct device *mbox_dev;
+ /* DMA device and DMA channel */
+ struct dma_device dma_dev;
+ struct dma_chan dma_chan;
+ /* DMA channel resources */
+ void *resp_base;
+ dma_addr_t resp_dma_base;
+ void *cmds_base;
+ dma_addr_t cmds_dma_base;
+ spinlock_t reqs_lock;
+ struct sba_request *reqs;
+ bool reqs_fence;
+ struct list_head reqs_alloc_list;
+ struct list_head reqs_pending_list;
+ struct list_head reqs_active_list;
+ struct list_head reqs_completed_list;
+ struct list_head reqs_aborted_list;
+ struct list_head reqs_free_list;
+ int reqs_free_count;
+};
+
+/* ====== Channel resource management routines ===== */
+
+static struct sba_request *sba_alloc_request(struct sba_device *sba)
+{
+ unsigned long flags;
+ struct sba_request *req = NULL;
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+
+ if (!list_empty(&sba->reqs_free_list)) {
+ req = list_first_entry(&sba->reqs_free_list,
+ struct sba_request,
+ node);
+
+ req->state = SBA_REQUEST_STATE_ALLOCED;
+ req->fence = false;
+ atomic_set(&req->msgs_pending_count, 0);
+ list_move_tail(&req->node, &sba->reqs_alloc_list);
+ sba->reqs_free_count--;
+
+ dma_async_tx_descriptor_init(&req->tx, &sba->dma_chan);
+ }
+
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+ return req;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_pending_request(struct sba_device *sba,
+ struct sba_request *req)
+{
+ req->state = SBA_REQUEST_STATE_PENDING;
+ list_move_tail(&req->node, &sba->reqs_pending_list);
+ if (list_empty(&sba->reqs_active_list))
+ sba->reqs_fence = false;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static bool _sba_active_request(struct sba_device *sba,
+ struct sba_request *req)
+{
+ if (list_empty(&sba->reqs_active_list))
+ sba->reqs_fence = false;
+ if (sba->reqs_fence)
+ return false;
+ req->state = SBA_REQUEST_STATE_ACTIVE;
+ list_move_tail(&req->node, &sba->reqs_active_list);
+ if (req->fence)
+ sba->reqs_fence = true;
+ return true;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_abort_request(struct sba_device *sba,
+ struct sba_request *req)
+{
+ req->state = SBA_REQUEST_STATE_ABORTED;
+ list_move_tail(&req->node, &sba->reqs_aborted_list);
+ if (list_empty(&sba->reqs_active_list))
+ sba->reqs_fence = false;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_free_request(struct sba_device *sba,
+ struct sba_request *req)
+{
+ req->state = SBA_REQUEST_STATE_FREE;
+ list_move_tail(&req->node, &sba->reqs_free_list);
+ if (list_empty(&sba->reqs_active_list))
+ sba->reqs_fence = false;
+ sba->reqs_free_count++;
+}
+
+static void sba_complete_request(struct sba_request *req)
+{
+ unsigned long flags;
+ struct sba_device *sba = req->sba;
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+ req->state = SBA_REQUEST_STATE_COMPLETED;
+ list_move_tail(&req->node, &sba->reqs_completed_list);
+ if (list_empty(&sba->reqs_active_list))
+ sba->reqs_fence = false;
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static void sba_free_request(struct sba_request *req)
+{
+ unsigned long flags;
+ struct sba_device *sba = req->sba;
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+ _sba_free_request(sba, req);
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static int sba_free_request_count(struct sba_device *sba)
+{
+ int ret;
+ unsigned long flags;
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+ ret = sba->reqs_free_count;
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+ return ret;
+}
+
+static void sba_cleanup_inflight_requests(struct sba_device *sba)
+{
+ unsigned long flags;
+ struct sba_request *req, *req1;
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+
+ /* Freeup all alloced request */
+ list_for_each_entry_safe(req, req1, &sba->reqs_alloc_list, node) {
+ _sba_free_request(sba, req);
+ }
+
+ /* Freeup all pending request */
+ list_for_each_entry_safe(req, req1, &sba->reqs_pending_list, node) {
+ if (req->bmsg.batch.msgs_queued < req->bmsg.batch.msgs_count)
+ /* Set partially-queued request as aborted */
+ _sba_abort_request(sba, req);
+ else
+ /* Freeup rest of the pending request */
+ _sba_free_request(sba, req);
+ }
+
+ /* Freeup all completed request */
+ list_for_each_entry_safe(req, req1, &sba->reqs_completed_list, node) {
+ _sba_free_request(sba, req);
+ }
+
+ /* Set all active requests as aborted */
+ list_for_each_entry_safe(req, req1, &sba->reqs_active_list, node) {
+ _sba_abort_request(sba, req);
+ }
+
+ /*
+ * Note: We expect that aborted request will be eventually
+ * freed by sba_receive_message()
+ */
+
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+/* ====== DMAENGINE callbacks ===== */
+
+static int sba_alloc_chan_resources(struct dma_chan *dchan)
+{
+ /*
+ * We only have one channel so we have pre-alloced
+ * channel resources. Over here we just return number
+ * of free request.
+ */
+ return sba_free_request_count(to_sba_device(dchan));
+}
+
+static void sba_free_chan_resources(struct dma_chan *dchan)
+{
+ /*
+ * Channel resources are pre-alloced so we just free-up
+ * whatever we can so that we can re-use pre-alloced
+ * channel resources next time.
+ */
+ sba_cleanup_inflight_requests(to_sba_device(dchan));
+}
+
+static int sba_send_mbox_request(struct sba_device *sba,
+ struct sba_request *req)
+{
+ int mchans_idx, ret = 0;
+
+ /* Select mailbox channel in round-robin fashion */
+ mchans_idx = atomic_inc_return(&sba->mchans_current);
+ mchans_idx = mchans_idx % sba->mchans_count;
+
+ /* Send batch message for the request */
+ req->bmsg.batch.msgs_queued = 0;
+ ret = mbox_send_message(sba->mchans[mchans_idx], &req->bmsg);
+ if (ret < 0) {
+ dev_info(sba->dev, "channel %d message %d (total %d)",
+ mchans_idx, req->bmsg.batch.msgs_queued,
+ req->bmsg.batch.msgs_count);
+ dev_err(sba->dev, "send message failed with error %d", ret);
+ return ret;
+ }
+ ret = req->bmsg.error;
+ if (ret < 0) {
+ dev_info(sba->dev,
+ "mbox channel %d message %d (total %d)",
+ mchans_idx, req->bmsg.batch.msgs_queued,
+ req->bmsg.batch.msgs_count);
+ dev_err(sba->dev, "message error %d", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void sba_issue_pending(struct dma_chan *dchan)
+{
+ int ret;
+ unsigned long flags;
+ struct sba_request *req, *req1;
+ struct sba_device *sba = to_sba_device(dchan);
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+
+ /* Process all pending request */
+ list_for_each_entry_safe(req, req1, &sba->reqs_pending_list, node) {
+ /* Try to make request active */
+ if (!_sba_active_request(sba, req))
+ break;
+
+ /* Send request to mailbox channel */
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+ ret = sba_send_mbox_request(sba, req);
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+
+ /* If something went wrong then keep request pending */
+ if (ret < 0) {
+ _sba_pending_request(sba, req);
+ break;
+ }
+ }
+
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static dma_cookie_t sba_tx_submit(struct dma_async_tx_descriptor *tx)
+{
+ unsigned long flags;
+ dma_cookie_t cookie;
+ struct sba_request *req;
+ struct sba_device *sba;
+
+ if (unlikely(!tx))
+ return -EINVAL;
+
+ sba = to_sba_device(tx->chan);
+ req = to_sba_request(tx);
+
+ /* Assign cookie and mark request pending */
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+ cookie = dma_cookie_assign(tx);
+ _sba_pending_request(sba, req);
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+ /* Try to submit pending request */
+ sba_issue_pending(&sba->dma_chan);
+
+ return cookie;
+}
+
+static enum dma_status sba_tx_status(struct dma_chan *dchan,
+ dma_cookie_t cookie,
+ struct dma_tx_state *txstate)
+{
+ int mchan_idx;
+ enum dma_status ret;
+ struct sba_device *sba = to_sba_device(dchan);
+
+ ret = dma_cookie_status(dchan, cookie, txstate);
+ if (ret == DMA_COMPLETE)
+ return ret;
+
+ for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
+ mbox_client_peek_data(sba->mchans[mchan_idx]);
+
+ return dma_cookie_status(dchan, cookie, txstate);
+}
+
+static unsigned int sba_fillup_memcpy_msg(struct sba_request *req,
+ struct brcm_sba_command *cmds,
+ struct brcm_message *msg,
+ dma_addr_t msg_offset, size_t msg_len,
+ dma_addr_t dst, dma_addr_t src)
+{
+ u64 cmd;
+ u32 c_mdata;
+ struct brcm_sba_command *cmdsp = cmds;
+
+ /* Type-B command to load data into buf0 */
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_LOAD_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = src + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+
+ /* Type-A command to write buf0 */
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+ c_mdata = SBA_C_MDATA_WRITE_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+ if (req->sba->hw_resp_size) {
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+ cmdsp->resp = req->resp_dma;
+ cmdsp->resp_len = req->sba->hw_resp_size;
+ }
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+ cmdsp->data = dst + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+
+ /* Fillup brcm_message */
+ msg->type = BRCM_MESSAGE_SBA;
+ msg->sba.cmds = cmds;
+ msg->sba.cmds_count = cmdsp - cmds;
+ msg->ctx = req;
+ msg->error = 0;
+
+ return cmdsp - cmds;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t src,
+ size_t len, unsigned long flags)
+{
+ size_t msg_len;
+ dma_addr_t msg_offset = 0;
+ unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+ struct sba_device *sba = to_sba_device(dchan);
+ struct sba_request *req = NULL;
+
+ /* Sanity checks */
+ if (unlikely(len > sba->req_size))
+ return NULL;
+
+ /* Alloc new request */
+ req = sba_alloc_request(sba);
+ if (!req)
+ return NULL;
+ req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+ /* Fillup request messages */
+ while (len) {
+ msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+ cmds_count = sba_fillup_memcpy_msg(req,
+ &req->cmds[cmds_idx],
+ &req->msgs[msgs_count],
+ msg_offset, msg_len, dst, src);
+ msgs_count++;
+ cmds_idx += cmds_count;
+ msg_offset += msg_len;
+ len -= msg_len;
+ }
+ req->bmsg.type = BRCM_MESSAGE_BATCH;
+ req->bmsg.batch.msgs = &req->msgs[0];
+ req->bmsg.batch.msgs_queued = 0;
+ req->bmsg.batch.msgs_count = msgs_count;
+ req->bmsg.ctx = req;
+ req->bmsg.error = 0;
+ atomic_set(&req->msgs_pending_count, msgs_count);
+
+ /* Init async_tx descriptor */
+ req->tx.flags = flags;
+ req->tx.cookie = -EBUSY;
+
+ return &req->tx;
+}
+
+static unsigned int sba_fillup_xor_msg(struct sba_request *req,
+ struct brcm_sba_command *cmds,
+ struct brcm_message *msg,
+ dma_addr_t msg_offset, size_t msg_len,
+ dma_addr_t dst, dma_addr_t *src, u32 src_cnt)
+{
+ u64 cmd;
+ u32 c_mdata;
+ unsigned int i;
+ struct brcm_sba_command *cmdsp = cmds;
+
+ /* Type-B command to load data into buf0 */
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_LOAD_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = src[0] + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+
+ /* Type-B commands to xor data with buf0 and put it back in buf0 */
+ for (i = 1; i < src_cnt; i++) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_XOR_VAL(0, 0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_XOR, SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = src[i] + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+
+ /* Type-A command to write buf0 */
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+ c_mdata = SBA_C_MDATA_WRITE_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+ if (req->sba->hw_resp_size) {
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+ cmdsp->resp = req->resp_dma;
+ cmdsp->resp_len = req->sba->hw_resp_size;
+ }
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+ cmdsp->data = dst + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+
+ /* Fillup brcm_message */
+ msg->type = BRCM_MESSAGE_SBA;
+ msg->sba.cmds = cmds;
+ msg->sba.cmds_count = cmdsp - cmds;
+ msg->ctx = req;
+ msg->error = 0;
+
+ return cmdsp - cmds;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_xor(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t *src,
+ u32 src_cnt, size_t len, unsigned long flags)
+{
+ size_t msg_len;
+ dma_addr_t msg_offset = 0;
+ unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+ struct sba_device *sba = to_sba_device(dchan);
+ struct sba_request *req = NULL;
+
+ /* Sanity checks */
+ if (unlikely(len > sba->req_size))
+ return NULL;
+ if (unlikely(src_cnt > sba->max_xor_srcs))
+ return NULL;
+
+ /* Alloc new request */
+ req = sba_alloc_request(sba);
+ if (!req)
+ return NULL;
+ req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+ /* Fillup request messages */
+ while (len) {
+ msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+ cmds_count = sba_fillup_xor_msg(req,
+ &req->cmds[cmds_idx],
+ &req->msgs[msgs_count],
+ msg_offset, msg_len,
+ dst, src, src_cnt);
+ msgs_count++;
+ cmds_idx += cmds_count;
+ msg_offset += msg_len;
+ len -= msg_len;
+ }
+ req->bmsg.type = BRCM_MESSAGE_BATCH;
+ req->bmsg.batch.msgs = &req->msgs[0];
+ req->bmsg.batch.msgs_queued = 0;
+ req->bmsg.batch.msgs_count = msgs_count;
+ req->bmsg.ctx = req;
+ req->bmsg.error = 0;
+ atomic_set(&req->msgs_pending_count, msgs_count);
+
+ /* Init async_tx descriptor */
+ req->tx.flags = flags;
+ req->tx.cookie = -EBUSY;
+
+ return &req->tx;
+}
+
+static unsigned int sba_fillup_pq_msg(struct sba_request *req,
+ bool pq_continue,
+ struct brcm_sba_command *cmds,
+ struct brcm_message *msg,
+ dma_addr_t msg_offset, size_t msg_len,
+ dma_addr_t *dst_p, dma_addr_t *dst_q,
+ const u8 *scf, dma_addr_t *src, u32 src_cnt)
+{
+ u64 cmd;
+ u32 c_mdata;
+ unsigned int i;
+ struct brcm_sba_command *cmdsp = cmds;
+
+ if (pq_continue) {
+ /* Type-B command to load old P into buf0 */
+ if (dst_p) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B,
+ SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_LOAD_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = *dst_p + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+
+ /* Type-B command to load old Q into buf1 */
+ if (dst_q) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B,
+ SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_LOAD_VAL(1);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = *dst_q + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+ } else {
+ /* Type-A command to load data into buf0 */
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ SBA_ENC(cmd, SBA_CMD_ZERO_ALL_BUFFERS,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+ cmdsp++;
+ }
+
+ /* Type-B commands for generate P onto buf0 and Q onto buf1 */
+ for (i = 0; i < src_cnt; i++) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ c_mdata = SBA_C_MDATA_PQ_VAL(raid6_gflog[scf[i]], 1, 0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_C_MDATA_MS(c_mdata),
+ SBA_C_MDATA_MS_SHIFT, SBA_C_MDATA_MS_MASK);
+ SBA_ENC(cmd, SBA_CMD_GALOIS_XOR,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+ cmdsp->data = src[i] + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+
+ /* Type-A command to write buf0 */
+ if (dst_p) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+ c_mdata = SBA_C_MDATA_WRITE_VAL(0);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+ if (req->sba->hw_resp_size) {
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+ cmdsp->resp = req->resp_dma;
+ cmdsp->resp_len = req->sba->hw_resp_size;
+ }
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+ cmdsp->data = *dst_p + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+
+ /* Type-A command to write buf1 */
+ if (dst_q) {
+ cmd = 0;
+ SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+ SBA_ENC(cmd, msg_len,
+ SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+ SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+ c_mdata = SBA_C_MDATA_WRITE_VAL(1);
+ SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+ SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+ SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+ SBA_CMD_SHIFT, SBA_CMD_MASK);
+ cmdsp->cmd = cmd;
+ *cmdsp->cmd_dma = cpu_to_le64(cmd);
+ cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+ if (req->sba->hw_resp_size) {
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+ cmdsp->resp = req->resp_dma;
+ cmdsp->resp_len = req->sba->hw_resp_size;
+ }
+ cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+ cmdsp->data = *dst_q + msg_offset;
+ cmdsp->data_len = msg_len;
+ cmdsp++;
+ }
+
+ /* Fillup brcm_message */
+ msg->type = BRCM_MESSAGE_SBA;
+ msg->sba.cmds = cmds;
+ msg->sba.cmds_count = cmdsp - cmds;
+ msg->ctx = req;
+ msg->error = 0;
+
+ return cmdsp - cmds;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_pq(struct dma_chan *dchan, dma_addr_t *dst, dma_addr_t *src,
+ u32 src_cnt, const u8 *scf, size_t len, unsigned long flags)
+{
+ u32 i;
+ size_t dst_count, msg_len;
+ unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+ dma_addr_t *dst_p = NULL, *dst_q = NULL;
+ dma_addr_t msg_offset = 0;
+ struct sba_device *sba = to_sba_device(dchan);
+ struct sba_request *req = NULL;
+
+ /* Sanity checks */
+ if (unlikely(len > sba->req_size))
+ return NULL;
+ if (unlikely(src_cnt > sba->max_pq_srcs))
+ return NULL;
+ for (i = 0; i < src_cnt; i++)
+ if (sba->max_pq_coefs <= raid6_gflog[scf[i]])
+ return NULL;
+
+ /* Figure-out P and Q destination addresses */
+ dst_count = 0;
+ if (!(flags & DMA_PREP_PQ_DISABLE_P))
+ dst_p = &dst[dst_count++];
+ if (!(flags & DMA_PREP_PQ_DISABLE_Q))
+ dst_q = &dst[dst_count++];
+
+ /* Alloc new request */
+ req = sba_alloc_request(sba);
+ if (!req)
+ return NULL;
+ req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+ /* Fillup request messages */
+ while (len) {
+ msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+ cmds_count = sba_fillup_pq_msg(req, dmaf_continue(flags),
+ &req->cmds[cmds_idx],
+ &req->msgs[msgs_count],
+ msg_offset, msg_len,
+ dst_p, dst_q, scf, src, src_cnt);
+ msgs_count++;
+ cmds_idx += cmds_count;
+ msg_offset += msg_len;
+ len -= msg_len;
+ }
+ req->bmsg.type = BRCM_MESSAGE_BATCH;
+ req->bmsg.batch.msgs = &req->msgs[0];
+ req->bmsg.batch.msgs_queued = 0;
+ req->bmsg.batch.msgs_count = msgs_count;
+ req->bmsg.ctx = req;
+ req->bmsg.error = 0;
+ atomic_set(&req->msgs_pending_count, msgs_count);
+
+ /* Init async_tx descriptor */
+ req->tx.flags = flags;
+ req->tx.cookie = -EBUSY;
+
+ return &req->tx;
+}
+
+/* ====== Mailbox callbacks ===== */
+
+static void sba_dma_tx_actions(struct sba_request *req)
+{
+ struct dma_async_tx_descriptor *tx = &req->tx;
+
+ WARN_ON(tx->cookie < 0);
+
+ if (tx->cookie > 0) {
+ dma_cookie_complete(tx);
+
+ /* call the callback (must not sleep or submit new
+ * operations to this channel)
+ */
+ if (tx->callback)
+ tx->callback(tx->callback_param);
+
+ dma_descriptor_unmap(tx);
+ }
+
+ /* run dependent operations */
+ dma_run_dependencies(tx);
+}
+
+static void sba_dma_clean(struct sba_request *req)
+{
+ /* If waiting for 'ack' then move to completed list */
+ if (!async_tx_test_ack(&req->tx))
+ sba_complete_request(req);
+ else
+ sba_free_request(req);
+}
+
+static void sba_receive_message(struct mbox_client *cl, void *msg)
+{
+ unsigned long flags;
+ struct brcm_message *m = msg;
+ struct sba_request *req = m->ctx, *req1;
+ struct sba_device *sba = req->sba;
+
+ /* error count if message has error */
+ if (m->error < 0) {
+ dev_err(sba->dev, "%s got message with error %d",
+ dma_chan_name(&sba->dma_chan), m->error);
+ }
+
+ /* Wait for all messages to be completed */
+ if (atomic_dec_return(&req->msgs_pending_count))
+ return;
+
+ /* Update request */
+ if (req->state == SBA_REQUEST_STATE_ACTIVE) {
+ sba_dma_tx_actions(req);
+ sba_dma_clean(req);
+ } else {
+ sba_free_request(req);
+ }
+
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+
+ /* Re-check all completed request waiting for 'ack' */
+ list_for_each_entry_safe(req, req1, &sba->reqs_completed_list, node) {
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+ sba_dma_tx_actions(req);
+ sba_dma_clean(req);
+ spin_lock_irqsave(&sba->reqs_lock, flags);
+ }
+
+ spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+ /* Try to submit pending request */
+ sba_issue_pending(&sba->dma_chan);
+}
+
+/* ====== Platform driver routines ===== */
+
+static int sba_prealloc_channel_resources(struct sba_device *sba)
+{
+ int i, j, p, ret = 0;
+ struct sba_request *req = NULL;
+
+ sba->resp_base = dma_alloc_coherent(sba->dma_dev.dev,
+ sba->max_resp_pool_size,
+ &sba->resp_dma_base, GFP_KERNEL);
+ if (!sba->resp_base)
+ return -ENOMEM;
+
+ sba->cmds_base = dma_alloc_coherent(sba->dma_dev.dev,
+ sba->max_cmds_pool_size,
+ &sba->cmds_dma_base, GFP_KERNEL);
+ if (!sba->cmds_base) {
+ ret = -ENOMEM;
+ goto fail_free_resp_pool;
+ }
+
+ spin_lock_init(&sba->reqs_lock);
+ sba->reqs_fence = false;
+ INIT_LIST_HEAD(&sba->reqs_alloc_list);
+ INIT_LIST_HEAD(&sba->reqs_pending_list);
+ INIT_LIST_HEAD(&sba->reqs_active_list);
+ INIT_LIST_HEAD(&sba->reqs_completed_list);
+ INIT_LIST_HEAD(&sba->reqs_aborted_list);
+ INIT_LIST_HEAD(&sba->reqs_free_list);
+
+ sba->reqs = devm_kcalloc(sba->dev, sba->max_req,
+ sizeof(*req), GFP_KERNEL);
+ if (!sba->reqs) {
+ ret = -ENOMEM;
+ goto fail_free_cmds_pool;
+ }
+
+ for (i = 0, p = 0; i < sba->max_req; i++) {
+ req = &sba->reqs[i];
+ INIT_LIST_HEAD(&req->node);
+ req->sba = sba;
+ req->state = SBA_REQUEST_STATE_FREE;
+ req->fence = false;
+ req->resp = sba->resp_base + p;
+ req->resp_dma = sba->resp_dma_base + p;
+ p += sba->hw_resp_size;
+ req->cmds = devm_kcalloc(sba->dev, sba->max_cmd_per_req,
+ sizeof(*req->cmds), GFP_KERNEL);
+ if (!req->cmds) {
+ ret = -ENOMEM;
+ goto fail_free_cmds_pool;
+ }
+ for (j = 0; j < sba->max_cmd_per_req; j++) {
+ req->cmds[j].cmd = 0;
+ req->cmds[j].cmd_dma = sba->cmds_base +
+ (i * sba->max_cmd_per_req + j) * sizeof(u64);
+ req->cmds[j].cmd_dma_addr = sba->cmds_dma_base +
+ (i * sba->max_cmd_per_req + j) * sizeof(u64);
+ req->cmds[j].flags = 0;
+ }
+ req->msgs = devm_kcalloc(sba->dev, sba->max_msg_per_req,
+ sizeof(*req->msgs), GFP_KERNEL);
+ if (!req->msgs) {
+ ret = -ENOMEM;
+ goto fail_free_cmds_pool;
+ }
+ memset(&req->bmsg, 0, sizeof(req->bmsg));
+ atomic_set(&req->msgs_pending_count, 0);
+ dma_async_tx_descriptor_init(&req->tx, &sba->dma_chan);
+ req->tx.tx_submit = sba_tx_submit;
+ req->tx.phys = req->resp_dma;
+ list_add_tail(&req->node, &sba->reqs_free_list);
+ }
+
+ sba->reqs_free_count = sba->max_req;
+
+ return 0;
+
+fail_free_cmds_pool:
+ dma_free_coherent(sba->dma_dev.dev,
+ sba->max_cmds_pool_size,
+ sba->cmds_base, sba->cmds_dma_base);
+fail_free_resp_pool:
+ dma_free_coherent(sba->dma_dev.dev,
+ sba->max_resp_pool_size,
+ sba->resp_base, sba->resp_dma_base);
+ return ret;
+}
+
+static void sba_freeup_channel_resources(struct sba_device *sba)
+{
+ dmaengine_terminate_all(&sba->dma_chan);
+ dma_free_coherent(sba->dma_dev.dev, sba->max_cmds_pool_size,
+ sba->cmds_base, sba->cmds_dma_base);
+ dma_free_coherent(sba->dma_dev.dev, sba->max_resp_pool_size,
+ sba->resp_base, sba->resp_dma_base);
+ sba->resp_base = NULL;
+ sba->resp_dma_base = 0;
+}
+
+static int sba_async_register(struct sba_device *sba)
+{
+ int ret;
+ struct dma_device *dma_dev = &sba->dma_dev;
+
+ /* Initialize DMA channel cookie */
+ sba->dma_chan.device = dma_dev;
+ dma_cookie_init(&sba->dma_chan);
+
+ /* Initialize DMA device capability mask */
+ dma_cap_zero(dma_dev->cap_mask);
+ dma_cap_set(DMA_MEMCPY, dma_dev->cap_mask);
+ dma_cap_set(DMA_XOR, dma_dev->cap_mask);
+ dma_cap_set(DMA_PQ, dma_dev->cap_mask);
+
+ /*
+ * Set mailbox channel device as the base device of
+ * our dma_device because the actual memory accesses
+ * will be done by mailbox controller
+ */
+ dma_dev->dev = sba->mbox_dev;
+
+ /* Set base prep routines */
+ dma_dev->device_alloc_chan_resources = sba_alloc_chan_resources;
+ dma_dev->device_free_chan_resources = sba_free_chan_resources;
+ dma_dev->device_issue_pending = sba_issue_pending;
+ dma_dev->device_tx_status = sba_tx_status;
+
+ /* Set memcpy routines and capability */
+ if (dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask))
+ dma_dev->device_prep_dma_memcpy = sba_prep_dma_memcpy;
+
+ /* Set xor routines and capability */
+ if (dma_has_cap(DMA_XOR, dma_dev->cap_mask)) {
+ dma_dev->device_prep_dma_xor = sba_prep_dma_xor;
+ dma_dev->max_xor = sba->max_xor_srcs;
+ }
+
+ /* Set pq routines and capability */
+ if (dma_has_cap(DMA_PQ, dma_dev->cap_mask)) {
+ dma_dev->device_prep_dma_pq = sba_prep_dma_pq;
+ dma_set_maxpq(dma_dev, sba->max_pq_srcs, 0);
+ dma_set_maxpqcoef(dma_dev, sba->max_pq_coefs);
+ }
+
+ /* Initialize DMA device channel list */
+ INIT_LIST_HEAD(&dma_dev->channels);
+ list_add_tail(&sba->dma_chan.device_node, &dma_dev->channels);
+
+ /* Register with Linux async DMA framework*/
+ ret = dma_async_device_register(dma_dev);
+ if (ret) {
+ dev_err(sba->dev, "async device register error %d", ret);
+ return ret;
+ }
+
+ dev_info(sba->dev, "%s capabilities: %s%s%s\n",
+ dma_chan_name(&sba->dma_chan),
+ dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "memcpy " : "",
+ dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
+ dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "");
+
+ return 0;
+}
+
+static int sba_probe(struct platform_device *pdev)
+{
+ int i, ret = 0, mchans_count;
+ struct sba_device *sba;
+
+ /* Allocate main SBA struct */
+ sba = devm_kzalloc(&pdev->dev, sizeof(*sba), GFP_KERNEL);
+ if (!sba)
+ return -ENOMEM;
+
+ sba->dev = &pdev->dev;
+ platform_set_drvdata(pdev, sba);
+
+ /* Determine SBA version from DT compatible string */
+ if (of_device_is_compatible(sba->dev->of_node, "brcm,iproc-sba"))
+ sba->ver = SBA_VER_1;
+ else if (of_device_is_compatible(sba->dev->of_node,
+ "brcm,iproc-sba-v2"))
+ sba->ver = SBA_VER_2;
+ else
+ return -ENODEV;
+
+ /* Derived Configuration parameters */
+ switch (sba->ver) {
+ case SBA_VER_1:
+ sba->max_req = 128;
+ sba->req_size = PAGE_SIZE;
+ sba->hw_buf_size = 4096;
+ sba->hw_resp_size = 8;
+ sba->max_pq_coefs = 6;
+ sba->max_pq_srcs = 6;
+ break;
+ case SBA_VER_2:
+ sba->max_req = 128;
+ sba->req_size = PAGE_SIZE;
+ sba->hw_buf_size = 4096;
+ sba->hw_resp_size = 8;
+ sba->max_pq_coefs = 30;
+ /*
+ * We can support max_pq_srcs == max_pq_coefs because
+ * we are limited by number of SBA commands that we can
+ * fit in one message for underlying ring manager HW.
+ */
+ sba->max_pq_srcs = 12;
+ break;
+ default:
+ return -EINVAL;
+ }
+ sba->max_msg_per_req = sba->req_size / sba->hw_buf_size;
+ if ((sba->max_msg_per_req * sba->hw_buf_size) < sba->req_size)
+ sba->max_msg_per_req++;
+ sba->max_cmd_per_msg = sba->max_pq_srcs + 3;
+ sba->max_cmd_per_req = sba->max_msg_per_req * sba->max_cmd_per_msg;
+ sba->max_xor_srcs = sba->max_cmd_per_msg - 1;
+ sba->max_resp_pool_size = sba->max_req * sba->hw_resp_size;
+ sba->max_cmds_pool_size = sba->max_req *
+ sba->max_cmd_per_req * sizeof(u64);
+
+ /* Setup mailbox client */
+ sba->client.dev = &pdev->dev;
+ sba->client.rx_callback = sba_receive_message;
+ sba->client.tx_block = false;
+ sba->client.knows_txdone = false;
+ sba->client.tx_tout = 0;
+
+ /* Number of channels equals number of mailbox channels */
+ ret = of_count_phandle_with_args(pdev->dev.of_node,
+ "mboxes", "#mbox-cells");
+ if (ret <= 0)
+ return -ENODEV;
+ mchans_count = ret;
+ sba->mchans_count = 0;
+ atomic_set(&sba->mchans_current, 0);
+
+ /* Allocate mailbox channel array */
+ sba->mchans = devm_kcalloc(&pdev->dev, sba->mchans_count,
+ sizeof(*sba->mchans), GFP_KERNEL);
+ if (!sba->mchans)
+ return -ENOMEM;
+
+ /* Request mailbox channels */
+ for (i = 0; i < mchans_count; i++) {
+ sba->mchans[i] = mbox_request_channel(&sba->client, i);
+ if (IS_ERR(sba->mchans[i])) {
+ ret = PTR_ERR(sba->mchans[i]);
+ goto fail_free_mchans;
+ }
+ sba->mchans_count++;
+ }
+
+ /* Find-out underlying mailbox device */
+ sba->mbox_dev = mbox_channel_device(sba->mchans[0]);
+ if (IS_ERR(sba->mbox_dev)) {
+ ret = PTR_ERR(sba->mbox_dev);
+ goto fail_free_mchans;
+ }
+
+ /* All mailbox channels should be of same ring manager device */
+ for (i = 1; i < mchans_count; i++) {
+ if (mbox_channel_device(sba->mchans[i]) != sba->mbox_dev) {
+ ret = -EINVAL;
+ goto fail_free_mchans;
+ }
+ }
+
+ /* Register DMA device with linux async framework */
+ ret = sba_async_register(sba);
+ if (ret)
+ goto fail_free_mchans;
+
+ /* Prealloc channel resource */
+ ret = sba_prealloc_channel_resources(sba);
+ if (ret)
+ goto fail_async_dev_unreg;
+
+ /* Print device info */
+ dev_info(sba->dev, "%s using SBAv%d and %d mailbox channels",
+ dma_chan_name(&sba->dma_chan), sba->ver+1,
+ sba->mchans_count);
+
+ return 0;
+
+fail_async_dev_unreg:
+ dma_async_device_unregister(&sba->dma_dev);
+fail_free_mchans:
+ for (i = 0; i < sba->mchans_count; i++)
+ mbox_free_channel(sba->mchans[i]);
+ return ret;
+}
+
+static int sba_remove(struct platform_device *pdev)
+{
+ int i;
+ struct sba_device *sba = platform_get_drvdata(pdev);
+
+ sba_freeup_channel_resources(sba);
+
+ dma_async_device_unregister(&sba->dma_dev);
+
+ for (i = 0; i < sba->mchans_count; i++)
+ mbox_free_channel(sba->mchans[i]);
+
+ return 0;
+}
+
+static const struct of_device_id sba_of_match[] = {
+ { .compatible = "brcm,iproc-sba", },
+ { .compatible = "brcm,iproc-sba-v2", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, sba_of_match);
+
+static struct platform_driver sba_driver = {
+ .probe = sba_probe,
+ .remove = sba_remove,
+ .driver = {
+ .name = "bcm-sba-raid",
+ .of_match_table = sba_of_match,
+ },
+};
+module_platform_driver(sba_driver);
+
+MODULE_DESCRIPTION("Broadcom SBA RAID driver");
+MODULE_AUTHOR("Anup Patel <[email protected]>");
+MODULE_LICENSE("GPL v2");
--
2.7.4

2017-02-02 06:01:14

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel <[email protected]> wrote:
> The DMAENGINE framework assumes that if PQ offload is supported by a
> DMA device then all 256 PQ coefficients are supported. This assumption
> does not hold anymore because we now have BCM-SBA-RAID offload engine
> which supports PQ offload with limited number of PQ coefficients.
>
> This patch extends async_tx APIs to handle DMA devices with support
> for fewer PQ coefficients.
>
> Signed-off-by: Anup Patel <[email protected]>
> Reviewed-by: Scott Branden <[email protected]>
> ---
> crypto/async_tx/async_pq.c | 3 +++
> crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
> include/linux/dmaengine.h | 19 +++++++++++++++++++
> include/linux/raid/pq.h | 3 +++
> 4 files changed, 35 insertions(+), 2 deletions(-)

So, I hate the way async_tx does these checks on each operation, and
it's ok for me to say that because it's my fault. Really it's md that
should be validating engine offload capabilities once at the beginning
of time. I'd rather we move in that direction than continue to pile
onto a bad design.

2017-02-03 11:00:05

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

On Thu, Feb 2, 2017 at 11:31 AM, Dan Williams <[email protected]> wrote:
>
> On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel <[email protected]> wrote:
> > The DMAENGINE framework assumes that if PQ offload is supported by a
> > DMA device then all 256 PQ coefficients are supported. This assumption
> > does not hold anymore because we now have BCM-SBA-RAID offload engine
> > which supports PQ offload with limited number of PQ coefficients.
> >
> > This patch extends async_tx APIs to handle DMA devices with support
> > for fewer PQ coefficients.
> >
> > Signed-off-by: Anup Patel <[email protected]>
> > Reviewed-by: Scott Branden <[email protected]>
> > ---
> > crypto/async_tx/async_pq.c | 3 +++
> > crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
> > include/linux/dmaengine.h | 19 +++++++++++++++++++
> > include/linux/raid/pq.h | 3 +++
> > 4 files changed, 35 insertions(+), 2 deletions(-)
>
> So, I hate the way async_tx does these checks on each operation, and
> it's ok for me to say that because it's my fault. Really it's md that
> should be validating engine offload capabilities once at the beginning
> of time. I'd rather we move in that direction than continue to pile
> onto a bad design.

Yes, indeed. All async_tx APIs have lot of checks and for high throughput
RAID offload engine these checks can add some overhead.

I think doing checks in Linux md would be certainly better but this would
mean lot of changes in Linux md as well as remove checks in async_tx.

Also, async_tx APIs should not find DMA channel on its own instead it
should rely on Linux md to provide DMA channel pointer as parameter.

It's better to do checks cleanup in async_tx as separate patchset and
keep this patchset simple.

Regards,
Anup

2017-02-03 12:05:42

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH 1/6] mailbox: Add new API mbox_channel_device() for clients

On Thu, Feb 2, 2017 at 10:17 AM, Anup Patel <[email protected]> wrote:
> The remote processor can have DMAENGINE capabilities and client
> can pass data to be processed via main memory. In such cases,
> the client will require DMAble memory for remote processor.
>
> This patch adds new API mbox_channel_device() which can be
> used by clients to get struct device pointer of underlying
> mailbox controller. This struct device pointer of mailbox
> controller can be used by clients to allocate DMAble memory
> for remote processor.
>
IIUC, DT already provides a way for what you need.

2017-02-03 18:42:11

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

On Fri, Feb 3, 2017 at 2:59 AM, Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/[email protected]> wrote:
>
>
> On Thu, Feb 2, 2017 at 11:31 AM, Dan Williams <[email protected]>
> wrote:
>>
>> On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/[email protected]>
>> wrote:
>> > The DMAENGINE framework assumes that if PQ offload is supported by a
>> > DMA device then all 256 PQ coefficients are supported. This assumption
>> > does not hold anymore because we now have BCM-SBA-RAID offload engine
>> > which supports PQ offload with limited number of PQ coefficients.
>> >
>> > This patch extends async_tx APIs to handle DMA devices with support
>> > for fewer PQ coefficients.
>> >
>> > Signed-off-by: Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/[email protected]>
>> > Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/[email protected]>
>> > ---
>> > crypto/async_tx/async_pq.c | 3 +++
>> > crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
>> > include/linux/dmaengine.h | 19 +++++++++++++++++++
>> > include/linux/raid/pq.h | 3 +++
>> > 4 files changed, 35 insertions(+), 2 deletions(-)
>>
>> So, I hate the way async_tx does these checks on each operation, and
>> it's ok for me to say that because it's my fault. Really it's md that
>> should be validating engine offload capabilities once at the beginning
>> of time. I'd rather we move in that direction than continue to pile
>> onto a bad design.
>
>
> Yes, indeed. All async_tx APIs have lot of checks and for high throughput
> RAID offload engine these checks can add some overhead.
>
> I think doing checks in Linux md would be certainly better but this would
> mean lot of changes in Linux md as well as remove checks in async_tx.
>
> Also, async_tx APIs should not find DMA channel on its own instead it
> should rely on Linux md to provide DMA channel pointer as parameter.
>
> It's better to do checks cleanup in async_tx as separate patchset and
> keep this patchset simple.

That's been the problem with async_tx being broken like this for
years. Once you get this "small / simple" patch upstream, that
arguably makes async_tx a little bit worse, there is no longer any
motivation to fix the underlying issues. If you care about the long
term health of raid offload and are enabling new hardware support you
should first tackle the known problems with it before adding new
features.

2017-02-05 06:06:57

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

On Thu, Feb 02, 2017 at 10:17:15AM +0530, Anup Patel wrote:
> +config BCM_SBA_RAID
> + tristate "Broadcom SBA RAID engine support"
> + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
> + select DMA_ENGINE
> + select DMA_ENGINE_RAID
> + select ASYNC_TX_ENABLE_CHANNEL_SWITCH
> + default ARCH_BCM_IPROC

whats with the funny alignement?

> +/* SBA command related defines */
> +#define SBA_TYPE_SHIFT 48
> +#define SBA_TYPE_MASK 0x3
> +#define SBA_TYPE_A 0x0
> +#define SBA_TYPE_B 0x2
> +#define SBA_TYPE_C 0x3
> +#define SBA_USER_DEF_SHIFT 32
> +#define SBA_USER_DEF_MASK 0xffff
> +#define SBA_R_MDATA_SHIFT 24
> +#define SBA_R_MDATA_MASK 0xff
> +#define SBA_C_MDATA_MS_SHIFT 18
> +#define SBA_C_MDATA_MS_MASK 0x3
> +#define SBA_INT_SHIFT 17
> +#define SBA_INT_MASK 0x1
> +#define SBA_RESP_SHIFT 16
> +#define SBA_RESP_MASK 0x1
> +#define SBA_C_MDATA_SHIFT 8
> +#define SBA_C_MDATA_MASK 0xff
> +#define SBA_CMD_SHIFT 0
> +#define SBA_CMD_MASK 0xf
> +#define SBA_CMD_ZERO_ALL_BUFFERS 0x8
> +#define SBA_CMD_LOAD_BUFFER 0x9
> +#define SBA_CMD_XOR 0xa
> +#define SBA_CMD_GALOIS_XOR 0xb
> +#define SBA_CMD_ZERO_BUFFER 0x4
> +#define SBA_CMD_WRITE_BUFFER 0xc

Try using BIT and GENMAST for hardware descriptions

> +
> +/* SBA C_MDATA helper macros */
> +#define SBA_C_MDATA_LOAD_VAL(__bnum0) ((__bnum0) & 0x3)
> +#define SBA_C_MDATA_WRITE_VAL(__bnum0) ((__bnum0) & 0x3)
> +#define SBA_C_MDATA_XOR_VAL(__bnum1, __bnum0) \
> + ({ u32 __v = ((__bnum0) & 0x3); \
> + __v |= ((__bnum1) & 0x3) << 2; \
> + __v; \
> + })
> +#define SBA_C_MDATA_PQ_VAL(__dnum, __bnum1, __bnum0) \
> + ({ u32 __v = ((__bnum0) & 0x3); \
> + __v |= ((__bnum1) & 0x3) << 2; \
> + __v |= ((__dnum) & 0x1f) << 5; \
> + __v; \
> + })

ah why are we usig complex macros, why can't these be simple functions..

> +#define SBA_C_MDATA_LS(__c_mdata_val) ((__c_mdata_val) & 0xff)
> +#define SBA_C_MDATA_MS(__c_mdata_val) (((__c_mdata_val) >> 8) & 0x3)
> +
> +/* Driver helper macros */
> +#define to_sba_request(tx) \
> + container_of(tx, struct sba_request, tx)
> +#define to_sba_device(dchan) \
> + container_of(dchan, struct sba_device, dma_chan)
> +
> +enum sba_request_state {
> + SBA_REQUEST_STATE_FREE = 1,
> + SBA_REQUEST_STATE_ALLOCED = 2,
> + SBA_REQUEST_STATE_PENDING = 3,
> + SBA_REQUEST_STATE_ACTIVE = 4,
> + SBA_REQUEST_STATE_COMPLETED = 5,
> + SBA_REQUEST_STATE_ABORTED = 6,

whats up with a very funny indentation setting, we use 8 chars.

Please re-read the Documentation/process/coding-style.rst

> +static int sba_alloc_chan_resources(struct dma_chan *dchan)
> +{
> + /*
> + * We only have one channel so we have pre-alloced
> + * channel resources. Over here we just return number
> + * of free request.
> + */
> + return sba_free_request_count(to_sba_device(dchan));
> +}

essentially you are not doing much, so you can skip it. Its an optional
call.

> +static void sba_free_chan_resources(struct dma_chan *dchan)
> +{
> + /*
> + * Channel resources are pre-alloced so we just free-up
> + * whatever we can so that we can re-use pre-alloced
> + * channel resources next time.
> + */
> + sba_cleanup_inflight_requests(to_sba_device(dchan));

well this one checks for pending requests as well, which shouldn't be there
when freeing a channel, something seems not quite right here..

> +static int sba_send_mbox_request(struct sba_device *sba,
> + struct sba_request *req)
> +{
> + int mchans_idx, ret = 0;
> +
> + /* Select mailbox channel in round-robin fashion */
> + mchans_idx = atomic_inc_return(&sba->mchans_current);
> + mchans_idx = mchans_idx % sba->mchans_count;
> +
> + /* Send batch message for the request */
> + req->bmsg.batch.msgs_queued = 0;
> + ret = mbox_send_message(sba->mchans[mchans_idx], &req->bmsg);
> + if (ret < 0) {
> + dev_info(sba->dev, "channel %d message %d (total %d)",
> + mchans_idx, req->bmsg.batch.msgs_queued,
> + req->bmsg.batch.msgs_count);

dev_err?

> + dev_err(sba->dev, "send message failed with error %d", ret);
> + return ret;
> + }
> + ret = req->bmsg.error;
> + if (ret < 0) {
> + dev_info(sba->dev,
> + "mbox channel %d message %d (total %d)",
> + mchans_idx, req->bmsg.batch.msgs_queued,
> + req->bmsg.batch.msgs_count);

same here

> +static dma_cookie_t sba_tx_submit(struct dma_async_tx_descriptor *tx)
> +{
> + unsigned long flags;
> + dma_cookie_t cookie;
> + struct sba_request *req;
> + struct sba_device *sba;
> +
> + if (unlikely(!tx))
> + return -EINVAL;
> +
> + sba = to_sba_device(tx->chan);
> + req = to_sba_request(tx);
> +
> + /* Assign cookie and mark request pending */
> + spin_lock_irqsave(&sba->reqs_lock, flags);
> + cookie = dma_cookie_assign(tx);
> + _sba_pending_request(sba, req);
> + spin_unlock_irqrestore(&sba->reqs_lock, flags);
> +
> + /* Try to submit pending request */
> + sba_issue_pending(&sba->dma_chan);

Nope, thats wrong, caller needs to call .issue_pending for that

> +static enum dma_status sba_tx_status(struct dma_chan *dchan,
> + dma_cookie_t cookie,
> + struct dma_tx_state *txstate)
> +{
> + int mchan_idx;
> + enum dma_status ret;
> + struct sba_device *sba = to_sba_device(dchan);
> +
> + ret = dma_cookie_status(dchan, cookie, txstate);
> + if (ret == DMA_COMPLETE)
> + return ret;
> +
> + for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
> + mbox_client_peek_data(sba->mchans[mchan_idx]);

what is this achieving?

> +static struct dma_async_tx_descriptor *
> +sba_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t src,
> + size_t len, unsigned long flags)
> +{
> + size_t msg_len;
> + dma_addr_t msg_offset = 0;
> + unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
> + struct sba_device *sba = to_sba_device(dchan);
> + struct sba_request *req = NULL;
> +
> + /* Sanity checks */
> + if (unlikely(len > sba->req_size))
> + return NULL;

why is that an error, you can create multiple txn of max length

> +static int sba_async_register(struct sba_device *sba)
> +{
> + int ret;
> + struct dma_device *dma_dev = &sba->dma_dev;
> +
> + /* Initialize DMA channel cookie */
> + sba->dma_chan.device = dma_dev;
> + dma_cookie_init(&sba->dma_chan);
> +
> + /* Initialize DMA device capability mask */
> + dma_cap_zero(dma_dev->cap_mask);
> + dma_cap_set(DMA_MEMCPY, dma_dev->cap_mask);
> + dma_cap_set(DMA_XOR, dma_dev->cap_mask);
> + dma_cap_set(DMA_PQ, dma_dev->cap_mask);
> +
> + /*
> + * Set mailbox channel device as the base device of
> + * our dma_device because the actual memory accesses
> + * will be done by mailbox controller
> + */
> + dma_dev->dev = sba->mbox_dev;
> +
> + /* Set base prep routines */
> + dma_dev->device_alloc_chan_resources = sba_alloc_chan_resources;
> + dma_dev->device_free_chan_resources = sba_free_chan_resources;
> + dma_dev->device_issue_pending = sba_issue_pending;
> + dma_dev->device_tx_status = sba_tx_status;

Please add terminate callback support, also add the capabilities, we need to
advertise that and use in clients

Also you can simplify bunch of code by using virt-chan support for managing
channels and descriptors

--
~Vinod

2017-02-06 03:55:26

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

On Sat, Feb 4, 2017 at 12:12 AM, Dan Williams <[email protected]> wrote:
> On Fri, Feb 3, 2017 at 2:59 AM, Anup Patel <[email protected]> wrote:
>>
>>
>> On Thu, Feb 2, 2017 at 11:31 AM, Dan Williams <[email protected]>
>> wrote:
>>>
>>> On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel <[email protected]>
>>> wrote:
>>> > The DMAENGINE framework assumes that if PQ offload is supported by a
>>> > DMA device then all 256 PQ coefficients are supported. This assumption
>>> > does not hold anymore because we now have BCM-SBA-RAID offload engine
>>> > which supports PQ offload with limited number of PQ coefficients.
>>> >
>>> > This patch extends async_tx APIs to handle DMA devices with support
>>> > for fewer PQ coefficients.
>>> >
>>> > Signed-off-by: Anup Patel <[email protected]>
>>> > Reviewed-by: Scott Branden <[email protected]>
>>> > ---
>>> > crypto/async_tx/async_pq.c | 3 +++
>>> > crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
>>> > include/linux/dmaengine.h | 19 +++++++++++++++++++
>>> > include/linux/raid/pq.h | 3 +++
>>> > 4 files changed, 35 insertions(+), 2 deletions(-)
>>>
>>> So, I hate the way async_tx does these checks on each operation, and
>>> it's ok for me to say that because it's my fault. Really it's md that
>>> should be validating engine offload capabilities once at the beginning
>>> of time. I'd rather we move in that direction than continue to pile
>>> onto a bad design.
>>
>>
>> Yes, indeed. All async_tx APIs have lot of checks and for high throughput
>> RAID offload engine these checks can add some overhead.
>>
>> I think doing checks in Linux md would be certainly better but this would
>> mean lot of changes in Linux md as well as remove checks in async_tx.
>>
>> Also, async_tx APIs should not find DMA channel on its own instead it
>> should rely on Linux md to provide DMA channel pointer as parameter.
>>
>> It's better to do checks cleanup in async_tx as separate patchset and
>> keep this patchset simple.
>
> That's been the problem with async_tx being broken like this for
> years. Once you get this "small / simple" patch upstream, that
> arguably makes async_tx a little bit worse, there is no longer any
> motivation to fix the underlying issues. If you care about the long
> term health of raid offload and are enabling new hardware support you
> should first tackle the known problems with it before adding new
> features.

Apart from the checks related issue you pointed there are other
issues with async_tx APIs such as:

1. The mechanism to do update PQ (or RAID6 update) operation
in current async_tx APIs is to call async_gen_syndrome() twice
with ASYNC_TX_PQ_XOR_DST flag set. Also, async_gen_syndrome()
will always prefer SW approach when ASYNC_TX_PQ_XOR_DST flag
is set. This means async_tx API is forcing SW approach for update
PQ operation and in-addition we require two async_gen_syndrome()
calls to achieve update PQ. This limitations of async_gen_syndrome()
reduces performance of async_tx APIs. Instead of this we should
have a dedicated async_update_pq() API which will allow RAID
offload engine drivers (such as BCM-FS4-RAID) to implement
update PQ using HW offload and this new API will fall-back to
SW approach using async_gen_syndrome() if no DMA channel
provides update PQ HW offload.

2. In our stress testing, we have observed that dma_map_page()
and dma_unmap_page() used in various async_tx APIs are the
major cause of overhead. If we directly call DMA channel callbacks
with pre-DMA-mapped pages then we get very high throughput.
The async_tx APIs should provide a way for pre-DMA-mapped
pages so that Linux MD can exploit this fact for better performance.

3. We really don't have a test module to stress/benchmark all
async_tx APIs using multi-threading and batching large number
of request in each thread. This kind of test module is very much
required for performance benchmarking and stressing high
throughput (hundreds of Gbps) RAID offload engines (such as
BCM-FS4-RAID).

>From the above, we already have async_tx_test module to
address point3. We also plan to address point1 above but
this would also require changes in Linux MD to use new
async_update_pq() API.

As you can see, this patchset is not end of story of us if we
want best possible utilization of BCM-FS4-RAID.

Regards,
Anup

2017-02-06 04:27:12

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 1/6] mailbox: Add new API mbox_channel_device() for clients

On Fri, Feb 3, 2017 at 5:35 PM, Jassi Brar <[email protected]> wrote:
> On Thu, Feb 2, 2017 at 10:17 AM, Anup Patel <[email protected]> wrote:
>> The remote processor can have DMAENGINE capabilities and client
>> can pass data to be processed via main memory. In such cases,
>> the client will require DMAble memory for remote processor.
>>
>> This patch adds new API mbox_channel_device() which can be
>> used by clients to get struct device pointer of underlying
>> mailbox controller. This struct device pointer of mailbox
>> controller can be used by clients to allocate DMAble memory
>> for remote processor.
>>
> IIUC, DT already provides a way for what you need.

Thanks for the suggestion. I will explore in this direction and
try to avoid this patch in next revision.

Can you please have a look at FlexRM driver which I
had submitted previously?
https://lkml.org/lkml/2017/1/5/291
https://lkml.org/lkml/2017/1/5/293

Regards,
Anup

2017-02-06 12:01:15

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

On Sun, Feb 5, 2017 at 11:36 AM, Vinod Koul <[email protected]> wrote:
> On Thu, Feb 02, 2017 at 10:17:15AM +0530, Anup Patel wrote:
>> +config BCM_SBA_RAID
>> + tristate "Broadcom SBA RAID engine support"
>> + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
>> + select DMA_ENGINE
>> + select DMA_ENGINE_RAID
>> + select ASYNC_TX_ENABLE_CHANNEL_SWITCH
>> + default ARCH_BCM_IPROC
>
> whats with the funny alignement?

Sure, I will use tabs here.

>
>> +/* SBA command related defines */
>> +#define SBA_TYPE_SHIFT 48
>> +#define SBA_TYPE_MASK 0x3
>> +#define SBA_TYPE_A 0x0
>> +#define SBA_TYPE_B 0x2
>> +#define SBA_TYPE_C 0x3
>> +#define SBA_USER_DEF_SHIFT 32
>> +#define SBA_USER_DEF_MASK 0xffff
>> +#define SBA_R_MDATA_SHIFT 24
>> +#define SBA_R_MDATA_MASK 0xff
>> +#define SBA_C_MDATA_MS_SHIFT 18
>> +#define SBA_C_MDATA_MS_MASK 0x3
>> +#define SBA_INT_SHIFT 17
>> +#define SBA_INT_MASK 0x1
>> +#define SBA_RESP_SHIFT 16
>> +#define SBA_RESP_MASK 0x1
>> +#define SBA_C_MDATA_SHIFT 8
>> +#define SBA_C_MDATA_MASK 0xff
>> +#define SBA_CMD_SHIFT 0
>> +#define SBA_CMD_MASK 0xf
>> +#define SBA_CMD_ZERO_ALL_BUFFERS 0x8
>> +#define SBA_CMD_LOAD_BUFFER 0x9
>> +#define SBA_CMD_XOR 0xa
>> +#define SBA_CMD_GALOIS_XOR 0xb
>> +#define SBA_CMD_ZERO_BUFFER 0x4
>> +#define SBA_CMD_WRITE_BUFFER 0xc
>
> Try using BIT and GENMAST for hardware descriptions

Sure, will do.

>
>> +
>> +/* SBA C_MDATA helper macros */
>> +#define SBA_C_MDATA_LOAD_VAL(__bnum0) ((__bnum0) & 0x3)
>> +#define SBA_C_MDATA_WRITE_VAL(__bnum0) ((__bnum0) & 0x3)
>> +#define SBA_C_MDATA_XOR_VAL(__bnum1, __bnum0) \
>> + ({ u32 __v = ((__bnum0) & 0x3); \
>> + __v |= ((__bnum1) & 0x3) << 2; \
>> + __v; \
>> + })
>> +#define SBA_C_MDATA_PQ_VAL(__dnum, __bnum1, __bnum0) \
>> + ({ u32 __v = ((__bnum0) & 0x3); \
>> + __v |= ((__bnum1) & 0x3) << 2; \
>> + __v |= ((__dnum) & 0x1f) << 5; \
>> + __v; \
>> + })
>
> ah why are we usig complex macros, why can't these be simple functions..

"static inline functions" seemed too complicated here because most of
these macros are two lines of c-code.

Do you still insist on using "static inline functions"?

>
>> +#define SBA_C_MDATA_LS(__c_mdata_val) ((__c_mdata_val) & 0xff)
>> +#define SBA_C_MDATA_MS(__c_mdata_val) (((__c_mdata_val) >> 8) & 0x3)
>> +
>> +/* Driver helper macros */
>> +#define to_sba_request(tx) \
>> + container_of(tx, struct sba_request, tx)
>> +#define to_sba_device(dchan) \
>> + container_of(dchan, struct sba_device, dma_chan)
>> +
>> +enum sba_request_state {
>> + SBA_REQUEST_STATE_FREE = 1,
>> + SBA_REQUEST_STATE_ALLOCED = 2,
>> + SBA_REQUEST_STATE_PENDING = 3,
>> + SBA_REQUEST_STATE_ACTIVE = 4,
>> + SBA_REQUEST_STATE_COMPLETED = 5,
>> + SBA_REQUEST_STATE_ABORTED = 6,
>
> whats up with a very funny indentation setting, we use 8 chars.
>
> Please re-read the Documentation/process/coding-style.rst

I have double checked this enum. The indentation is fine
and as-per coding style. Am I missing anything else?

>
>> +static int sba_alloc_chan_resources(struct dma_chan *dchan)
>> +{
>> + /*
>> + * We only have one channel so we have pre-alloced
>> + * channel resources. Over here we just return number
>> + * of free request.
>> + */
>> + return sba_free_request_count(to_sba_device(dchan));
>> +}
>
> essentially you are not doing much, so you can skip it. Its an optional
> call.

Sure, will do.

>
>> +static void sba_free_chan_resources(struct dma_chan *dchan)
>> +{
>> + /*
>> + * Channel resources are pre-alloced so we just free-up
>> + * whatever we can so that we can re-use pre-alloced
>> + * channel resources next time.
>> + */
>> + sba_cleanup_inflight_requests(to_sba_device(dchan));
>
> well this one checks for pending requests as well, which shouldn't be there
> when freeing a channel, something seems not quite right here..
>
>> +static int sba_send_mbox_request(struct sba_device *sba,
>> + struct sba_request *req)
>> +{
>> + int mchans_idx, ret = 0;
>> +
>> + /* Select mailbox channel in round-robin fashion */
>> + mchans_idx = atomic_inc_return(&sba->mchans_current);
>> + mchans_idx = mchans_idx % sba->mchans_count;
>> +
>> + /* Send batch message for the request */
>> + req->bmsg.batch.msgs_queued = 0;
>> + ret = mbox_send_message(sba->mchans[mchans_idx], &req->bmsg);
>> + if (ret < 0) {
>> + dev_info(sba->dev, "channel %d message %d (total %d)",
>> + mchans_idx, req->bmsg.batch.msgs_queued,
>> + req->bmsg.batch.msgs_count);
>
> dev_err?

Sure, will use dev_err.

>
>> + dev_err(sba->dev, "send message failed with error %d", ret);
>> + return ret;
>> + }
>> + ret = req->bmsg.error;
>> + if (ret < 0) {
>> + dev_info(sba->dev,
>> + "mbox channel %d message %d (total %d)",
>> + mchans_idx, req->bmsg.batch.msgs_queued,
>> + req->bmsg.batch.msgs_count);
>
> same here

OK.

>
>> +static dma_cookie_t sba_tx_submit(struct dma_async_tx_descriptor *tx)
>> +{
>> + unsigned long flags;
>> + dma_cookie_t cookie;
>> + struct sba_request *req;
>> + struct sba_device *sba;
>> +
>> + if (unlikely(!tx))
>> + return -EINVAL;
>> +
>> + sba = to_sba_device(tx->chan);
>> + req = to_sba_request(tx);
>> +
>> + /* Assign cookie and mark request pending */
>> + spin_lock_irqsave(&sba->reqs_lock, flags);
>> + cookie = dma_cookie_assign(tx);
>> + _sba_pending_request(sba, req);
>> + spin_unlock_irqrestore(&sba->reqs_lock, flags);
>> +
>> + /* Try to submit pending request */
>> + sba_issue_pending(&sba->dma_chan);
>
> Nope, thats wrong, caller needs to call .issue_pending for that

This was giving minor performance improvement but I will
remove this since its against API usage.

>
>> +static enum dma_status sba_tx_status(struct dma_chan *dchan,
>> + dma_cookie_t cookie,
>> + struct dma_tx_state *txstate)
>> +{
>> + int mchan_idx;
>> + enum dma_status ret;
>> + struct sba_device *sba = to_sba_device(dchan);
>> +
>> + ret = dma_cookie_status(dchan, cookie, txstate);
>> + if (ret == DMA_COMPLETE)
>> + return ret;
>> +
>> + for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
>> + mbox_client_peek_data(sba->mchans[mchan_idx]);
>
> what is this achieving?

The mbox_client_peek_data() is a hint to mailbox controller driver
to check for available messages.

This gives good performance improvement when some DMA client
code is polling using tx_status() callback.

>
>> +static struct dma_async_tx_descriptor *
>> +sba_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t src,
>> + size_t len, unsigned long flags)
>> +{
>> + size_t msg_len;
>> + dma_addr_t msg_offset = 0;
>> + unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
>> + struct sba_device *sba = to_sba_device(dchan);
>> + struct sba_request *req = NULL;
>> +
>> + /* Sanity checks */
>> + if (unlikely(len > sba->req_size))
>> + return NULL;
>
> why is that an error, you can create multiple txn of max length

Sure, I will extend driver to create multiple txn when
"len > req->size"

>
>> +static int sba_async_register(struct sba_device *sba)
>> +{
>> + int ret;
>> + struct dma_device *dma_dev = &sba->dma_dev;
>> +
>> + /* Initialize DMA channel cookie */
>> + sba->dma_chan.device = dma_dev;
>> + dma_cookie_init(&sba->dma_chan);
>> +
>> + /* Initialize DMA device capability mask */
>> + dma_cap_zero(dma_dev->cap_mask);
>> + dma_cap_set(DMA_MEMCPY, dma_dev->cap_mask);
>> + dma_cap_set(DMA_XOR, dma_dev->cap_mask);
>> + dma_cap_set(DMA_PQ, dma_dev->cap_mask);
>> +
>> + /*
>> + * Set mailbox channel device as the base device of
>> + * our dma_device because the actual memory accesses
>> + * will be done by mailbox controller
>> + */
>> + dma_dev->dev = sba->mbox_dev;
>> +
>> + /* Set base prep routines */
>> + dma_dev->device_alloc_chan_resources = sba_alloc_chan_resources;
>> + dma_dev->device_free_chan_resources = sba_free_chan_resources;
>> + dma_dev->device_issue_pending = sba_issue_pending;
>> + dma_dev->device_tx_status = sba_tx_status;
>
> Please add terminate callback support, also add the capabilities, we need to
> advertise that and use in clients

OK, I will add terminate callback.

>
> Also you can simplify bunch of code by using virt-chan support for managing
> channels and descriptors

OK, I will surely explore virt-chan.

Regards,
Anup

2017-02-06 16:54:00

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

On Mon, Feb 06, 2017 at 05:31:15PM +0530, Anup Patel wrote:

> >> +
> >> +/* SBA C_MDATA helper macros */
> >> +#define SBA_C_MDATA_LOAD_VAL(__bnum0) ((__bnum0) & 0x3)
> >> +#define SBA_C_MDATA_WRITE_VAL(__bnum0) ((__bnum0) & 0x3)
> >> +#define SBA_C_MDATA_XOR_VAL(__bnum1, __bnum0) \
> >> + ({ u32 __v = ((__bnum0) & 0x3); \
> >> + __v |= ((__bnum1) & 0x3) << 2; \
> >> + __v; \
> >> + })
> >> +#define SBA_C_MDATA_PQ_VAL(__dnum, __bnum1, __bnum0) \
> >> + ({ u32 __v = ((__bnum0) & 0x3); \
> >> + __v |= ((__bnum1) & 0x3) << 2; \
> >> + __v |= ((__dnum) & 0x1f) << 5; \
> >> + __v; \
> >> + })
> >
> > ah why are we usig complex macros, why can't these be simple functions..
>
> "static inline functions" seemed too complicated here because most of
> these macros are two lines of c-code.

and thats where I have an issue with this. Macros for simple things is fine
but not for couple of line of logic!

>
> Do you still insist on using "static inline functions"?

Yes

>
> >
> >> +#define SBA_C_MDATA_LS(__c_mdata_val) ((__c_mdata_val) & 0xff)
> >> +#define SBA_C_MDATA_MS(__c_mdata_val) (((__c_mdata_val) >> 8) & 0x3)
> >> +
> >> +/* Driver helper macros */
> >> +#define to_sba_request(tx) \
> >> + container_of(tx, struct sba_request, tx)
> >> +#define to_sba_device(dchan) \
> >> + container_of(dchan, struct sba_device, dma_chan)
> >> +
> >> +enum sba_request_state {
> >> + SBA_REQUEST_STATE_FREE = 1,
> >> + SBA_REQUEST_STATE_ALLOCED = 2,
> >> + SBA_REQUEST_STATE_PENDING = 3,
> >> + SBA_REQUEST_STATE_ACTIVE = 4,
> >> + SBA_REQUEST_STATE_COMPLETED = 5,
> >> + SBA_REQUEST_STATE_ABORTED = 6,
> >
> > whats up with a very funny indentation setting, we use 8 chars.
> >
> > Please re-read the Documentation/process/coding-style.rst
>
> I have double checked this enum. The indentation is fine
> and as-per coding style. Am I missing anything else?

Somehow the initial indent doesnt seem to be 8 chars to me.

> >> +static enum dma_status sba_tx_status(struct dma_chan *dchan,
> >> + dma_cookie_t cookie,
> >> + struct dma_tx_state *txstate)
> >> +{
> >> + int mchan_idx;
> >> + enum dma_status ret;
> >> + struct sba_device *sba = to_sba_device(dchan);
> >> +
> >> + ret = dma_cookie_status(dchan, cookie, txstate);
> >> + if (ret == DMA_COMPLETE)
> >> + return ret;
> >> +
> >> + for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
> >> + mbox_client_peek_data(sba->mchans[mchan_idx]);
> >
> > what is this achieving?
>
> The mbox_client_peek_data() is a hint to mailbox controller driver
> to check for available messages.
>
> This gives good performance improvement when some DMA client
> code is polling using tx_status() callback.

Then why do it before and then check status.

--
~Vinod

2017-02-07 06:02:23

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

On Mon, Feb 6, 2017 at 10:24 PM, Vinod Koul <[email protected]> wrote:
> On Mon, Feb 06, 2017 at 05:31:15PM +0530, Anup Patel wrote:
>
>> >> +
>> >> +/* SBA C_MDATA helper macros */
>> >> +#define SBA_C_MDATA_LOAD_VAL(__bnum0) ((__bnum0) & 0x3)
>> >> +#define SBA_C_MDATA_WRITE_VAL(__bnum0) ((__bnum0) & 0x3)
>> >> +#define SBA_C_MDATA_XOR_VAL(__bnum1, __bnum0) \
>> >> + ({ u32 __v = ((__bnum0) & 0x3); \
>> >> + __v |= ((__bnum1) & 0x3) << 2; \
>> >> + __v; \
>> >> + })
>> >> +#define SBA_C_MDATA_PQ_VAL(__dnum, __bnum1, __bnum0) \
>> >> + ({ u32 __v = ((__bnum0) & 0x3); \
>> >> + __v |= ((__bnum1) & 0x3) << 2; \
>> >> + __v |= ((__dnum) & 0x1f) << 5; \
>> >> + __v; \
>> >> + })
>> >
>> > ah why are we usig complex macros, why can't these be simple functions..
>>
>> "static inline functions" seemed too complicated here because most of
>> these macros are two lines of c-code.
>
> and thats where I have an issue with this. Macros for simple things is fine
> but not for couple of line of logic!
>
>>
>> Do you still insist on using "static inline functions"?
>
> Yes

Sure, will use "static inline functions" instead these macros.

>
>>
>> >
>> >> +#define SBA_C_MDATA_LS(__c_mdata_val) ((__c_mdata_val) & 0xff)
>> >> +#define SBA_C_MDATA_MS(__c_mdata_val) (((__c_mdata_val) >> 8) & 0x3)
>> >> +
>> >> +/* Driver helper macros */
>> >> +#define to_sba_request(tx) \
>> >> + container_of(tx, struct sba_request, tx)
>> >> +#define to_sba_device(dchan) \
>> >> + container_of(dchan, struct sba_device, dma_chan)
>> >> +
>> >> +enum sba_request_state {
>> >> + SBA_REQUEST_STATE_FREE = 1,
>> >> + SBA_REQUEST_STATE_ALLOCED = 2,
>> >> + SBA_REQUEST_STATE_PENDING = 3,
>> >> + SBA_REQUEST_STATE_ACTIVE = 4,
>> >> + SBA_REQUEST_STATE_COMPLETED = 5,
>> >> + SBA_REQUEST_STATE_ABORTED = 6,
>> >
>> > whats up with a very funny indentation setting, we use 8 chars.
>> >
>> > Please re-read the Documentation/process/coding-style.rst
>>
>> I have double checked this enum. The indentation is fine
>> and as-per coding style. Am I missing anything else?
>
> Somehow the initial indent doesnt seem to be 8 chars to me.
>
>> >> +static enum dma_status sba_tx_status(struct dma_chan *dchan,
>> >> + dma_cookie_t cookie,
>> >> + struct dma_tx_state *txstate)
>> >> +{
>> >> + int mchan_idx;
>> >> + enum dma_status ret;
>> >> + struct sba_device *sba = to_sba_device(dchan);
>> >> +
>> >> + ret = dma_cookie_status(dchan, cookie, txstate);
>> >> + if (ret == DMA_COMPLETE)
>> >> + return ret;
>> >> +
>> >> + for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
>> >> + mbox_client_peek_data(sba->mchans[mchan_idx]);
>> >
>> > what is this achieving?
>>
>> The mbox_client_peek_data() is a hint to mailbox controller driver
>> to check for available messages.
>>
>> This gives good performance improvement when some DMA client
>> code is polling using tx_status() callback.
>
> Then why do it before and then check status.

If there was a work completed when mbox_client_peek_data()
is called then sba_receive_message() will be called immediately
by mailbox controller driver.

We are doing dma_cookie_complete() in sba_receive_message()
so if mbox_client_peek_data() is called before dma_cookie_status()
then dma_cookie_status() will see correct state of cookie.

Also, I explored virt-dma APIs for BCM-SBA-RAID driver. The virt-dma
implements tasklet based bottom-half for each virt-dma-channel. This
bottom-half is not required for BCM-FS4-RAID driver because its a
mailbox client driver and the mailbox controller driver already implements
bottom-half for each mailbox channel.

If we still go ahead and use virt-dma in BCM-FS4-RAID driver then we
will have two bottom-halfs in-action one in mailbox controller driver and
another in BCM-FS4-RAID driver which in-turn will add bottom-half
scheduling overhead thereby reducing performance of BCM-FS4-RAID
driver.

Regards,
Anup