2015-05-12 15:37:35

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

Hi,

This serie refactors the mv_xor in order to support the latest Armada
38x features, including the PQ support in order to offload the RAID6
PQ operations.

Not all the PQ operations are supported by the XOR engine, so we had
to introduce new async_tx flags in the process to identify
un-supported operations.

Please note that this is currently not usable because of a possible
regression in the RAID stack in 4.1 that is being discussed at the
moment here: https://lkml.org/lkml/2015/5/7/527

Let me know what you think,
Maxime

Lior Amsalem (7):
dmaengine: mv_xor: add support for a38x command in descriptor mode
dmaengine: mv_xor: Enlarge descriptor pool size
dmaengine: mv_xor: improve descriptors list handling and reduce
locking
dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
async_tx: adding mult and sum_product flags
dmaengine: mv_xor: add support for a38x RAID6 support
ARM: mvebu: a38x: Enable A38x XOR engine features

Maxime Ripard (1):
dmaengine: mv_xor: Rename function for consistent naming

Documentation/devicetree/bindings/dma/mv-xor.txt | 2 +-
arch/arm/boot/dts/armada-38x.dtsi | 20 +-
crypto/async_tx/async_raid6_recov.c | 4 +-
drivers/dma/mv_xor.c | 459 +++++++++++++++--------
drivers/dma/mv_xor.h | 32 +-
include/linux/dmaengine.h | 4 +
6 files changed, 326 insertions(+), 195 deletions(-)

--
2.4.0


2015-05-12 15:40:04

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 1/8] dmaengine: mv_xor: Rename function for consistent naming

The current function names isn't very consistent, and functions with the
same prefix might operate on either a channel or a descriptor, which is
kind of confusing.

Rename these functions to have a consistent and clearer naming scheme.

Signed-off-by: Maxime Ripard <[email protected]>
---
drivers/dma/mv_xor.c | 83 ++++++++++++++++++++++++++--------------------------
1 file changed, 42 insertions(+), 41 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1c56001df676..1affafa888c5 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -104,7 +104,7 @@ static u32 mv_chan_get_intr_cause(struct mv_xor_chan *chan)
return intr_cause;
}

-static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
+static void mv_chan_clear_eoc_cause(struct mv_xor_chan *chan)
{
u32 val;

@@ -114,14 +114,14 @@ static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
writel_relaxed(val, XOR_INTR_CAUSE(chan));
}

-static void mv_xor_device_clear_err_status(struct mv_xor_chan *chan)
+static void mv_chan_clear_err_status(struct mv_xor_chan *chan)
{
u32 val = 0xFFFF0000 >> (chan->idx * 16);
writel_relaxed(val, XOR_INTR_CAUSE(chan));
}

-static void mv_set_mode(struct mv_xor_chan *chan,
- enum dma_transaction_type type)
+static void mv_chan_set_mode(struct mv_xor_chan *chan,
+ enum dma_transaction_type type)
{
u32 op_mode;
u32 config = readl_relaxed(XOR_CONFIG(chan));
@@ -172,12 +172,12 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
}

/**
- * mv_xor_free_slots - flags descriptor slots for reuse
+ * mv_chan_free_slots - flags descriptor slots for reuse
* @slot: Slot to free
* Caller must hold &mv_chan->lock while calling this function
*/
-static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
- struct mv_xor_desc_slot *slot)
+static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
+ struct mv_xor_desc_slot *slot)
{
dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
__func__, __LINE__, slot);
@@ -187,12 +187,12 @@ static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
}

/*
- * mv_xor_start_new_chain - program the engine to operate on new chain headed by
- * sw_desc
+ * mv_chan_start_new_chain - program the engine to operate on new
+ * chain headed by sw_desc
* Caller must hold &mv_chan->lock while calling this function
*/
-static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
- struct mv_xor_desc_slot *sw_desc)
+static void mv_chan_start_new_chain(struct mv_xor_chan *mv_chan,
+ struct mv_xor_desc_slot *sw_desc)
{
dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: sw_desc %p\n",
__func__, __LINE__, sw_desc);
@@ -205,8 +205,9 @@ static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
}

static dma_cookie_t
-mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
- struct mv_xor_chan *mv_chan, dma_cookie_t cookie)
+mv_desc_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
+ struct mv_xor_chan *mv_chan,
+ dma_cookie_t cookie)
{
BUG_ON(desc->async_tx.cookie < 0);

@@ -230,7 +231,7 @@ mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
}

static int
-mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)
+mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)
{
struct mv_xor_desc_slot *iter, *_iter;

@@ -240,15 +241,15 @@ mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)

if (async_tx_test_ack(&iter->async_tx)) {
list_del(&iter->completed_node);
- mv_xor_free_slots(mv_chan, iter);
+ mv_chan_free_slots(mv_chan, iter);
}
}
return 0;
}

static int
-mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
- struct mv_xor_chan *mv_chan)
+mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
+ struct mv_xor_chan *mv_chan)
{
dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
__func__, __LINE__, desc, desc->async_tx.flags);
@@ -262,12 +263,12 @@ mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
return 0;
}

- mv_xor_free_slots(mv_chan, desc);
+ mv_chan_free_slots(mv_chan, desc);
return 0;
}

/* This function must be called with the mv_xor_chan spinlock held */
-static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
+static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
{
struct mv_xor_desc_slot *iter, *_iter;
dma_cookie_t cookie = 0;
@@ -277,7 +278,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)

dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
- mv_xor_clean_completed_slots(mv_chan);
+ mv_chan_clean_completed_slots(mv_chan);

/* free completed slots from the chain starting with
* the oldest descriptor
@@ -304,9 +305,9 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
break;
}

- cookie = mv_xor_run_tx_complete_actions(iter, mv_chan, cookie);
+ cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);

- if (mv_xor_clean_slot(iter, mv_chan))
+ if (mv_desc_clean_slot(iter, mv_chan))
break;
}

@@ -316,7 +317,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
struct mv_xor_desc_slot,
chain_node);

- mv_xor_start_new_chain(mv_chan, chain_head);
+ mv_chan_start_new_chain(mv_chan, chain_head);
}

if (cookie > 0)
@@ -328,12 +329,12 @@ static void mv_xor_tasklet(unsigned long data)
struct mv_xor_chan *chan = (struct mv_xor_chan *) data;

spin_lock_bh(&chan->lock);
- mv_xor_slot_cleanup(chan);
+ mv_chan_slot_cleanup(chan);
spin_unlock_bh(&chan->lock);
}

static struct mv_xor_desc_slot *
-mv_xor_alloc_slot(struct mv_xor_chan *mv_chan)
+mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
{
struct mv_xor_desc_slot *iter, *_iter;
int retry = 0;
@@ -431,7 +432,7 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
}

if (new_hw_chain)
- mv_xor_start_new_chain(mv_chan, sw_desc);
+ mv_chan_start_new_chain(mv_chan, sw_desc);

spin_unlock_bh(&mv_chan->lock);

@@ -504,7 +505,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
__func__, src_cnt, len, &dest, flags);

spin_lock_bh(&mv_chan->lock);
- sw_desc = mv_xor_alloc_slot(mv_chan);
+ sw_desc = mv_chan_alloc_slot(mv_chan);
if (sw_desc) {
sw_desc->type = DMA_XOR;
sw_desc->async_tx.flags = flags;
@@ -556,7 +557,7 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)

spin_lock_bh(&mv_chan->lock);

- mv_xor_slot_cleanup(mv_chan);
+ mv_chan_slot_cleanup(mv_chan);

list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
chain_node) {
@@ -603,13 +604,13 @@ static enum dma_status mv_xor_status(struct dma_chan *chan,
return ret;

spin_lock_bh(&mv_chan->lock);
- mv_xor_slot_cleanup(mv_chan);
+ mv_chan_slot_cleanup(mv_chan);
spin_unlock_bh(&mv_chan->lock);

return dma_cookie_status(chan, cookie, txstate);
}

-static void mv_dump_xor_regs(struct mv_xor_chan *chan)
+static void mv_chan_dump_regs(struct mv_xor_chan *chan)
{
u32 val;

@@ -632,8 +633,8 @@ static void mv_dump_xor_regs(struct mv_xor_chan *chan)
dev_err(mv_chan_to_devp(chan), "error addr 0x%08x\n", val);
}

-static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
- u32 intr_cause)
+static void mv_chan_err_interrupt_handler(struct mv_xor_chan *chan,
+ u32 intr_cause)
{
if (intr_cause & XOR_INT_ERR_DECODE) {
dev_dbg(mv_chan_to_devp(chan), "ignoring address decode error\n");
@@ -643,7 +644,7 @@ static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
dev_err(mv_chan_to_devp(chan), "error on chan %d. intr cause 0x%08x\n",
chan->idx, intr_cause);

- mv_dump_xor_regs(chan);
+ mv_chan_dump_regs(chan);
WARN_ON(1);
}

@@ -655,11 +656,11 @@ static irqreturn_t mv_xor_interrupt_handler(int irq, void *data)
dev_dbg(mv_chan_to_devp(chan), "intr cause %x\n", intr_cause);

if (intr_cause & XOR_INTR_ERRORS)
- mv_xor_err_interrupt_handler(chan, intr_cause);
+ mv_chan_err_interrupt_handler(chan, intr_cause);

tasklet_schedule(&chan->irq_tasklet);

- mv_xor_device_clear_eoc_cause(chan);
+ mv_chan_clear_eoc_cause(chan);

return IRQ_HANDLED;
}
@@ -678,7 +679,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan)
* Perform a transaction to verify the HW works.
*/

-static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan)
+static int mv_chan_memcpy_self_test(struct mv_xor_chan *mv_chan)
{
int i, ret;
void *src, *dest;
@@ -787,7 +788,7 @@ out:

#define MV_XOR_NUM_SRC_TEST 4 /* must be <= 15 */
static int
-mv_xor_xor_self_test(struct mv_xor_chan *mv_chan)
+mv_chan_xor_self_test(struct mv_xor_chan *mv_chan)
{
int i, src_idx, ret;
struct page *dest;
@@ -1014,7 +1015,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
mv_chan);

/* clear errors before enabling interrupts */
- mv_xor_device_clear_err_status(mv_chan);
+ mv_chan_clear_err_status(mv_chan);

ret = request_irq(mv_chan->irq, mv_xor_interrupt_handler,
0, dev_name(&pdev->dev), mv_chan);
@@ -1023,7 +1024,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,

mv_chan_unmask_interrupts(mv_chan);

- mv_set_mode(mv_chan, DMA_XOR);
+ mv_chan_set_mode(mv_chan, DMA_XOR);

spin_lock_init(&mv_chan->lock);
INIT_LIST_HEAD(&mv_chan->chain);
@@ -1035,14 +1036,14 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
list_add_tail(&mv_chan->dmachan.device_node, &dma_dev->channels);

if (dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask)) {
- ret = mv_xor_memcpy_self_test(mv_chan);
+ ret = mv_chan_memcpy_self_test(mv_chan);
dev_dbg(&pdev->dev, "memcpy self test returned %d\n", ret);
if (ret)
goto err_free_irq;
}

if (dma_has_cap(DMA_XOR, dma_dev->cap_mask)) {
- ret = mv_xor_xor_self_test(mv_chan);
+ ret = mv_chan_xor_self_test(mv_chan);
dev_dbg(&pdev->dev, "xor self test returned %d\n", ret);
if (ret)
goto err_free_irq;
--
2.4.0

2015-05-12 15:37:37

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

From: Lior Amsalem <[email protected]>

The Marvell Armada 38x SoC introduce new features to the XOR engine,
especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
the descriptor and not set through the controller registers.

This new feature allows mixing of different commands (even PQ) on the same
channel/chain without the need to stop the engine to reconfigure the engine
mode.

Refactor the driver to be able to use that new feature on the Armada 38x,
while keeping the old behaviour on the older SoCs.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
Documentation/devicetree/bindings/dma/mv-xor.txt | 2 +-
drivers/dma/mv_xor.c | 83 ++++++++++++++++++++----
drivers/dma/mv_xor.h | 7 ++
3 files changed, 78 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/mv-xor.txt b/Documentation/devicetree/bindings/dma/mv-xor.txt
index 7c6cb7fcecd2..5b10f9ee0937 100644
--- a/Documentation/devicetree/bindings/dma/mv-xor.txt
+++ b/Documentation/devicetree/bindings/dma/mv-xor.txt
@@ -1,7 +1,7 @@
* Marvell XOR engines

Required properties:
-- compatible: Should be "marvell,orion-xor"
+- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
- reg: Should contain registers location and length (two sets)
the first set is the low registers, the second set the high
registers for the XOR engine.
diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1affafa888c5..bcc9aec0ce02 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -19,6 +19,7 @@
#include <linux/dma-mapping.h>
#include <linux/spinlock.h>
#include <linux/interrupt.h>
+#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/memory.h>
#include <linux/clk.h>
@@ -30,6 +31,11 @@
#include "dmaengine.h"
#include "mv_xor.h"

+enum mv_xor_mode {
+ XOR_MODE_IN_REG,
+ XOR_MODE_IN_DESC,
+};
+
static void mv_xor_issue_pending(struct dma_chan *chan);

#define to_mv_xor_chan(chan) \
@@ -56,6 +62,24 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
hw_desc->byte_count = byte_count;
}

+static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
+{
+ struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+ switch (desc->type) {
+ case DMA_XOR:
+ case DMA_INTERRUPT:
+ hw_desc->desc_command |= XOR_DESC_OPERATION_XOR;
+ break;
+ case DMA_MEMCPY:
+ hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
+ break;
+ default:
+ BUG();
+ return;
+ }
+}
+
static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
u32 next_desc_addr)
{
@@ -154,6 +178,25 @@ static void mv_chan_set_mode(struct mv_xor_chan *chan,
chan->current_type = type;
}

+static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
+{
+ u32 op_mode;
+ u32 config = readl_relaxed(XOR_CONFIG(chan));
+
+ op_mode = XOR_OPERATION_MODE_IN_DESC;
+
+ config &= ~0x7;
+ config |= op_mode;
+
+#if defined(__BIG_ENDIAN)
+ config |= XOR_DESCRIPTOR_SWAP;
+#else
+ config &= ~XOR_DESCRIPTOR_SWAP;
+#endif
+
+ writel_relaxed(config, XOR_CONFIG(chan));
+}
+
static void mv_chan_activate(struct mv_xor_chan *chan)
{
dev_dbg(mv_chan_to_devp(chan), " activate chan.\n");
@@ -510,6 +553,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
sw_desc->type = DMA_XOR;
sw_desc->async_tx.flags = flags;
mv_desc_init(sw_desc, dest, len, flags);
+ if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+ mv_desc_set_mode(sw_desc);
while (src_cnt--)
mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
}
@@ -952,7 +997,7 @@ static int mv_xor_channel_remove(struct mv_xor_chan *mv_chan)
static struct mv_xor_chan *
mv_xor_channel_add(struct mv_xor_device *xordev,
struct platform_device *pdev,
- int idx, dma_cap_mask_t cap_mask, int irq)
+ int idx, dma_cap_mask_t cap_mask, int irq, int op_in_desc)
{
int ret = 0;
struct mv_xor_chan *mv_chan;
@@ -964,6 +1009,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,

mv_chan->idx = idx;
mv_chan->irq = irq;
+ mv_chan->op_in_desc = op_in_desc;

dma_dev = &mv_chan->dmadev;

@@ -1024,7 +1070,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,

mv_chan_unmask_interrupts(mv_chan);

- mv_chan_set_mode(mv_chan, DMA_XOR);
+ if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+ mv_chan_set_mode_to_desc(mv_chan);
+ else
+ mv_chan_set_mode(mv_chan, DMA_XOR);

spin_lock_init(&mv_chan->lock);
INIT_LIST_HEAD(&mv_chan->chain);
@@ -1049,7 +1098,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
goto err_free_irq;
}

- dev_info(&pdev->dev, "Marvell XOR: ( %s%s%s)\n",
+ dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+ mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
@@ -1098,6 +1148,15 @@ mv_xor_conf_mbus_windows(struct mv_xor_device *xordev,
writel(0, base + WINDOW_OVERRIDE_CTRL(1));
}

+#ifdef CONFIG_OF
+static const struct of_device_id mv_xor_dt_ids[] = {
+ { .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
+ { .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
+ {},
+};
+MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
+#endif
+
static int mv_xor_probe(struct platform_device *pdev)
{
const struct mbus_dram_target_info *dram;
@@ -1105,6 +1164,7 @@ static int mv_xor_probe(struct platform_device *pdev)
struct mv_xor_platform_data *pdata = dev_get_platdata(&pdev->dev);
struct resource *res;
int i, ret;
+ int op_in_desc;

dev_notice(&pdev->dev, "Marvell shared XOR driver\n");

@@ -1149,11 +1209,15 @@ static int mv_xor_probe(struct platform_device *pdev)
if (pdev->dev.of_node) {
struct device_node *np;
int i = 0;
+ const struct of_device_id *of_id =
+ of_match_device(of_match_ptr(mv_xor_dt_ids),
+ &pdev->dev);

for_each_child_of_node(pdev->dev.of_node, np) {
struct mv_xor_chan *chan;
dma_cap_mask_t cap_mask;
int irq;
+ op_in_desc = (int)of_id->data;

dma_cap_zero(cap_mask);
if (of_property_read_bool(np, "dmacap,memcpy"))
@@ -1170,7 +1234,7 @@ static int mv_xor_probe(struct platform_device *pdev)
}

chan = mv_xor_channel_add(xordev, pdev, i,
- cap_mask, irq);
+ cap_mask, irq, op_in_desc);
if (IS_ERR(chan)) {
ret = PTR_ERR(chan);
irq_dispose_mapping(irq);
@@ -1199,7 +1263,8 @@ static int mv_xor_probe(struct platform_device *pdev)
}

chan = mv_xor_channel_add(xordev, pdev, i,
- cd->cap_mask, irq);
+ cd->cap_mask, irq,
+ XOR_MODE_IN_REG);
if (IS_ERR(chan)) {
ret = PTR_ERR(chan);
goto err_channel_add;
@@ -1245,14 +1310,6 @@ static int mv_xor_remove(struct platform_device *pdev)
return 0;
}

-#ifdef CONFIG_OF
-static const struct of_device_id mv_xor_dt_ids[] = {
- { .compatible = "marvell,orion-xor", },
- {},
-};
-MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
-#endif
-
static struct platform_driver mv_xor_driver = {
.probe = mv_xor_probe,
.remove = mv_xor_remove,
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 91958dba39a2..89a8c85539a1 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -30,8 +30,13 @@
/* Values for the XOR_CONFIG register */
#define XOR_OPERATION_MODE_XOR 0
#define XOR_OPERATION_MODE_MEMCPY 2
+#define XOR_OPERATION_MODE_IN_DESC 7
#define XOR_DESCRIPTOR_SWAP BIT(14)

+#define XOR_DESC_OPERATION_XOR (0 << 24)
+#define XOR_DESC_OPERATION_CRC32C (1 << 24)
+#define XOR_DESC_OPERATION_MEMCPY (2 << 24)
+
#define XOR_DESC_DMA_OWNED BIT(31)
#define XOR_DESC_EOD_INT_EN BIT(31)

@@ -95,6 +100,7 @@ struct mv_xor_device {
* @all_slots: complete domain of slots usable by the channel
* @slots_allocated: records the actual size of the descriptor slot pool
* @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
+ * @op_in_desc: new mode of driver, each op is writen to descriptor.
*/
struct mv_xor_chan {
int pending;
@@ -115,6 +121,7 @@ struct mv_xor_chan {
struct list_head all_slots;
int slots_allocated;
struct tasklet_struct irq_tasklet;
+ int op_in_desc;
char dummy_src[MV_XOR_MIN_BYTE_COUNT];
char dummy_dst[MV_XOR_MIN_BYTE_COUNT];
dma_addr_t dummy_src_addr, dummy_dst_addr;
--
2.4.0

2015-05-12 15:40:15

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 6/8] async_tx: adding mult and sum_product flags

From: Lior Amsalem <[email protected]>

Some engines (like Marvell mv_xor) do not support mult and sum_product
operations as part of the pq support.

This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
flags helps the driver identify such operations.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Reviewed-by: Nadav Haklai <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
crypto/async_tx/async_raid6_recov.c | 4 ++--
include/linux/dmaengine.h | 4 ++++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 934a84981495..2db5486fd873 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
struct device *dev = dma->dev;
dma_addr_t pq[2];
struct dma_async_tx_descriptor *tx;
- enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+ enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;

if (submit->flags & ASYNC_TX_FENCE)
dma_flags |= DMA_PREP_FENCE;
@@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
dma_addr_t dma_dest[2];
struct device *dev = dma->dev;
struct dma_async_tx_descriptor *tx;
- enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+ enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;

if (submit->flags & ASYNC_TX_FENCE)
dma_flags |= DMA_PREP_FENCE;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index ad419757241f..f19ecebb4d3f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -174,6 +174,8 @@ struct dma_interleaved_template {
* operation it continues the calculation with new sources
* @DMA_PREP_FENCE - tell the driver that subsequent operations depend
* on the result of this operation
+ * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
+ * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
*/
enum dma_ctrl_flags {
DMA_PREP_INTERRUPT = (1 << 0),
@@ -182,6 +184,8 @@ enum dma_ctrl_flags {
DMA_PREP_PQ_DISABLE_Q = (1 << 3),
DMA_PREP_CONTINUE = (1 << 4),
DMA_PREP_FENCE = (1 << 5),
+ DMA_PREP_PQ_MULT = (1 << 10),
+ DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),
};

/**
--
2.4.0

2015-05-12 15:37:39

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 4/8] dmaengine: mv_xor: improve descriptors list handling and reduce locking

From: Lior Amsalem <[email protected]>

This patch change the way free descriptors are marked.

Instead of having a field for descriptor in use, all the descriptors in the
all_slots list are free for use.

This simplify the allocation method and reduce the locking needed.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
drivers/dma/mv_xor.c | 131 +++++++++++++++++----------------------------------
drivers/dma/mv_xor.h | 17 +++----
2 files changed, 48 insertions(+), 100 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index bcc9aec0ce02..d1110442f0d2 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -88,12 +88,6 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
hw_desc->phy_next_desc = next_desc_addr;
}

-static void mv_desc_clear_next_desc(struct mv_xor_desc_slot *desc)
-{
- struct mv_xor_desc *hw_desc = desc->hw_desc;
- hw_desc->phy_next_desc = 0;
-}
-
static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
int index, dma_addr_t addr)
{
@@ -214,21 +208,6 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
return (state == 1) ? 1 : 0;
}

-/**
- * mv_chan_free_slots - flags descriptor slots for reuse
- * @slot: Slot to free
- * Caller must hold &mv_chan->lock while calling this function
- */
-static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
- struct mv_xor_desc_slot *slot)
-{
- dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
- __func__, __LINE__, slot);
-
- slot->slot_used = 0;
-
-}
-
/*
* mv_chan_start_new_chain - program the engine to operate on new
* chain headed by sw_desc
@@ -280,12 +259,10 @@ mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)

dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
- completed_node) {
+ node) {

- if (async_tx_test_ack(&iter->async_tx)) {
- list_del(&iter->completed_node);
- mv_chan_free_slots(mv_chan, iter);
- }
+ if (async_tx_test_ack(&iter->async_tx))
+ list_move_tail(&iter->node, &mv_chan->free_slots);
}
return 0;
}
@@ -296,17 +273,16 @@ mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
{
dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
__func__, __LINE__, desc, desc->async_tx.flags);
- list_del(&desc->chain_node);
+
/* the client is allowed to attach dependent operations
* until 'ack' is set
*/
- if (!async_tx_test_ack(&desc->async_tx)) {
+ if (!async_tx_test_ack(&desc->async_tx))
/* move this slot to the completed_slots */
- list_add_tail(&desc->completed_node, &mv_chan->completed_slots);
- return 0;
- }
+ list_move_tail(&desc->node, &mv_chan->completed_slots);
+ else
+ list_move_tail(&desc->node, &mv_chan->free_slots);

- mv_chan_free_slots(mv_chan, desc);
return 0;
}

@@ -328,7 +304,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
*/

list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
- chain_node) {
+ node) {
prefetch(_iter);
prefetch(&_iter->async_tx);

@@ -358,7 +334,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
struct mv_xor_desc_slot *chain_head;
chain_head = list_entry(mv_chan->chain.next,
struct mv_xor_desc_slot,
- chain_node);
+ node);

mv_chan_start_new_chain(mv_chan, chain_head);
}
@@ -379,49 +355,28 @@ static void mv_xor_tasklet(unsigned long data)
static struct mv_xor_desc_slot *
mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
{
- struct mv_xor_desc_slot *iter, *_iter;
- int retry = 0;
+ struct mv_xor_desc_slot *iter;

- /* start search from the last allocated descrtiptor
- * if a contiguous allocation can not be found start searching
- * from the beginning of the list
- */
-retry:
- if (retry == 0)
- iter = mv_chan->last_used;
- else
- iter = list_entry(&mv_chan->all_slots,
- struct mv_xor_desc_slot,
- slot_node);
+ spin_lock_bh(&mv_chan->lock);

- list_for_each_entry_safe_continue(
- iter, _iter, &mv_chan->all_slots, slot_node) {
+ if (!list_empty(&mv_chan->free_slots)) {
+ iter = list_first_entry(&mv_chan->free_slots,
+ struct mv_xor_desc_slot,
+ node);

- prefetch(_iter);
- prefetch(&_iter->async_tx);
- if (iter->slot_used) {
- /* give up after finding the first busy slot
- * on the second pass through the list
- */
- if (retry)
- break;
- continue;
- }
+ list_move_tail(&iter->node, &mv_chan->allocated_slots);
+
+ spin_unlock_bh(&mv_chan->lock);

/* pre-ack descriptor */
async_tx_ack(&iter->async_tx);
-
- iter->slot_used = 1;
- INIT_LIST_HEAD(&iter->chain_node);
iter->async_tx.cookie = -EBUSY;
- mv_chan->last_used = iter;
- mv_desc_clear_next_desc(iter);

return iter;

}
- if (!retry++)
- goto retry;
+
+ spin_unlock_bh(&mv_chan->lock);

/* try to free some slots if the allocation fails */
tasklet_schedule(&mv_chan->irq_tasklet);
@@ -447,14 +402,14 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
cookie = dma_cookie_assign(tx);

if (list_empty(&mv_chan->chain))
- list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+ list_move_tail(&sw_desc->node, &mv_chan->chain);
else {
new_hw_chain = 0;

old_chain_tail = list_entry(mv_chan->chain.prev,
struct mv_xor_desc_slot,
- chain_node);
- list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+ node);
+ list_move_tail(&sw_desc->node, &mv_chan->chain);

dev_dbg(mv_chan_to_devp(mv_chan), "Append to last desc %pa\n",
&old_chain_tail->async_tx.phys);
@@ -507,26 +462,20 @@ static int mv_xor_alloc_chan_resources(struct dma_chan *chan)

dma_async_tx_descriptor_init(&slot->async_tx, chan);
slot->async_tx.tx_submit = mv_xor_tx_submit;
- INIT_LIST_HEAD(&slot->chain_node);
- INIT_LIST_HEAD(&slot->slot_node);
+ INIT_LIST_HEAD(&slot->node);
dma_desc = mv_chan->dma_desc_pool;
slot->async_tx.phys = dma_desc + idx * MV_XOR_SLOT_SIZE;
slot->idx = idx++;

spin_lock_bh(&mv_chan->lock);
mv_chan->slots_allocated = idx;
- list_add_tail(&slot->slot_node, &mv_chan->all_slots);
+ list_add_tail(&slot->node, &mv_chan->free_slots);
spin_unlock_bh(&mv_chan->lock);
}

- if (mv_chan->slots_allocated && !mv_chan->last_used)
- mv_chan->last_used = list_entry(mv_chan->all_slots.next,
- struct mv_xor_desc_slot,
- slot_node);
-
dev_dbg(mv_chan_to_devp(mv_chan),
- "allocated %d descriptor slots last_used: %p\n",
- mv_chan->slots_allocated, mv_chan->last_used);
+ "allocated %d descriptor slots\n",
+ mv_chan->slots_allocated);

return mv_chan->slots_allocated ? : -ENOMEM;
}
@@ -547,7 +496,6 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
"%s src_cnt: %d len: %u dest %pad flags: %ld\n",
__func__, src_cnt, len, &dest, flags);

- spin_lock_bh(&mv_chan->lock);
sw_desc = mv_chan_alloc_slot(mv_chan);
if (sw_desc) {
sw_desc->type = DMA_XOR;
@@ -558,7 +506,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
while (src_cnt--)
mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
}
- spin_unlock_bh(&mv_chan->lock);
+
dev_dbg(mv_chan_to_devp(mv_chan),
"%s sw_desc %p async_tx %p \n",
__func__, sw_desc, &sw_desc->async_tx);
@@ -605,22 +553,26 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)
mv_chan_slot_cleanup(mv_chan);

list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
- chain_node) {
+ node) {
in_use_descs++;
- list_del(&iter->chain_node);
+ list_move_tail(&iter->node, &mv_chan->free_slots);
}
list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
- completed_node) {
+ node) {
+ in_use_descs++;
+ list_move_tail(&iter->node, &mv_chan->free_slots);
+ }
+ list_for_each_entry_safe(iter, _iter, &mv_chan->allocated_slots,
+ node) {
in_use_descs++;
- list_del(&iter->completed_node);
+ list_move_tail(&iter->node, &mv_chan->free_slots);
}
list_for_each_entry_safe_reverse(
- iter, _iter, &mv_chan->all_slots, slot_node) {
- list_del(&iter->slot_node);
+ iter, _iter, &mv_chan->free_slots, node) {
+ list_del(&iter->node);
kfree(iter);
mv_chan->slots_allocated--;
}
- mv_chan->last_used = NULL;

dev_dbg(mv_chan_to_devp(mv_chan), "%s slots_allocated %d\n",
__func__, mv_chan->slots_allocated);
@@ -1078,7 +1030,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
spin_lock_init(&mv_chan->lock);
INIT_LIST_HEAD(&mv_chan->chain);
INIT_LIST_HEAD(&mv_chan->completed_slots);
- INIT_LIST_HEAD(&mv_chan->all_slots);
+ INIT_LIST_HEAD(&mv_chan->free_slots);
+ INIT_LIST_HEAD(&mv_chan->allocated_slots);
mv_chan->dmachan.device = dma_dev;
dma_cookie_init(&mv_chan->dmachan);

diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 145ba2f895f7..71684de37ddb 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -93,11 +93,11 @@ struct mv_xor_device {
* @mmr_base: memory mapped register base
* @idx: the index of the xor channel
* @chain: device chain view of the descriptors
+ * @free_slots: free slots usable by the channel
+ * @allocated_slots: slots allocated by the driver
* @completed_slots: slots completed by HW but still need to be acked
* @device: parent device
* @common: common dmaengine channel object members
- * @last_used: place holder for allocation to continue from where it left off
- * @all_slots: complete domain of slots usable by the channel
* @slots_allocated: records the actual size of the descriptor slot pool
* @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
* @op_in_desc: new mode of driver, each op is writen to descriptor.
@@ -111,14 +111,14 @@ struct mv_xor_chan {
int irq;
enum dma_transaction_type current_type;
struct list_head chain;
+ struct list_head free_slots;
+ struct list_head allocated_slots;
struct list_head completed_slots;
dma_addr_t dma_desc_pool;
void *dma_desc_pool_virt;
size_t pool_size;
struct dma_device dmadev;
struct dma_chan dmachan;
- struct mv_xor_desc_slot *last_used;
- struct list_head all_slots;
int slots_allocated;
struct tasklet_struct irq_tasklet;
int op_in_desc;
@@ -129,9 +129,7 @@ struct mv_xor_chan {

/**
* struct mv_xor_desc_slot - software descriptor
- * @slot_node: node on the mv_xor_chan.all_slots list
- * @chain_node: node on the mv_xor_chan.chain list
- * @completed_node: node on the mv_xor_chan.completed_slots list
+ * @node: node on the mv_xor_chan lists
* @hw_desc: virtual address of the hardware descriptor chain
* @phys: hardware address of the hardware descriptor chain
* @slot_used: slot in use or not
@@ -140,12 +138,9 @@ struct mv_xor_chan {
* @async_tx: support for the async_tx api
*/
struct mv_xor_desc_slot {
- struct list_head slot_node;
- struct list_head chain_node;
- struct list_head completed_node;
+ struct list_head node;
enum dma_transaction_type type;
void *hw_desc;
- u16 slot_used;
u16 idx;
struct dma_async_tx_descriptor async_tx;
};
--
2.4.0

2015-05-12 15:37:42

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 7/8] dmaengine: mv_xor: add support for a38x RAID6 support

From: Lior Amsalem <[email protected]>

The offload engine in A38x introduce RAID6 capability, this patch adds
RAID6 offload support for mv_xor driver.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Reviewed-by: Nadav Haklai <[email protected]>
Tested-by: Nadav Haklai <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
drivers/dma/mv_xor.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++----
drivers/dma/mv_xor.h | 5 ++-
2 files changed, 111 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 28980483eafb..9b5753dcbf76 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -36,6 +36,11 @@ enum mv_xor_mode {
XOR_MODE_IN_DESC,
};

+/* engine coefficients */
+static u8 mv_xor_raid6_coefs[8] = {
+ 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
+};
+
static void mv_xor_issue_pending(struct dma_chan *chan);

#define to_mv_xor_chan(chan) \
@@ -48,8 +53,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan);
((chan)->dmadev.dev)

static void mv_desc_init(struct mv_xor_desc_slot *desc,
- dma_addr_t addr, u32 byte_count,
- enum dma_ctrl_flags flags)
+ u32 byte_count, enum dma_ctrl_flags flags)
{
struct mv_xor_desc *hw_desc = desc->hw_desc;

@@ -58,7 +62,6 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
/* Enable end-of-descriptor interrupts only for DMA_PREP_INTERRUPT */
hw_desc->desc_command = (flags & DMA_PREP_INTERRUPT) ?
XOR_DESC_EOD_INT_EN : 0;
- hw_desc->phy_dest_addr = addr;
hw_desc->byte_count = byte_count;
}

@@ -74,6 +77,9 @@ static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
case DMA_MEMCPY:
hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
break;
+ case DMA_PQ:
+ hw_desc->desc_command |= XOR_DESC_OPERATION_PQ;
+ break;
default:
BUG();
return;
@@ -84,16 +90,38 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
u32 next_desc_addr)
{
struct mv_xor_desc *hw_desc = desc->hw_desc;
+
BUG_ON(hw_desc->phy_next_desc);
hw_desc->phy_next_desc = next_desc_addr;
}

+static void mv_desc_set_dest_addr(struct mv_xor_desc_slot *desc,
+ dma_addr_t addr)
+{
+ struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+ hw_desc->phy_dest_addr = addr;
+ if (desc->type == DMA_PQ)
+ hw_desc->desc_command |= (1 << 8);
+}
+
+static void mv_desc_set_q_dest_addr(struct mv_xor_desc_slot *desc,
+ dma_addr_t addr)
+{
+ struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+ hw_desc->phy_q_dest_addr = addr;
+ if (desc->type == DMA_PQ)
+ hw_desc->desc_command |= (1 << 9);
+}
+
static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
int index, dma_addr_t addr)
{
struct mv_xor_desc *hw_desc = desc->hw_desc;
+
hw_desc->phy_src_addr[mv_phy_src_idx(index)] = addr;
- if (desc->type == DMA_XOR)
+ if ((desc->type == DMA_XOR) || (desc->type == DMA_PQ))
hw_desc->desc_command |= (1 << index);
}

@@ -520,7 +548,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
if (sw_desc) {
sw_desc->type = DMA_XOR;
sw_desc->async_tx.flags = flags;
- mv_desc_init(sw_desc, dest, len, flags);
+ mv_desc_init(sw_desc, len, flags);
+ mv_desc_set_dest_addr(sw_desc, dest);
if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
mv_desc_set_mode(sw_desc);
while (src_cnt--)
@@ -562,6 +591,69 @@ mv_xor_prep_dma_interrupt(struct dma_chan *chan, unsigned long flags)
return mv_xor_prep_dma_xor(chan, dest, &src, 1, len, flags);
}

+static struct dma_async_tx_descriptor *
+mv_xor_prep_dma_pq(struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src,
+ unsigned int src_cnt, const unsigned char *scf,
+ size_t len, unsigned long flags)
+{
+ struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
+ struct mv_xor_desc_slot *sw_desc;
+ int src_i = 0;
+ int i = 0;
+
+ if (unlikely(len < MV_XOR_MIN_BYTE_COUNT))
+ return NULL;
+
+ BUG_ON(len > MV_XOR_MAX_BYTE_COUNT);
+
+ dev_dbg(mv_chan_to_devp(mv_chan),
+ "%s src_cnt: %d len: %u flags: %ld\n",
+ __func__, src_cnt, len, flags);
+
+ /*
+ * since the coefs on Marvell engine are hardcoded, do not
+ * support mult and sum product requests
+ */
+ if ((flags & DMA_PREP_PQ_MULT) || (flags & DMA_PREP_PQ_SUM_PRODUCT))
+ return NULL;
+
+ sw_desc = mv_chan_alloc_slot(mv_chan);
+ if (sw_desc) {
+ sw_desc->type = DMA_PQ;
+ sw_desc->async_tx.flags = flags;
+ mv_desc_init(sw_desc, len, flags);
+ if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+ mv_desc_set_mode(sw_desc);
+ if (!(flags & DMA_PREP_PQ_DISABLE_P))
+ mv_desc_set_dest_addr(sw_desc, dst[0]);
+ if (!(flags & DMA_PREP_PQ_DISABLE_Q))
+ mv_desc_set_q_dest_addr(sw_desc, dst[1]);
+ while (src_cnt) {
+ /*
+ * probably we can do better coding below,
+ * hash table maybe?
+ */
+ if (scf[src_i] == mv_xor_raid6_coefs[i]) {
+ /*
+ * coefs are hardcoded, assign the src
+ * to the right place
+ */
+ mv_desc_set_src_addr(sw_desc, i, src[src_i]);
+ src_i++;
+ i++;
+ src_cnt--;
+ } else {
+ i++;
+ }
+ }
+ }
+
+ dev_dbg(mv_chan_to_devp(mv_chan),
+ "%s sw_desc %p async_tx %p\n",
+ __func__, sw_desc, &sw_desc->async_tx);
+ return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
static void mv_xor_free_chan_resources(struct dma_chan *chan)
{
struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
@@ -1026,6 +1118,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
dma_dev->max_xor = 8;
dma_dev->device_prep_dma_xor = mv_xor_prep_dma_xor;
}
+ if (dma_has_cap(DMA_PQ, dma_dev->cap_mask)) {
+ dma_set_maxpq(dma_dev, 8, 0);
+ dma_dev->device_prep_dma_pq = mv_xor_prep_dma_pq;
+ }

mv_chan->mmr_base = xordev->xor_base;
mv_chan->mmr_high_base = xordev->xor_high_base;
@@ -1071,11 +1167,13 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
goto err_free_irq;
}

- dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+ dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s%s)\n",
mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
- dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
+ dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "",
+ dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "");
+

dma_async_device_register(dma_dev);
return mv_chan;
@@ -1199,6 +1297,8 @@ static int mv_xor_probe(struct platform_device *pdev)
dma_cap_set(DMA_XOR, cap_mask);
if (of_property_read_bool(np, "dmacap,interrupt"))
dma_cap_set(DMA_INTERRUPT, cap_mask);
+ if (of_property_read_bool(np, "dmacap,pq"))
+ dma_cap_set(DMA_PQ, cap_mask);

irq = irq_of_parse_and_map(np, 0);
if (!irq) {
@@ -1310,6 +1410,6 @@ static void __exit mv_xor_exit(void)
module_exit(mv_xor_exit);
#endif

-MODULE_AUTHOR("Saeed Bishara <[email protected]>");
+MODULE_AUTHOR("Lior Amsalem <[email protected]>, Saeed Bishara <[email protected]>");
MODULE_DESCRIPTION("DMA engine driver for Marvell's XOR engine");
MODULE_LICENSE("GPL");
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index b7455b42137b..b72e7357c5c8 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -37,6 +37,7 @@
#define XOR_DESC_OPERATION_XOR (0 << 24)
#define XOR_DESC_OPERATION_CRC32C (1 << 24)
#define XOR_DESC_OPERATION_MEMCPY (2 << 24)
+#define XOR_DESC_OPERATION_PQ (5 << 24)

#define XOR_DESC_DMA_OWNED BIT(31)
#define XOR_DESC_EOD_INT_EN BIT(31)
@@ -164,7 +165,7 @@ struct mv_xor_desc {
u32 byte_count; /* size of src/dst blocks in bytes */
u32 phy_dest_addr; /* destination block address */
u32 phy_src_addr[8]; /* source block addresses */
- u32 reserved0;
+ u32 phy_q_dest_addr;
u32 reserved1;
};
#define mv_phy_src_idx(src_idx) (src_idx)
@@ -178,7 +179,7 @@ struct mv_xor_desc {
u32 byte_count; /* size of src/dst blocks in bytes */
u32 phy_src_addr[8]; /* source block addresses */
u32 reserved1;
- u32 reserved0;
+ u32 phy_q_dest_addr;
};
#define mv_phy_src_idx(src_idx) (src_idx ^ 1)
#endif
--
2.4.0

2015-05-12 15:37:43

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features

From: Lior Amsalem <[email protected]>

The new XOR engine has a new compatible of its own, together with new
channel capabilities.

Use that new compatible now that we have a driver that can handle it.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Reviewed-by: Nadav Haklai <[email protected]>
Tested-by: Nadav Haklai <[email protected]>
---
arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
index ed2dd8ba4080..6d07b7389415 100644
--- a/arch/arm/boot/dts/armada-38x.dtsi
+++ b/arch/arm/boot/dts/armada-38x.dtsi
@@ -448,7 +448,7 @@
};

xor@60800 {
- compatible = "marvell,orion-xor";
+ compatible = "marvell,a38x-xor";
reg = <0x60800 0x100
0x60a00 0x100>;
clocks = <&gateclk 22>;
@@ -458,17 +458,13 @@
interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
dmacap,memcpy;
dmacap,xor;
- };
- xor01 {
- interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
- dmacap,memcpy;
- dmacap,xor;
- dmacap,memset;
+ dmacap,pq;
+ dmacap,interrupt;
};
};

xor@60900 {
- compatible = "marvell,orion-xor";
+ compatible = "marvell,a38x-xor";
reg = <0x60900 0x100
0x60b00 0x100>;
clocks = <&gateclk 28>;
@@ -478,12 +474,8 @@
interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
dmacap,memcpy;
dmacap,xor;
- };
- xor11 {
- interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
- dmacap,memcpy;
- dmacap,xor;
- dmacap,memset;
+ dmacap,pq;
+ dmacap,interrupt;
};
};

--
2.4.0

2015-05-12 15:37:38

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 3/8] dmaengine: mv_xor: Enlarge descriptor pool size

From: Lior Amsalem <[email protected]>

Now that we have 2 channels assigned to 2 CPUs and all requests are chained
on same channels, we need much more descriptors available to satisfy
async_tx workload.

3072 descriptors was found in our lab as the number of descriptors which
allow the async_tx stack to work without waiting for free descriptors on
submission of new requests.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Nadav Haklai <[email protected]>
Tested-by: Nadav Haklai <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
drivers/dma/mv_xor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 89a8c85539a1..145ba2f895f7 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -19,7 +19,7 @@
#include <linux/dmaengine.h>
#include <linux/interrupt.h>

-#define MV_XOR_POOL_SIZE PAGE_SIZE
+#define MV_XOR_POOL_SIZE (MV_XOR_SLOT_SIZE * 3072)
#define MV_XOR_SLOT_SIZE 64
#define MV_XOR_THRESHOLD 1
#define MV_XOR_MAX_CHANNELS 2
--
2.4.0

2015-05-12 15:37:40

by Maxime Ripard

[permalink] [raw]
Subject: [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup

From: Lior Amsalem <[email protected]>

This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).

The cleanup function will free descriptors based on the ownership bit in
the descriptors.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Ofer Heifetz <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
---
drivers/dma/mv_xor.c | 72 +++++++++++++++++++++++++++++++++-------------------
drivers/dma/mv_xor.h | 1 +
2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index d1110442f0d2..28980483eafb 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -293,7 +293,8 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
dma_cookie_t cookie = 0;
int busy = mv_chan_is_busy(mv_chan);
u32 current_desc = mv_chan_get_current_desc(mv_chan);
- int seen_current = 0;
+ int current_cleaned = 0;
+ struct mv_xor_desc *hw_desc;

dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
@@ -305,38 +306,57 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)

list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
node) {
- prefetch(_iter);
- prefetch(&_iter->async_tx);

- /* do not advance past the current descriptor loaded into the
- * hardware channel, subsequent descriptors are either in
- * process or have not been submitted
- */
- if (seen_current)
- break;
+ /* clean finished descriptors */
+ hw_desc = iter->hw_desc;
+ if (hw_desc->status & XOR_DESC_SUCCESS) {
+ cookie = mv_desc_run_tx_complete_actions(iter, mv_chan,
+ cookie);

- /* stop the search if we reach the current descriptor and the
- * channel is busy
- */
- if (iter->async_tx.phys == current_desc) {
- seen_current = 1;
- if (busy)
+ /* done processing desc, clean slot */
+ mv_desc_clean_slot(iter, mv_chan);
+
+ /* break if we did cleaned the current */
+ if (iter->async_tx.phys == current_desc) {
+ current_cleaned = 1;
break;
+ }
+ } else {
+ if (iter->async_tx.phys == current_desc) {
+ current_cleaned = 0;
+ break;
+ }
}
-
- cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
-
- if (mv_desc_clean_slot(iter, mv_chan))
- break;
}

if ((busy == 0) && !list_empty(&mv_chan->chain)) {
- struct mv_xor_desc_slot *chain_head;
- chain_head = list_entry(mv_chan->chain.next,
- struct mv_xor_desc_slot,
- node);
-
- mv_chan_start_new_chain(mv_chan, chain_head);
+ if (current_cleaned) {
+ /*
+ * current descriptor cleaned and removed, run
+ * from list head
+ */
+ iter = list_entry(mv_chan->chain.next,
+ struct mv_xor_desc_slot,
+ node);
+ mv_chan_start_new_chain(mv_chan, iter);
+ } else {
+ if (!list_is_last(&iter->node, &mv_chan->chain)) {
+ /*
+ * descriptors are still waiting after
+ * current, trigger them
+ */
+ iter = list_entry(iter->node.next,
+ struct mv_xor_desc_slot,
+ node);
+ mv_chan_start_new_chain(mv_chan, iter);
+ } else {
+ /*
+ * some descriptors are still waiting
+ * to be cleaned
+ */
+ tasklet_schedule(&mv_chan->irq_tasklet);
+ }
+ }
}

if (cookie > 0)
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 71684de37ddb..b7455b42137b 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -32,6 +32,7 @@
#define XOR_OPERATION_MODE_MEMCPY 2
#define XOR_OPERATION_MODE_IN_DESC 7
#define XOR_DESCRIPTOR_SWAP BIT(14)
+#define XOR_DESC_SUCCESS 0x40000000

#define XOR_DESC_OPERATION_XOR (0 << 24)
#define XOR_DESC_OPERATION_CRC32C (1 << 24)
--
2.4.0

2015-05-12 15:49:41

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

Maxime,

On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:

> Required properties:
> -- compatible: Should be "marvell,orion-xor"
> +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"

I believe the new compatible string should be armada-380-xor or
armada380-xor. Wildcards in compatible strings are generally not
recommended, and we don't use the a38x- prefix anywhere in the tree, as
far as I can see:

drivers/ata/ahci_mvebu.c: { .compatible = "marvell,armada-380-ahci", },
drivers/bus/mvebu-mbus.c: { .compatible = "marvell,armada380-mbus",
drivers/mmc/host/sdhci-pxav3.c: .compatible = "marvell,armada-380-sdhci",
drivers/rtc/rtc-armada38x.c: { .compatible = "marvell,armada-380-rtc", },
drivers/thermal/armada_thermal.c: .compatible = "marvell,armada380-thermal",
drivers/usb/host/xhci-plat.c: { .compatible = "marvell,armada-380-xhci"},
drivers/watchdog/orion_wdt.c: .compatible = "marvell,armada-380-wdt",

Yes, we're not very consistent between armada380 and armada-380, but
we're not using a38x anywhere.

Best regards,

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-05-12 15:50:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> From: Lior Amsalem <[email protected]>
>
> The Marvell Armada 38x SoC introduce new features to the XOR engine,
> especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> the descriptor and not set through the controller registers.
>
> This new feature allows mixing of different commands (even PQ) on the same
> channel/chain without the need to stop the engine to reconfigure the engine
> mode.
>
> Refactor the driver to be able to use that new feature on the Armada 38x,
> while keeping the old behaviour on the older SoCs.
>
> Signed-off-by: Lior Amsalem <[email protected]>
> Reviewed-by: Ofer Heifetz <[email protected]>
> Signed-off-by: Maxime Ripard <[email protected]>

Two minimal style comments:

> +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> +{
> + u32 op_mode;
> + u32 config = readl_relaxed(XOR_CONFIG(chan));
> +
> + op_mode = XOR_OPERATION_MODE_IN_DESC;
> +
> + config &= ~0x7;
> + config |= op_mode;
> +
> +#if defined(__BIG_ENDIAN)
> + config |= XOR_DESCRIPTOR_SWAP;
> +#else
> + config &= ~XOR_DESCRIPTOR_SWAP;
> +#endif
> +
> + writel_relaxed(config, XOR_CONFIG(chan));
> +}

Using

if (IS_ENABLED(__BIG_ENDIAN))

here would make it more readable by avoiding the #if. Alternatively,
you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
the descriptors manually like a lot of other drivers do. You have
to swap the mmio accesses anywya.

> }
>
> +#ifdef CONFIG_OF
> +static const struct of_device_id mv_xor_dt_ids[] = {
> + { .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> + { .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> +#endif
> +

Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
so it's not worth the size benefit of leaving it out on the few machines
that don't.

You'll have to remove the of_match_ptr() invocation as well if you do that,
to avoid a warning about an unused symbol.

Arnd

2015-05-12 15:51:27

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup

Maxime,

On Tue, 12 May 2015 17:37:40 +0200, Maxime Ripard wrote:
> From: Lior Amsalem <[email protected]>
>
> This patch fixes a bug in the XOR driver where the cleanup function can be
> called and free descriptors that never been processed by the engine (which
> result in data errors).
>
> The cleanup function will free descriptors based on the ownership bit in
> the descriptors.
>
> Signed-off-by: Lior Amsalem <[email protected]>
> Reviewed-by: Ofer Heifetz <[email protected]>
> Signed-off-by: Maxime Ripard <[email protected]>

If this is fixing a bug, shouldn't it be Cc'ed to stable@ ? But then it
needs to be the first patch in the series.

>From what I remember, this series both fixes some issues *and* adds
RAID6 support. We need to separate out the two things so that the fixes
can be propagated to stable.

Thanks,

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-05-12 15:54:35

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

Dear Arnd Bergmann,

On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:

> Using
>
> if (IS_ENABLED(__BIG_ENDIAN))
>
> here would make it more readable by avoiding the #if. Alternatively,
> you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> the descriptors manually like a lot of other drivers do. You have
> to swap the mmio accesses anywya.

Agreed on IS_ENABLED(). However, I don't understand the comment about
the need to swap mmio accesses? We're using readl_relaxed() /
writel_relaxed(), and they do the swapping already. Am I missing
something?

Best regards,

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-05-12 16:03:27

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

On Tue, May 12, 2015 at 05:49:36PM +0200, Thomas Petazzoni wrote:
> Maxime,
>
> On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:
>
> > Required properties:
> > -- compatible: Should be "marvell,orion-xor"
> > +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
>
> I believe the new compatible string should be armada-380-xor or
> armada380-xor. Wildcards in compatible strings are generally not
> recommended, and we don't use the a38x- prefix anywhere in the tree, as
> far as I can see:
>
> drivers/ata/ahci_mvebu.c: { .compatible = "marvell,armada-380-ahci", },
> drivers/bus/mvebu-mbus.c: { .compatible = "marvell,armada380-mbus",
> drivers/mmc/host/sdhci-pxav3.c: .compatible = "marvell,armada-380-sdhci",
> drivers/rtc/rtc-armada38x.c: { .compatible = "marvell,armada-380-rtc", },
> drivers/thermal/armada_thermal.c: .compatible = "marvell,armada380-thermal",
> drivers/usb/host/xhci-plat.c: { .compatible = "marvell,armada-380-xhci"},
> drivers/watchdog/orion_wdt.c: .compatible = "marvell,armada-380-wdt",
>
> Yes, we're not very consistent between armada380 and armada-380, but
> we're not using a38x anywhere.

A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
suggests it should have the - if we want to be consistent with older
Marvell devices. As you said, starting with marvell,armanda it gets
pretty inconsistent :-(

Andrew

2015-05-12 16:04:30

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

On Tuesday 12 May 2015 17:54:31 Thomas Petazzoni wrote:
> Dear Arnd Bergmann,
>
> On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:
>
> > Using
> >
> > if (IS_ENABLED(__BIG_ENDIAN))
> >
> > here would make it more readable by avoiding the #if. Alternatively,
> > you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> > the descriptors manually like a lot of other drivers do. You have
> > to swap the mmio accesses anywya.
>
> Agreed on IS_ENABLED(). However, I don't understand the comment about
> the need to swap mmio accesses? We're using readl_relaxed() /
> writel_relaxed(), and they do the swapping already. Am I missing
> something?
>

Sorry for being unclear. This was exactly my point: readl_relaxed()
has to do swaps internally, and you can also do swaps when accessing
the descriptor in memory.

Arnd

2015-05-12 16:05:41

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
<[email protected]> wrote:
> Hi,
>
> This serie refactors the mv_xor in order to support the latest Armada
> 38x features, including the PQ support in order to offload the RAID6
> PQ operations.
>
> Not all the PQ operations are supported by the XOR engine, so we had
> to introduce new async_tx flags in the process to identify
> un-supported operations.
>
> Please note that this is currently not usable because of a possible
> regression in the RAID stack in 4.1 that is being discussed at the
> moment here: https://lkml.org/lkml/2015/5/7/527

This is problematic as async_tx is a wart on the dmaengine subsystem
and needs to be deprecated, I just have yet to find the time to do
that work. It turns out it was a mistake to hide the device details
from md, it should be explicitly managing the dma channels, not
relying on a abstraction api. The async_tx api usage of the
dma-mapping api is broken in that it relies on overlapping mappings of
the same address. This happens to work on x86, but on arm it needs
explicit non-overlapping mappings. I started the work to reference
count dma-mappings in 3.13, and we need to teach md to use
dmaengine_unmap_data explicitly. Yielding dma channel management to
md also results in a more efficient implementation as we can dma_map()
the stripe cache once rather than per-io. The "async_tx_ack()"
disaster can also go away when md is explicitly handling channel
switching.

2015-05-12 16:05:39

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

Andrew,

On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:

> A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> suggests it should have the - if we want to be consistent with older
> Marvell devices. As you said, starting with marvell,armanda it gets
> pretty inconsistent :-(

Yes, with the "-" would be better IMO.

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-05-12 16:05:35

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH 6/8] async_tx: adding mult and sum_product flags

On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <[email protected]>
>
> Some engines (like Marvell mv_xor) do not support mult and sum_product
> operations as part of the pq support.
>
> This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> flags helps the driver identify such operations.
>
> Signed-off-by: Lior Amsalem <[email protected]>
> Reviewed-by: Ofer Heifetz <[email protected]>
> Reviewed-by: Nadav Haklai <[email protected]>
> Signed-off-by: Maxime Ripard <[email protected]>
> ---
> crypto/async_tx/async_raid6_recov.c | 4 ++--
> include/linux/dmaengine.h | 4 ++++
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> index 934a84981495..2db5486fd873 100644
> --- a/crypto/async_tx/async_raid6_recov.c
> +++ b/crypto/async_tx/async_raid6_recov.c
> @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
> struct device *dev = dma->dev;
> dma_addr_t pq[2];
> struct dma_async_tx_descriptor *tx;
> - enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> + enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
>
> if (submit->flags & ASYNC_TX_FENCE)
> dma_flags |= DMA_PREP_FENCE;
> @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
> dma_addr_t dma_dest[2];
> struct device *dev = dma->dev;
> struct dma_async_tx_descriptor *tx;
> - enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> + enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
>
> if (submit->flags & ASYNC_TX_FENCE)
> dma_flags |= DMA_PREP_FENCE;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index ad419757241f..f19ecebb4d3f 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -174,6 +174,8 @@ struct dma_interleaved_template {
> * operation it continues the calculation with new sources
> * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
> * on the result of this operation
> + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
> */
> enum dma_ctrl_flags {
> DMA_PREP_INTERRUPT = (1 << 0),
> @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
> DMA_PREP_PQ_DISABLE_Q = (1 << 3),
> DMA_PREP_CONTINUE = (1 << 4),
> DMA_PREP_FENCE = (1 << 5),
> + DMA_PREP_PQ_MULT = (1 << 10),
> + DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

Any reason for skipping 6 to 9?

Andrew

2015-05-12 16:18:11

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features

On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <[email protected]>
>
> The new XOR engine has a new compatible of its own, together with new
> channel capabilities.
>
> Use that new compatible now that we have a driver that can handle it.
>
> Signed-off-by: Lior Amsalem <[email protected]>
> Reviewed-by: Ofer Heifetz <[email protected]>
> Reviewed-by: Nadav Haklai <[email protected]>
> Tested-by: Nadav Haklai <[email protected]>
> ---
> arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> 1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
> index ed2dd8ba4080..6d07b7389415 100644
> --- a/arch/arm/boot/dts/armada-38x.dtsi
> +++ b/arch/arm/boot/dts/armada-38x.dtsi
> @@ -448,7 +448,7 @@
> };
>
> xor@60800 {
> - compatible = "marvell,orion-xor";
> + compatible = "marvell,a38x-xor";
> reg = <0x60800 0x100
> 0x60a00 0x100>;
> clocks = <&gateclk 22>;
> @@ -458,17 +458,13 @@
> interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
> dmacap,memcpy;
> dmacap,xor;
> - };
> - xor01 {
> - interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
> - dmacap,memcpy;
> - dmacap,xor;
> - dmacap,memset;
> + dmacap,pq;
> + dmacap,interrupt;

Does this mean the hardware only has one channel?
And memset is no longer supported?

Andrew


> };
> };
>
> xor@60900 {
> - compatible = "marvell,orion-xor";
> + compatible = "marvell,a38x-xor";
> reg = <0x60900 0x100
> 0x60b00 0x100>;
> clocks = <&gateclk 28>;
> @@ -478,12 +474,8 @@
> interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
> dmacap,memcpy;
> dmacap,xor;
> - };
> - xor11 {
> - interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
> - dmacap,memcpy;
> - dmacap,xor;
> - dmacap,memset;
> + dmacap,pq;
> + dmacap,interrupt;
> };
> };
>
> --
> 2.4.0
>

2015-05-13 07:17:55

by Lior Amsalem

[permalink] [raw]
Subject: RE: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features

> From: Andrew Lunn [mailto:[email protected]]
> Sent: Tuesday, May 12, 2015 7:13 PM
>
> On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <[email protected]>
> >
> > The new XOR engine has a new compatible of its own, together with new
> > channel capabilities.
> >
> > Use that new compatible now that we have a driver that can handle it.
> >
> > Signed-off-by: Lior Amsalem <[email protected]>
> > Reviewed-by: Ofer Heifetz <[email protected]>
> > Reviewed-by: Nadav Haklai <[email protected]>
> > Tested-by: Nadav Haklai <[email protected]>
> > ---
> > arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> > 1 file changed, 6 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > b/arch/arm/boot/dts/armada-38x.dtsi
> > index ed2dd8ba4080..6d07b7389415 100644
> > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > @@ -448,7 +448,7 @@
> > };
> >
> > xor@60800 {
> > - compatible = "marvell,orion-xor";
> > + compatible = "marvell,a38x-xor";
> > reg = <0x60800 0x100
> > 0x60a00 0x100>;
> > clocks = <&gateclk 22>;
> > @@ -458,17 +458,13 @@
> > interrupts = <GIC_SPI 22
> IRQ_TYPE_LEVEL_HIGH>;
> > dmacap,memcpy;
> > dmacap,xor;
> > - };
> > - xor01 {
> > - interrupts = <GIC_SPI 23
> IRQ_TYPE_LEVEL_HIGH>;
> > - dmacap,memcpy;
> > - dmacap,xor;
> > - dmacap,memset;
> > + dmacap,pq;
> > + dmacap,interrupt;
>
> Does this mean the hardware only has one channel?
> And memset is no longer supported?
>

The hardware has two channels per engine and two engines.
However, both on HW side (both channels are on the same "bus port")
and SW (the dma subsystem will assign one channel per CPU).
we found it's better (performance wise) to use only one channel on each engine
and let the framework assign one per CPU.
This way, descriptors chaining was better (cause of the depended descriptors
problem) and overall interrupt number reduced.

Yes, since memset is a problematic one. It can only be done via registers
(and not on descriptors level) plus no one really needs it...

> Andrew
>
>
> > };
> > };
> >
> > xor@60900 {
> > - compatible = "marvell,orion-xor";
> > + compatible = "marvell,a38x-xor";
> > reg = <0x60900 0x100
> > 0x60b00 0x100>;
> > clocks = <&gateclk 28>;
> > @@ -478,12 +474,8 @@
> > interrupts = <GIC_SPI 65
> IRQ_TYPE_LEVEL_HIGH>;
> > dmacap,memcpy;
> > dmacap,xor;
> > - };
> > - xor11 {
> > - interrupts = <GIC_SPI 66
> IRQ_TYPE_LEVEL_HIGH>;
> > - dmacap,memcpy;
> > - dmacap,xor;
> > - dmacap,memset;
> > + dmacap,pq;
> > + dmacap,interrupt;
> > };
> > };
> >
> > --
> > 2.4.0
> >


Regards,
Lior Amsalem

2015-05-13 08:15:35

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

Hi Arnd,

On Tue, May 12, 2015 at 05:49:08PM +0200, Arnd Bergmann wrote:
> On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> > From: Lior Amsalem <[email protected]>
> >
> > The Marvell Armada 38x SoC introduce new features to the XOR engine,
> > especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> > the descriptor and not set through the controller registers.
> >
> > This new feature allows mixing of different commands (even PQ) on the same
> > channel/chain without the need to stop the engine to reconfigure the engine
> > mode.
> >
> > Refactor the driver to be able to use that new feature on the Armada 38x,
> > while keeping the old behaviour on the older SoCs.
> >
> > Signed-off-by: Lior Amsalem <[email protected]>
> > Reviewed-by: Ofer Heifetz <[email protected]>
> > Signed-off-by: Maxime Ripard <[email protected]>
>
> Two minimal style comments:
>
> > +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> > +{
> > + u32 op_mode;
> > + u32 config = readl_relaxed(XOR_CONFIG(chan));
> > +
> > + op_mode = XOR_OPERATION_MODE_IN_DESC;
> > +
> > + config &= ~0x7;
> > + config |= op_mode;
> > +
> > +#if defined(__BIG_ENDIAN)
> > + config |= XOR_DESCRIPTOR_SWAP;
> > +#else
> > + config &= ~XOR_DESCRIPTOR_SWAP;
> > +#endif
> > +
> > + writel_relaxed(config, XOR_CONFIG(chan));
> > +}
>
> Using
>
> if (IS_ENABLED(__BIG_ENDIAN))
>
> here would make it more readable by avoiding the #if.

Indeed. I'll change that.

> Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> and just swap the descriptors manually like a lot of other drivers
> do. You have to swap the mmio accesses anywya.

That won't be easily doable however.

Not only the endianness of the individual fields in the descriptor
changes, but changing the endianness also swaps the fields themselves
by pair of u32.

You can see that here:
http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159

So I'm guessing that leaving it like it is was the more readable
solution.

> > }
> >
> > +#ifdef CONFIG_OF
> > +static const struct of_device_id mv_xor_dt_ids[] = {
> > + { .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> > + { .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> > + {},
> > +};
> > +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> > +#endif
> > +
>
> Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
> so it's not worth the size benefit of leaving it out on the few machines
> that don't.
>
> You'll have to remove the of_match_ptr() invocation as well if you do that,
> to avoid a warning about an unused symbol.

Will do.

Thanks!
Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (2.73 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 08:25:04

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

On Tue, May 12, 2015 at 06:05:39PM +0200, Thomas Petazzoni wrote:
> Andrew,
>
> On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:
>
> > A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> > suggests it should have the - if we want to be consistent with older
> > Marvell devices. As you said, starting with marvell,armanda it gets
> > pretty inconsistent :-(
>
> Yes, with the "-" would be better IMO.

changed to armada-380-xor, thanks!

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (578.00 B)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 08:33:28

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features

On Wed, May 13, 2015 at 07:16:34AM +0000, Lior Amsalem wrote:
> > From: Andrew Lunn [mailto:[email protected]]
> > Sent: Tuesday, May 12, 2015 7:13 PM
> >
> > On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > > From: Lior Amsalem <[email protected]>
> > >
> > > The new XOR engine has a new compatible of its own, together with new
> > > channel capabilities.
> > >
> > > Use that new compatible now that we have a driver that can handle it.
> > >
> > > Signed-off-by: Lior Amsalem <[email protected]>
> > > Reviewed-by: Ofer Heifetz <[email protected]>
> > > Reviewed-by: Nadav Haklai <[email protected]>
> > > Tested-by: Nadav Haklai <[email protected]>
> > > ---
> > > arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> > > 1 file changed, 6 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > > b/arch/arm/boot/dts/armada-38x.dtsi
> > > index ed2dd8ba4080..6d07b7389415 100644
> > > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > > @@ -448,7 +448,7 @@
> > > };
> > >
> > > xor@60800 {
> > > - compatible = "marvell,orion-xor";
> > > + compatible = "marvell,a38x-xor";
> > > reg = <0x60800 0x100
> > > 0x60a00 0x100>;
> > > clocks = <&gateclk 22>;
> > > @@ -458,17 +458,13 @@
> > > interrupts = <GIC_SPI 22
> > IRQ_TYPE_LEVEL_HIGH>;
> > > dmacap,memcpy;
> > > dmacap,xor;
> > > - };
> > > - xor01 {
> > > - interrupts = <GIC_SPI 23
> > IRQ_TYPE_LEVEL_HIGH>;
> > > - dmacap,memcpy;
> > > - dmacap,xor;
> > > - dmacap,memset;
> > > + dmacap,pq;
> > > + dmacap,interrupt;
> >
> > Does this mean the hardware only has one channel?
> > And memset is no longer supported?
> >
>
> The hardware has two channels per engine and two engines.
> However, both on HW side (both channels are on the same "bus port")
> and SW (the dma subsystem will assign one channel per CPU).
> we found it's better (performance wise) to use only one channel on each engine
> and let the framework assign one per CPU.
> This way, descriptors chaining was better (cause of the depended descriptors
> problem) and overall interrupt number reduced.
>
> Yes, since memset is a problematic one. It can only be done via registers
> (and not on descriptors level) plus no one really needs it...

And memset support has been removed from dmaengine since 3.11, so it
doesn't look like anyone really needs it :)

We're talking about reintroducing it for some platforms that actually
need it, but it wasn't really used on marvell anyway...

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (2.66 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 08:47:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode

On Wednesday 13 May 2015 10:15:35 Maxime Ripard wrote:
> > Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> > and just swap the descriptors manually like a lot of other drivers
> > do. You have to swap the mmio accesses anywya.
>
> That won't be easily doable however.
>
> Not only the endianness of the individual fields in the descriptor
> changes, but changing the endianness also swaps the fields themselves
> by pair of u32.
>
> You can see that here:
> http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159
>
> So I'm guessing that leaving it like it is was the more readable
> solution.

I see, yes that would get a bit ugly.

Arnd

2015-05-13 08:50:05

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 6/8] async_tx: adding mult and sum_product flags

On Tue, May 12, 2015 at 06:05:35PM +0200, Andrew Lunn wrote:
> On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <[email protected]>
> >
> > Some engines (like Marvell mv_xor) do not support mult and sum_product
> > operations as part of the pq support.
> >
> > This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> > flags helps the driver identify such operations.
> >
> > Signed-off-by: Lior Amsalem <[email protected]>
> > Reviewed-by: Ofer Heifetz <[email protected]>
> > Reviewed-by: Nadav Haklai <[email protected]>
> > Signed-off-by: Maxime Ripard <[email protected]>
> > ---
> > crypto/async_tx/async_raid6_recov.c | 4 ++--
> > include/linux/dmaengine.h | 4 ++++
> > 2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> > index 934a84981495..2db5486fd873 100644
> > --- a/crypto/async_tx/async_raid6_recov.c
> > +++ b/crypto/async_tx/async_raid6_recov.c
> > @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
> > struct device *dev = dma->dev;
> > dma_addr_t pq[2];
> > struct dma_async_tx_descriptor *tx;
> > - enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > + enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
> >
> > if (submit->flags & ASYNC_TX_FENCE)
> > dma_flags |= DMA_PREP_FENCE;
> > @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
> > dma_addr_t dma_dest[2];
> > struct device *dev = dma->dev;
> > struct dma_async_tx_descriptor *tx;
> > - enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > + enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
> >
> > if (submit->flags & ASYNC_TX_FENCE)
> > dma_flags |= DMA_PREP_FENCE;
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index ad419757241f..f19ecebb4d3f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -174,6 +174,8 @@ struct dma_interleaved_template {
> > * operation it continues the calculation with new sources
> > * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
> > * on the result of this operation
> > + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> > + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
> > */
> > enum dma_ctrl_flags {
> > DMA_PREP_INTERRUPT = (1 << 0),
> > @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
> > DMA_PREP_PQ_DISABLE_Q = (1 << 3),
> > DMA_PREP_CONTINUE = (1 << 4),
> > DMA_PREP_FENCE = (1 << 5),
> > + DMA_PREP_PQ_MULT = (1 << 10),
> > + DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

None. I just forgot to change this. Thanks!

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (2.91 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 09:17:33

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

Hi Dan,

On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> <[email protected]> wrote:
> > Hi,
> >
> > This serie refactors the mv_xor in order to support the latest Armada
> > 38x features, including the PQ support in order to offload the RAID6
> > PQ operations.
> >
> > Not all the PQ operations are supported by the XOR engine, so we had
> > to introduce new async_tx flags in the process to identify
> > un-supported operations.
> >
> > Please note that this is currently not usable because of a possible
> > regression in the RAID stack in 4.1 that is being discussed at the
> > moment here: https://lkml.org/lkml/2015/5/7/527
>
> This is problematic as async_tx is a wart on the dmaengine subsystem
> and needs to be deprecated, I just have yet to find the time to do
> that work. It turns out it was a mistake to hide the device details
> from md, it should be explicitly managing the dma channels, not
> relying on a abstraction api. The async_tx api usage of the
> dma-mapping api is broken in that it relies on overlapping mappings of
> the same address. This happens to work on x86, but on arm it needs
> explicit non-overlapping mappings. I started the work to reference
> count dma-mappings in 3.13, and we need to teach md to use
> dmaengine_unmap_data explicitly. Yielding dma channel management to
> md also results in a more efficient implementation as we can dma_map()
> the stripe cache once rather than per-io. The "async_tx_ack()"
> disaster can also go away when md is explicitly handling channel
> switching.

Even though I'd be very much in favor of deprecating / removing
async_tx, is it something likely to happen soon?

I remember discussing this with Vinod at Plumbers back in October, but
haven't seen anything since then.

If not, I think that we shouldn't really hold back patches to
async_tx, even though we know than in a year from now, it's going to
be gone.

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (2.05 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 16:00:47

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
<[email protected]> wrote:
> Hi Dan,
>
> On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> <[email protected]> wrote:
>> > Hi,
>> >
>> > This serie refactors the mv_xor in order to support the latest Armada
>> > 38x features, including the PQ support in order to offload the RAID6
>> > PQ operations.
>> >
>> > Not all the PQ operations are supported by the XOR engine, so we had
>> > to introduce new async_tx flags in the process to identify
>> > un-supported operations.
>> >
>> > Please note that this is currently not usable because of a possible
>> > regression in the RAID stack in 4.1 that is being discussed at the
>> > moment here: https://lkml.org/lkml/2015/5/7/527
>>
>> This is problematic as async_tx is a wart on the dmaengine subsystem
>> and needs to be deprecated, I just have yet to find the time to do
>> that work. It turns out it was a mistake to hide the device details
>> from md, it should be explicitly managing the dma channels, not
>> relying on a abstraction api. The async_tx api usage of the
>> dma-mapping api is broken in that it relies on overlapping mappings of
>> the same address. This happens to work on x86, but on arm it needs
>> explicit non-overlapping mappings. I started the work to reference
>> count dma-mappings in 3.13, and we need to teach md to use
>> dmaengine_unmap_data explicitly. Yielding dma channel management to
>> md also results in a more efficient implementation as we can dma_map()
>> the stripe cache once rather than per-io. The "async_tx_ack()"
>> disaster can also go away when md is explicitly handling channel
>> switching.
>
> Even though I'd be very much in favor of deprecating / removing
> async_tx, is it something likely to happen soon?

Not unless someone else takes it on, I'm actively asking for help.

> I remember discussing this with Vinod at Plumbers back in October, but
> haven't seen anything since then.

Right, "help!" :)

> If not, I think that we shouldn't really hold back patches to
> async_tx, even though we know than in a year from now, it's going to
> be gone.

We definitely should block new usages, because they make a bad
situation worse. Russell already warned that the dma_mapping api
abuse could lead to data corruption on ARM (speculative pre-fetching).
We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
BROKEN" until we can get this resolved.

2015-05-18 09:15:04

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

Hi Dan,

On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> <[email protected]> wrote:
> > Hi Dan,
> >
> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> > 38x features, including the PQ support in order to offload the RAID6
> >> > PQ operations.
> >> >
> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> > to introduce new async_tx flags in the process to identify
> >> > un-supported operations.
> >> >
> >> > Please note that this is currently not usable because of a possible
> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >>
> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> and needs to be deprecated, I just have yet to find the time to do
> >> that work. It turns out it was a mistake to hide the device details
> >> from md, it should be explicitly managing the dma channels, not
> >> relying on a abstraction api. The async_tx api usage of the
> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> the same address. This happens to work on x86, but on arm it needs
> >> explicit non-overlapping mappings. I started the work to reference
> >> count dma-mappings in 3.13, and we need to teach md to use
> >> dmaengine_unmap_data explicitly. Yielding dma channel management to
> >> md also results in a more efficient implementation as we can dma_map()
> >> the stripe cache once rather than per-io. The "async_tx_ack()"
> >> disaster can also go away when md is explicitly handling channel
> >> switching.
> >
> > Even though I'd be very much in favor of deprecating / removing
> > async_tx, is it something likely to happen soon?
>
> Not unless someone else takes it on, I'm actively asking for help.
>
> > I remember discussing this with Vinod at Plumbers back in October, but
> > haven't seen anything since then.
>
> Right, "help!" :)
>
> > If not, I think that we shouldn't really hold back patches to
> > async_tx, even though we know than in a year from now, it's going to
> > be gone.
>
> We definitely should block new usages, because they make a bad
> situation worse. Russell already warned that the dma_mapping api
> abuse could lead to data corruption on ARM (speculative pre-fetching).
> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> BROKEN" until we can get this resolved.

I'm not sure what the issues exactly are with async_tx and ARM, but
these patches have been tested on ARM and are working quite well.

What I'm doing here is merely using the existing API, I'm not making
it worse, just using the API that is used by numerous drivers
already. So I'm not sure this is really reasonable to ask for such a
huge rework (with a huge potential of regressions) before merging my
patches.

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (3.15 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-18 17:06:55

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
<[email protected]> wrote:
> Hi Dan,
>
> On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> <[email protected]> wrote:
>> > Hi Dan,
>> >
>> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> <[email protected]> wrote:
>> >> > Hi,
>> >> >
>> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> > PQ operations.
>> >> >
>> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> > to introduce new async_tx flags in the process to identify
>> >> > un-supported operations.
>> >> >
>> >> > Please note that this is currently not usable because of a possible
>> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >>
>> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> and needs to be deprecated, I just have yet to find the time to do
>> >> that work. It turns out it was a mistake to hide the device details
>> >> from md, it should be explicitly managing the dma channels, not
>> >> relying on a abstraction api. The async_tx api usage of the
>> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> the same address. This happens to work on x86, but on arm it needs
>> >> explicit non-overlapping mappings. I started the work to reference
>> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> dmaengine_unmap_data explicitly. Yielding dma channel management to
>> >> md also results in a more efficient implementation as we can dma_map()
>> >> the stripe cache once rather than per-io. The "async_tx_ack()"
>> >> disaster can also go away when md is explicitly handling channel
>> >> switching.
>> >
>> > Even though I'd be very much in favor of deprecating / removing
>> > async_tx, is it something likely to happen soon?
>>
>> Not unless someone else takes it on, I'm actively asking for help.
>>
>> > I remember discussing this with Vinod at Plumbers back in October, but
>> > haven't seen anything since then.
>>
>> Right, "help!" :)
>>
>> > If not, I think that we shouldn't really hold back patches to
>> > async_tx, even though we know than in a year from now, it's going to
>> > be gone.
>>
>> We definitely should block new usages, because they make a bad
>> situation worse. Russell already warned that the dma_mapping api
>> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> BROKEN" until we can get this resolved.
>
> I'm not sure what the issues exactly are with async_tx and ARM, but
> these patches have been tested on ARM and are working quite well.

https://lkml.org/lkml/2011/7/8/363

> What I'm doing here is merely using the existing API, I'm not making
> it worse, just using the API that is used by numerous drivers
> already. So I'm not sure this is really reasonable to ask for such a
> huge rework (with a huge potential of regressions) before merging my
> patches.

It happens.

https://lwn.net/Articles/641443/

I'm not happy about not having had the time to do this rework myself.
Linux is better off with this api deprecated.

2015-05-26 09:45:11

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
> <[email protected]> wrote:
> > Hi Dan,
> >
> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> >> <[email protected]> wrote:
> >> > Hi Dan,
> >> >
> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> >> <[email protected]> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> >> > 38x features, including the PQ support in order to offload the RAID6
> >> >> > PQ operations.
> >> >> >
> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> >> > to introduce new async_tx flags in the process to identify
> >> >> > un-supported operations.
> >> >> >
> >> >> > Please note that this is currently not usable because of a possible
> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >> >>
> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> >> and needs to be deprecated, I just have yet to find the time to do
> >> >> that work. It turns out it was a mistake to hide the device details
> >> >> from md, it should be explicitly managing the dma channels, not
> >> >> relying on a abstraction api. The async_tx api usage of the
> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> >> the same address. This happens to work on x86, but on arm it needs
> >> >> explicit non-overlapping mappings. I started the work to reference
> >> >> count dma-mappings in 3.13, and we need to teach md to use
> >> >> dmaengine_unmap_data explicitly. Yielding dma channel management to
> >> >> md also results in a more efficient implementation as we can dma_map()
> >> >> the stripe cache once rather than per-io. The "async_tx_ack()"
> >> >> disaster can also go away when md is explicitly handling channel
> >> >> switching.
> >> >
> >> > Even though I'd be very much in favor of deprecating / removing
> >> > async_tx, is it something likely to happen soon?
> >>
> >> Not unless someone else takes it on, I'm actively asking for help.
> >>
> >> > I remember discussing this with Vinod at Plumbers back in October, but
> >> > haven't seen anything since then.
> >>
> >> Right, "help!" :)
> >>
> >> > If not, I think that we shouldn't really hold back patches to
> >> > async_tx, even though we know than in a year from now, it's going to
> >> > be gone.
> >>
> >> We definitely should block new usages, because they make a bad
> >> situation worse. Russell already warned that the dma_mapping api
> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> >> BROKEN" until we can get this resolved.
> >
> > I'm not sure what the issues exactly are with async_tx and ARM, but
> > these patches have been tested on ARM and are working quite well.
>
> https://lkml.org/lkml/2011/7/8/363
>
> > What I'm doing here is merely using the existing API, I'm not making
> > it worse, just using the API that is used by numerous drivers
> > already. So I'm not sure this is really reasonable to ask for such a
> > huge rework (with a huge potential of regressions) before merging my
> > patches.
>
> It happens.
>
> https://lwn.net/Articles/641443/

It really depends on what you mean by "help". If you mean "undertake
all by yourself the removal of async tx", then no, sorry, I won't,
especially when you ask to do that for a patch that just enables a
feature of an API already used on that platform.

If you mean, "give me a hand, you can start there", then yeah, I can
do that.

> I'm not happy about not having had the time to do this rework myself.
> Linux is better off with this api deprecated.

You're not talking about deprecating it, you're talking about removing
it entirely.

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (0.00 B)
(No filename) (176.00 B)
Download all attachments

2015-05-26 16:31:03

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

[ adding Boaz as this discussion has implications for ore_raid ]

On Tue, May 26, 2015 at 2:45 AM, Maxime Ripard
<[email protected]> wrote:
> On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
>> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
>> <[email protected]> wrote:
>> > Hi Dan,
>> >
>> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> >> <[email protected]> wrote:
>> >> > Hi Dan,
>> >> >
>> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> >> <[email protected]> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> >> > PQ operations.
>> >> >> >
>> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> >> > to introduce new async_tx flags in the process to identify
>> >> >> > un-supported operations.
>> >> >> >
>> >> >> > Please note that this is currently not usable because of a possible
>> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >> >>
>> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> >> and needs to be deprecated, I just have yet to find the time to do
>> >> >> that work. It turns out it was a mistake to hide the device details
>> >> >> from md, it should be explicitly managing the dma channels, not
>> >> >> relying on a abstraction api. The async_tx api usage of the
>> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> >> the same address. This happens to work on x86, but on arm it needs
>> >> >> explicit non-overlapping mappings. I started the work to reference
>> >> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> >> dmaengine_unmap_data explicitly. Yielding dma channel management to
>> >> >> md also results in a more efficient implementation as we can dma_map()
>> >> >> the stripe cache once rather than per-io. The "async_tx_ack()"
>> >> >> disaster can also go away when md is explicitly handling channel
>> >> >> switching.
>> >> >
>> >> > Even though I'd be very much in favor of deprecating / removing
>> >> > async_tx, is it something likely to happen soon?
>> >>
>> >> Not unless someone else takes it on, I'm actively asking for help.
>> >>
>> >> > I remember discussing this with Vinod at Plumbers back in October, but
>> >> > haven't seen anything since then.
>> >>
>> >> Right, "help!" :)
>> >>
>> >> > If not, I think that we shouldn't really hold back patches to
>> >> > async_tx, even though we know than in a year from now, it's going to
>> >> > be gone.
>> >>
>> >> We definitely should block new usages, because they make a bad
>> >> situation worse. Russell already warned that the dma_mapping api
>> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> >> BROKEN" until we can get this resolved.
>> >
>> > I'm not sure what the issues exactly are with async_tx and ARM, but
>> > these patches have been tested on ARM and are working quite well.
>>
>> https://lkml.org/lkml/2011/7/8/363
>>
>> > What I'm doing here is merely using the existing API, I'm not making
>> > it worse, just using the API that is used by numerous drivers
>> > already. So I'm not sure this is really reasonable to ask for such a
>> > huge rework (with a huge potential of regressions) before merging my
>> > patches.
>>
>> It happens.
>>
>> https://lwn.net/Articles/641443/
>
> It really depends on what you mean by "help". If you mean "undertake
> all by yourself the removal of async tx", then no, sorry, I won't,
> especially when you ask to do that for a patch that just enables a
> feature of an API already used on that platform.

...a potentially broken feature. Are you sure that this speculative
prefetch problem does not affect your implementation?

> If you mean, "give me a hand, you can start there", then yeah, I can
> do that.
>
>> I'm not happy about not having had the time to do this rework myself.
>> Linux is better off with this api deprecated.
>
> You're not talking about deprecating it, you're talking about removing
> it entirely.

True, and adding more users makes that removal more difficult. I'm
willing to help out on the design and review for this work, I just
can't commit to doing the implementation and testing.

I think it looks something like this:

At init time the raid456 driver probes for offload resources It can
discover several scenarios:

1/ "the ioatdma case": raid channels that have all the necessary
operations (copy, xor/pq, xor/pq check). In this case we'll never
need to perform a channel switch. Potentially the cpu never touches
the stripe cache in this case and we can maintain a static dma mapping
for the entire lifespan of a struct stripe_head.

2/ "the channel switch case": All the necessary offload resources are
available but span multiple devices. In this case we need to wait for
channel1 to complete an operation before channel2 can start. This
case is complicated by the fact that different channels may need their
own dma mappings. In the simplest case channels can share the same
mapping and raid456 needs to wait for channel completions. I think we
can do a better job than the async_tx api here as raid456 should
probably poll for completions after each stripe processing batch.
Taking an interrupt per channel-switch event seems like excessive
overhead.

3/ "the co-op case": We have a xor/pq offload resource, but copy and
check operations require the cpu to touch the stripe cache. In this
case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
to pass buffers back and forth between device and cpu ownership. This
shares some of the complexity of waiting for completions with scenario
2.

Which scenario does your implementation fall into? Maybe we can focus
on that one and leave the other scenarios for other dmaengine
maintainers to jump in an implement?

2015-05-27 11:52:38

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On 05/26/2015 07:31 PM, Dan Williams wrote:
> [ adding Boaz as this discussion has implications for ore_raid ]
>
<>
>> You're not talking about deprecating it, you're talking about removing
>> it entirely.
>
> True, and adding more users makes that removal more difficult. I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
>
<>

Hi

So for ore_raid, Yes it uses both xor and pq functions, and I expect
that to work also after the API changes.

That said, I never really cared for the HW offload engines of these
APIs. Actually I never met any. On a modern machine I always got
the DCE/MMX kick in or one of the other CPU variants. With preliminary
testing of XOR I got an almost memory speed for xor (read n pages
+ write one) So with multy-core CPUs I fail to see how an HW do
better, memory caching and all. The PQ was not that far behind.

All I need is an abstract API that gives me the best implementation
on any ARCH / configuration. Actually the async_tx API is a pain
and a sync API would make things simple. I do not use the concurrent
async submit, wait later at all. I submit then wait.

So anything you change this to, as long as you keep the wonderful
dce implementation is good with me, just that the code keeps running
after the new API is fine with me.

(And the common structures between XOR and PQ was also nice, but I
can also use a union, its always either or in ore_raid)

Once you make API changes and modify code, CC me I'll run tests

good luck
Boaz

2015-06-02 14:45:05

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading

On Tue, May 26, 2015 at 09:31:03AM -0700, Dan Williams wrote:
> > If you mean, "give me a hand, you can start there", then yeah, I can
> > do that.
> >
> >> I'm not happy about not having had the time to do this rework myself.
> >> Linux is better off with this api deprecated.
> >
> > You're not talking about deprecating it, you're talking about removing
> > it entirely.
>
> True, and adding more users makes that removal more difficult. I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
>
> I think it looks something like this:
>
> At init time the raid456 driver probes for offload resources It can
> discover several scenarios:
>
> 1/ "the ioatdma case": raid channels that have all the necessary
> operations (copy, xor/pq, xor/pq check). In this case we'll never
> need to perform a channel switch. Potentially the cpu never touches
> the stripe cache in this case and we can maintain a static dma mapping
> for the entire lifespan of a struct stripe_head.
>
> 2/ "the channel switch case": All the necessary offload resources are
> available but span multiple devices. In this case we need to wait for
> channel1 to complete an operation before channel2 can start. This
> case is complicated by the fact that different channels may need their
> own dma mappings. In the simplest case channels can share the same
> mapping and raid456 needs to wait for channel completions. I think we
> can do a better job than the async_tx api here as raid456 should
> probably poll for completions after each stripe processing batch.
> Taking an interrupt per channel-switch event seems like excessive
> overhead.
>
> 3/ "the co-op case": We have a xor/pq offload resource, but copy and
> check operations require the cpu to touch the stripe cache. In this
> case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
> to pass buffers back and forth between device and cpu ownership. This
> shares some of the complexity of waiting for completions with scenario
> 2.
>
> Which scenario does your implementation fall into? Maybe we can focus
> on that one and leave the other scenarios for other dmaengine
> maintainers to jump in an implement?

From my limited understanding of RAID and PQ computations, it would be
3 with a twist.

Our hardware controller supports xor and PQ, but the checks and
recovering data is not supported (we're not able to offload async_mult
and async_sum_product).

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Attachments:
(No filename) (2.55 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments