2015-02-20 16:21:45

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 0/5] crypto: talitos: Add crypto async queue handling

I was testing dm-crypt performance with a Freescale P1022 board with
a recent kernel and was getting IO errors while doing testing with LUKS.
Investigation showed that all hardware FIFO slots were filling and
the driver was returning EAGAIN to the block layer, which is not an
expected response for an async crypto implementation.

The following patch series adds a few small fixes, and reworks the
submission path to use the crypto_queue mechanism to handle the
request backlog.


Martin Hicks (5):
crypto: talitos: Simplify per-channel initialization
crypto: talitos: Remove MD5_BLOCK_SIZE
crypto: talitos: Fix off-by-one and use all hardware slots
crypto: talitos: Reorganize request submission data structures
crypto: talitos: Add software backlog queue handling

drivers/crypto/talitos.c | 189 ++++++++++++++++++++++++----------------------
drivers/crypto/talitos.h | 44 +++++++++--
2 files changed, 137 insertions(+), 96 deletions(-)

--
1.7.10.4


2015-02-20 16:21:51

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE

This is properly defined in the md5 header file.
---
drivers/crypto/talitos.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index c49d977..89cf4d5 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -637,8 +637,6 @@ static void talitos_unregister_rng(struct device *dev)
#define TALITOS_MAX_KEY_SIZE 96
#define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */

-#define MD5_BLOCK_SIZE 64
-
struct talitos_ctx {
struct device *dev;
int ch;
@@ -2195,7 +2193,7 @@ static struct talitos_alg_template driver_algs[] = {
.halg.base = {
.cra_name = "md5",
.cra_driver_name = "md5-talitos",
- .cra_blocksize = MD5_BLOCK_SIZE,
+ .cra_blocksize = MD5_HMAC_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_AHASH |
CRYPTO_ALG_ASYNC,
}
@@ -2285,7 +2283,7 @@ static struct talitos_alg_template driver_algs[] = {
.halg.base = {
.cra_name = "hmac(md5)",
.cra_driver_name = "hmac-md5-talitos",
- .cra_blocksize = MD5_BLOCK_SIZE,
+ .cra_blocksize = MD5_HMAC_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_AHASH |
CRYPTO_ALG_ASYNC,
}
--
1.7.10.4

2015-02-20 16:21:48

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 1/5] crypto: talitos: Simplify per-channel initialization

There were multiple loops in a row, for each separate step of the
initialization of the channels. Simplify to a single loop.

Signed-off-by: Martin Hicks <[email protected]>
---
drivers/crypto/talitos.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 067ec21..c49d977 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2706,20 +2706,16 @@ static int talitos_probe(struct platform_device *ofdev)
goto err_out;
}

+ priv->fifo_len = roundup_pow_of_two(priv->chfifo_len);
+
for (i = 0; i < priv->num_channels; i++) {
priv->chan[i].reg = priv->reg + TALITOS_CH_STRIDE * (i + 1);
if (!priv->irq[1] || !(i & 1))
priv->chan[i].reg += TALITOS_CH_BASE_OFFSET;
- }

- for (i = 0; i < priv->num_channels; i++) {
spin_lock_init(&priv->chan[i].head_lock);
spin_lock_init(&priv->chan[i].tail_lock);
- }

- priv->fifo_len = roundup_pow_of_two(priv->chfifo_len);
-
- for (i = 0; i < priv->num_channels; i++) {
priv->chan[i].fifo = kzalloc(sizeof(struct talitos_request) *
priv->fifo_len, GFP_KERNEL);
if (!priv->chan[i].fifo) {
@@ -2727,11 +2723,10 @@ static int talitos_probe(struct platform_device *ofdev)
err = -ENOMEM;
goto err_out;
}
- }

- for (i = 0; i < priv->num_channels; i++)
atomic_set(&priv->chan[i].submit_count,
-(priv->chfifo_len - 1));
+ }

dma_set_mask(dev, DMA_BIT_MASK(36));

--
1.7.10.4

2015-02-20 16:21:53

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 3/5] crypto: talitos: Fix off-by-one and use all hardware slots

The submission count was off by one.

Signed-off-by: Martin Hicks <[email protected]>
---
drivers/crypto/talitos.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 89cf4d5..7709805 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2722,8 +2722,7 @@ static int talitos_probe(struct platform_device *ofdev)
goto err_out;
}

- atomic_set(&priv->chan[i].submit_count,
- -(priv->chfifo_len - 1));
+ atomic_set(&priv->chan[i].submit_count, -priv->chfifo_len);
}

dma_set_mask(dev, DMA_BIT_MASK(36));
--
1.7.10.4

2015-02-20 16:21:55

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 4/5] crypto: talitos: Reorganize request submission data structures

This is preparatory work for moving to using the crypto async queue
handling code. A talitos_request structure is buried into each
talitos_edesc so that when talitos_submit() is called, everything required
to defer the submission to the hardware is contained within talitos_edesc.

Signed-off-by: Martin Hicks <[email protected]>
---
drivers/crypto/talitos.c | 93 +++++++++++++++-------------------------------
drivers/crypto/talitos.h | 41 +++++++++++++++++---
2 files changed, 65 insertions(+), 69 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 7709805..d3472be 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -186,22 +186,17 @@ static int init_device(struct device *dev)
* talitos_submit - submits a descriptor to the device for processing
* @dev: the SEC device to be used
* @ch: the SEC device channel to be used
- * @desc: the descriptor to be processed by the device
- * @callback: whom to call when processing is complete
+ * @edesc: the descriptor to be processed by the device
* @context: a handle for use by caller (optional)
*
* desc must contain valid dma-mapped (bus physical) address pointers.
* callback must check err and feedback in descriptor header
* for device processing status.
*/
-int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc,
- void (*callback)(struct device *dev,
- struct talitos_desc *desc,
- void *context, int error),
- void *context)
+int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
{
struct talitos_private *priv = dev_get_drvdata(dev);
- struct talitos_request *request;
+ struct talitos_request *request = &edesc->req;
unsigned long flags;
int head;

@@ -214,19 +209,15 @@ int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc,
}

head = priv->chan[ch].head;
- request = &priv->chan[ch].fifo[head];
-
- /* map descriptor and save caller data */
- request->dma_desc = dma_map_single(dev, desc, sizeof(*desc),
+ request->dma_desc = dma_map_single(dev, request->desc,
+ sizeof(*request->desc),
DMA_BIDIRECTIONAL);
- request->callback = callback;
- request->context = context;

/* increment fifo head */
priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);

smp_wmb();
- request->desc = desc;
+ priv->chan[ch].fifo[head] = request;

/* GO! */
wmb();
@@ -247,15 +238,16 @@ EXPORT_SYMBOL(talitos_submit);
static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
{
struct talitos_private *priv = dev_get_drvdata(dev);
- struct talitos_request *request, saved_req;
+ struct talitos_request *request;
unsigned long flags;
int tail, status;

spin_lock_irqsave(&priv->chan[ch].tail_lock, flags);

tail = priv->chan[ch].tail;
- while (priv->chan[ch].fifo[tail].desc) {
- request = &priv->chan[ch].fifo[tail];
+ while (priv->chan[ch].fifo[tail]) {
+ request = priv->chan[ch].fifo[tail];
+ status = 0;

/* descriptors with their done bits set don't get the error */
rmb();
@@ -271,14 +263,9 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
sizeof(struct talitos_desc),
DMA_BIDIRECTIONAL);

- /* copy entries so we can call callback outside lock */
- saved_req.desc = request->desc;
- saved_req.callback = request->callback;
- saved_req.context = request->context;
-
/* release request entry in fifo */
smp_wmb();
- request->desc = NULL;
+ priv->chan[ch].fifo[tail] = NULL;

/* increment fifo tail */
priv->chan[ch].tail = (tail + 1) & (priv->fifo_len - 1);
@@ -287,8 +274,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch)

atomic_dec(&priv->chan[ch].submit_count);

- saved_req.callback(dev, saved_req.desc, saved_req.context,
- status);
+ request->callback(dev, request->desc, request->context, status);
+
/* channel may resume processing in single desc error case */
if (error && !reset_ch && status == error)
return;
@@ -352,7 +339,8 @@ static u32 current_desc_hdr(struct device *dev, int ch)
tail = priv->chan[ch].tail;

iter = tail;
- while (priv->chan[ch].fifo[iter].dma_desc != cur_desc) {
+ while (priv->chan[ch].fifo[iter] &&
+ priv->chan[ch].fifo[iter]->dma_desc != cur_desc) {
iter = (iter + 1) & (priv->fifo_len - 1);
if (iter == tail) {
dev_err(dev, "couldn't locate current descriptor\n");
@@ -360,7 +348,7 @@ static u32 current_desc_hdr(struct device *dev, int ch)
}
}

- return priv->chan[ch].fifo[iter].desc->hdr;
+ return priv->chan[ch].fifo[iter] ? priv->chan[ch].fifo[iter]->desc->hdr : 0;
}

/*
@@ -702,37 +690,6 @@ badkey:
return -EINVAL;
}

-/*
- * talitos_edesc - s/w-extended descriptor
- * @assoc_nents: number of segments in associated data scatterlist
- * @src_nents: number of segments in input scatterlist
- * @dst_nents: number of segments in output scatterlist
- * @assoc_chained: whether assoc is chained or not
- * @src_chained: whether src is chained or not
- * @dst_chained: whether dst is chained or not
- * @iv_dma: dma address of iv for checking continuity and link table
- * @dma_len: length of dma mapped link_tbl space
- * @dma_link_tbl: bus physical address of link_tbl
- * @desc: h/w descriptor
- * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1)
- *
- * if decrypting (with authcheck), or either one of src_nents or dst_nents
- * is greater than 1, an integrity check value is concatenated to the end
- * of link_tbl data
- */
-struct talitos_edesc {
- int assoc_nents;
- int src_nents;
- int dst_nents;
- bool assoc_chained;
- bool src_chained;
- bool dst_chained;
- dma_addr_t iv_dma;
- int dma_len;
- dma_addr_t dma_link_tbl;
- struct talitos_desc desc;
- struct talitos_ptr link_tbl[0];
-};

static int talitos_map_sg(struct device *dev, struct scatterlist *sg,
unsigned int nents, enum dma_data_direction dir,
@@ -1078,7 +1035,10 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq,
map_single_talitos_ptr(dev, &desc->ptr[6], ivsize, ctx->iv, 0,
DMA_FROM_DEVICE);

- ret = talitos_submit(dev, ctx->ch, desc, callback, areq);
+ edesc->req.callback = callback;
+ edesc->req.context = areq;
+
+ ret = talitos_submit(dev, ctx->ch, edesc);
if (ret != -EINPROGRESS) {
ipsec_esp_unmap(dev, edesc, areq);
kfree(edesc);
@@ -1209,6 +1169,7 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
edesc->dma_link_tbl = dma_map_single(dev, &edesc->link_tbl[0],
edesc->dma_len,
DMA_BIDIRECTIONAL);
+ edesc->req.desc = &edesc->desc;

return edesc;
}
@@ -1449,7 +1410,10 @@ static int common_nonsnoop(struct talitos_edesc *edesc,
to_talitos_ptr(&desc->ptr[6], 0);
desc->ptr[6].j_extent = 0;

- ret = talitos_submit(dev, ctx->ch, desc, callback, areq);
+ edesc->req.callback = callback;
+ edesc->req.context = areq;
+
+ ret = talitos_submit(dev, ctx->ch, edesc);
if (ret != -EINPROGRESS) {
common_nonsnoop_unmap(dev, edesc, areq);
kfree(edesc);
@@ -1629,7 +1593,10 @@ static int common_nonsnoop_hash(struct talitos_edesc *edesc,
/* last DWORD empty */
desc->ptr[6] = zero_entry;

- ret = talitos_submit(dev, ctx->ch, desc, callback, areq);
+ edesc->req.callback = callback;
+ edesc->req.context = areq;
+
+ ret = talitos_submit(dev, ctx->ch, edesc);
if (ret != -EINPROGRESS) {
common_nonsnoop_hash_unmap(dev, edesc, areq);
kfree(edesc);
@@ -2714,7 +2681,7 @@ static int talitos_probe(struct platform_device *ofdev)
spin_lock_init(&priv->chan[i].head_lock);
spin_lock_init(&priv->chan[i].tail_lock);

- priv->chan[i].fifo = kzalloc(sizeof(struct talitos_request) *
+ priv->chan[i].fifo = kzalloc(sizeof(struct talitos_request *) *
priv->fifo_len, GFP_KERNEL);
if (!priv->chan[i].fifo) {
dev_err(dev, "failed to allocate request fifo %d\n", i);
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 61a1405..91faa76 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -77,7 +77,7 @@ struct talitos_channel {
void __iomem *reg;

/* request fifo */
- struct talitos_request *fifo;
+ struct talitos_request **fifo;

/* number of requests pending in channel h/w fifo */
atomic_t submit_count ____cacheline_aligned;
@@ -133,11 +133,40 @@ struct talitos_private {
struct hwrng rng;
};

-extern int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc,
- void (*callback)(struct device *dev,
- struct talitos_desc *desc,
- void *context, int error),
- void *context);
+/*
+ * talitos_edesc - s/w-extended descriptor
+ * @assoc_nents: number of segments in associated data scatterlist
+ * @src_nents: number of segments in input scatterlist
+ * @dst_nents: number of segments in output scatterlist
+ * @assoc_chained: whether assoc is chained or not
+ * @src_chained: whether src is chained or not
+ * @dst_chained: whether dst is chained or not
+ * @iv_dma: dma address of iv for checking continuity and link table
+ * @dma_len: length of dma mapped link_tbl space
+ * @dma_link_tbl: bus physical address of link_tbl
+ * @desc: h/w descriptor
+ * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1)
+ *
+ * if decrypting (with authcheck), or either one of src_nents or dst_nents
+ * is greater than 1, an integrity check value is concatenated to the end
+ * of link_tbl data
+ */
+struct talitos_edesc {
+ struct talitos_request req;
+ int assoc_nents;
+ int src_nents;
+ int dst_nents;
+ bool assoc_chained;
+ bool src_chained;
+ bool dst_chained;
+ dma_addr_t iv_dma;
+ int dma_len;
+ dma_addr_t dma_link_tbl;
+ struct talitos_desc desc;
+ struct talitos_ptr link_tbl[0];
+};
+
+extern int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc);

/* .features flag */
#define TALITOS_FTR_SRC_LINK_TBL_LEN_INCLUDES_EXTENT 0x00000001
--
1.7.10.4

2015-02-20 16:21:57

by Martin Hicks

[permalink] [raw]
Subject: [PATCH 5/5] crypto: talitos: Add software backlog queue handling

I was running into situations where the hardware FIFO was filling up, and
the code was returning EAGAIN to dm-crypt and just dropping the submitted
crypto request.

This adds support in talitos for a software backlog queue. When requests
can't be queued to the hardware immediately EBUSY is returned. The queued
requests are dispatched to the hardware in received order as hardware FIFO
slots become available.

Signed-off-by: Martin Hicks <[email protected]>
---
drivers/crypto/talitos.c | 92 +++++++++++++++++++++++++++++++++++-----------
drivers/crypto/talitos.h | 3 ++
2 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index d3472be..226654c 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -183,43 +183,72 @@ static int init_device(struct device *dev)
}

/**
- * talitos_submit - submits a descriptor to the device for processing
+ * talitos_handle_queue - performs submissions either of new descriptors
+ * or ones waiting in the queue backlog.
* @dev: the SEC device to be used
* @ch: the SEC device channel to be used
- * @edesc: the descriptor to be processed by the device
- * @context: a handle for use by caller (optional)
+ * @edesc: the descriptor to be processed by the device (optional)
*
* desc must contain valid dma-mapped (bus physical) address pointers.
* callback must check err and feedback in descriptor header
- * for device processing status.
+ * for device processing status upon completion.
*/
-int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
+int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc *edesc)
{
struct talitos_private *priv = dev_get_drvdata(dev);
- struct talitos_request *request = &edesc->req;
+ struct talitos_request *request, *orig_request = NULL;
+ struct crypto_async_request *async_req;
unsigned long flags;
int head;
+ int ret = -EINPROGRESS;

spin_lock_irqsave(&priv->chan[ch].head_lock, flags);

+ if (edesc) {
+ orig_request = &edesc->req;
+ crypto_enqueue_request(&priv->chan[ch].queue, &orig_request->base);
+ }
+
+flush_another:
+ if (priv->chan[ch].queue.qlen == 0) {
+ spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
+ return 0;
+ }
+
if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) {
/* h/w fifo is full */
spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
- return -EAGAIN;
+ return -EBUSY;
}

- head = priv->chan[ch].head;
+ /* Dequeue the oldest request */
+ async_req = crypto_dequeue_request(&priv->chan[ch].queue);
+
+ request = container_of(async_req, struct talitos_request, base);
request->dma_desc = dma_map_single(dev, request->desc,
sizeof(*request->desc),
DMA_BIDIRECTIONAL);

/* increment fifo head */
+ head = priv->chan[ch].head;
priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);

- smp_wmb();
- priv->chan[ch].fifo[head] = request;
+ spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
+
+ /*
+ * Mark a backlogged request as in-progress, return EBUSY because
+ * the original request that was submitted is backlogged.
+ */
+ if (request != orig_request) {
+ struct crypto_async_request *areq = request->context;
+ areq->complete(areq, -EINPROGRESS);
+ ret = -EBUSY;
+ }
+
+ spin_lock_irqsave(&priv->chan[ch].head_lock, flags);

/* GO! */
+ priv->chan[ch].fifo[head] = request;
wmb();
out_be32(priv->chan[ch].reg + TALITOS_FF,
upper_32_bits(request->dma_desc));
@@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)

spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);

- return -EINPROGRESS;
+ /*
+ * When handling the queue via the completion path, queue more
+ * requests if the hardware has room.
+ */
+ if (!edesc) {
+ spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
+ goto flush_another;
+ }
+
+ return ret;
}
-EXPORT_SYMBOL(talitos_submit);
+EXPORT_SYMBOL(talitos_handle_queue);

/*
* process what was done, notify callback of error if not
@@ -284,6 +322,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
}

spin_unlock_irqrestore(&priv->chan[ch].tail_lock, flags);
+
+ talitos_handle_queue(dev, ch, NULL);
}

/*
@@ -1038,8 +1078,8 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq,
edesc->req.callback = callback;
edesc->req.context = areq;

- ret = talitos_submit(dev, ctx->ch, edesc);
- if (ret != -EINPROGRESS) {
+ ret = talitos_handle_queue(dev, ctx->ch, edesc);
+ if (ret != -EINPROGRESS && ret != -EBUSY) {
ipsec_esp_unmap(dev, edesc, areq);
kfree(edesc);
}
@@ -1080,6 +1120,7 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
unsigned int ivsize,
int icv_stashing,
u32 cryptoflags,
+ struct crypto_async_request *areq,
bool encrypt)
{
struct talitos_edesc *edesc;
@@ -1170,6 +1211,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
edesc->dma_len,
DMA_BIDIRECTIONAL);
edesc->req.desc = &edesc->desc;
+ /* A copy of the crypto_async_request to use the crypto_queue backlog */
+ memcpy(&edesc->req.base, areq, sizeof(struct crypto_async_request));

return edesc;
}
@@ -1184,7 +1227,7 @@ static struct talitos_edesc *aead_edesc_alloc(struct aead_request *areq, u8 *iv,
return talitos_edesc_alloc(ctx->dev, areq->assoc, areq->src, areq->dst,
iv, areq->assoclen, areq->cryptlen,
ctx->authsize, ivsize, icv_stashing,
- areq->base.flags, encrypt);
+ areq->base.flags, &areq->base, encrypt);
}

static int aead_encrypt(struct aead_request *req)
@@ -1413,8 +1456,8 @@ static int common_nonsnoop(struct talitos_edesc *edesc,
edesc->req.callback = callback;
edesc->req.context = areq;

- ret = talitos_submit(dev, ctx->ch, edesc);
- if (ret != -EINPROGRESS) {
+ ret = talitos_handle_queue(dev, ctx->ch, edesc);
+ if (ret != -EINPROGRESS && ret != -EBUSY) {
common_nonsnoop_unmap(dev, edesc, areq);
kfree(edesc);
}
@@ -1430,7 +1473,7 @@ static struct talitos_edesc *ablkcipher_edesc_alloc(struct ablkcipher_request *

return talitos_edesc_alloc(ctx->dev, NULL, areq->src, areq->dst,
areq->info, 0, areq->nbytes, 0, ivsize, 0,
- areq->base.flags, encrypt);
+ areq->base.flags, &areq->base, encrypt);
}

static int ablkcipher_encrypt(struct ablkcipher_request *areq)
@@ -1596,8 +1639,8 @@ static int common_nonsnoop_hash(struct talitos_edesc *edesc,
edesc->req.callback = callback;
edesc->req.context = areq;

- ret = talitos_submit(dev, ctx->ch, edesc);
- if (ret != -EINPROGRESS) {
+ ret = talitos_handle_queue(dev, ctx->ch, edesc);
+ if (ret != -EINPROGRESS && ret != -EBUSY) {
common_nonsnoop_hash_unmap(dev, edesc, areq);
kfree(edesc);
}
@@ -1612,7 +1655,7 @@ static struct talitos_edesc *ahash_edesc_alloc(struct ahash_request *areq,
struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);

return talitos_edesc_alloc(ctx->dev, NULL, req_ctx->psrc, NULL, NULL, 0,
- nbytes, 0, 0, 0, areq->base.flags, false);
+ nbytes, 0, 0, 0, areq->base.flags, &areq->base, false);
}

static int ahash_init(struct ahash_request *areq)
@@ -2690,6 +2733,13 @@ static int talitos_probe(struct platform_device *ofdev)
}

atomic_set(&priv->chan[i].submit_count, -priv->chfifo_len);
+
+ /*
+ * The crypto_queue is used to manage the backlog only. While
+ * the hardware FIFO has space requests are dispatched
+ * immediately.
+ */
+ crypto_init_queue(&priv->chan[i].queue, 0);
}

dma_set_mask(dev, DMA_BIT_MASK(36));
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 91faa76..a6f73e2 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -65,6 +65,7 @@ struct talitos_desc {
* @context: caller context (optional)
*/
struct talitos_request {
+ struct crypto_async_request base;
struct talitos_desc *desc;
dma_addr_t dma_desc;
void (*callback) (struct device *dev, struct talitos_desc *desc,
@@ -91,6 +92,8 @@ struct talitos_channel {
spinlock_t tail_lock ____cacheline_aligned;
/* index to next in-progress/done descriptor request */
int tail;
+
+ struct crypto_queue queue;
};

struct talitos_private {
--
1.7.10.4

2015-02-20 18:26:21

by Martin Hicks

[permalink] [raw]
Subject: Re: [PATCH 0/5] crypto: talitos: Add crypto async queue handling

Resending to linux-crypto in plain text.
Sorry to everyone else for the duplication.
mh

On Fri, Feb 20, 2015 at 1:23 PM, Martin Hicks <[email protected]> wrote:
>
> I've just noticed that performance is pretty terrible with maxcpus=1, so
> I'll investigate that.
> mh
>
> On Fri, Feb 20, 2015 at 11:21 AM, Martin Hicks <[email protected]> wrote:
>>
>> I was testing dm-crypt performance with a Freescale P1022 board with
>> a recent kernel and was getting IO errors while doing testing with LUKS.
>> Investigation showed that all hardware FIFO slots were filling and
>> the driver was returning EAGAIN to the block layer, which is not an
>> expected response for an async crypto implementation.
>>
>> The following patch series adds a few small fixes, and reworks the
>> submission path to use the crypto_queue mechanism to handle the
>> request backlog.
>>
>>
>> Martin Hicks (5):
>> crypto: talitos: Simplify per-channel initialization
>> crypto: talitos: Remove MD5_BLOCK_SIZE
>> crypto: talitos: Fix off-by-one and use all hardware slots
>> crypto: talitos: Reorganize request submission data structures
>> crypto: talitos: Add software backlog queue handling
>>
>> drivers/crypto/talitos.c | 189
>> ++++++++++++++++++++++++----------------------
>> drivers/crypto/talitos.h | 44 +++++++++--
>> 2 files changed, 137 insertions(+), 96 deletions(-)
>>
>> --
>> 1.7.10.4
>>
>
>
>
> --
> Martin Hicks P.Eng. | [email protected]
> Bork Consulting Inc. | +1 (613) 266-2296



--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296

2015-02-20 18:23:28

by Martin Hicks

[permalink] [raw]
Subject: Re: [PATCH 0/5] crypto: talitos: Add crypto async queue handling

I've just noticed that performance is pretty terrible with maxcpus=1, so
I'll investigate that.
mh

On Fri, Feb 20, 2015 at 11:21 AM, Martin Hicks <[email protected]> wrote:

> I was testing dm-crypt performance with a Freescale P1022 board with
> a recent kernel and was getting IO errors while doing testing with LUKS.
> Investigation showed that all hardware FIFO slots were filling and
> the driver was returning EAGAIN to the block layer, which is not an
> expected response for an async crypto implementation.
>
> The following patch series adds a few small fixes, and reworks the
> submission path to use the crypto_queue mechanism to handle the
> request backlog.
>
>
> Martin Hicks (5):
> crypto: talitos: Simplify per-channel initialization
> crypto: talitos: Remove MD5_BLOCK_SIZE
> crypto: talitos: Fix off-by-one and use all hardware slots
> crypto: talitos: Reorganize request submission data structures
> crypto: talitos: Add software backlog queue handling
>
> drivers/crypto/talitos.c | 189
> ++++++++++++++++++++++++----------------------
> drivers/crypto/talitos.h | 44 +++++++++--
> 2 files changed, 137 insertions(+), 96 deletions(-)
>
> --
> 1.7.10.4
>
>


--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296


Attachments:
(No filename) (150.00 B)

2015-02-24 18:22:25

by Horia Geantă

[permalink] [raw]
Subject: Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling

On 2/20/2015 6:21 PM, Martin Hicks wrote:
> I was running into situations where the hardware FIFO was filling up, and
> the code was returning EAGAIN to dm-crypt and just dropping the submitted
> crypto request.
>
> This adds support in talitos for a software backlog queue. When requests
> can't be queued to the hardware immediately EBUSY is returned. The queued
> requests are dispatched to the hardware in received order as hardware FIFO
> slots become available.
>
> Signed-off-by: Martin Hicks <[email protected]>

Hi Martin,

Thanks for the effort!
Indeed we noticed that talitos (and caam) don't play nicely with
dm-crypt, lacking a backlog mechanism.

Please run checkpatch --strict and fix the errors, warnings.

> ---
> drivers/crypto/talitos.c | 92 +++++++++++++++++++++++++++++++++++-----------
> drivers/crypto/talitos.h | 3 ++
> 2 files changed, 74 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index d3472be..226654c 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -183,43 +183,72 @@ static int init_device(struct device *dev)
> }
>
> /**
> - * talitos_submit - submits a descriptor to the device for processing
> + * talitos_handle_queue - performs submissions either of new descriptors
> + * or ones waiting in the queue backlog.
> * @dev: the SEC device to be used
> * @ch: the SEC device channel to be used
> - * @edesc: the descriptor to be processed by the device
> - * @context: a handle for use by caller (optional)

The "context" kernel-doc should have been removed in patch 4/5.

> + * @edesc: the descriptor to be processed by the device (optional)
> *
> * desc must contain valid dma-mapped (bus physical) address pointers.
> * callback must check err and feedback in descriptor header
> - * for device processing status.
> + * for device processing status upon completion.
> */
> -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
> +int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc *edesc)
> {
> struct talitos_private *priv = dev_get_drvdata(dev);
> - struct talitos_request *request = &edesc->req;
> + struct talitos_request *request, *orig_request = NULL;
> + struct crypto_async_request *async_req;
> unsigned long flags;
> int head;
> + int ret = -EINPROGRESS;
>
> spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
>
> + if (edesc) {
> + orig_request = &edesc->req;
> + crypto_enqueue_request(&priv->chan[ch].queue, &orig_request->base);
> + }

The request goes through the SW queue even if there are empty slots in
the HW queue, doing unnecessary crypto_queue_encrypt() and
crypto_queue_decrypt(). Trying to use the HW queue first would be better.

> +
> +flush_another:
> + if (priv->chan[ch].queue.qlen == 0) {
> + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> + return 0;
> + }
> +
> if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) {
> /* h/w fifo is full */
> spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> - return -EAGAIN;
> + return -EBUSY;
> }
>
> - head = priv->chan[ch].head;
> + /* Dequeue the oldest request */
> + async_req = crypto_dequeue_request(&priv->chan[ch].queue);
> +
> + request = container_of(async_req, struct talitos_request, base);
> request->dma_desc = dma_map_single(dev, request->desc,
> sizeof(*request->desc),
> DMA_BIDIRECTIONAL);
>
> /* increment fifo head */
> + head = priv->chan[ch].head;
> priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);
>
> - smp_wmb();
> - priv->chan[ch].fifo[head] = request;
> + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> +
> + /*
> + * Mark a backlogged request as in-progress, return EBUSY because
> + * the original request that was submitted is backlogged.

s/is backlogged/is backlogged or dropped
Original request will not be enqueued by crypto_queue_enqueue() if the
CRYPTO_TFM_REQ_MAY_BACKLOG flag is not set (since SW queue is for
backlog only) - that's the case for IPsec requests.

> + */
> + if (request != orig_request) {
> + struct crypto_async_request *areq = request->context;
> + areq->complete(areq, -EINPROGRESS);
> + ret = -EBUSY;
> + }
> +
> + spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
>
> /* GO! */
> + priv->chan[ch].fifo[head] = request;
> wmb();
> out_be32(priv->chan[ch].reg + TALITOS_FF,
> upper_32_bits(request->dma_desc));
> @@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
>
> spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
>
> - return -EINPROGRESS;
> + /*
> + * When handling the queue via the completion path, queue more
> + * requests if the hardware has room.
> + */
> + if (!edesc) {
> + spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
> + goto flush_another;
> + }

This is better - avoids releasing and reacquiring the lock:

if (!edesc) {
goto flush_another;
}

spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);


> +
> + return ret;
> }
> -EXPORT_SYMBOL(talitos_submit);
> +EXPORT_SYMBOL(talitos_handle_queue);
>
> /*
> * process what was done, notify callback of error if not
> @@ -284,6 +322,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
> }
>
> spin_unlock_irqrestore(&priv->chan[ch].tail_lock, flags);
> +
> + talitos_handle_queue(dev, ch, NULL);
> }
>
> /*
> @@ -1038,8 +1078,8 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq,
> edesc->req.callback = callback;
> edesc->req.context = areq;
>
> - ret = talitos_submit(dev, ctx->ch, edesc);
> - if (ret != -EINPROGRESS) {
> + ret = talitos_handle_queue(dev, ctx->ch, edesc);
> + if (ret != -EINPROGRESS && ret != -EBUSY) {

Again, factor in CRYPTO_TFM_REQ_MAY_BACKLOG.
Only when talitos_handle_queue() returns -EBUSY *and*
CRYPTO_TFM_REQ_MAY_BACKLOG is set, the request is backlogged.

Thus the 2nd condition should be:
!(ret == -EBUSY && areq->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)

Other places need similar fix.


> ipsec_esp_unmap(dev, edesc, areq);
> kfree(edesc);
> }
> @@ -1080,6 +1120,7 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
> unsigned int ivsize,
> int icv_stashing,
> u32 cryptoflags,
> + struct crypto_async_request *areq,
> bool encrypt)
> {
> struct talitos_edesc *edesc;
> @@ -1170,6 +1211,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
> edesc->dma_len,
> DMA_BIDIRECTIONAL);
> edesc->req.desc = &edesc->desc;
> + /* A copy of the crypto_async_request to use the crypto_queue backlog */
> + memcpy(&edesc->req.base, areq, sizeof(struct crypto_async_request));

Why not have a
struct crypto_async_request *base;
instead of
struct crypto_async_request base;
in talitos_request structure?

This way:
1. memcopy above is avoided
2. talitos_request structure gets smaller - remember that
talitos_request is now embedded in talitos_edesc structure, which gets
kmalloc-ed for every request

That would be similar to previous driver behaviour.

Caller is expected not to destroy the request if the return code from
the Crypto API / backend driver is -EINPROGRESS or -EBUSY (when
MAY_BACKLOG flag is set).

>
> return edesc;
> }
> @@ -1184,7 +1227,7 @@ static struct talitos_edesc *aead_edesc_alloc(struct aead_request *areq, u8 *iv,
> return talitos_edesc_alloc(ctx->dev, areq->assoc, areq->src, areq->dst,
> iv, areq->assoclen, areq->cryptlen,
> ctx->authsize, ivsize, icv_stashing,
> - areq->base.flags, encrypt);
> + areq->base.flags, &areq->base, encrypt);
> }
>
> static int aead_encrypt(struct aead_request *req)
> @@ -1413,8 +1456,8 @@ static int common_nonsnoop(struct talitos_edesc *edesc,
> edesc->req.callback = callback;
> edesc->req.context = areq;
>
> - ret = talitos_submit(dev, ctx->ch, edesc);
> - if (ret != -EINPROGRESS) {
> + ret = talitos_handle_queue(dev, ctx->ch, edesc);
> + if (ret != -EINPROGRESS && ret != -EBUSY) {
> common_nonsnoop_unmap(dev, edesc, areq);
> kfree(edesc);
> }
> @@ -1430,7 +1473,7 @@ static struct talitos_edesc *ablkcipher_edesc_alloc(struct ablkcipher_request *
>
> return talitos_edesc_alloc(ctx->dev, NULL, areq->src, areq->dst,
> areq->info, 0, areq->nbytes, 0, ivsize, 0,
> - areq->base.flags, encrypt);
> + areq->base.flags, &areq->base, encrypt);
> }
>
> static int ablkcipher_encrypt(struct ablkcipher_request *areq)
> @@ -1596,8 +1639,8 @@ static int common_nonsnoop_hash(struct talitos_edesc *edesc,
> edesc->req.callback = callback;
> edesc->req.context = areq;
>
> - ret = talitos_submit(dev, ctx->ch, edesc);
> - if (ret != -EINPROGRESS) {
> + ret = talitos_handle_queue(dev, ctx->ch, edesc);
> + if (ret != -EINPROGRESS && ret != -EBUSY) {
> common_nonsnoop_hash_unmap(dev, edesc, areq);
> kfree(edesc);
> }
> @@ -1612,7 +1655,7 @@ static struct talitos_edesc *ahash_edesc_alloc(struct ahash_request *areq,
> struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
>
> return talitos_edesc_alloc(ctx->dev, NULL, req_ctx->psrc, NULL, NULL, 0,
> - nbytes, 0, 0, 0, areq->base.flags, false);
> + nbytes, 0, 0, 0, areq->base.flags, &areq->base, false);
> }
>
> static int ahash_init(struct ahash_request *areq)
> @@ -2690,6 +2733,13 @@ static int talitos_probe(struct platform_device *ofdev)
> }
>
> atomic_set(&priv->chan[i].submit_count, -priv->chfifo_len);
> +
> + /*
> + * The crypto_queue is used to manage the backlog only. While
> + * the hardware FIFO has space requests are dispatched
> + * immediately.
> + */
> + crypto_init_queue(&priv->chan[i].queue, 0);
> }
>
> dma_set_mask(dev, DMA_BIT_MASK(36));
> diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
> index 91faa76..a6f73e2 100644
> --- a/drivers/crypto/talitos.h
> +++ b/drivers/crypto/talitos.h
> @@ -65,6 +65,7 @@ struct talitos_desc {
> * @context: caller context (optional)
> */
> struct talitos_request {
> + struct crypto_async_request base;
> struct talitos_desc *desc;
> dma_addr_t dma_desc;
> void (*callback) (struct device *dev, struct talitos_desc *desc,
> @@ -91,6 +92,8 @@ struct talitos_channel {
> spinlock_t tail_lock ____cacheline_aligned;
> /* index to next in-progress/done descriptor request */
> int tail;
> +
> + struct crypto_queue queue;
> };
>
> struct talitos_private {
>

2015-02-26 19:22:39

by Martin Hicks

[permalink] [raw]
Subject: Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling

Hi Horia,

On Tue, Feb 24, 2015 at 1:21 PM, Horia Geantă
<[email protected]> wrote:
> On 2/20/2015 6:21 PM, Martin Hicks wrote:
>> + int ret = -EINPROGRESS;
>>
>> spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
>>
>> + if (edesc) {
>> + orig_request = &edesc->req;
>> + crypto_enqueue_request(&priv->chan[ch].queue, &orig_request->base);
>> + }
>
> The request goes through the SW queue even if there are empty slots in
> the HW queue, doing unnecessary crypto_queue_encrypt() and
> crypto_queue_decrypt(). Trying to use the HW queue first would be better.
>

Definitely a valid point. In trying to reorganize this it really
complicates things.
Splitting the submit from issuing requests from the backlog seems to
make it much more straightforward.

>> @@ -1170,6 +1211,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev,
>> edesc->dma_len,
>> DMA_BIDIRECTIONAL);
>> edesc->req.desc = &edesc->desc;
>> + /* A copy of the crypto_async_request to use the crypto_queue backlog */
>> + memcpy(&edesc->req.base, areq, sizeof(struct crypto_async_request));
>
> Why not have a
> struct crypto_async_request *base;
> instead of
> struct crypto_async_request base;
> in talitos_request structure?
>
> This way:
> 1. memcopy above is avoided
> 2. talitos_request structure gets smaller - remember that
> talitos_request is now embedded in talitos_edesc structure, which gets
> kmalloc-ed for every request

The trouble I ran into was that after a request is backlogged and put
on the crypto queue, I couldn't see how else I could go from the
crypto_async_request that is pulled from the queue back to the
talitos_request that is needed in order put the pointer into the
hardware FIFO.

This is the one issue from you review that I didn't address in v2 of the patch.

Thanks for the review. Not releasing / re-acquiring the spinlock
while trying to flush the backlog really improved performance.

mh

--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296

2015-02-27 12:37:41

by Horia Geantă

[permalink] [raw]
Subject: Re: [PATCH 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE

On 2/20/2015 6:21 PM, Martin Hicks wrote:
> This is properly defined in the md5 header file.
> ---

Signed-off-by tag is missing.

2015-03-01 09:32:24

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE

Martin Hicks <[email protected]> wrote:
> This is properly defined in the md5 header file.

Please resubmit with sign-off.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt