LinuxLists.cc - [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-03-15 07:48:36

[permalink] [raw]

Subject: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

Now some cipher hardware engines prefer to handle bulk block by merging requests
to increase the block size and thus increase the hardware engine processing speed.

This patchset introduces request bulk mode to help the crypto hardware drivers
improve in efficiency.

Changes since v1:
- Modify the sg_is_contiguous() function.

Baolin Wang (4):
scatterlist: Introduce some helper functions
crypto: Introduce some helper functions to help to merge requests
crypto: Introduce the bulk mode for crypto engine framework
md: dm-crypt: Initialize the sector number for one request

crypto/Kconfig | 1 +
crypto/ablk_helper.c | 135 ++++++++++++++++++++++++++++++++++++++++++
crypto/crypto_engine.c | 122 +++++++++++++++++++++++++++++++++++++-
drivers/crypto/omap-aes.c | 2 +-
drivers/md/dm-crypt.c | 1 +
include/crypto/ablk_helper.h | 3 +
include/crypto/algapi.h | 23 ++++++-
include/linux/crypto.h | 5 ++
include/linux/scatterlist.h | 33 +++++++++++
lib/scatterlist.c | 69 +++++++++++++++++++++
10 files changed, 389 insertions(+), 5 deletions(-)

--
1.7.9.5

2016-03-15 07:48:43

[permalink] [raw]

Subject: [PATCH v2 1/4] scatterlist: Introduce some helper functions

In crypto engine framework, one request can combine (copy) other requests'
scatterlists into its dynamic sg table to manage them together as a bulk
block , which can improve engine efficency with handling bulk block. Thus
we need some helper functions to manage dynamic scattertables.

This patch introduces 'sg_is_contiguous()' function to check if two
scatterlists are contiguous, 'sg_alloc_empty_table()' function to
allocate one empty dynamic sg table, 'sg_add_sg_to_table()' function
to copy one mapped scatterlist into sg table and 'sg_table_is_empty'
function to check if the sg table is empty.

Signed-off-by: Baolin Wang <[email protected]>
---
include/linux/scatterlist.h | 33 +++++++++++++++++++++
lib/scatterlist.c | 69 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..c1ed9f4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -212,6 +212,20 @@ static inline void sg_unmark_end(struct scatterlist *sg)
}

/**
+ * sg_table_is_empty - Check if the sg table is empty
+ * @sgt: sg table
+ *
+ * Description:
+ * The 'orig_nents' member of one sg table is used to indicate how many
+ * scatterlists in the sg table.
+ *
+ **/
+static inline bool sg_table_is_empty(struct sg_table *sgt)
+{
+ return !sgt->orig_nents;
+}
+
+/**
* sg_phys - Return physical address of an sg entry
* @sg: SG entry
*
@@ -241,6 +255,23 @@ static inline void *sg_virt(struct scatterlist *sg)
return page_address(sg_page(sg)) + sg->offset;
}

+/**
+ * sg_is_contiguous - Check if the scatterlists are contiguous
+ * @sga: SG entry
+ * @sgb: SG entry
+ *
+ * Description:
+ * If the sga scatterlist is contiguous with the sgb scatterlist,
+ * that means they can be merged together.
+ *
+ **/
+static inline bool sg_is_contiguous(struct scatterlist *sga,
+ struct scatterlist *sgb)
+{
+ return *(unsigned long *)sg_virt(sga) + sga->length ==
+ *(unsigned long *)sg_virt(sgb);
+}
+
int sg_nents(struct scatterlist *sg);
int sg_nents_for_len(struct scatterlist *sg, u64 len);
struct scatterlist *sg_next(struct scatterlist *);
@@ -261,6 +292,8 @@ void sg_free_table(struct sg_table *);
int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
struct scatterlist *, gfp_t, sg_alloc_fn *);
int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
+int sg_alloc_empty_table(struct sg_table *, unsigned int, gfp_t);
+int sg_add_sg_to_table(struct sg_table *, struct scatterlist *);
int sg_alloc_table_from_pages(struct sg_table *sgt,
struct page **pages, unsigned int n_pages,
unsigned long offset, unsigned long size,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..6d3f3b0 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -370,6 +370,75 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
EXPORT_SYMBOL(sg_alloc_table);

/**
+ * sg_add_sg_to_table - Add one scatterlist into sg table
+ * @sgt: The sg table header to use
+ * @src: The sg need to be added into sg table
+ *
+ * Description:
+ * The 'nents' member indicates how many mapped scatterlists has been added
+ * in the dynamic sg table. The 'orig_nents' member indicates the size of the
+ * dynamic sg table.
+ *
+ * Copy one mapped @src@ scatterlist into the dynamic sg table and increase
+ * 'nents' member.
+ *
+ **/
+int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src)
+{
+ unsigned int i = 0, orig_nents = sgt->orig_nents;
+ struct scatterlist *sgl = sgt->sgl;
+ struct scatterlist *sg;
+
+ /* Check if there are enough space for the new sg to be added */
+ if (sgt->nents >= sgt->orig_nents)
+ return -EINVAL;
+
+ for_each_sg(sgl, sg, orig_nents, i) {
+ if (sgt->nents > 0 && i == (sgt->nents - 1)) {
+ sg_unmark_end(sg);
+ } else if (i == sgt->nents) {
+ memcpy(sg, src, sizeof(struct scatterlist));
+ sg_mark_end(sg);
+ sgt->nents++;
+ break;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * sg_alloc_empty_table - Allocate one empty dynamic sg table
+ * @sgt: The sg table header to use
+ * @nents: Number of entries in sg list
+ * @gfp_mask: GFP allocation mask
+ *
+ * Description:
+ * Allocate and initialize one dynamic sg table. One dynamic sg table means
+ * it need allocate @nents@ empty scatterlists entries and is used to copy
+ * other mapped scatterlists into the dynamic sg table.
+ *
+ * The 'nents' member indicates how many scatterlists has been copied into
+ * the dynamic sg table. It should set 0 which means there are no mapped
+ * scatterlists added in this sg table now.
+ *
+ * The 'orig_nents' member indicates the size of the dynamic sg table.
+ *
+ **/
+int sg_alloc_empty_table(struct sg_table *sgt, unsigned int nents,
+ gfp_t gfp_mask)
+{
+ int ret;
+
+ ret = sg_alloc_table(sgt, nents, gfp_mask);
+ if (ret)
+ return ret;
+
+ sgt->nents = 0;
+ return 0;
+}
+
+/**
* sg_alloc_table_from_pages - Allocate and initialize an sg table from
* an array of pages
* @sgt: The sg table header to use
--
1.7.9.5

2016-03-15 07:49:02

[permalink] [raw]

Subject: [PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests

Usually the dm-crypt subsystem will send encryption/descryption requests to
the crypto layer one block at a time, making each request 512 bytes long,
which is a much smaller size for hardware engine, that means the hardware
engine can not play its best performance.

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

This patch introduces some helper functions to help to merge requests to improve
hardware engine efficiency.

Signed-off-by: Baolin Wang <[email protected]>
---
crypto/ablk_helper.c | 135 ++++++++++++++++++++++++++++++++++++++++++
include/crypto/ablk_helper.h | 3 +
include/linux/crypto.h | 5 ++
3 files changed, 143 insertions(+)

diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c
index e1fcf53..3cf15cb 100644
--- a/crypto/ablk_helper.c
+++ b/crypto/ablk_helper.c
@@ -26,6 +26,7 @@

#include <linux/kernel.h>
#include <linux/crypto.h>
+#include <linux/device-mapper.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/hardirq.h>
@@ -34,6 +35,140 @@
#include <crypto/ablk_helper.h>
#include <asm/simd.h>

+/**
+ * ablk_link_request_if_contigous - try to link one request into previous one
+ * if the page address is contiguous.
+ * @list_req: the request from queue list
+ * @req: the new request need to be merged
+ *
+ * If the listed request and new request's pages of the scatterlists are
+ * contiguous, then merge the scatterlists of new request into the listed one.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_link_request_if_contigous(struct ablkcipher_request *list_req,
+ struct ablkcipher_request *req)
+{
+ struct scatterlist *last_src_sg =
+ sg_last(list_req->sgt_src.sgl, list_req->sgt_src.nents);
+ struct scatterlist *last_dst_sg =
+ sg_last(list_req->sgt_dst.sgl, list_req->sgt_dst.nents);
+ struct scatterlist *req_src_sg = req->src;
+ struct scatterlist *req_dst_sg = req->dst;
+
+ if (!last_src_sg || !last_dst_sg)
+ return false;
+
+ /* Check if the src/dst scatterlists are contiguous */
+ if (!sg_is_contiguous(last_src_sg, req_src_sg) ||
+ !sg_is_contiguous(last_dst_sg, req_dst_sg))
+ return false;
+
+ /*
+ * If the request can be merged into the listed request after the
+ * checking, then expand the listed request scatterlists' length.
+ */
+ last_src_sg->length += req_src_sg->length;
+ last_dst_sg->length += req_dst_sg->length;
+ list_req->nbytes += req->nbytes;
+
+ return true;
+}
+
+/**
+ * ablk_merge_request - try to merge one request into previous one
+ * @list_req: the request from queue list
+ * @req: the request need to be merged
+ *
+ * This function will create a dynamic scatterlist table for both source
+ * and destination if the request is the first coming in.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_merge_request(struct ablkcipher_request *list_req,
+ struct ablkcipher_request *req)
+{
+ struct sg_table *sgt_src = &list_req->sgt_src;
+ struct sg_table *sgt_dst = &list_req->sgt_dst;
+ unsigned int nents = SG_MAX_SINGLE_ALLOC;
+
+ if (sg_table_is_empty(sgt_src)) {
+ if (sg_alloc_empty_table(sgt_src, nents, GFP_ATOMIC))
+ return false;
+
+ if (sg_add_sg_to_table(sgt_src, list_req->src))
+ return false;
+ }
+
+ if (sg_table_is_empty(sgt_dst)) {
+ if (sg_alloc_empty_table(sgt_dst, nents, GFP_ATOMIC))
+ return false;
+
+ if (sg_add_sg_to_table(sgt_dst, list_req->dst))
+ return false;
+ }
+
+ /*
+ * Check if the new request is contiguous for the listed request,
+ * if it is contiguous then merge the new request into the listed one.
+ */
+ if (ablk_link_request_if_contigous(list_req, req))
+ return true;
+
+ if (sg_add_sg_to_table(sgt_src, req->src))
+ return false;
+
+ if (sg_add_sg_to_table(sgt_dst, req->dst))
+ return false;
+
+ list_req->nbytes += req->nbytes;
+ return true;
+}
+
+/**
+ * ablk_try_merge - try to merge one request into previous one
+ * @queue: the crypto queue list
+ * @req: the request need to be merged
+ *
+ * Note: The merging action should be under the spinlock or mutex protection.
+ *
+ * Return 0 on success and others are failed.
+ */
+int ablk_try_merge(struct crypto_queue *queue,
+ struct ablkcipher_request *req)
+{
+ struct ablkcipher_request *list_req;
+ struct crypto_async_request *async_req;
+
+ list_for_each_entry(async_req, &queue->list, list) {
+ list_req = ablkcipher_request_cast(async_req);
+
+ if (list_req->base.flags != req->base.flags)
+ continue;
+
+ /* Check that the request adds up to an even number of sectors */
+ if (!IS_ALIGNED(list_req->nbytes, (1U << SECTOR_SHIFT)))
+ continue;
+
+ if (list_req->nbytes + req->nbytes > UINT_MAX)
+ continue;
+
+ /*
+ * We first check that the sectors are adjacent so we don't
+ * mistadly coalesce something that is contigous in memory but
+ * not contigous on disk.
+ */
+ if (list_req->sector + list_req->nbytes /
+ (1U << SECTOR_SHIFT) == req->sector) {
+ if (ablk_merge_request(list_req, req))
+ return 0;
+ }
+ }
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(ablk_try_merge);
+
int ablk_set_key(struct crypto_ablkcipher *tfm, const u8 *key,
unsigned int key_len)
{
diff --git a/include/crypto/ablk_helper.h b/include/crypto/ablk_helper.h
index 4f93df5..12ae00d 100644
--- a/include/crypto/ablk_helper.h
+++ b/include/crypto/ablk_helper.h
@@ -8,6 +8,7 @@
#include <linux/crypto.h>
#include <linux/kernel.h>
#include <crypto/cryptd.h>
+#include <crypto/algapi.h>

struct async_helper_ctx {
struct cryptd_ablkcipher *cryptd_tfm;
@@ -28,4 +29,6 @@ extern int ablk_init_common(struct crypto_tfm *tfm, const char *drv_name);

extern int ablk_init(struct crypto_tfm *tfm);

+extern int ablk_try_merge(struct crypto_queue *queue,
+ struct ablkcipher_request *req);
#endif /* _CRYPTO_ABLK_HELPER_H */
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index e71cb70..f878bb1 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -21,6 +21,7 @@
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/bug.h>
+#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/uaccess.h>
@@ -170,6 +171,10 @@ struct ablkcipher_request {
struct scatterlist *src;
struct scatterlist *dst;

+ struct sg_table sgt_src;
+ struct sg_table sgt_dst;
+ sector_t sector;
+
void *__ctx[] CRYPTO_MINALIGN_ATTR;
};

--
1.7.9.5

2016-03-15 07:49:20

[permalink] [raw]

Subject: [PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework

Now some cipher hardware engines prefer to handle bulk block by merging
requests to increase the block size and thus increase the hardware engine
processing speed.

This patch introduces request bulk mode to help the crypto hardware drivers
improve in efficiency, and chooses the suitable mode (SECTOR_MODE) for
initializing omap aes engine.

Signed-off-by: Baolin Wang <[email protected]>
---
crypto/Kconfig | 1 +
crypto/crypto_engine.c | 122 +++++++++++++++++++++++++++++++++++++++++++--
drivers/crypto/omap-aes.c | 2 +-
include/crypto/algapi.h | 23 ++++++++-
4 files changed, 143 insertions(+), 5 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index c844227..6a2f9a6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -229,6 +229,7 @@ config CRYPTO_GLUE_HELPER_X86

config CRYPTO_ENGINE
tristate
+ select CRYPTO_ABLK_HELPER

comment "Authenticated Encryption with Associated Data"

diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index a55c82d..0de5829 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -14,6 +14,7 @@

#include <linux/err.h>
#include <linux/delay.h>
+#include <crypto/ablk_helper.h>
#include "internal.h"

#define CRYPTO_ENGINE_MAX_QLEN 10
@@ -84,6 +85,17 @@ static void crypto_pump_requests(struct crypto_engine *engine,

req = ablkcipher_request_cast(async_req);

+ /*
+ * If the engine supports the bulk mode and the request is allocated the
+ * sg table to expand scatterlists entries, then it need to point the
+ * scatterlists from the sg table.
+ */
+ if (engine->mode == SECTOR_BULK_MODE && req->sgt_src.orig_nents &&
+ req->sgt_dst.orig_nents) {
+ req->src = req->sgt_src.sgl;
+ req->dst = req->sgt_dst.sgl;
+ }
+
engine->cur_req = req;
if (backlog)
backlog->complete(backlog, -EINPROGRESS);
@@ -137,9 +149,46 @@ static void crypto_pump_work(struct kthread_work *work)
}

/**
+ * crypto_merge_request_to_engine - try to merge one request into previous one
+ * @engine: the hardware engine
+ * @req: the request need to be merged
+ *
+ * If the crypto engine supports bulk mode, then try to merge the new request
+ * into the listed one from engine queue to handle them together.
+ *
+ * Return 0 on success and others are failed.
+ */
+static bool crypto_merge_request_to_engine(struct crypto_engine *engine,
+ struct ablkcipher_request *req)
+{
+ /*
+ * The request is allocated from memory pool in dm-crypt, here need to
+ * do initialization for sg table in case some random values.
+ */
+ req->sgt_src.orig_nents = 0;
+ req->sgt_dst.orig_nents = 0;
+
+ /*
+ * If the hardware engine can not support the bulk mode encryption,
+ * just return 1 means merging failed.
+ */
+ if (engine->mode != SECTOR_BULK_MODE)
+ return 1;
+
+ return ablk_try_merge(&engine->queue, req);
+}
+
+/**
* crypto_transfer_request - transfer the new request into the engine queue
* @engine: the hardware engine
* @req: the request need to be listed into the engine queue
+ *
+ * Firstly it will check if the new request can be merged into previous one
+ * if their secotr numbers are continuous, if not should list it into engine
+ * queue.
+ *
+ * If the new request can be merged into the previous request, then just
+ * finalize the new request.
*/
int crypto_transfer_request(struct crypto_engine *engine,
struct ablkcipher_request *req, bool need_pump)
@@ -154,6 +203,26 @@ int crypto_transfer_request(struct crypto_engine *engine,
return -ESHUTDOWN;
}

+ /*
+ * Here need to check if the request can be merged into previous
+ * request. If the hardware engine can support encryption with
+ * bulk block, we can merge the new request into previous request
+ * if their secotr numbers are continuous, which can be handled
+ * together by engine to improve the encryption efficiency.
+ * Return -EINPROGRESS means it has been merged into previous request,
+ * so just end up this request.
+ */
+ ret = crypto_merge_request_to_engine(engine, req);
+ if (!ret) {
+ spin_unlock_irqrestore(&engine->queue_lock, flags);
+ crypto_finalize_request(engine, req, 0);
+ return -EINPROGRESS;
+ }
+
+ /*
+ * If the request can not be merged into previous request, then list it
+ * into the queue of engine, and will be handled by kthread worker.
+ */
ret = ablkcipher_enqueue_request(&engine->queue, req);

if (!engine->busy && need_pump)
@@ -178,7 +247,8 @@ int crypto_transfer_request_to_engine(struct crypto_engine *engine,
EXPORT_SYMBOL_GPL(crypto_transfer_request_to_engine);

/**
- * crypto_finalize_request - finalize one request if the request is done
+ * crypto_finalize_request - finalize one request if the request is done or
+ * merged into previous request
* @engine: the hardware engine
* @req: the request need to be finalized
* @err: error number
@@ -208,9 +278,18 @@ void crypto_finalize_request(struct crypto_engine *engine,
spin_unlock_irqrestore(&engine->queue_lock, flags);
}

+ sg_free_table(&req->sgt_src);
+ sg_free_table(&req->sgt_dst);
req->base.complete(&req->base, err);

- queue_kthread_work(&engine->kworker, &engine->pump_requests);
+ /*
+ * If the request is finalized by merging into the previous request from
+ * the engine queue, then it is no need to queue the kthread work.
+ * Cause now maybe there are other requests need to be merged into the
+ * listed request from one block, just wait for merging action.
+ */
+ if (finalize_cur_req)
+ queue_kthread_work(&engine->kworker, &engine->pump_requests);
}
EXPORT_SYMBOL_GPL(crypto_finalize_request);

@@ -279,15 +358,45 @@ int crypto_engine_stop(struct crypto_engine *engine)
EXPORT_SYMBOL_GPL(crypto_engine_stop);

/**
+ * crypto_engine_change_mode - Change the mode for hardware engine
+ * @engine: the hardware engine
+ * @mode: engine mode to be set
+ *
+ * This function can change the hardware engine mode when the engine is running.
+ * Return 0 on success, else on fail.
+ */
+int crypto_engine_change_mode(struct crypto_engine *engine,
+ enum engine_mode mode)
+{
+ unsigned long flags;
+ int ret;
+
+ ret = crypto_engine_stop(engine);
+ if (ret) {
+ pr_warn("could not change engine mode now\n");
+ return ret;
+ }
+
+ spin_lock_irqsave(&engine->queue_lock, flags);
+ engine->mode = mode;
+ spin_unlock_irqrestore(&engine->queue_lock, flags);
+
+ return crypto_engine_start(engine);
+}
+EXPORT_SYMBOL_GPL(crypto_engine_change_mode);
+
+/**
* crypto_engine_alloc_init - allocate crypto hardware engine structure and
* initialize it.
* @dev: the device attached with one hardware engine
+ * @mode: crypto engine mode
* @rt: whether this queue is set to run as a realtime task
*
* This must be called from context that can sleep.
* Return: the crypto engine structure on success, else NULL.
*/
-struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
+struct crypto_engine *crypto_engine_alloc_init(struct device *dev,
+ enum engine_mode mode, bool rt)
{
struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
struct crypto_engine *engine;
@@ -299,6 +408,13 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
if (!engine)
return NULL;

+ /*
+ * If the hardware engine can handle the IV by itself, that means it
+ * just need one initial IV for multiple requests from one block. So
+ * we can merge requests from one block into one request to handle,
+ * which can improve the hardware engine efficiency.
+ */
+ engine->mode = mode;
engine->rt = rt;
engine->running = false;
engine->busy = false;
diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index d420ec7..946f11f 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -1230,7 +1230,7 @@ static int omap_aes_probe(struct platform_device *pdev)
}

/* Initialize crypto engine */
- dd->engine = crypto_engine_alloc_init(dev, 1);
+ dd->engine = crypto_engine_alloc_init(dev, SECTOR_MODE, 1);
if (!dd->engine)
goto err_algs;

diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index b09d43f..69fb43e 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -130,6 +130,22 @@ struct ablkcipher_walk {
};

#define ENGINE_NAME_LEN 30
+
+/*
+ * enum engine_mode - crypto engine mode
+ * @SECTOR_MODE: should do encryption/decryption one request by one (one
+ * request length is one sector size), and it will not coalesce requests.
+ * @SECTOR_BULK_MODE: it can coalesce the contiguous requests (one request
+ * length is one sector size) together to be one bulk request, which can be
+ * handled by crypto engine at one time.
+ * @MAX_MODE: engine mode numbers
+ */
+enum engine_mode {
+ SECTOR_MODE,
+ SECTOR_BULK_MODE,
+ MAX_MODE,
+};
+
/*
* struct crypto_engine - crypto hardware engine
* @name: the engine name
@@ -140,6 +156,7 @@ struct ablkcipher_walk {
* @list: link with the global crypto engine list
* @queue_lock: spinlock to syncronise access to request queue
* @queue: the crypto queue of the engine
+ * @mode: crypto engine mode
* @rt: whether this queue is set to run as a realtime task
* @prepare_crypt_hardware: a request will soon arrive from the queue
* so the subsystem requests the driver to prepare the hardware
@@ -167,6 +184,7 @@ struct crypto_engine {
spinlock_t queue_lock;
struct crypto_queue queue;

+ enum engine_mode mode;
bool rt;

int (*prepare_crypt_hardware)(struct crypto_engine *engine);
@@ -195,7 +213,10 @@ void crypto_finalize_request(struct crypto_engine *engine,
struct ablkcipher_request *req, int err);
int crypto_engine_start(struct crypto_engine *engine);
int crypto_engine_stop(struct crypto_engine *engine);
-struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt);
+int crypto_engine_change_mode(struct crypto_engine *engine,
+ enum engine_mode mode);
+struct crypto_engine *crypto_engine_alloc_init(struct device *dev,
+ enum engine_mode mode, bool rt);
int crypto_engine_exit(struct crypto_engine *engine);

extern const struct crypto_type crypto_ablkcipher_type;
--
1.7.9.5

2016-03-15 07:49:37

[permalink] [raw]

Subject: [PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request

If the crypto engine can support the bulk mode, that means the contiguous
requests from one block can be merged into one request to be handled by
crypto engine. If so, the crypto engine need the sector number of one request
to do merging action.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/md/dm-crypt.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3147c8d..9e2dbfd 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -866,6 +866,7 @@ static int crypt_convert_block(struct crypt_config *cc,
return r;
}

+ req->sector = ctx->cc_sector;
ablkcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
1 << SECTOR_SHIFT, iv);

--
1.7.9.5

2016-04-02 15:01:12

by Robert Jarzmik

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

Baolin Wang <[email protected]> writes:

> +/**
> + * sg_is_contiguous - Check if the scatterlists are contiguous
> + * @sga: SG entry
> + * @sgb: SG entry
> + *
> + * Description:
> + * If the sga scatterlist is contiguous with the sgb scatterlist,
> + * that means they can be merged together.
> + *
> + **/
> +static inline bool sg_is_contiguous(struct scatterlist *sga,
> + struct scatterlist *sgb)
> +{
> + return *(unsigned long *)sg_virt(sga) + sga->length ==
> + *(unsigned long *)sg_virt(sgb);
As I already said, I don't like casts.

But let's take some height : you're needing this function to decide to merge
scatterlists. That means that you think the probability of having 2 scatterlist
mergeable is not meaningless, ie. 50% or more.

I suppose your scatterlists are allocated through kmalloc(). I'd like to know,
through your testing, what is the success rate of sg_is_contiguous(), ie. I'd
like to know how many times sg_is_contiguous() was called, and amongst these
calls how many times it returned true.

That will tell me how "worth" is this optimization.

> + * sg_add_sg_to_table - Add one scatterlist into sg table
> + * @sgt: The sg table header to use
> + * @src: The sg need to be added into sg table
> + *
> + * Description:
> + * The 'nents' member indicates how many mapped scatterlists has been added
> + * in the dynamic sg table. The 'orig_nents' member indicates the size of the
> + * dynamic sg table.
> + *
> + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase
> + * 'nents' member.
> + *
> + **/
Okay, I still believe this one is wrong, because we don't understand each other.
So let's take an example :
sg_table = {
.sgl = {
{
.page_link = PAGE_48,
.offset = 0,
.length = 2048,
.dma_address = 0x30000,
.dma_length = 4096,
},
{
.page_link = PAGE_48 | 0x02,
.offset = 2048,
.length = 2048,
.dma_address = 0,
.dma_length = 0,
},
},
.nents = 1,
.orig_nents = 2,
};

In this example, by sheer luck the 2 scatterlist entries were physically
contiguous, and the mapping function coallesced the 2 entries into only one
(dma_address, dma_length) entry. That could also happen with an IOMMU by the
way. Therefore, sg_table.nents = 1.

If I understand your function correctly, it will erase sg_table.sgl[1], and that
looks incorrect to me. This is why I can't understand how your code can be
correct, and why I say you add a new "meaning" to sg_table->nents, which is not
consistent with the meaning I understand.

Cheers.

--
Robert

2016-04-05 07:10:26

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

Hi Robert,

Sorry for the late reply.

On 2 April 2016 at 23:00, Robert Jarzmik <[email protected]> wrote:
> Baolin Wang <[email protected]> writes:
>
>> +/**
>> + * sg_is_contiguous - Check if the scatterlists are contiguous
>> + * @sga: SG entry
>> + * @sgb: SG entry
>> + *
>> + * Description:
>> + * If the sga scatterlist is contiguous with the sgb scatterlist,
>> + * that means they can be merged together.
>> + *
>> + **/
>> +static inline bool sg_is_contiguous(struct scatterlist *sga,
>> + struct scatterlist *sgb)
>> +{
>> + return *(unsigned long *)sg_virt(sga) + sga->length ==
>> + *(unsigned long *)sg_virt(sgb);
> As I already said, I don't like casts.

OK. Could you give me a good example. Thanks.

>
> But let's take some height : you're needing this function to decide to merge
> scatterlists. That means that you think the probability of having 2 scatterlist
> mergeable is not meaningless, ie. 50% or more.
>
> I suppose your scatterlists are allocated through kmalloc(). I'd like to know,
> through your testing, what is the success rate of sg_is_contiguous(), ie. I'd
> like to know how many times sg_is_contiguous() was called, and amongst these
> calls how many times it returned true.
>
> That will tell me how "worth" is this optimization.

I think this is just one useful API for users, If the rate is only 1%,
we also need to check if they are contiguous to decide if they can be
coalesced.

>
>> + * sg_add_sg_to_table - Add one scatterlist into sg table
>> + * @sgt: The sg table header to use
>> + * @src: The sg need to be added into sg table
>> + *
>> + * Description:
>> + * The 'nents' member indicates how many mapped scatterlists has been added
>> + * in the dynamic sg table. The 'orig_nents' member indicates the size of the
>> + * dynamic sg table.
>> + *
>> + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase
>> + * 'nents' member.
>> + *
>> + **/
> Okay, I still believe this one is wrong, because we don't understand each other.
> So let's take an example :
> sg_table = {
> .sgl = {
> {
> .page_link = PAGE_48,
> .offset = 0,
> .length = 2048,
> .dma_address = 0x30000,
> .dma_length = 4096,
> },
> {
> .page_link = PAGE_48 | 0x02,
> .offset = 2048,
> .length = 2048,
> .dma_address = 0,
> .dma_length = 0,
> },
> },
> .nents = 1,
> .orig_nents = 2,
> };
>
> In this example, by sheer luck the 2 scatterlist entries were physically
> contiguous, and the mapping function coallesced the 2 entries into only one
> (dma_address, dma_length) entry. That could also happen with an IOMMU by the
> way. Therefore, sg_table.nents = 1.
>
> If I understand your function correctly, it will erase sg_table.sgl[1], and that
> looks incorrect to me. This is why I can't understand how your code can be
> correct, and why I say you add a new "meaning" to sg_table->nents, which is not
> consistent with the meaning I understand.

I think you misunderstood my point. The 'sg_add_sg_to_table()'
function is used to add one mapped scatterlist into the dynamic sg
table. For example:
1. How to add one mapped sg into dynamic sg table
(1) If we allocate one dynamic sg table with 3 (small size can be
showed easily) empty scatterlists.:
sg_table = {
.sgl = {
{
.page_link = 0,
.offset = 0,
.length = 0,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
{
.page_link = 0 | 0x02,
.offset = 0,
.length = 0,
},
},
.nents = 0,
.orig_nents = 3,
};

(2) If some module (like dm-crypt module) always sends one mapped
scatterlist (size is always 512bytes) to another module (crypto
driver) to handle the mapped scatterlist at one time. But we want to
collect the mapped scatterlist into one dynamic sg table to manage
them together, means we can send multiple mapped scatterlists (from
sg_table->sgl) to the crypto driver to handle them together to improve
its efficiency. So we add one mapped scatterlists into the dynamic sg
table.
mapped sg = {
.page_link = PAGE_20,
.offset = 0,
.length = 512,
},

Add into synamic sg table ------->
sg_table = {
.sgl = {
{
.page_link = PAGE_20 | 0x02,
.offset = 0,
.length = 512,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
},
.nents = 1,
.orig_nents = 3,
};

Another two mapped scatterlists are added into synamic sg table ------->
sg_table = {
.sgl = {
{
.page_link = PAGE_20,
.offset = 0,
.length = 512,
},
{
.page_link = PAGE_25,
.offset = 0,
.length = 512,
},
{
.page_link = PAGE_28 | 0x20,
.offset = 0,
.length = 512,
},
},
.nents = 3,
.orig_nents = 3,
};

Then we can send the dynamic sg table to the crypto engine driver to
handle them together at one time. (If the dynamic sg table size is
512, then the crypto engine driver can handle 512 scatterlists at one
time, which can improve much efficiency). That's the reason why we
want to introduce the dynamic sg table.

2. How to coalesce
(1) If one mapped scatterlsit already has been added into dynamic sg table:
sg_table = {
.sgl = {
{
.page_link = PAGE_20 | 0x02,
.offset = 0,
.length = 512,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
},
.nents = 1,
.orig_nents = 3,
};

(2) Another mapped scatterlist comes.
mapped sg = {
.page_link = PAGE_20,
.offset = 512,
.length = 512,
},

So we check the new mapped scatterlist can be coalesced into previous
one in dynamic sg table like this:
sg_table = {
.sgl = {
{
.page_link = PAGE_20 | 0x02,
.offset = 0,
.length = 1024,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
{
.page_link = 0,
.offset = 0,
.length = 0,
},
},
.nents = 1,
.orig_nents = 3,
};

It's done. We coalesced one scatterlist into antoher one to expand the
scatterlist's length.
Thanks for your comments.

>
> Cheers.
>
> --
> Robert

--
Baolin.wang
Best Regards

2016-04-15 13:49:40

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote:
> Now some cipher hardware engines prefer to handle bulk block by merging requests
> to increase the block size and thus increase the hardware engine processing speed.
>
> This patchset introduces request bulk mode to help the crypto hardware drivers
> improve in efficiency.

Could you please explain why this merging can't be done in dm-crypt
instead?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 05:31:13

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

Hi Herbert,

On 15 April 2016 at 21:48, Herbert Xu <[email protected]> wrote:
> On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote:
>> Now some cipher hardware engines prefer to handle bulk block by merging requests
>> to increase the block size and thus increase the hardware engine processing speed.
>>
>> This patchset introduces request bulk mode to help the crypto hardware drivers
>> improve in efficiency.
>
> Could you please explain why this merging can't be done in dm-crypt
> instead?

We've tried to do this in dm-crypt, but it failed.
The dm-crypt maintainer explained to me that I should optimize the
driver, not add strange hw-dependent crypto modes to dm-crypt, this is
not the first crypto accelerator that is just not suited for this kind
of use.
He thought if it can process batch of chunks of data each with own IV,
then it can work with dm-crypt, but he thought such optimized code
should be inside crypto API, not in dmcrypt.

I think his suggestion is reasonable, so we introduce the crypto
engine framework to factor out the common patterns for driving the
queue of operations. Then it will be more reasonable to do the bulk
mode optimization in crypto engine framework. Thanks.

>
> Thanks,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 05:45:58

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 01:31:09PM +0800, Baolin Wang wrote:
>
> We've tried to do this in dm-crypt, but it failed.
> The dm-crypt maintainer explained to me that I should optimize the
> driver, not add strange hw-dependent crypto modes to dm-crypt, this is
> not the first crypto accelerator that is just not suited for this kind
> of use.
> He thought if it can process batch of chunks of data each with own IV,
> then it can work with dm-crypt, but he thought such optimized code
> should be inside crypto API, not in dmcrypt.

That's a completely bogus argument. The user always has more
information available than the underlying API. So it is totally
stupid to have the API try to extract information that the user
could have provided in the first place.

I'm not taking this patch-set.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 06:02:55

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 13:45, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 01:31:09PM +0800, Baolin Wang wrote:
>>
>> We've tried to do this in dm-crypt, but it failed.
>> The dm-crypt maintainer explained to me that I should optimize the
>> driver, not add strange hw-dependent crypto modes to dm-crypt, this is
>> not the first crypto accelerator that is just not suited for this kind
>> of use.
>> He thought if it can process batch of chunks of data each with own IV,
>> then it can work with dm-crypt, but he thought such optimized code
>> should be inside crypto API, not in dmcrypt.
>
> That's a completely bogus argument. The user always has more
> information available than the underlying API. So it is totally
> stupid to have the API try to extract information that the user
> could have provided in the first place.

If the crypto hardware engine can support bulk data
encryption/decryption, so the engine driver can select bulk mode to
handle the requests. I think it is a totally driver things, not in
dmcrypt. The dmcrypt can not get the hardware engine's attributes.

>
> I'm not taking this patch-set.
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 07:04:54

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 02:02:51PM +0800, Baolin Wang wrote:
>
> If the crypto hardware engine can support bulk data
> encryption/decryption, so the engine driver can select bulk mode to
> handle the requests. I think it is a totally driver things, not in
> dmcrypt. The dmcrypt can not get the hardware engine's attributes.

It has nothing to do with the hardware attributes. dm-crypt should
be sending maximal requests in the first place.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 07:21:19

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 15:04, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 02:02:51PM +0800, Baolin Wang wrote:
>>
>> If the crypto hardware engine can support bulk data
>> encryption/decryption, so the engine driver can select bulk mode to
>> handle the requests. I think it is a totally driver things, not in
>> dmcrypt. The dmcrypt can not get the hardware engine's attributes.
>
> It has nothing to do with the hardware attributes. dm-crypt should
> be sending maximal requests in the first place.

I don't think so, the dm-crypt can not send maximal requests at some
situations. For example, the 'cbc(aes)' cipher, it must be handled
sector by sector (IV is dependency for each sector), so the dm-crypt
can not send maximal requests, but must sector by sector in this
situation.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 07:25:15

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 03:21:16PM +0800, Baolin Wang wrote:
>
> I don't think so, the dm-crypt can not send maximal requests at some
> situations. For example, the 'cbc(aes)' cipher, it must be handled
> sector by sector (IV is dependency for each sector), so the dm-crypt
> can not send maximal requests, but must sector by sector in this
> situation.

If you can't merge them as requests, then how is the hardware going
to benefit from batching them? Or are you talking about parallel
processing similar to sha-mb?

Even with batching we should be involving the user because only the
user knows (if anyone does) whether more data will be forthcoming.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 07:59:04

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 15:24, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 03:21:16PM +0800, Baolin Wang wrote:
>>
>> I don't think so, the dm-crypt can not send maximal requests at some
>> situations. For example, the 'cbc(aes)' cipher, it must be handled
>> sector by sector (IV is dependency for each sector), so the dm-crypt
>> can not send maximal requests, but must sector by sector in this
>> situation.
>
> If you can't merge them as requests, then how is the hardware going
> to benefit from batching them? Or are you talking about parallel

That depends on the hardware engine. Some cipher hardware engines
(like xts(aes) engine) can handle the intermediate values (IV) by
themselves in one bulk block, which means we can increase the size of
the request by merging request rather than always 512 bytes and thus
increase the hardware engine processing speed. But for some other
hardware engines (like cbc(aes) engine), they can not support bulk
block, must sector by sector. So the engine drivers can select the
suitable mode to do encryption/decryption.

> processing similar to sha-mb?
>
> Even with batching we should be involving the user because only the
> user knows (if anyone does) whether more data will be forthcoming.

If this cipher engine can support bulk block encryption, the crypto
engine framework can merge requests if they are eligible
automatically. Don't need to worry about how many data will be
forthcoming.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 08:05:41

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote:
>
> That depends on the hardware engine. Some cipher hardware engines
> (like xts(aes) engine) can handle the intermediate values (IV) by
> themselves in one bulk block, which means we can increase the size of
> the request by merging request rather than always 512 bytes and thus
> increase the hardware engine processing speed. But for some other
> hardware engines (like cbc(aes) engine), they can not support bulk
> block, must sector by sector. So the engine drivers can select the
> suitable mode to do encryption/decryption.

So what is this supposed to handle, xts or cbc?

> > Even with batching we should be involving the user because only the
> > user knows (if anyone does) whether more data will be forthcoming.
>
> If this cipher engine can support bulk block encryption, the crypto
> engine framework can merge requests if they are eligible
> automatically. Don't need to worry about how many data will be
> forthcoming.

Merging is simply wrong when the data is coming in as one piece
and you've just artifically broken it up, only to merge it later.

If the data can be merged then it should have stayed as one piece
rather than being fragmented.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 08:14:52

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 16:04, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote:
>>
>> That depends on the hardware engine. Some cipher hardware engines
>> (like xts(aes) engine) can handle the intermediate values (IV) by
>> themselves in one bulk block, which means we can increase the size of
>> the request by merging request rather than always 512 bytes and thus
>> increase the hardware engine processing speed. But for some other
>> hardware engines (like cbc(aes) engine), they can not support bulk
>> block, must sector by sector. So the engine drivers can select the
>> suitable mode to do encryption/decryption.
>
> So what is this supposed to handle, xts or cbc?

As I know, now cbc engine also need to handle requests sector by
sector, but for xts/ecb engine can support bulk block, which means can
merge requests.

>
>> > Even with batching we should be involving the user because only the
>> > user knows (if anyone does) whether more data will be forthcoming.
>>
>> If this cipher engine can support bulk block encryption, the crypto
>> engine framework can merge requests if they are eligible
>> automatically. Don't need to worry about how many data will be
>> forthcoming.
>
> Merging is simply wrong when the data is coming in as one piece
> and you've just artifically broken it up, only to merge it later.

It will not broke it up, and it will check if the requests coming
from dm-crypt can be merged together.

>
> If the data can be merged then it should have stayed as one piece
> rather than being fragmented.

Yes, usually one whole block can be merged into one request as the latency.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 08:20:16

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 04:14:48PM +0800, Baolin Wang wrote:
> On 18 April 2016 at 16:04, Herbert Xu <[email protected]> wrote:
> > On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote:
> >>
> >> That depends on the hardware engine. Some cipher hardware engines
> >> (like xts(aes) engine) can handle the intermediate values (IV) by
> >> themselves in one bulk block, which means we can increase the size of
> >> the request by merging request rather than always 512 bytes and thus
> >> increase the hardware engine processing speed. But for some other
> >> hardware engines (like cbc(aes) engine), they can not support bulk
> >> block, must sector by sector. So the engine drivers can select the
> >> suitable mode to do encryption/decryption.
> >
> > So what is this supposed to handle, xts or cbc?
>
> As I know, now cbc engine also need to handle requests sector by
> sector, but for xts/ecb engine can support bulk block, which means can
> merge requests.

If it's just xts then why can't dm-crypt merge it and send a single
request?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 08:28:50

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 16:17, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 04:14:48PM +0800, Baolin Wang wrote:
>> On 18 April 2016 at 16:04, Herbert Xu <[email protected]> wrote:
>> > On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote:
>> >>
>> >> That depends on the hardware engine. Some cipher hardware engines
>> >> (like xts(aes) engine) can handle the intermediate values (IV) by
>> >> themselves in one bulk block, which means we can increase the size of
>> >> the request by merging request rather than always 512 bytes and thus
>> >> increase the hardware engine processing speed. But for some other
>> >> hardware engines (like cbc(aes) engine), they can not support bulk
>> >> block, must sector by sector. So the engine drivers can select the
>> >> suitable mode to do encryption/decryption.
>> >
>> > So what is this supposed to handle, xts or cbc?
>>
>> As I know, now cbc engine also need to handle requests sector by
>> sector, but for xts/ecb engine can support bulk block, which means can
>> merge requests.
>
> If it's just xts then why can't dm-crypt merge it and send a single
> request?

What I meaning is if the xts engine can support bulk block, then the
engine driver can select bulk mode to do encryption, but if their xts
engine can not support bulk mode, which depends on hardware design,
the engine driver can not select bulk mode. So the dm-crypt can not
know what will be selected by the engine driver, it can not send one
bulk block each time.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 08:31:57

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 04:28:46PM +0800, Baolin Wang wrote:
>
> What I meaning is if the xts engine can support bulk block, then the
> engine driver can select bulk mode to do encryption, but if their xts
> engine can not support bulk mode, which depends on hardware design,
> the engine driver can not select bulk mode. So the dm-crypt can not
> know what will be selected by the engine driver, it can not send one
> bulk block each time.

Why can't the xts code just break it up if it can't handle it?

You want to postpone splitting as much as possible. Even if the
underlying xts code couldn't handle it, it would still make sense
for the crypto API to see the request in one piece.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 08:40:40

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 16:31, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 04:28:46PM +0800, Baolin Wang wrote:
>>
>> What I meaning is if the xts engine can support bulk block, then the
>> engine driver can select bulk mode to do encryption, but if their xts
>> engine can not support bulk mode, which depends on hardware design,
>> the engine driver can not select bulk mode. So the dm-crypt can not
>> know what will be selected by the engine driver, it can not send one
>> bulk block each time.
>
> Why can't the xts code just break it up if it can't handle it?

Simply to say, now there are many different hardware engines for
different vendors, some engines can support bulk block but some can
not (or no cipher hardware engine), then the dm-crypt can not know
your hardware engine features. If the dm-crypt send one bulk block to
low level, but the engine driver can not support bulk block, then it
will crash. So we did the merging action in driver level not dm-crypt
level.

>
> You want to postpone splitting as much as possible. Even if the
> underlying xts code couldn't handle it, it would still make sense
> for the crypto API to see the request in one piece.
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 08:42:44

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18, 2016 at 04:40:36PM +0800, Baolin Wang wrote:
>
> Simply to say, now there are many different hardware engines for
> different vendors, some engines can support bulk block but some can
> not (or no cipher hardware engine), then the dm-crypt can not know
> your hardware engine features. If the dm-crypt send one bulk block to
> low level, but the engine driver can not support bulk block, then it
> will crash. So we did the merging action in driver level not dm-crypt
> level.

Surely we can handle it in the crypto API layer, just as we do GSO
in the network stack for drivers that can't handle TSO?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-04-18 08:51:14

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 18 April 2016 at 16:41, Herbert Xu <[email protected]> wrote:
> On Mon, Apr 18, 2016 at 04:40:36PM +0800, Baolin Wang wrote:
>>
>> Simply to say, now there are many different hardware engines for
>> different vendors, some engines can support bulk block but some can
>> not (or no cipher hardware engine), then the dm-crypt can not know
>> your hardware engine features. If the dm-crypt send one bulk block to
>> low level, but the engine driver can not support bulk block, then it
>> will crash. So we did the merging action in driver level not dm-crypt
>> level.
>
> Surely we can handle it in the crypto API layer, just as we do GSO
> in the network stack for drivers that can't handle TSO?

Yes, it is similar. Thanks.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
Baolin.wang
Best Regards

2016-04-18 13:36:19

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On Mon, Apr 18 2016 at 1:31am -0400,
Baolin Wang <[email protected]> wrote:

> Hi Herbert,
>
> On 15 April 2016 at 21:48, Herbert Xu <[email protected]> wrote:
> > On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote:
> >> Now some cipher hardware engines prefer to handle bulk block by merging requests
> >> to increase the block size and thus increase the hardware engine processing speed.
> >>
> >> This patchset introduces request bulk mode to help the crypto hardware drivers
> >> improve in efficiency.
> >
> > Could you please explain why this merging can't be done in dm-crypt
> > instead?
>
> We've tried to do this in dm-crypt, but it failed.
> The dm-crypt maintainer explained to me that I should optimize the
> driver, not add strange hw-dependent crypto modes to dm-crypt, this is
> not the first crypto accelerator that is just not suited for this kind
> of use.

As a DM mainatiner my only contribution to this line of discussion was
relative to your proposal to train the dm-crypt target (which is
bio-based) to also provide request-based support, see:
https://www.redhat.com/archives/dm-devel/2015-November/msg00112.html

But follow-up discussion occured, primarily with Milan Broz, which led
to this bulk mode support in the crypto layer. Pretty strange Milan
wasn't cc'd on your patchset posting (I've now cc'd him).

Mike

2016-04-18 21:22:44

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

On 04/18/2016 03:36 PM, Mike Snitzer wrote:
> On Mon, Apr 18 2016 at 1:31am -0400,
> Baolin Wang <[email protected]> wrote:
>
>> Hi Herbert,
>>
>> On 15 April 2016 at 21:48, Herbert Xu <[email protected]> wrote:
>>> On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote:
>>>> Now some cipher hardware engines prefer to handle bulk block by merging requests
>>>> to increase the block size and thus increase the hardware engine processing speed.
>>>>
>>>> This patchset introduces request bulk mode to help the crypto hardware drivers
>>>> improve in efficiency.
>>>
>>> Could you please explain why this merging can't be done in dm-crypt
>>> instead?
>>
>> We've tried to do this in dm-crypt, but it failed.
>> The dm-crypt maintainer explained to me that I should optimize the
>> driver, not add strange hw-dependent crypto modes to dm-crypt, this is
>> not the first crypto accelerator that is just not suited for this kind
>> of use.
>
> As a DM mainatiner my only contribution to this line of discussion was
> relative to your proposal to train the dm-crypt target (which is
> bio-based) to also provide request-based support, see:
> https://www.redhat.com/archives/dm-devel/2015-November/msg00112.html
>
> But follow-up discussion occured, primarily with Milan Broz, which led
> to this bulk mode support in the crypto layer. Pretty strange Milan
> wasn't cc'd on your patchset posting (I've now cc'd him).

My complaint was mainly that the proposed dmcrypt based version just did
not work properly.
https://lkml.org/lkml/2016/1/2/109

(I did not test the new version we are replying here. I wonder how the problem
I mentioned is fixed though.
Also see Mikulas' comments https://www.redhat.com/archives/dm-devel/2016-January/msg00145.html)

Anyway, all this seems to optimize case for specific crypto hw, that
is not able to perform optimally with pattern dmcrypt produces.

I do not think dmcrypt should do optimizations for specific hw.

But if we decide that it is needed there, it should not cause any performance
or compatibility problems elsewhere...

Milan

2016-04-20 07:34:53

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

Hi Robert,

On 5 April 2016 at 15:10, Baolin Wang <[email protected]> wrote:
> Hi Robert,
>
> Sorry for the late reply.
>
> On 2 April 2016 at 23:00, Robert Jarzmik <[email protected]> wrote:
>> Baolin Wang <[email protected]> writes:
>>
>>> +/**
>>> + * sg_is_contiguous - Check if the scatterlists are contiguous
>>> + * @sga: SG entry
>>> + * @sgb: SG entry
>>> + *
>>> + * Description:
>>> + * If the sga scatterlist is contiguous with the sgb scatterlist,
>>> + * that means they can be merged together.
>>> + *
>>> + **/
>>> +static inline bool sg_is_contiguous(struct scatterlist *sga,
>>> + struct scatterlist *sgb)
>>> +{
>>> + return *(unsigned long *)sg_virt(sga) + sga->length ==
>>> + *(unsigned long *)sg_virt(sgb);
>> As I already said, I don't like casts.
>
> OK. Could you give me a good example. Thanks.
>
>>
>> But let's take some height : you're needing this function to decide to merge
>> scatterlists. That means that you think the probability of having 2 scatterlist
>> mergeable is not meaningless, ie. 50% or more.
>>
>> I suppose your scatterlists are allocated through kmalloc(). I'd like to know,
>> through your testing, what is the success rate of sg_is_contiguous(), ie. I'd
>> like to know how many times sg_is_contiguous() was called, and amongst these
>> calls how many times it returned true.
>>
>> That will tell me how "worth" is this optimization.
>
> I think this is just one useful API for users, If the rate is only 1%,
> we also need to check if they are contiguous to decide if they can be
> coalesced.
>
>>
>>> + * sg_add_sg_to_table - Add one scatterlist into sg table
>>> + * @sgt: The sg table header to use
>>> + * @src: The sg need to be added into sg table
>>> + *
>>> + * Description:
>>> + * The 'nents' member indicates how many mapped scatterlists has been added
>>> + * in the dynamic sg table. The 'orig_nents' member indicates the size of the
>>> + * dynamic sg table.
>>> + *
>>> + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase
>>> + * 'nents' member.
>>> + *
>>> + **/
>> Okay, I still believe this one is wrong, because we don't understand each other.
>> So let's take an example :
>> sg_table = {
>> .sgl = {
>> {
>> .page_link = PAGE_48,
>> .offset = 0,
>> .length = 2048,
>> .dma_address = 0x30000,
>> .dma_length = 4096,
>> },
>> {
>> .page_link = PAGE_48 | 0x02,
>> .offset = 2048,
>> .length = 2048,
>> .dma_address = 0,
>> .dma_length = 0,
>> },
>> },
>> .nents = 1,
>> .orig_nents = 2,
>> };
>>
>> In this example, by sheer luck the 2 scatterlist entries were physically
>> contiguous, and the mapping function coallesced the 2 entries into only one
>> (dma_address, dma_length) entry. That could also happen with an IOMMU by the
>> way. Therefore, sg_table.nents = 1.
>>
>> If I understand your function correctly, it will erase sg_table.sgl[1], and that
>> looks incorrect to me. This is why I can't understand how your code can be
>> correct, and why I say you add a new "meaning" to sg_table->nents, which is not
>> consistent with the meaning I understand.
>
> I think you misunderstood my point. The 'sg_add_sg_to_table()'
> function is used to add one mapped scatterlist into the dynamic sg
> table. For example:
> 1. How to add one mapped sg into dynamic sg table
> (1) If we allocate one dynamic sg table with 3 (small size can be
> showed easily) empty scatterlists.:
> sg_table = {
> .sgl = {
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> {
> .page_link = 0 | 0x02,
> .offset = 0,
> .length = 0,
> },
> },
> .nents = 0,
> .orig_nents = 3,
> };
>
> (2) If some module (like dm-crypt module) always sends one mapped
> scatterlist (size is always 512bytes) to another module (crypto
> driver) to handle the mapped scatterlist at one time. But we want to
> collect the mapped scatterlist into one dynamic sg table to manage
> them together, means we can send multiple mapped scatterlists (from
> sg_table->sgl) to the crypto driver to handle them together to improve
> its efficiency. So we add one mapped scatterlists into the dynamic sg
> table.
> mapped sg = {
> .page_link = PAGE_20,
> .offset = 0,
> .length = 512,
> },
>
> Add into synamic sg table ------->
> sg_table = {
> .sgl = {
> {
> .page_link = PAGE_20 | 0x02,
> .offset = 0,
> .length = 512,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> },
> .nents = 1,
> .orig_nents = 3,
> };
>
> Another two mapped scatterlists are added into synamic sg table ------->
> sg_table = {
> .sgl = {
> {
> .page_link = PAGE_20,
> .offset = 0,
> .length = 512,
> },
> {
> .page_link = PAGE_25,
> .offset = 0,
> .length = 512,
> },
> {
> .page_link = PAGE_28 | 0x20,
> .offset = 0,
> .length = 512,
> },
> },
> .nents = 3,
> .orig_nents = 3,
> };
>
> Then we can send the dynamic sg table to the crypto engine driver to
> handle them together at one time. (If the dynamic sg table size is
> 512, then the crypto engine driver can handle 512 scatterlists at one
> time, which can improve much efficiency). That's the reason why we
> want to introduce the dynamic sg table.
>
> 2. How to coalesce
> (1) If one mapped scatterlsit already has been added into dynamic sg table:
> sg_table = {
> .sgl = {
> {
> .page_link = PAGE_20 | 0x02,
> .offset = 0,
> .length = 512,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> },
> .nents = 1,
> .orig_nents = 3,
> };
>
> (2) Another mapped scatterlist comes.
> mapped sg = {
> .page_link = PAGE_20,
> .offset = 512,
> .length = 512,
> },
>
> So we check the new mapped scatterlist can be coalesced into previous
> one in dynamic sg table like this:
> sg_table = {
> .sgl = {
> {
> .page_link = PAGE_20 | 0x02,
> .offset = 0,
> .length = 1024,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> {
> .page_link = 0,
> .offset = 0,
> .length = 0,
> },
> },
> .nents = 1,
> .orig_nents = 3,
> };
>
> It's done. We coalesced one scatterlist into antoher one to expand the
> scatterlist's length.
> Thanks for your comments.
>

It seems there are no more comments about this patchset? We really
want to these APIs to coalesce scatterlists to improve the crypto
engine efficiency. Thanks.

--
Baolin.wang
Best Regards

2016-04-20 08:42:36

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

On Wed, Apr 20, 2016 at 03:34:47PM +0800, Baolin Wang wrote:
> It seems there are no more comments about this patchset? We really
> want to these APIs to coalesce scatterlists to improve the crypto
> engine efficiency. Thanks.

Your patch-set makes no sense. If the data can be coalesced then
it shouldn't have been split up in the first place. I'm nacking
the whole series.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt