From: Iuliana Prodan <[email protected]>
Add the option to allocate the crypto request object plus any extra space
needed by the driver into a DMA-able memory.
Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
indicate to crypto API the need to allocate GFP_DMA memory
for private contexts of the crypto requests.
For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
esp_alloc_tmp() function for IPv4 and IPv6.
This series includes an example of how a driver can use
CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
to avoid allocating memory at crypto request runtime.
The extra size needed by the driver is added to the reqsize field
that indicates how much memory could be needed per request.
Iuliana Prodan (4):
crypto: add CRYPTO_TFM_REQ_DMA flag
net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
crypto: caam - avoid allocating memory at crypto request runtime for
skcipher
crypto: caam - avoid allocating memory at crypto request runtime for
aead
drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
include/crypto/aead.h | 4 ++
include/crypto/akcipher.h | 21 ++++++
include/crypto/hash.h | 4 ++
include/crypto/skcipher.h | 4 ++
include/linux/crypto.h | 1 +
net/ipv4/esp4.c | 7 +-
net/ipv6/esp6.c | 7 +-
8 files changed, 144 insertions(+), 34 deletions(-)
--
2.17.1
From: Iuliana Prodan <[email protected]>
Some crypto backends might require the requests' private contexts
to be allocated in DMA-able memory.
Signed-off-by: Horia Geanta <[email protected]>
---
net/ipv4/esp4.c | 7 ++++++-
net/ipv6/esp6.c | 7 ++++++-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 8b07f3a4f2db..9edfb1012c3d 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -46,6 +46,7 @@ struct esp_output_extra {
static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int extralen)
{
unsigned int len;
+ gfp_t gfp = GFP_ATOMIC;
len = extralen;
@@ -62,7 +63,11 @@ static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int extralen)
len += sizeof(struct scatterlist) * nfrags;
- return kmalloc(len, GFP_ATOMIC);
+ if (crypto_aead_reqsize(aead) &&
+ (crypto_aead_get_flags(aead) & CRYPTO_TFM_REQ_DMA))
+ gfp |= GFP_DMA;
+
+ return kmalloc(len, gfp);
}
static inline void *esp_tmp_extra(void *tmp)
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 52c2f063529f..e9125e1234b5 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -63,6 +63,7 @@ struct esp_output_extra {
static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int seqihlen)
{
unsigned int len;
+ gfp_t gfp = GFP_ATOMIC;
len = seqihlen;
@@ -79,7 +80,11 @@ static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int seqihlen)
len += sizeof(struct scatterlist) * nfrags;
- return kmalloc(len, GFP_ATOMIC);
+ if (crypto_aead_reqsize(aead) &&
+ (crypto_aead_get_flags(aead) & CRYPTO_TFM_REQ_DMA))
+ gfp |= GFP_DMA;
+
+ return kmalloc(len, gfp);
}
static inline void *esp_tmp_extra(void *tmp)
--
2.17.1
On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
<[email protected]> wrote:
>
> From: Iuliana Prodan <[email protected]>
>
> Add the option to allocate the crypto request object plus any extra space
> needed by the driver into a DMA-able memory.
>
> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
> indicate to crypto API the need to allocate GFP_DMA memory
> for private contexts of the crypto requests.
>
These are always directional DMA mappings, right? So why can't we use
bounce buffering here?
> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
> esp_alloc_tmp() function for IPv4 and IPv6.
>
> This series includes an example of how a driver can use
> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
> to avoid allocating memory at crypto request runtime.
> The extra size needed by the driver is added to the reqsize field
> that indicates how much memory could be needed per request.
>
> Iuliana Prodan (4):
> crypto: add CRYPTO_TFM_REQ_DMA flag
> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
> crypto: caam - avoid allocating memory at crypto request runtime for
> skcipher
> crypto: caam - avoid allocating memory at crypto request runtime for
> aead
>
> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
> include/crypto/aead.h | 4 ++
> include/crypto/akcipher.h | 21 ++++++
> include/crypto/hash.h | 4 ++
> include/crypto/skcipher.h | 4 ++
> include/linux/crypto.h | 1 +
> net/ipv4/esp4.c | 7 +-
> net/ipv6/esp6.c | 7 +-
> 8 files changed, 144 insertions(+), 34 deletions(-)
>
> --
> 2.17.1
>
From: Iuliana Prodan <[email protected]>
Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is computed in frontend
driver (caamalg) initialization and saved in reqsize field
that indicates how much memory could be needed per request.
CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.
Signed-off-by: Iuliana Prodan <[email protected]>
---
drivers/crypto/caam/caamalg.c | 59 +++++++++++++++++++++++++++--------
1 file changed, 46 insertions(+), 13 deletions(-)
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 6ace8545faec..7038394c41c0 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -880,6 +880,7 @@ static int xts_skcipher_setkey(struct crypto_skcipher *skcipher, const u8 *key,
* @mapped_dst_nents: number of segments in output h/w link table
* @sec4_sg_bytes: length of dma mapped sec4_sg space
* @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if aead_edesc needs to be freed
* @sec4_sg_dma: bus physical mapped address of h/w link table
* @sec4_sg: pointer to h/w link table
* @hw_desc: the h/w job descriptor followed by any referenced link tables
@@ -891,6 +892,7 @@ struct aead_edesc {
int mapped_dst_nents;
int sec4_sg_bytes;
bool bklog;
+ bool free;
dma_addr_t sec4_sg_dma;
struct sec4_sg_entry *sec4_sg;
u32 hw_desc[];
@@ -987,8 +989,8 @@ static void aead_crypt_done(struct device *jrdev, u32 *desc, u32 err,
ecode = caam_jr_strstatus(jrdev, err);
aead_unmap(jrdev, edesc, req);
-
- kfree(edesc);
+ if (edesc->free)
+ kfree(edesc);
/*
* If no backlog flag, the completion of the request is done
@@ -1301,7 +1303,7 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
int src_len, dst_len = 0;
struct aead_edesc *edesc;
- int sec4_sg_index, sec4_sg_len, sec4_sg_bytes;
+ int sec4_sg_index, sec4_sg_len, sec4_sg_bytes, edesc_size = 0;
unsigned int authsize = ctx->authsize;
if (unlikely(req->dst != req->src)) {
@@ -1381,13 +1383,30 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
sec4_sg_bytes = sec4_sg_len * sizeof(struct sec4_sg_entry);
- /* allocate space for base edesc and hw desc commands, link tables */
- edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
- GFP_DMA | flags);
- if (!edesc) {
- caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
- 0, 0, 0);
- return ERR_PTR(-ENOMEM);
+ /* Check if there's enough space for edesc saved in req */
+ edesc_size = sizeof(*edesc) + desc_bytes + sec4_sg_bytes;
+ if (edesc_size > (crypto_aead_reqsize(aead) -
+ sizeof(struct caam_aead_req_ctx))) {
+ /*
+ * allocate space for base edesc and
+ * hw desc commands, link tables
+ */
+ edesc = kzalloc(edesc_size, GFP_DMA | flags);
+ if (!edesc) {
+ caam_unmap(jrdev, req->src, req->dst, src_nents,
+ dst_nents, 0, 0, 0, 0);
+ return ERR_PTR(-ENOMEM);
+ }
+ edesc->free = true;
+ } else {
+ /*
+ * get address for base edesc and
+ * hw desc commands, link tables
+ */
+ edesc = (struct aead_edesc *)((u8 *)rctx +
+ sizeof(struct caam_aead_req_ctx));
+ /* clear memory */
+ memset(edesc, 0, sizeof(*edesc));
}
edesc->src_nents = src_nents;
@@ -1538,7 +1557,8 @@ static int aead_do_one_req(struct crypto_engine *engine, void *areq)
if (ret != -EINPROGRESS) {
aead_unmap(ctx->jrdev, rctx->edesc, req);
- kfree(rctx->edesc);
+ if (rctx->edesc->free)
+ kfree(rctx->edesc);
} else {
ret = 0;
}
@@ -3463,6 +3483,20 @@ static int caam_aead_init(struct crypto_aead *tfm)
struct caam_aead_alg *caam_alg =
container_of(alg, struct caam_aead_alg, aead);
struct caam_ctx *ctx = crypto_aead_ctx(tfm);
+ int extra_reqsize = 0;
+
+ /*
+ * Compute extra space needed for base edesc and
+ * hw desc commands, link tables, IV
+ */
+ extra_reqsize = sizeof(struct aead_edesc) +
+ /* max size for hw desc commands */
+ (AEAD_DESC_JOB_IO_LEN + CAAM_CMD_SZ * 6) +
+ /* link tables for src and dst, 4 entries max, aligned */
+ (8 * sizeof(struct sec4_sg_entry));
+
+ /* Need GFP_DMA for extra request size */
+ crypto_aead_set_flags(tfm, CRYPTO_TFM_REQ_DMA);
crypto_aead_set_reqsize(tfm, sizeof(struct caam_aead_req_ctx));
@@ -3533,8 +3567,7 @@ static void caam_aead_alg_init(struct caam_aead_alg *t_alg)
alg->base.cra_module = THIS_MODULE;
alg->base.cra_priority = CAAM_CRA_PRIORITY;
alg->base.cra_ctxsize = sizeof(struct caam_ctx);
- alg->base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY |
- CRYPTO_ALG_KERN_DRIVER_ONLY;
+ alg->base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_KERN_DRIVER_ONLY;
alg->init = caam_aead_init;
alg->exit = caam_aead_exit;
--
2.17.1
On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
> <[email protected]> wrote:
>>
>> From: Iuliana Prodan <[email protected]>
>>
>> Add the option to allocate the crypto request object plus any extra space
>> needed by the driver into a DMA-able memory.
>>
>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
>> indicate to crypto API the need to allocate GFP_DMA memory
>> for private contexts of the crypto requests.
>>
>
> These are always directional DMA mappings, right? So why can't we use
> bounce buffering here?
>
The idea was to avoid allocating any memory in crypto drivers.
We want to be able to use dm-crypt with CAAM, which needs DMA-able
memory and increasing reqsize is not enough.
It started from here
https://lore.kernel.org/linux-crypto/[email protected]/T/#m39684173a2f0f4b83d8bcbec223e98169273d1e4
>> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
>> esp_alloc_tmp() function for IPv4 and IPv6.
>>
>> This series includes an example of how a driver can use
>> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
>> to avoid allocating memory at crypto request runtime.
>> The extra size needed by the driver is added to the reqsize field
>> that indicates how much memory could be needed per request.
>>
>> Iuliana Prodan (4):
>> crypto: add CRYPTO_TFM_REQ_DMA flag
>> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
>> crypto: caam - avoid allocating memory at crypto request runtime for
>> skcipher
>> crypto: caam - avoid allocating memory at crypto request runtime for
>> aead
>>
>> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
>> include/crypto/aead.h | 4 ++
>> include/crypto/akcipher.h | 21 ++++++
>> include/crypto/hash.h | 4 ++
>> include/crypto/skcipher.h | 4 ++
>> include/linux/crypto.h | 1 +
>> net/ipv4/esp4.c | 7 +-
>> net/ipv6/esp6.c | 7 +-
>> 8 files changed, 144 insertions(+), 34 deletions(-)
>>
>> --
>> 2.17.1
>>
On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
>
> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
> > On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
> > <[email protected]> wrote:
> >>
> >> From: Iuliana Prodan <[email protected]>
> >>
> >> Add the option to allocate the crypto request object plus any extra space
> >> needed by the driver into a DMA-able memory.
> >>
> >> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
> >> indicate to crypto API the need to allocate GFP_DMA memory
> >> for private contexts of the crypto requests.
> >>
> >
> > These are always directional DMA mappings, right? So why can't we use
> > bounce buffering here?
> >
> The idea was to avoid allocating any memory in crypto drivers.
> We want to be able to use dm-crypt with CAAM, which needs DMA-able
> memory and increasing reqsize is not enough.
But what does 'needs DMA-able memory' mean? DMA operations are
asynchronous by definition, and so the DMA layer should be able to
allocate bounce buffers when needed. This will cost some performance
in cases where the hardware cannot address all of memory directly, but
this is a consequence of the design, and I don't think we should
burden the generic API with this.
> It started from here
> https://lore.kernel.org/linux-crypto/[email protected]/T/#m39684173a2f0f4b83d8bcbec223e98169273d1e4
>
> >> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
> >> esp_alloc_tmp() function for IPv4 and IPv6.
> >>
> >> This series includes an example of how a driver can use
> >> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
> >> to avoid allocating memory at crypto request runtime.
> >> The extra size needed by the driver is added to the reqsize field
> >> that indicates how much memory could be needed per request.
> >>
> >> Iuliana Prodan (4):
> >> crypto: add CRYPTO_TFM_REQ_DMA flag
> >> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
> >> crypto: caam - avoid allocating memory at crypto request runtime for
> >> skcipher
> >> crypto: caam - avoid allocating memory at crypto request runtime for
> >> aead
> >>
> >> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
> >> include/crypto/aead.h | 4 ++
> >> include/crypto/akcipher.h | 21 ++++++
> >> include/crypto/hash.h | 4 ++
> >> include/crypto/skcipher.h | 4 ++
> >> include/linux/crypto.h | 1 +
> >> net/ipv4/esp4.c | 7 +-
> >> net/ipv6/esp6.c | 7 +-
> >> 8 files changed, 144 insertions(+), 34 deletions(-)
> >>
> >> --
> >> 2.17.1
> >>
On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
> On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
>>
>> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
>>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
>>> <[email protected]> wrote:
>>>>
>>>> From: Iuliana Prodan <[email protected]>
>>>>
>>>> Add the option to allocate the crypto request object plus any extra space
>>>> needed by the driver into a DMA-able memory.
>>>>
>>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
>>>> indicate to crypto API the need to allocate GFP_DMA memory
>>>> for private contexts of the crypto requests.
>>>>
>>>
>>> These are always directional DMA mappings, right? So why can't we use
>>> bounce buffering here?
>>>
>> The idea was to avoid allocating any memory in crypto drivers.
>> We want to be able to use dm-crypt with CAAM, which needs DMA-able
>> memory and increasing reqsize is not enough.
>
> But what does 'needs DMA-able memory' mean? DMA operations are
> asynchronous by definition, and so the DMA layer should be able to
> allocate bounce buffers when needed. This will cost some performance
> in cases where the hardware cannot address all of memory directly, but
> this is a consequence of the design, and I don't think we should
> burden the generic API with this.
>
Ard, I believe you're right.
In CAAM, for req->src and req->dst, which comes from crypto request, we
use DMA mappings without knowing if the memory is DMAable or not.
We should do the same for CAAM's hw descriptors commands and link
tables. That's the extra memory allocated by increasing reqsize.
Horia, do you see any limitations, in CAAM, for not using the above
approach?
>> It started from here
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-crypto%2F71b6f739-d4a8-8b26-bf78-ce9acf9a0f99%40nxp.com%2FT%2F%23m39684173a2f0f4b83d8bcbec223e98169273d1e4&data=04%7C01%7Ciuliana.prodan%40nxp.com%7C436d50e009434fd1c52808d891da3c8f%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637419713794116972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=38rJuxFhPWDJXWc2R66Hu%2Fn7vve9u%2BeF0Evp%2BFAhCwQ%3D&reserved=0
>>
>>>> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
>>>> esp_alloc_tmp() function for IPv4 and IPv6.
>>>>
>>>> This series includes an example of how a driver can use
>>>> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
>>>> to avoid allocating memory at crypto request runtime.
>>>> The extra size needed by the driver is added to the reqsize field
>>>> that indicates how much memory could be needed per request.
>>>>
>>>> Iuliana Prodan (4):
>>>> crypto: add CRYPTO_TFM_REQ_DMA flag
>>>> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
>>>> crypto: caam - avoid allocating memory at crypto request runtime for
>>>> skcipher
>>>> crypto: caam - avoid allocating memory at crypto request runtime for
>>>> aead
>>>>
>>>> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
>>>> include/crypto/aead.h | 4 ++
>>>> include/crypto/akcipher.h | 21 ++++++
>>>> include/crypto/hash.h | 4 ++
>>>> include/crypto/skcipher.h | 4 ++
>>>> include/linux/crypto.h | 1 +
>>>> net/ipv4/esp4.c | 7 +-
>>>> net/ipv6/esp6.c | 7 +-
>>>> 8 files changed, 144 insertions(+), 34 deletions(-)
>>>>
>>>> --
>>>> 2.17.1
>>>>
On Thu, 26 Nov 2020 at 17:00, Iuliana Prodan <[email protected]> wrote:
>
> On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
> > On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
> >>
> >> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
> >>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
> >>> <[email protected]> wrote:
> >>>>
> >>>> From: Iuliana Prodan <[email protected]>
> >>>>
> >>>> Add the option to allocate the crypto request object plus any extra space
> >>>> needed by the driver into a DMA-able memory.
> >>>>
> >>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
> >>>> indicate to crypto API the need to allocate GFP_DMA memory
> >>>> for private contexts of the crypto requests.
> >>>>
> >>>
> >>> These are always directional DMA mappings, right? So why can't we use
> >>> bounce buffering here?
> >>>
> >> The idea was to avoid allocating any memory in crypto drivers.
> >> We want to be able to use dm-crypt with CAAM, which needs DMA-able
> >> memory and increasing reqsize is not enough.
> >
> > But what does 'needs DMA-able memory' mean? DMA operations are
> > asynchronous by definition, and so the DMA layer should be able to
> > allocate bounce buffers when needed. This will cost some performance
> > in cases where the hardware cannot address all of memory directly, but
> > this is a consequence of the design, and I don't think we should
> > burden the generic API with this.
> >
> Ard, I believe you're right.
>
> In CAAM, for req->src and req->dst, which comes from crypto request, we
> use DMA mappings without knowing if the memory is DMAable or not.
>
> We should do the same for CAAM's hw descriptors commands and link
> tables. That's the extra memory allocated by increasing reqsize.
>
It depends on whether any such mappings are non-directional. But I
would not expect per-request mappings to be modifiable by both the CPU
and the device at the same time.
> Horia, do you see any limitations, in CAAM, for not using the above
> approach?
>
>
> >> It started from here
> >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-crypto%2F71b6f739-d4a8-8b26-bf78-ce9acf9a0f99%40nxp.com%2FT%2F%23m39684173a2f0f4b83d8bcbec223e98169273d1e4&data=04%7C01%7Ciuliana.prodan%40nxp.com%7C436d50e009434fd1c52808d891da3c8f%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637419713794116972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=38rJuxFhPWDJXWc2R66Hu%2Fn7vve9u%2BeF0Evp%2BFAhCwQ%3D&reserved=0
> >>
> >>>> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
> >>>> esp_alloc_tmp() function for IPv4 and IPv6.
> >>>>
> >>>> This series includes an example of how a driver can use
> >>>> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
> >>>> to avoid allocating memory at crypto request runtime.
> >>>> The extra size needed by the driver is added to the reqsize field
> >>>> that indicates how much memory could be needed per request.
> >>>>
> >>>> Iuliana Prodan (4):
> >>>> crypto: add CRYPTO_TFM_REQ_DMA flag
> >>>> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
> >>>> crypto: caam - avoid allocating memory at crypto request runtime for
> >>>> skcipher
> >>>> crypto: caam - avoid allocating memory at crypto request runtime for
> >>>> aead
> >>>>
> >>>> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
> >>>> include/crypto/aead.h | 4 ++
> >>>> include/crypto/akcipher.h | 21 ++++++
> >>>> include/crypto/hash.h | 4 ++
> >>>> include/crypto/skcipher.h | 4 ++
> >>>> include/linux/crypto.h | 1 +
> >>>> net/ipv4/esp4.c | 7 +-
> >>>> net/ipv6/esp6.c | 7 +-
> >>>> 8 files changed, 144 insertions(+), 34 deletions(-)
> >>>>
> >>>> --
> >>>> 2.17.1
> >>>>
On 11/26/2020 7:12 PM, Ard Biesheuvel wrote:
> On Thu, 26 Nov 2020 at 17:00, Iuliana Prodan <[email protected]> wrote:
>>
>> On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
>>> On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
>>>>
>>>> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
>>>>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> From: Iuliana Prodan <[email protected]>
>>>>>>
>>>>>> Add the option to allocate the crypto request object plus any extra space
>>>>>> needed by the driver into a DMA-able memory.
>>>>>>
>>>>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
>>>>>> indicate to crypto API the need to allocate GFP_DMA memory
>>>>>> for private contexts of the crypto requests.
>>>>>>
>>>>>
>>>>> These are always directional DMA mappings, right? So why can't we use
>>>>> bounce buffering here?
>>>>>
>>>> The idea was to avoid allocating any memory in crypto drivers.
>>>> We want to be able to use dm-crypt with CAAM, which needs DMA-able
>>>> memory and increasing reqsize is not enough.
>>>
>>> But what does 'needs DMA-able memory' mean? DMA operations are
>>> asynchronous by definition, and so the DMA layer should be able to
>>> allocate bounce buffers when needed. This will cost some performance
>>> in cases where the hardware cannot address all of memory directly, but
>>> this is a consequence of the design, and I don't think we should
>>> burden the generic API with this.
>>>
>> Ard, I believe you're right.
>>
>> In CAAM, for req->src and req->dst, which comes from crypto request, we
>> use DMA mappings without knowing if the memory is DMAable or not.
>>
>> We should do the same for CAAM's hw descriptors commands and link
>> tables. That's the extra memory allocated by increasing reqsize.
>>
>
> It depends on whether any such mappings are non-directional. But I
> would not expect per-request mappings to be modifiable by both the CPU
> and the device at the same time.
>
There are bidirectional mappings on req->src (if it's also used for
output) and IV (if exits).
But, these are not modify by CPU and CAAM at the same time.
>
>> Horia, do you see any limitations, in CAAM, for not using the above
>> approach?
>>
>>
>>>> It started from here
>>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-crypto%2F71b6f739-d4a8-8b26-bf78-ce9acf9a0f99%40nxp.com%2FT%2F%23m39684173a2f0f4b83d8bcbec223e98169273d1e4&data=04%7C01%7Ciuliana.prodan%40nxp.com%7Cfdd8e587f49f44821e6d08d8922e8ca9%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637420075916446952%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x2G4kaWiKVVcOie2yC8JwOpDnPsa3OPO6HpfThqXChE%3D&reserved=0
>>>>
>>>>>> For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
>>>>>> esp_alloc_tmp() function for IPv4 and IPv6.
>>>>>>
>>>>>> This series includes an example of how a driver can use
>>>>>> CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
>>>>>> to avoid allocating memory at crypto request runtime.
>>>>>> The extra size needed by the driver is added to the reqsize field
>>>>>> that indicates how much memory could be needed per request.
>>>>>>
>>>>>> Iuliana Prodan (4):
>>>>>> crypto: add CRYPTO_TFM_REQ_DMA flag
>>>>>> net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
>>>>>> crypto: caam - avoid allocating memory at crypto request runtime for
>>>>>> skcipher
>>>>>> crypto: caam - avoid allocating memory at crypto request runtime for
>>>>>> aead
>>>>>>
>>>>>> drivers/crypto/caam/caamalg.c | 130 +++++++++++++++++++++++++---------
>>>>>> include/crypto/aead.h | 4 ++
>>>>>> include/crypto/akcipher.h | 21 ++++++
>>>>>> include/crypto/hash.h | 4 ++
>>>>>> include/crypto/skcipher.h | 4 ++
>>>>>> include/linux/crypto.h | 1 +
>>>>>> net/ipv4/esp4.c | 7 +-
>>>>>> net/ipv6/esp6.c | 7 +-
>>>>>> 8 files changed, 144 insertions(+), 34 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.17.1
>>>>>>
On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
> On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
>>
>> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
>>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
>>> <[email protected]> wrote:
>>>>
>>>> From: Iuliana Prodan <[email protected]>
>>>>
>>>> Add the option to allocate the crypto request object plus any extra space
>>>> needed by the driver into a DMA-able memory.
>>>>
>>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
>>>> indicate to crypto API the need to allocate GFP_DMA memory
>>>> for private contexts of the crypto requests.
>>>>
>>>
>>> These are always directional DMA mappings, right? So why can't we use
>>> bounce buffering here?
>>>
>> The idea was to avoid allocating any memory in crypto drivers.
>> We want to be able to use dm-crypt with CAAM, which needs DMA-able
>> memory and increasing reqsize is not enough.
>
> But what does 'needs DMA-able memory' mean? DMA operations are
> asynchronous by definition, and so the DMA layer should be able to
> allocate bounce buffers when needed. This will cost some performance
> in cases where the hardware cannot address all of memory directly, but
> this is a consequence of the design, and I don't think we should
> burden the generic API with this.
>
The performance loss due to bounce buffering is non-negligible.
Previous experiments we did showed a 35% gain (when forcing all data,
including I/O buffers, in ZONE_DMA32).
I don't have the exact numbers for bounce buffering introduced by allowing
only by the control data structures (descriptors etc.) in high memory,
but I don't think it's fair to easily dismiss this topic,
given the big performance drop and relatively low impact
on the generic API.
Thanks,
Horia
On Mon, 7 Dec 2020 at 14:50, Horia Geantă <[email protected]> wrote:
>
> On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
> > On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <[email protected]> wrote:
> >>
> >> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
> >>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
> >>> <[email protected]> wrote:
> >>>>
> >>>> From: Iuliana Prodan <[email protected]>
> >>>>
> >>>> Add the option to allocate the crypto request object plus any extra space
> >>>> needed by the driver into a DMA-able memory.
> >>>>
> >>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
> >>>> indicate to crypto API the need to allocate GFP_DMA memory
> >>>> for private contexts of the crypto requests.
> >>>>
> >>>
> >>> These are always directional DMA mappings, right? So why can't we use
> >>> bounce buffering here?
> >>>
> >> The idea was to avoid allocating any memory in crypto drivers.
> >> We want to be able to use dm-crypt with CAAM, which needs DMA-able
> >> memory and increasing reqsize is not enough.
> >
> > But what does 'needs DMA-able memory' mean? DMA operations are
> > asynchronous by definition, and so the DMA layer should be able to
> > allocate bounce buffers when needed. This will cost some performance
> > in cases where the hardware cannot address all of memory directly, but
> > this is a consequence of the design, and I don't think we should
> > burden the generic API with this.
> >
> The performance loss due to bounce buffering is non-negligible.
> Previous experiments we did showed a 35% gain (when forcing all data,
> including I/O buffers, in ZONE_DMA32).
>
> I don't have the exact numbers for bounce buffering introduced by allowing
> only by the control data structures (descriptors etc.) in high memory,
> but I don't think it's fair to easily dismiss this topic,
> given the big performance drop and relatively low impact
> on the generic API.
>
It is not about the impact on the API. It is about the layering
violation: all masters in a system will be affected by DMA addressing
limitations, and all will be affected by the performance impact of
bounce buffering when it is needed. DMA accessible memory is generally
'owned' by the DMA layer so it can be used for bounce buffering for
all masters. If one device starts claiming DMA-able memory for its own
use, other masters could be adversely affected, given that they may
not be able to do DMA at all (not even via bounce buffers) once a
single master uses up all DMA-able memory.