2020-09-07 09:34:51

by Van Leeuwen, Pascal

[permalink] [raw]
Subject: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

A customer of ours (Rambus) reported corruption issues running the driver
on their SoC platform, which turned out to be not fully coherent.
This caused problems with the DMA mapping of the state and cache buffers
stored inside the safexcel_ahash_req struct, as these buffers would not
start and/or end on a cacheline boundary, hence they could share a cache
line with the CPU. Which could cause the CPU to read stale data from the
cache (loaded from memory *before* the accelerator updated it).

Fixed by determining the system cacheline size and dynamically moving
these 2 buffers to a cacheline aligned boundary. Also ensuring that the
last cacheline of the last buffer is not shared (by overallocating).

This was tested by the customer to solve their coherence problems. It
was also tested by me on a VCU118 board w/ EIP-197c and Macchiatobin.

Signed-off-by: Pascal van Leeuwen <[email protected]>
---
drivers/crypto/inside-secure/safexcel_hash.c | 97 +++++++++++++++++++---------
1 file changed, 65 insertions(+), 32 deletions(-)

diff --git a/drivers/crypto/inside-secure/safexcel_hash.c b/drivers/crypto/inside-secure/safexcel_hash.c
index 16a4679..e350f39 100644
--- a/drivers/crypto/inside-secure/safexcel_hash.c
+++ b/drivers/crypto/inside-secure/safexcel_hash.c
@@ -22,6 +22,8 @@ struct safexcel_ahash_ctx {
struct safexcel_context base;
struct safexcel_crypto_priv *priv;

+ int extra_req_bytes;
+ int req_align;
u32 alg;
u8 key_sz;
bool cbcmac;
@@ -56,17 +58,19 @@ struct safexcel_ahash_req {
u8 state_sz; /* expected state size, only set once */
u8 block_sz; /* block size, only set once */
u8 digest_sz; /* output digest size, only set once */
- __le32 state[SHA3_512_BLOCK_SIZE /
- sizeof(__le32)] __aligned(sizeof(__le32));
+ __le32 *state; /* pointer to DMA safe state buffer */

u64 len;
u64 processed;

- u8 cache[HASH_CACHE_SIZE] __aligned(sizeof(u32));
+ u8 *cache; /* pointer to DMA safe cache buffer */
dma_addr_t cache_dma;
unsigned int cache_sz;

u8 cache_next[HASH_CACHE_SIZE] __aligned(sizeof(u32));
+
+ /* this is where the DMA buffers for state & cache end up */
+ u8 dma_buf_area[];
};

static inline u64 safexcel_queued_len(struct safexcel_ahash_req *req)
@@ -613,7 +617,6 @@ static int safexcel_ahash_send(struct crypto_async_request *async,
ret = safexcel_ahash_send_inv(async, ring, commands, results);
else
ret = safexcel_ahash_send_req(async, ring, commands, results);
-
return ret;
}

@@ -889,6 +892,25 @@ static int safexcel_ahash_export(struct ahash_request *areq, void *out)
return 0;
}

+static void safexcel_hash_req_init(struct safexcel_ahash_req *req,
+ int req_align)
+{
+ memset(req, 0, sizeof(*req));
+
+ /*
+ * put cache buffer at first cacheline aligned address at end of
+ * struct safexcel_ahash_req
+ */
+ req->cache = (u8 *)__ALIGN_MASK((uintptr_t)req->dma_buf_area,
+ (uintptr_t)req_align);
+ /*
+ * put state buffer at first cacheline aligned address behind
+ * the cache buffer
+ */
+ req->state = (__le32 *)__ALIGN_MASK((uintptr_t)req->cache +
+ HASH_CACHE_SIZE, (uintptr_t)req_align);
+}
+
static int safexcel_ahash_import(struct ahash_request *areq, const void *in)
{
struct safexcel_ahash_req *req = ahash_request_ctx(areq);
@@ -921,9 +943,20 @@ static int safexcel_ahash_cra_init(struct crypto_tfm *tfm)
ctx->base.send = safexcel_ahash_send;
ctx->base.handle_result = safexcel_handle_result;
ctx->fb_do_setkey = false;
+ ctx->req_align = cache_line_size() - 1;
+
+ /*
+ * compute how many bytes we need, worst case, to store cache
+ * aligned buffers for cache and state, padding to the next
+ * cacheline as well to avoid anything else ending up there
+ */
+ ctx->extra_req_bytes = ctx->req_align; /* worst case to next line */
+ ctx->extra_req_bytes += __ALIGN_MASK(SHA3_512_BLOCK_SIZE, ctx->req_align);
+ ctx->extra_req_bytes += __ALIGN_MASK(HASH_CACHE_SIZE, ctx->req_align);

crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
- sizeof(struct safexcel_ahash_req));
+ sizeof(struct safexcel_ahash_req) +
+ ctx->extra_req_bytes);
return 0;
}

@@ -932,7 +965,7 @@ static int safexcel_sha1_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA1;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1009,7 +1042,7 @@ static int safexcel_hmac_sha1_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SHA1_DIGEST_SIZE);
@@ -1106,7 +1139,7 @@ static int safexcel_hmac_init_iv(struct ahash_request *areq,
unsigned int blocksize, u8 *pad, void *state)
{
struct safexcel_ahash_result result;
- struct safexcel_ahash_req *req;
+ struct safexcel_ahash_req *req = ahash_request_ctx(areq);
struct scatterlist sg;
int ret;

@@ -1120,7 +1153,6 @@ static int safexcel_hmac_init_iv(struct ahash_request *areq,
if (ret)
return ret;

- req = ahash_request_ctx(areq);
req->hmac = true;
req->last_req = true;

@@ -1253,7 +1285,7 @@ static int safexcel_sha256_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA256;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1310,7 +1342,7 @@ static int safexcel_sha224_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA224;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1374,7 +1406,7 @@ static int safexcel_hmac_sha224_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SHA256_DIGEST_SIZE);
@@ -1446,7 +1478,7 @@ static int safexcel_hmac_sha256_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SHA256_DIGEST_SIZE);
@@ -1511,7 +1543,7 @@ static int safexcel_sha512_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA512;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1568,7 +1600,7 @@ static int safexcel_sha384_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA384;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1632,7 +1664,7 @@ static int safexcel_hmac_sha512_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SHA512_DIGEST_SIZE);
@@ -1704,7 +1736,7 @@ static int safexcel_hmac_sha384_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SHA512_DIGEST_SIZE);
@@ -1769,7 +1801,7 @@ static int safexcel_md5_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_MD5;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -1826,7 +1858,7 @@ static int safexcel_hmac_md5_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, MD5_DIGEST_SIZE);
@@ -1909,7 +1941,7 @@ static int safexcel_crc32_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from loaded key */
req->state[0] = (__force __le32)le32_to_cpu(~ctx->ipad[0]);
@@ -1981,7 +2013,7 @@ static int safexcel_cbcmac_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from loaded keys */
memcpy(req->state, ctx->ipad, ctx->key_sz);
@@ -2264,7 +2296,7 @@ static int safexcel_sm3_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SM3;
req->digest = CONTEXT_CONTROL_DIGEST_PRECOMPUTED;
@@ -2328,7 +2360,7 @@ static int safexcel_hmac_sm3_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(areq));
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Start from ipad precompute */
memcpy(req->state, ctx->ipad, SM3_DIGEST_SIZE);
@@ -2394,7 +2426,7 @@ static int safexcel_sha3_224_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA3_224;
req->digest = CONTEXT_CONTROL_DIGEST_INITIAL;
@@ -2536,7 +2568,8 @@ static int safexcel_sha3_cra_init(struct crypto_tfm *tfm)
/* Update statesize from fallback algorithm! */
crypto_hash_alg_common(ahash)->statesize =
crypto_ahash_statesize(ctx->fback);
- crypto_ahash_set_reqsize(ahash, max(sizeof(struct safexcel_ahash_req),
+ crypto_ahash_set_reqsize(ahash, max(sizeof(struct safexcel_ahash_req) +
+ ctx->extra_req_bytes,
sizeof(struct ahash_request) +
crypto_ahash_reqsize(ctx->fback)));
return 0;
@@ -2587,7 +2620,7 @@ static int safexcel_sha3_256_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA3_256;
req->digest = CONTEXT_CONTROL_DIGEST_INITIAL;
@@ -2645,7 +2678,7 @@ static int safexcel_sha3_384_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA3_384;
req->digest = CONTEXT_CONTROL_DIGEST_INITIAL;
@@ -2703,7 +2736,7 @@ static int safexcel_sha3_512_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

ctx->alg = CONTEXT_CONTROL_CRYPTO_ALG_SHA3_512;
req->digest = CONTEXT_CONTROL_DIGEST_INITIAL;
@@ -2853,7 +2886,7 @@ static int safexcel_hmac_sha3_224_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Copy (half of) the key */
memcpy(req->state, ctx->ipad, SHA3_224_BLOCK_SIZE / 2);
@@ -2924,7 +2957,7 @@ static int safexcel_hmac_sha3_256_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Copy (half of) the key */
memcpy(req->state, ctx->ipad, SHA3_256_BLOCK_SIZE / 2);
@@ -2995,7 +3028,7 @@ static int safexcel_hmac_sha3_384_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Copy (half of) the key */
memcpy(req->state, ctx->ipad, SHA3_384_BLOCK_SIZE / 2);
@@ -3066,7 +3099,7 @@ static int safexcel_hmac_sha3_512_init(struct ahash_request *areq)
struct safexcel_ahash_ctx *ctx = crypto_ahash_ctx(tfm);
struct safexcel_ahash_req *req = ahash_request_ctx(areq);

- memset(req, 0, sizeof(*req));
+ safexcel_hash_req_init(req, ctx->req_align);

/* Copy (half of) the key */
memcpy(req->state, ctx->ipad, SHA3_512_BLOCK_SIZE / 2);
--
1.8.3.1


2020-09-18 06:59:08

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

On Mon, Sep 07, 2020 at 10:19:44AM +0200, Pascal van Leeuwen wrote:
>
> @@ -921,9 +943,20 @@ static int safexcel_ahash_cra_init(struct crypto_tfm *tfm)
> ctx->base.send = safexcel_ahash_send;
> ctx->base.handle_result = safexcel_handle_result;
> ctx->fb_do_setkey = false;
> + ctx->req_align = cache_line_size() - 1;

So the alignment is just L1_CACHE_BYTES, which is a constant.
Why don't you just put that into the struct and then simply align
the whole struct? To get the aligned ctx, you can make a wrapper
around ahash_request_ctx that does the aligning for you.

Have a look at drivers/crypto/padlock-aes.c which does something
similar for the tfm ctx.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-09-18 07:53:00

by Van Leeuwen, Pascal

[permalink] [raw]
Subject: RE: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems


> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Friday, September 18, 2020 8:58 AM
> To: Van Leeuwen, Pascal <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems
>
> <<< External Email >>>
> On Mon, Sep 07, 2020 at 10:19:44AM +0200, Pascal van Leeuwen wrote:
> >
> > @@ -921,9 +943,20 @@ static int safexcel_ahash_cra_init(struct crypto_tfm *tfm)
> > ctx->base.send = safexcel_ahash_send;
> > ctx->base.handle_result = safexcel_handle_result;
> > ctx->fb_do_setkey = false;
> > +ctx->req_align = cache_line_size() - 1;
>
> So the alignment is just L1_CACHE_BYTES, which is a constant.
> Why don't you just put that into the struct and then simply align
> the whole struct? To get the aligned ctx, you can make a wrapper
> around ahash_request_ctx that does the aligning for you.
>
Actually, that is what we did as a _quick hack_ initially, but:

First of all, it's not only about the L1 cacheline size. It's about the worst case cache
line size in the path all the way from the CPU to the actual memory interface.

Second, cache line sizes may differ from system to system. So it's not actually
a constant at all (unless you compile the driver specifically for 1 target system).

> Have a look at drivers/crypto/padlock-aes.c which does something
> similar for the tfm ctx.
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

2020-09-18 08:04:19

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

On Fri, Sep 18, 2020 at 07:42:35AM +0000, Van Leeuwen, Pascal wrote:
>
> Actually, that is what we did as a _quick hack_ initially, but:
>
> First of all, it's not only about the L1 cacheline size. It's about the worst case cache
> line size in the path all the way from the CPU to the actual memory interface.
>
> Second, cache line sizes may differ from system to system. So it's not actually
> a constant at all (unless you compile the driver specifically for 1 target system).

Can this alignment exceed ARCH_DMA_MINALIGN? If not then the
macro CRYPTO_MINALIGN should cover it.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-09-18 08:22:20

by Van Leeuwen, Pascal

[permalink] [raw]
Subject: RE: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Friday, September 18, 2020 10:01 AM
> To: Van Leeuwen, Pascal <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; Ard Biesheuvel <[email protected]>
> Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems
>
> <<< External Email >>>
> On Fri, Sep 18, 2020 at 07:42:35AM +0000, Van Leeuwen, Pascal wrote:
> >
> > Actually, that is what we did as a _quick hack_ initially, but:
> >
> > First of all, it's not only about the L1 cacheline size. It's about the worst case cache
> > line size in the path all the way from the CPU to the actual memory interface.
> >
> > Second, cache line sizes may differ from system to system. So it's not actually
> > a constant at all (unless you compile the driver specifically for 1 target system).
>
> Can this alignment exceed ARCH_DMA_MINALIGN? If not then the
> macro CRYPTO_MINALIGN should cover it.
>
I don't know. I'm not familiar with that macro and I have not been able to dig up any
clear description on what it should convey.

Based on the name, I might be inclined to think yes, but based on many definitions
I've seen in header files, I would say no. Because it's often just an alias for the L1
cacheline size, which may not be the largest cacheline for _some_ systems.

In any case, aligning to the worst cache cacheline for a CPU architecture may mean
you end up wasting a lot of space on a system with a much smaller cacheline.

> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

2020-09-24 03:12:35

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

On Fri, Sep 18, 2020 at 08:21:44AM +0000, Van Leeuwen, Pascal wrote:
>
> > Can this alignment exceed ARCH_DMA_MINALIGN? If not then the
> > macro CRYPTO_MINALIGN should cover it.
>
> I don't know. I'm not familiar with that macro and I have not been able to dig up any
> clear description on what it should convey.

I'm pretty sure it is because that's the reason kmalloc uses it
as its minimum as otherwise memory returned by kmalloc may cross
cache-lines.

> In any case, aligning to the worst cache cacheline for a CPU architecture may mean
> you end up wasting a lot of space on a system with a much smaller cacheline.

It won't waste any memory because kmalloc is already using it as
a minimum.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-09-24 08:08:56

by Van Leeuwen, Pascal

[permalink] [raw]
Subject: RE: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Thursday, September 24, 2020 5:12 AM
> To: Van Leeuwen, Pascal <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; Ard Biesheuvel <[email protected]>
> Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems
>
> <<< External Email >>>
> On Fri, Sep 18, 2020 at 08:21:44AM +0000, Van Leeuwen, Pascal wrote:
> >
> > > Can this alignment exceed ARCH_DMA_MINALIGN? If not then the
> > > macro CRYPTO_MINALIGN should cover it.
> >
> > I don't know. I'm not familiar with that macro and I have not been able to dig up any
> > clear description on what it should convey.
>
> I'm pretty sure it is because that's the reason kmalloc uses it
> as its minimum as otherwise memory returned by kmalloc may cross
> cache-lines.
>
If that is indeed what kmalloc uses for alignment, good point ...
I suppose if that is guaranteed, it is a possible alternative solution to at least
the coherence problem I needed to solve.

But, why use some fixed worst case value if you can be more smart about it?
(That applies to kmalloc as well, by the way ... why does it use some fixed define
for that and not the dynamically discovered system cache line size?)

Also, there is some benefit to aligning these buffers for systems that ARE fully
coherent and therefore do not (seem to) define ARCH_DMA_MINALIGN.
Although that would also apply to any kmalloc'd buffers supplied externally ...

> > In any case, aligning to the worst cache cacheline for a CPU architecture may mean
> > you end up wasting a lot of space on a system with a much smaller cacheline.
>
> It won't waste any memory because kmalloc is already using it as
> a minimum.
>
The fact that kmalloc uses it does _not_ rule out the fact that it wastes memory ...
And as long as you use kmalloc for fairly large data structures, it shouldn't matter much.
But here I need a couple of fairly small buffers.

> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

2020-09-24 12:36:43

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

On Thu, Sep 24, 2020 at 08:08:12AM +0000, Van Leeuwen, Pascal wrote:
>
> The fact that kmalloc uses it does _not_ rule out the fact that it wastes memory ...
> And as long as you use kmalloc for fairly large data structures, it shouldn't matter much.
> But here I need a couple of fairly small buffers.

Those small buffers are embedded in a structure that's already
aligned by kmalloc. So just put your buffers at the start of
the struct to minimise holes.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2020-09-24 12:52:04

by Van Leeuwen, Pascal

[permalink] [raw]
Subject: RE: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems


> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Thursday, September 24, 2020 2:36 PM
> To: Van Leeuwen, Pascal <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; Ard Biesheuvel <[email protected]>
> Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems
>
> <<< External Email >>>
> On Thu, Sep 24, 2020 at 08:08:12AM +0000, Van Leeuwen, Pascal wrote:
> >
> > The fact that kmalloc uses it does _not_ rule out the fact that it wastes memory ...
> > And as long as you use kmalloc for fairly large data structures, it shouldn't matter much.
> > But here I need a couple of fairly small buffers.
>
> Those small buffers are embedded in a structure that's already
> aligned by kmalloc. So just put your buffers at the start of
> the struct to minimise holes.
>
If you would make them fixed in size, then putting them at the start instead of the
end would indeed by a bit more efficient (but obviously, that doesn't work if you
dynamically scale them), I'll remember that.

But you still have 2 potential gaps (from buffer 1 to buffer 2 and from buffer 2 to
the other items in the struct) that are larger then they may need to be.
If everyone can live with the wasted space, it's fine by me. (frankly, I don't know
where these structs may end up - guess not on the minimal kernel stack then?)

I only did it this way because I anticipated that that would be accepted ... guess I
could've save myself some trouble there :-)

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.

> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

2020-09-24 12:58:54

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: inside-secure - Fix corruption on not fully coherent systems

On Thu, Sep 24, 2020 at 12:51:11PM +0000, Van Leeuwen, Pascal wrote:
>
> But you still have 2 potential gaps (from buffer 1 to buffer 2 and from buffer 2 to
> the other items in the struct) that are larger then they may need to be.

So put some of the rest of your struct in the middle, up to 128
bytes.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt