2020-03-16 15:01:26

by Andrey Smirnov

[permalink] [raw]
Subject: [PATCH v8 0/8] enable CAAM's HWRNG as default

Everyone:

This series is a continuation of original [discussion]. I don't know
if what's in the series is enough to use CAAMs HWRNG system wide, but
I am hoping that with enough iterations and feedback it will be.

Changes since [v1]:

- Original hw_random replaced with the one using output of TRNG directly

- SEC4 DRNG IP block exposed via crypto API

- Small fix regarding use of GFP_DMA added to the series

Chagnes since [v2]:

- msleep in polling loop to avoid wasting CPU cycles

- caam_trng_read() bails out early if 'wait' is set to 'false'

- fixed typo in ZII's name

Changes since [v3]:

- DRNG's .cra_name is now "stdrng"

- collected Reviewd-by tag from Lucas

- typo fixes in commit messages of the series

Changes since [v4]:

- Dropped "crypto: caam - RNG4 TRNG errata" and "crypto: caam -
enable prediction resistance in HRWNG" to limit the scope of the
series. Those two patches are not yet ready and can be submitted
separately later.

- Collected Tested-by from Chris

Changes since [v5]:

- Series is converted back to implementing HWRNG using a job ring
as per feedback from Horia.

Changes since [v6]:

- "crypto: caam - drop global context pointer and init_done"
changed to use devres group to allow freeing HWRNG resource
independently of the parent device lifecycle. Code to deal with
circular deallocation dependency is added as well

- Removed worker self-scheduling in "crypto: caam - simplify RNG
implementation". It didn't bring much value, but meant that
simple cleanup with just a call to flush_work() wasn't good
enough.

- Added a simple function with a FIXME item for MC firmware check in
"crypto: caam - enable prediction resistance in HRWNG"

- "crypto: caam - limit single JD RNG output to maximum of 16
bytes" now shrinks async FIFO size from 32K to 64 bytes, since
having a buffer that big doesn't seem to do any good given that
througput of TRNG

Changes since [v7]:

- Collected Reviewd-bys from Horia

- updated "crypto: caam - simplify RNG implementation" to drop
custom type and fix comments

- updated "crypto: caam - enable prediction resistance in HRWNG"
to integrate code from Andrei Botila

- updated "crypto: caam - drop global context pointer and
init_done" to use .priv instead of container_of for private data
pointer

Feedback is welcome!

Thanks,
Andrey Smirnov

[discussion] https://patchwork.kernel.org/patch/9850669/
[v1] https://lore.kernel.org/lkml/[email protected]
[v2] https://lore.kernel.org/lkml/[email protected]
[v3] https://lore.kernel.org/lkml/[email protected]
[v4] https://lore.kernel.org/lkml/[email protected]
[v5] https://lore.kernel.org/lkml/[email protected]
[v6] https://lore.kernel.org/lkml/[email protected]
[v7] https://lore.kernel.org/lkml/[email protected]

Andrey Smirnov (8):
crypto: caam - allocate RNG instantiation descriptor with GFP_DMA
crypto: caam - use struct hwrng's .init for initialization
crypto: caam - drop global context pointer and init_done
crypto: caam - simplify RNG implementation
crypto: caam - check if RNG job failed
crypto: caam - invalidate entropy register during RNG initialization
crypto: caam - enable prediction resistance in HRWNG
crypto: caam - limit single JD RNG output to maximum of 16 bytes

drivers/crypto/caam/Kconfig | 1 +
drivers/crypto/caam/caamrng.c | 405 +++++++++++++---------------------
drivers/crypto/caam/ctrl.c | 88 ++++++--
drivers/crypto/caam/desc.h | 2 +
drivers/crypto/caam/intern.h | 7 +-
drivers/crypto/caam/jr.c | 13 +-
drivers/crypto/caam/regs.h | 7 +-
7 files changed, 243 insertions(+), 280 deletions(-)

--
2.21.0


2020-03-16 15:01:44

by Andrey Smirnov

[permalink] [raw]
Subject: [PATCH v8 7/8] crypto: caam - enable prediction resistance in HRWNG

Instantiate CAAM RNG with prediction resistance enabled to improve its
quality (with PR on DRNG is forced to reseed from TRNG every time
random data is generated).

Management Complex firmware with version lower than 10.20.0
doesn't provide prediction resistance support. Consider this
and only instantiate rng when mc f/w version is lower.

Signed-off-by: Andrey Smirnov <[email protected]>
Signed-off-by: Andrei Botila <[email protected]>
Cc: Chris Healy <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Horia Geantă <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Iuliana Prodan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/crypto/caam/Kconfig | 1 +
drivers/crypto/caam/caamrng.c | 3 +-
drivers/crypto/caam/ctrl.c | 73 ++++++++++++++++++++++++++++-------
drivers/crypto/caam/desc.h | 2 +
drivers/crypto/caam/regs.h | 4 +-
5 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
index 64f82263b20e..a62f228be6da 100644
--- a/drivers/crypto/caam/Kconfig
+++ b/drivers/crypto/caam/Kconfig
@@ -13,6 +13,7 @@ config CRYPTO_DEV_FSL_CAAM
depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE
select SOC_BUS
select CRYPTO_DEV_FSL_CAAM_COMMON
+ imply FSL_MC_BUS
help
Enables the driver module for Freescale's Cryptographic Accelerator
and Assurance Module (CAAM), also known as the SEC version 4 (SEC4).
diff --git a/drivers/crypto/caam/caamrng.c b/drivers/crypto/caam/caamrng.c
index ffbdc912f1be..cffa6604f726 100644
--- a/drivers/crypto/caam/caamrng.c
+++ b/drivers/crypto/caam/caamrng.c
@@ -69,7 +69,8 @@ static u32 *caam_init_desc(u32 *desc, dma_addr_t dst_dma, int len)
{
init_job_desc(desc, 0); /* + 1 cmd_sz */
/* Generate random bytes: + 1 cmd_sz */
- append_operation(desc, OP_ALG_ALGSEL_RNG | OP_TYPE_CLASS1_ALG);
+ append_operation(desc, OP_ALG_ALGSEL_RNG | OP_TYPE_CLASS1_ALG |
+ OP_ALG_PR_ON);
/* Store bytes: + 1 cmd_sz + caam_ptr_sz */
append_fifo_store(desc, dst_dma, len, FIFOST_TYPE_RNGSTORE);

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index b278471f4013..47521b6294ed 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -10,6 +10,7 @@
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/sys_soc.h>
+#include <linux/fsl/mc.h>

#include "compat.h"
#include "regs.h"
@@ -36,7 +37,8 @@ static void build_instantiation_desc(u32 *desc, int handle, int do_sk)
init_job_desc(desc, 0);

op_flags = OP_TYPE_CLASS1_ALG | OP_ALG_ALGSEL_RNG |
- (handle << OP_ALG_AAI_SHIFT) | OP_ALG_AS_INIT;
+ (handle << OP_ALG_AAI_SHIFT) | OP_ALG_AS_INIT |
+ OP_ALG_PR_ON;

/* INIT RNG in non-test mode */
append_operation(desc, op_flags);
@@ -278,12 +280,25 @@ static int instantiate_rng(struct device *ctrldev, int state_handle_mask,
return -ENOMEM;

for (sh_idx = 0; sh_idx < RNG4_MAX_HANDLES; sh_idx++) {
+ const u32 rdsta_if = RDSTA_IF0 << sh_idx;
+ const u32 rdsta_pr = RDSTA_PR0 << sh_idx;
+ const u32 rdsta_mask = rdsta_if | rdsta_pr;
/*
* If the corresponding bit is set, this state handle
* was initialized by somebody else, so it's left alone.
*/
- if ((1 << sh_idx) & state_handle_mask)
- continue;
+ if (rdsta_if & state_handle_mask) {
+ if (rdsta_pr & state_handle_mask)
+ continue;
+
+ dev_info(ctrldev,
+ "RNG4 SH%d was previously instantiated without prediction resistance. Tearing it down\n",
+ sh_idx);
+
+ ret = deinstantiate_rng(ctrldev, rdsta_if);
+ if (ret)
+ break;
+ }

/* Create the descriptor for instantiating RNG State Handle */
build_instantiation_desc(desc, sh_idx, gen_sk);
@@ -303,9 +318,9 @@ static int instantiate_rng(struct device *ctrldev, int state_handle_mask,
if (ret)
break;

- rdsta_val = rd_reg32(&ctrl->r4tst[0].rdsta) & RDSTA_IFMASK;
+ rdsta_val = rd_reg32(&ctrl->r4tst[0].rdsta) & RDSTA_MASK;
if ((status && status != JRSTA_SSRC_JUMP_HALT_CC) ||
- !(rdsta_val & (1 << sh_idx))) {
+ (rdsta_val & rdsta_mask) != rdsta_mask) {
ret = -EAGAIN;
break;
}
@@ -564,6 +579,26 @@ static void caam_remove_debugfs(void *root)
}
#endif

+#ifdef CONFIG_FSL_MC_BUS
+static bool check_version(struct fsl_mc_version *mc_version, u32 major,
+ u32 minor, u32 revision)
+{
+ if (mc_version->major > major)
+ return true;
+
+ if (mc_version->major == major) {
+ if (mc_version->minor > minor)
+ return true;
+
+ if (mc_version->minor == minor && mc_version->revision > 0)
+ return true;
+ }
+
+ return false;
+}
+#endif
+
+
/* Probe routine for CAAM top (controller) level */
static int caam_probe(struct platform_device *pdev)
{
@@ -582,6 +617,7 @@ static int caam_probe(struct platform_device *pdev)
u8 rng_vid;
int pg_size;
int BLOCK_OFFSET = 0;
+ bool pr_support = false;

ctrlpriv = devm_kzalloc(&pdev->dev, sizeof(*ctrlpriv), GFP_KERNEL);
if (!ctrlpriv)
@@ -667,6 +703,21 @@ static int caam_probe(struct platform_device *pdev)

/* Get the IRQ of the controller (for security violations only) */
ctrlpriv->secvio_irq = irq_of_parse_and_map(nprop, 0);
+ np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-mc");
+ ctrlpriv->mc_en = !!np;
+ of_node_put(np);
+
+#ifdef CONFIG_FSL_MC_BUS
+ if (ctrlpriv->mc_en) {
+ struct fsl_mc_version *mc_version;
+
+ mc_version = fsl_mc_get_version();
+ if (mc_version)
+ pr_support = check_version(mc_version, 10, 20, 0);
+ else
+ return -EPROBE_DEFER;
+ }
+#endif

/*
* Enable DECO watchdogs and, if this is a PHYS_ADDR_T_64BIT kernel,
@@ -674,10 +725,6 @@ static int caam_probe(struct platform_device *pdev)
* In case of SoCs with Management Complex, MC f/w performs
* the configuration.
*/
- np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-mc");
- ctrlpriv->mc_en = !!np;
- of_node_put(np);
-
if (!ctrlpriv->mc_en)
clrsetbits_32(&ctrl->mcr, MCFGR_AWCACHE_MASK,
MCFGR_AWCACHE_CACH | MCFGR_AWCACHE_BUFF |
@@ -784,7 +831,7 @@ static int caam_probe(struct platform_device *pdev)
* already instantiated, do RNG instantiation
* In case of SoCs with Management Complex, RNG is managed by MC f/w.
*/
- if (!ctrlpriv->mc_en && rng_vid >= 4) {
+ if (!(ctrlpriv->mc_en && pr_support) && rng_vid >= 4) {
ctrlpriv->rng4_sh_init =
rd_reg32(&ctrl->r4tst[0].rdsta);
/*
@@ -794,11 +841,11 @@ static int caam_probe(struct platform_device *pdev)
* to regenerate these keys before the next POR.
*/
gen_sk = ctrlpriv->rng4_sh_init & RDSTA_SKVN ? 0 : 1;
- ctrlpriv->rng4_sh_init &= RDSTA_IFMASK;
+ ctrlpriv->rng4_sh_init &= RDSTA_MASK;
do {
int inst_handles =
rd_reg32(&ctrl->r4tst[0].rdsta) &
- RDSTA_IFMASK;
+ RDSTA_MASK;
/*
* If either SH were instantiated by somebody else
* (e.g. u-boot) then it is assumed that the entropy
@@ -838,7 +885,7 @@ static int caam_probe(struct platform_device *pdev)
* Set handles init'ed by this module as the complement of the
* already initialized ones
*/
- ctrlpriv->rng4_sh_init = ~ctrlpriv->rng4_sh_init & RDSTA_IFMASK;
+ ctrlpriv->rng4_sh_init = ~ctrlpriv->rng4_sh_init & RDSTA_MASK;

/* Enable RDB bit so that RNG works faster */
clrsetbits_32(&ctrl->scfgr, 0, SCFGR_RDBENABLE);
diff --git a/drivers/crypto/caam/desc.h b/drivers/crypto/caam/desc.h
index 4b6854bf896a..e796d3cb9be8 100644
--- a/drivers/crypto/caam/desc.h
+++ b/drivers/crypto/caam/desc.h
@@ -1254,6 +1254,8 @@
#define OP_ALG_ICV_OFF (0 << OP_ALG_ICV_SHIFT)
#define OP_ALG_ICV_ON (1 << OP_ALG_ICV_SHIFT)

+#define OP_ALG_PR_ON BIT(1)
+
#define OP_ALG_DIR_SHIFT 0
#define OP_ALG_DIR_MASK 1
#define OP_ALG_DECRYPT 0
diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index c191e8fd0fa7..0f810bc13b2b 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -524,9 +524,11 @@ struct rng4tst {
u32 rsvd1[40];
#define RDSTA_SKVT 0x80000000
#define RDSTA_SKVN 0x40000000
+#define RDSTA_PR0 BIT(4)
+#define RDSTA_PR1 BIT(5)
#define RDSTA_IF0 0x00000001
#define RDSTA_IF1 0x00000002
-#define RDSTA_IFMASK (RDSTA_IF1 | RDSTA_IF0)
+#define RDSTA_MASK (RDSTA_PR1 | RDSTA_PR0 | RDSTA_IF1 | RDSTA_IF0)
u32 rdsta;
u32 rsvd2[15];
};
--
2.21.0

2020-03-16 15:01:43

by Andrey Smirnov

[permalink] [raw]
Subject: [PATCH v8 4/8] crypto: caam - simplify RNG implementation

Rework CAAM RNG implementation as follows:

- Make use of the fact that HWRNG supports partial reads and will
handle such cases gracefully by removing recursion in caam_read()

- Convert blocking caam_read() codepath to do a single blocking job
read directly into requested buffer, bypassing any intermediary
buffers

- Convert async caam_read() codepath into a simple single
reader/single writer FIFO use-case, thus simplifying concurrency
handling and delegating buffer read/write position management to KFIFO
subsystem.

- Leverage the same low level RNG data extraction code for both async
and blocking caam_read() scenarios, get rid of the shared job
descriptor and make non-shared one as a simple as possible (just
HEADER + ALGORITHM OPERATION + FIFO STORE)

- Split private context from DMA related memory, so that the former
could be allocated without GFP_DMA.

NOTE: On its face value this commit decreased throughput numbers
reported by

dd if=/dev/hwrng of=/dev/null bs=1 count=100K [iflag=nonblock]

by about 15%, however commits that enable prediction resistance and
limit JR total size impact the performance so much and move the
bottleneck such as to make this regression irrelevant.

NOTE: On the bright side, this commit reduces RNG in kernel DMA buffer
memory usage from 2 x RN_BUF_SIZE (~256K) to 32K.

Signed-off-by: Andrey Smirnov <[email protected]>
Cc: Chris Healy <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Horia Geantă <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Iuliana Prodan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/crypto/caam/caamrng.c | 323 ++++++++++++----------------------
1 file changed, 108 insertions(+), 215 deletions(-)

diff --git a/drivers/crypto/caam/caamrng.c b/drivers/crypto/caam/caamrng.c
index 753625f2b2c0..b3b92e5944cd 100644
--- a/drivers/crypto/caam/caamrng.c
+++ b/drivers/crypto/caam/caamrng.c
@@ -7,35 +7,12 @@
*
* Based on caamalg.c crypto API driver.
*
- * relationship between job descriptors to shared descriptors:
- *
- * --------------- --------------
- * | JobDesc #0 |-------------------->| ShareDesc |
- * | *(buffer 0) | |------------->| (generate) |
- * --------------- | | (move) |
- * | | (store) |
- * --------------- | --------------
- * | JobDesc #1 |------|
- * | *(buffer 1) |
- * ---------------
- *
- * A job desc looks like this:
- *
- * ---------------------
- * | Header |
- * | ShareDesc Pointer |
- * | SEQ_OUT_PTR |
- * | (output buffer) |
- * ---------------------
- *
- * The SharedDesc never changes, and each job descriptor points to one of two
- * buffers for each device, from which the data will be copied into the
- * requested destination
*/

#include <linux/hw_random.h>
#include <linux/completion.h>
#include <linux/atomic.h>
+#include <linux/kfifo.h>

#include "compat.h"

@@ -45,38 +22,26 @@
#include "jr.h"
#include "error.h"

+#define CAAM_RNG_MAX_FIFO_STORE_SIZE U16_MAX
+
+#define CAAM_RNG_FIFO_LEN SZ_32K /* Must be a multiple of 2 */
+
/*
- * Maximum buffer size: maximum number of random, cache-aligned bytes that
- * will be generated and moved to seq out ptr (extlen not allowed)
+ * Length of used descriptors, see caam_init_desc()
*/
-#define RN_BUF_SIZE (0xffff / L1_CACHE_BYTES * \
- L1_CACHE_BYTES)
-
-/* length of descriptors */
-#define DESC_JOB_O_LEN (CAAM_CMD_SZ * 2 + CAAM_PTR_SZ_MAX * 2)
-#define DESC_RNG_LEN (3 * CAAM_CMD_SZ)
-
-/* Buffer, its dma address and lock */
-struct buf_data {
- u8 buf[RN_BUF_SIZE] ____cacheline_aligned;
- dma_addr_t addr;
- struct completion filled;
- u32 hw_desc[DESC_JOB_O_LEN];
-#define BUF_NOT_EMPTY 0
-#define BUF_EMPTY 1
-#define BUF_PENDING 2 /* Empty, but with job pending --don't submit another */
- atomic_t empty;
-};
+#define CAAM_RNG_DESC_LEN (CAAM_CMD_SZ + \
+ CAAM_CMD_SZ + \
+ CAAM_CMD_SZ + CAAM_PTR_SZ_MAX)

/* rng per-device context */
struct caam_rng_ctx {
struct hwrng rng;
struct device *jrdev;
- dma_addr_t sh_desc_dma;
- u32 sh_desc[DESC_RNG_LEN];
- unsigned int cur_buf_idx;
- int current_buf;
- struct buf_data bufs[2];
+ struct device *ctrldev;
+ void *desc_async;
+ void *desc_sync;
+ struct work_struct worker;
+ struct kfifo fifo;
};

static struct caam_rng_ctx *to_caam_rng_ctx(struct hwrng *r)
@@ -84,228 +49,153 @@ static struct caam_rng_ctx *to_caam_rng_ctx(struct hwrng *r)
return (struct caam_rng_ctx *)r->priv;
}

-static inline void rng_unmap_buf(struct device *jrdev, struct buf_data *bd)
+static void caam_rng_done(struct device *jrdev, u32 *desc, u32 err,
+ void *context)
{
- if (bd->addr)
- dma_unmap_single(jrdev, bd->addr, RN_BUF_SIZE,
- DMA_FROM_DEVICE);
-}
-
-static inline void rng_unmap_ctx(struct caam_rng_ctx *ctx)
-{
- struct device *jrdev = ctx->jrdev;
-
- if (ctx->sh_desc_dma)
- dma_unmap_single(jrdev, ctx->sh_desc_dma,
- desc_bytes(ctx->sh_desc), DMA_TO_DEVICE);
- rng_unmap_buf(jrdev, &ctx->bufs[0]);
- rng_unmap_buf(jrdev, &ctx->bufs[1]);
-}
-
-static void rng_done(struct device *jrdev, u32 *desc, u32 err, void *context)
-{
- struct buf_data *bd;
-
- bd = container_of(desc, struct buf_data, hw_desc[0]);
+ struct completion *done = context;

if (err)
caam_jr_strstatus(jrdev, err);

- atomic_set(&bd->empty, BUF_NOT_EMPTY);
- complete(&bd->filled);
-
- /* Buffer refilled, invalidate cache */
- dma_sync_single_for_cpu(jrdev, bd->addr, RN_BUF_SIZE, DMA_FROM_DEVICE);
-
- print_hex_dump_debug("rng refreshed buf@: ", DUMP_PREFIX_ADDRESS, 16, 4,
- bd->buf, RN_BUF_SIZE, 1);
+ complete(done);
}

-static inline int submit_job(struct caam_rng_ctx *ctx, int to_current)
+static u32 *caam_init_desc(u32 *desc, dma_addr_t dst_dma, int len)
{
- struct buf_data *bd = &ctx->bufs[!(to_current ^ ctx->current_buf)];
- struct device *jrdev = ctx->jrdev;
- u32 *desc = bd->hw_desc;
- int err;
+ init_job_desc(desc, 0); /* + 1 cmd_sz */
+ /* Generate random bytes: + 1 cmd_sz */
+ append_operation(desc, OP_ALG_ALGSEL_RNG | OP_TYPE_CLASS1_ALG);
+ /* Store bytes: + 1 cmd_sz + caam_ptr_sz */
+ append_fifo_store(desc, dst_dma, len, FIFOST_TYPE_RNGSTORE);

- dev_dbg(jrdev, "submitting job %d\n", !(to_current ^ ctx->current_buf));
- init_completion(&bd->filled);
- err = caam_jr_enqueue(jrdev, desc, rng_done, ctx);
- if (err != -EINPROGRESS)
- complete(&bd->filled); /* don't wait on failed job*/
- else
- atomic_inc(&bd->empty); /* note if pending */
+ print_hex_dump_debug("rng job desc@: ", DUMP_PREFIX_ADDRESS,
+ 16, 4, desc, desc_bytes(desc), 1);

- return err;
+ return desc;
}

-static int caam_read(struct hwrng *rng, void *data, size_t max, bool wait)
+static int caam_rng_read_one(struct device *jrdev,
+ void *dst, int len,
+ void *desc,
+ struct completion *done)
{
- struct caam_rng_ctx *ctx = to_caam_rng_ctx(rng);
- struct buf_data *bd = &ctx->bufs[ctx->current_buf];
- int next_buf_idx, copied_idx;
+ dma_addr_t dst_dma;
int err;

- if (atomic_read(&bd->empty)) {
- /* try to submit job if there wasn't one */
- if (atomic_read(&bd->empty) == BUF_EMPTY) {
- err = submit_job(ctx, 1);
- /* if can't submit job, can't even wait */
- if (err != -EINPROGRESS)
- return 0;
- }
- /* no immediate data, so exit if not waiting */
- if (!wait)
- return 0;
-
- /* waiting for pending job */
- if (atomic_read(&bd->empty))
- wait_for_completion(&bd->filled);
- }
-
- next_buf_idx = ctx->cur_buf_idx + max;
- dev_dbg(ctx->jrdev, "%s: start reading at buffer %d, idx %d\n",
- __func__, ctx->current_buf, ctx->cur_buf_idx);
+ len = min_t(int, len, CAAM_RNG_MAX_FIFO_STORE_SIZE);

- /* if enough data in current buffer */
- if (next_buf_idx < RN_BUF_SIZE) {
- memcpy(data, bd->buf + ctx->cur_buf_idx, max);
- ctx->cur_buf_idx = next_buf_idx;
- return max;
+ dst_dma = dma_map_single(jrdev, dst, len, DMA_FROM_DEVICE);
+ if (dma_mapping_error(jrdev, dst_dma)) {
+ dev_err(jrdev, "unable to map destination memory\n");
+ return -ENOMEM;
}

- /* else, copy what's left... */
- copied_idx = RN_BUF_SIZE - ctx->cur_buf_idx;
- memcpy(data, bd->buf + ctx->cur_buf_idx, copied_idx);
- ctx->cur_buf_idx = 0;
- atomic_set(&bd->empty, BUF_EMPTY);
-
- /* ...refill... */
- submit_job(ctx, 1);
+ init_completion(done);
+ err = caam_jr_enqueue(jrdev,
+ caam_init_desc(desc, dst_dma, len),
+ caam_rng_done, done);
+ if (err == -EINPROGRESS) {
+ wait_for_completion(done);
+ err = 0;
+ }

- /* and use next buffer */
- ctx->current_buf = !ctx->current_buf;
- dev_dbg(ctx->jrdev, "switched to buffer %d\n", ctx->current_buf);
+ dma_unmap_single(jrdev, dst_dma, len, DMA_FROM_DEVICE);

- /* since there already is some data read, don't wait */
- return copied_idx + caam_read(rng, data + copied_idx,
- max - copied_idx, false);
+ return err ?: len;
}

-static inline int rng_create_sh_desc(struct caam_rng_ctx *ctx)
+static void caam_rng_fill_async(struct caam_rng_ctx *ctx)
{
- struct device *jrdev = ctx->jrdev;
- u32 *desc = ctx->sh_desc;
-
- init_sh_desc(desc, HDR_SHARE_SERIAL);
-
- /* Generate random bytes */
- append_operation(desc, OP_ALG_ALGSEL_RNG | OP_TYPE_CLASS1_ALG);
-
- /* Store bytes */
- append_seq_fifo_store(desc, RN_BUF_SIZE, FIFOST_TYPE_RNGSTORE);
-
- ctx->sh_desc_dma = dma_map_single(jrdev, desc, desc_bytes(desc),
- DMA_TO_DEVICE);
- if (dma_mapping_error(jrdev, ctx->sh_desc_dma)) {
- dev_err(jrdev, "unable to map shared descriptor\n");
- return -ENOMEM;
- }
-
- print_hex_dump_debug("rng shdesc@: ", DUMP_PREFIX_ADDRESS, 16, 4,
- desc, desc_bytes(desc), 1);
+ struct scatterlist sg[1];
+ struct completion done;
+ int len, nents;
+
+ sg_init_table(sg, ARRAY_SIZE(sg));
+ nents = kfifo_dma_in_prepare(&ctx->fifo, sg, ARRAY_SIZE(sg),
+ CAAM_RNG_FIFO_LEN);
+ if (!nents)
+ return;
+
+ len = caam_rng_read_one(ctx->jrdev, sg_virt(&sg[0]),
+ sg[0].length,
+ ctx->desc_async,
+ &done);
+ if (len < 0)
+ return;
+
+ kfifo_dma_in_finish(&ctx->fifo, len);
+}

- return 0;
+static void caam_rng_worker(struct work_struct *work)
+{
+ struct caam_rng_ctx *ctx = container_of(work, struct caam_rng_ctx,
+ worker);
+ caam_rng_fill_async(ctx);
}

-static inline int rng_create_job_desc(struct caam_rng_ctx *ctx, int buf_id)
+static int caam_read(struct hwrng *rng, void *dst, size_t max, bool wait)
{
- struct device *jrdev = ctx->jrdev;
- struct buf_data *bd = &ctx->bufs[buf_id];
- u32 *desc = bd->hw_desc;
- int sh_len = desc_len(ctx->sh_desc);
+ struct caam_rng_ctx *ctx = to_caam_rng_ctx(rng);
+ int out;

- init_job_desc_shared(desc, ctx->sh_desc_dma, sh_len, HDR_SHARE_DEFER |
- HDR_REVERSE);
+ if (wait) {
+ struct completion done;

- bd->addr = dma_map_single(jrdev, bd->buf, RN_BUF_SIZE, DMA_FROM_DEVICE);
- if (dma_mapping_error(jrdev, bd->addr)) {
- dev_err(jrdev, "unable to map dst\n");
- return -ENOMEM;
+ return caam_rng_read_one(ctx->jrdev, dst, max,
+ ctx->desc_sync, &done);
}

- append_seq_out_ptr_intlen(desc, bd->addr, RN_BUF_SIZE, 0);
+ out = kfifo_out(&ctx->fifo, dst, max);
+ if (kfifo_len(&ctx->fifo) <= CAAM_RNG_FIFO_LEN / 2)
+ schedule_work(&ctx->worker);

- print_hex_dump_debug("rng job desc@: ", DUMP_PREFIX_ADDRESS, 16, 4,
- desc, desc_bytes(desc), 1);
-
- return 0;
+ return out;
}

static void caam_cleanup(struct hwrng *rng)
{
struct caam_rng_ctx *ctx = to_caam_rng_ctx(rng);
- int i;
- struct buf_data *bd;

- for (i = 0; i < 2; i++) {
- bd = &ctx->bufs[i];
- if (atomic_read(&bd->empty) == BUF_PENDING)
- wait_for_completion(&bd->filled);
- }
-
- rng_unmap_ctx(ctx);
+ flush_work(&ctx->worker);
caam_jr_free(ctx->jrdev);
+ kfifo_free(&ctx->fifo);
}

-static int caam_init_buf(struct caam_rng_ctx *ctx, int buf_id)
+static int caam_init(struct hwrng *rng)
{
- struct buf_data *bd = &ctx->bufs[buf_id];
+ struct caam_rng_ctx *ctx = to_caam_rng_ctx(rng);
int err;

- err = rng_create_job_desc(ctx, buf_id);
- if (err)
- return err;
+ ctx->desc_sync = devm_kzalloc(ctx->ctrldev, CAAM_RNG_DESC_LEN,
+ GFP_DMA | GFP_KERNEL);
+ if (!ctx->desc_sync)
+ return -ENOMEM;

- atomic_set(&bd->empty, BUF_EMPTY);
- submit_job(ctx, buf_id == ctx->current_buf);
- wait_for_completion(&bd->filled);
+ ctx->desc_async = devm_kzalloc(ctx->ctrldev, CAAM_RNG_DESC_LEN,
+ GFP_DMA | GFP_KERNEL);
+ if (!ctx->desc_async)
+ return -ENOMEM;

- return 0;
-}
+ if (kfifo_alloc(&ctx->fifo, CAAM_RNG_FIFO_LEN, GFP_DMA | GFP_KERNEL))
+ return -ENOMEM;

-static int caam_init(struct hwrng *rng)
-{
- struct caam_rng_ctx *ctx = to_caam_rng_ctx(rng);
- int err;
+ INIT_WORK(&ctx->worker, caam_rng_worker);

ctx->jrdev = caam_jr_alloc();
err = PTR_ERR_OR_ZERO(ctx->jrdev);
if (err) {
+ kfifo_free(&ctx->fifo);
pr_err("Job Ring Device allocation for transform failed\n");
return err;
}

- err = rng_create_sh_desc(ctx);
- if (err)
- goto free_jrdev;
-
- ctx->current_buf = 0;
- ctx->cur_buf_idx = 0;
-
- err = caam_init_buf(ctx, 0);
- if (err)
- goto free_jrdev;
-
- err = caam_init_buf(ctx, 1);
- if (err)
- goto free_jrdev;
+ /*
+ * Fill async buffer to have early randomness data for
+ * hw_random
+ */
+ caam_rng_fill_async(ctx);

return 0;
-
-free_jrdev:
- caam_jr_free(ctx->jrdev);
- return err;
}

int caam_rng_init(struct device *ctrldev);
@@ -335,15 +225,18 @@ int caam_rng_init(struct device *ctrldev)
if (!devres_open_group(ctrldev, caam_rng_init, GFP_KERNEL))
return -ENOMEM;

- ctx = devm_kzalloc(ctrldev, sizeof(*ctx), GFP_DMA | GFP_KERNEL);
+ ctx = devm_kzalloc(ctrldev, sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;

+ ctx->ctrldev = ctrldev;
+
ctx->rng.name = "rng-caam";
ctx->rng.init = caam_init;
ctx->rng.cleanup = caam_cleanup;
ctx->rng.read = caam_read;
ctx->rng.priv = (unsigned long)ctx;
+ ctx->rng.quality = 1024;

dev_info(ctrldev, "registering rng-caam\n");

--
2.21.0

2020-03-16 15:02:05

by Andrey Smirnov

[permalink] [raw]
Subject: [PATCH v8 8/8] crypto: caam - limit single JD RNG output to maximum of 16 bytes

In order to follow recommendation in SP800-90C (section "9.4 The
Oversampling-NRBG Construction") limit the output of "generate" JD
submitted to CAAM. See
https://lore.kernel.org/linux-crypto/VI1PR0402MB3485EF10976A4A69F90E5B0F98580@VI1PR0402MB3485.eurprd04.prod.outlook.com/
for more details.

This change should make CAAM's hwrng driver good enough to have 1024
quality rating.

Signed-off-by: Andrey Smirnov <[email protected]>
Cc: Chris Healy <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Horia Geantă <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Iuliana Prodan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/crypto/caam/caamrng.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/caam/caamrng.c b/drivers/crypto/caam/caamrng.c
index cffa6604f726..77d048dfe5d0 100644
--- a/drivers/crypto/caam/caamrng.c
+++ b/drivers/crypto/caam/caamrng.c
@@ -22,9 +22,7 @@
#include "jr.h"
#include "error.h"

-#define CAAM_RNG_MAX_FIFO_STORE_SIZE U16_MAX
-
-#define CAAM_RNG_FIFO_LEN SZ_32K /* Must be a multiple of 2 */
+#define CAAM_RNG_MAX_FIFO_STORE_SIZE 16

/*
* Length of used descriptors, see caam_init_desc()
@@ -65,14 +63,15 @@ static void caam_rng_done(struct device *jrdev, u32 *desc, u32 err,
complete(jctx->done);
}

-static u32 *caam_init_desc(u32 *desc, dma_addr_t dst_dma, int len)
+static u32 *caam_init_desc(u32 *desc, dma_addr_t dst_dma)
{
init_job_desc(desc, 0); /* + 1 cmd_sz */
/* Generate random bytes: + 1 cmd_sz */
append_operation(desc, OP_ALG_ALGSEL_RNG | OP_TYPE_CLASS1_ALG |
OP_ALG_PR_ON);
/* Store bytes: + 1 cmd_sz + caam_ptr_sz */
- append_fifo_store(desc, dst_dma, len, FIFOST_TYPE_RNGSTORE);
+ append_fifo_store(desc, dst_dma,
+ CAAM_RNG_MAX_FIFO_STORE_SIZE, FIFOST_TYPE_RNGSTORE);

print_hex_dump_debug("rng job desc@: ", DUMP_PREFIX_ADDRESS,
16, 4, desc, desc_bytes(desc), 1);
@@ -92,7 +91,7 @@ static int caam_rng_read_one(struct device *jrdev,
.err = &ret,
};

- len = min_t(int, len, CAAM_RNG_MAX_FIFO_STORE_SIZE);
+ len = CAAM_RNG_MAX_FIFO_STORE_SIZE;

dst_dma = dma_map_single(jrdev, dst, len, DMA_FROM_DEVICE);
if (dma_mapping_error(jrdev, dst_dma)) {
@@ -102,7 +101,7 @@ static int caam_rng_read_one(struct device *jrdev,

init_completion(done);
err = caam_jr_enqueue(jrdev,
- caam_init_desc(desc, dst_dma, len),
+ caam_init_desc(desc, dst_dma),
caam_rng_done, &jctx);
if (err == -EINPROGRESS) {
wait_for_completion(done);
@@ -122,7 +121,7 @@ static void caam_rng_fill_async(struct caam_rng_ctx *ctx)

sg_init_table(sg, ARRAY_SIZE(sg));
nents = kfifo_dma_in_prepare(&ctx->fifo, sg, ARRAY_SIZE(sg),
- CAAM_RNG_FIFO_LEN);
+ CAAM_RNG_MAX_FIFO_STORE_SIZE);
if (!nents)
return;

@@ -156,7 +155,7 @@ static int caam_read(struct hwrng *rng, void *dst, size_t max, bool wait)
}

out = kfifo_out(&ctx->fifo, dst, max);
- if (kfifo_len(&ctx->fifo) <= CAAM_RNG_FIFO_LEN / 2)
+ if (kfifo_is_empty(&ctx->fifo))
schedule_work(&ctx->worker);

return out;
@@ -186,7 +185,8 @@ static int caam_init(struct hwrng *rng)
if (!ctx->desc_async)
return -ENOMEM;

- if (kfifo_alloc(&ctx->fifo, CAAM_RNG_FIFO_LEN, GFP_DMA | GFP_KERNEL))
+ if (kfifo_alloc(&ctx->fifo, CAAM_RNG_MAX_FIFO_STORE_SIZE,
+ GFP_DMA | GFP_KERNEL))
return -ENOMEM;

INIT_WORK(&ctx->worker, caam_rng_worker);
--
2.21.0

2020-03-17 17:42:11

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH v8 7/8] crypto: caam - enable prediction resistance in HRWNG

On 3/16/2020 5:01 PM, Andrey Smirnov wrote:
> Instantiate CAAM RNG with prediction resistance enabled to improve its
> quality (with PR on DRNG is forced to reseed from TRNG every time
> random data is generated).
>
> Management Complex firmware with version lower than 10.20.0
> doesn't provide prediction resistance support. Consider this
> and only instantiate rng when mc f/w version is lower.
>
As mentioned previously, this does not compile.

Please include the dependency
"bus: fsl-mc: add api to retrieve mc version"
https://patchwork.kernel.org/patch/11352493/
in the patch set.

> @@ -564,6 +579,26 @@ static void caam_remove_debugfs(void *root)
> }
> #endif
>
> +#ifdef CONFIG_FSL_MC_BUS
> +static bool check_version(struct fsl_mc_version *mc_version, u32 major,
> + u32 minor, u32 revision)
> +{
> + if (mc_version->major > major)
> + return true;
> +
> + if (mc_version->major == major) {
> + if (mc_version->minor > minor)
> + return true;
> +
> + if (mc_version->minor == minor && mc_version->revision > 0)
There's a typo in the code we provided, sorry for this.
Should be:
if (mc_version->minor == minor && mc_version->revision > revision)

Thanks,
Horia

2020-03-17 18:00:28

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH v8 7/8] crypto: caam - enable prediction resistance in HRWNG

On 3/16/2020 5:01 PM, Andrey Smirnov wrote:
> @@ -564,6 +579,26 @@ static void caam_remove_debugfs(void *root)
> }
> #endif
>
> +#ifdef CONFIG_FSL_MC_BUS
> +static bool check_version(struct fsl_mc_version *mc_version, u32 major,
> + u32 minor, u32 revision)
> +{
> + if (mc_version->major > major)
> + return true;
> +
> + if (mc_version->major == major) {
> + if (mc_version->minor > minor)
> + return true;
> +
> + if (mc_version->minor == minor && mc_version->revision > 0)
> + return true;
> + }
> +
> + return false;
> +}
> +#endif
> +
> +
Nitpick - checkpatch complains here:
CHECK: Please don't use multiple blank lines

Horia

2020-03-17 21:38:10

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH v8 4/8] crypto: caam - simplify RNG implementation

On 3/16/2020 5:01 PM, Andrey Smirnov wrote:
> Rework CAAM RNG implementation as follows:
>
> - Make use of the fact that HWRNG supports partial reads and will
> handle such cases gracefully by removing recursion in caam_read()
>
> - Convert blocking caam_read() codepath to do a single blocking job
> read directly into requested buffer, bypassing any intermediary
> buffers
>
> - Convert async caam_read() codepath into a simple single
> reader/single writer FIFO use-case, thus simplifying concurrency
> handling and delegating buffer read/write position management to KFIFO
> subsystem.
>
> - Leverage the same low level RNG data extraction code for both async
> and blocking caam_read() scenarios, get rid of the shared job
> descriptor and make non-shared one as a simple as possible (just
> HEADER + ALGORITHM OPERATION + FIFO STORE)
>
> - Split private context from DMA related memory, so that the former
> could be allocated without GFP_DMA.
>
> NOTE: On its face value this commit decreased throughput numbers
> reported by
>
> dd if=/dev/hwrng of=/dev/null bs=1 count=100K [iflag=nonblock]
>
> by about 15%, however commits that enable prediction resistance and
> limit JR total size impact the performance so much and move the
> bottleneck such as to make this regression irrelevant.
>
> NOTE: On the bright side, this commit reduces RNG in kernel DMA buffer
> memory usage from 2 x RN_BUF_SIZE (~256K) to 32K.
>
> Signed-off-by: Andrey Smirnov <[email protected]>
> Cc: Chris Healy <[email protected]>
> Cc: Lucas Stach <[email protected]>
> Cc: Horia Geantă <[email protected]>
> Cc: Herbert Xu <[email protected]>
> Cc: Iuliana Prodan <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
Reviewed-by: Horia Geantă <[email protected]>

Thanks,
Horia

2020-03-17 21:57:18

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH v8 8/8] crypto: caam - limit single JD RNG output to maximum of 16 bytes

On 3/16/2020 5:01 PM, Andrey Smirnov wrote:
> In order to follow recommendation in SP800-90C (section "9.4 The
> Oversampling-NRBG Construction") limit the output of "generate" JD
> submitted to CAAM. See
> https://lore.kernel.org/linux-crypto/VI1PR0402MB3485EF10976A4A69F90E5B0F98580@VI1PR0402MB3485.eurprd04.prod.outlook.com/
> for more details.
>
> This change should make CAAM's hwrng driver good enough to have 1024
> quality rating.
>
> Signed-off-by: Andrey Smirnov <[email protected]>
> Cc: Chris Healy <[email protected]>
> Cc: Lucas Stach <[email protected]>
> Cc: Horia Geantă <[email protected]>
> Cc: Herbert Xu <[email protected]>
> Cc: Iuliana Prodan <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
Reviewed-by: Horia Geantă <[email protected]>

Thanks,
Horia

2020-03-17 21:59:06

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH v8 4/8] crypto: caam - simplify RNG implementation

On 3/16/2020 5:01 PM, Andrey Smirnov wrote:
> @@ -335,15 +225,18 @@ int caam_rng_init(struct device *ctrldev)
> if (!devres_open_group(ctrldev, caam_rng_init, GFP_KERNEL))
> return -ENOMEM;
>
> - ctx = devm_kzalloc(ctrldev, sizeof(*ctx), GFP_DMA | GFP_KERNEL);
> + ctx = devm_kzalloc(ctrldev, sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> return -ENOMEM;
>
> + ctx->ctrldev = ctrldev;
> +
> ctx->rng.name = "rng-caam";
> ctx->rng.init = caam_init;
> ctx->rng.cleanup = caam_cleanup;
> ctx->rng.read = caam_read;
> ctx->rng.priv = (unsigned long)ctx;
> + ctx->rng.quality = 1024;
>
Nitpick: setting the quality should be moved to patch
"crypto: caam - limit single JD RNG output to maximum of 16 bytes"

Horia