2022-12-31 16:27:31

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 0/6] crypto/realtek: add new driver

This driver adds support for the Realtek crypto engine. It provides hardware
accelerated AES, SHA1 & MD5 algorithms. It is included in SoCs of the RTL838x
series, such as RTL8380, RTL8381, RTL8382, as well as SoCs from the RTL930x
series, such as RTL9301, RTL9302 and RTL9303. Some little endian and ARM based
Realtek SoCs seem to have this engine too. Nevertheless this patch was only
developed and verified on MIPS big endian devices.

Changes since v2:
- Use dma_map_single() & dma_unmap_single() calls
- Add missing dma_unmap_sg() calls
- Use dma_sync_single_for_device() only for mapped regions

Changes since v1:
- use macros to allow unaligned access during hash state import/export

Module has been successfully tested with
- lots of module loads/unloads with crypto manager extra tests enabled.
- openssl devcrypto benchmarking
- tcrypt.ko benchmarking

Benchmarks from tcrypt.ko mode=600, 402, 403 sec=5 on a 800 MHz RTL9301 SoC can
be summed up as follows:
- with smallest block sizes the engine is 8-10 times slower than software
- sweet spot (harware speed = software speed) starts at 256 byte blocks
- With large blocks the engine is round about 2 times faster than software
- md5 performance is always worse than software

op/s with default software algorithms:
16 B 64 B 256 B 1024 B 1472 B 8192 B
ecb(aes) 128 bit encrypt 513593 165651 44233 11264 7846 1411
ecb(aes) 128 bit decrypt 514819 165792 44259 11268 7851 1411
ecb(aes) 192 bit encrypt 455136 142680 37761 9579 6673 1198
ecb(aes) 192 bit decrypt 456524 142836 37790 9584 6675 1200
ecb(aes) 256 bit encrypt 412102 125771 33038 8361 5825 1048
ecb(aes) 256 bit decrypt 412321 125800 33056 8368 5827 1048
16 B 64 B 256 B 1024 B 1472 B 8192 B
cbc(aes) 128 bit encrypt 476081 154228 41307 10520 7331 1318
cbc(aes) 128 bit decrypt 462068 152934 41228 10516 7326 1315
cbc(aes) 192 bit encrypt 426126 133894 35598 9041 6297 1132
cbc(aes) 192 bit decrypt 416446 133116 35542 9040 6296 1131
cbc(aes) 256 bit encrypt 386841 118950 31382 7953 5539 996
cbc(aes) 256 bit decrypt 379032 118209 31324 7952 5537 995
16 B 64 B 256 B 1024 B 1472 B 8192 B
ctr(aes) 128 bit encrypt 475435 152852 40825 10372 7225 1299
ctr(aes) 128 bit decrypt 475804 152852 40862 10374 7227 1299
ctr(aes) 192 bit encrypt 426900 133025 35230 8940 6228 1120
ctr(aes) 192 bit decrypt 427377 133030 35235 8942 6228 1120
ctr(aes) 256 bit encrypt 388872 118259 31086 7875 5484 985
ctr(aes) 256 bit decrypt 388862 118260 31100 7875 5483 985
16 B 64 B 256 B 1024 B 2048 B 4096 B 8192 B
md5 600185 365210 166293 52399 27389 14011 7068
sha1 230154 124734 52979 16055 8322 4237 2137

op/s with module and hardware offloading enabled:
16 B 64 B 256 B 1024 B 1472 B 8192 B
ecb(aes) 128 bit encrypt 65062 58964 41380 19433 14884 2712
ecb(aes) 128 bit decrypt 65288 58507 40417 18854 14400 2627
ecb(aes) 192 bit encrypt 65233 57798 39236 17849 13534 2468
ecb(aes) 192 bit decrypt 65377 57100 38444 17336 13147 2406
ecb(aes) 256 bit encrypt 65064 56928 37400 16496 12432 2270
ecb(aes) 256 bit decrypt 64932 56115 36833 16064 12097 2219
16 B 64 B 256 B 1024 B 1472 B 8192 B
cbc(aes) 128 bit encrypt 64246 58073 40720 19361 14878 2718
cbc(aes) 128 bit decrypt 60969 55128 38904 18630 14184 2614
cbc(aes) 192 bit encrypt 64211 56854 38787 17793 13571 2468
cbc(aes) 192 bit decrypt 60948 53947 37209 17097 12955 2390
cbc(aes) 256 bit encrypt 63920 55889 37128 16502 12430 2267
cbc(aes) 256 bit decrypt 60680 53174 35787 15819 11961 2200
16 B 64 B 256 B 1024 B 1472 B 8192 B
ctr(aes) 128 bit encrypt 64452 58387 40897 19401 14921 2710
ctr(aes) 128 bit decrypt 64425 58244 41016 19433 14747 2710
ctr(aes) 192 bit encrypt 64513 57115 38884 17860 13547 2468
ctr(aes) 192 bit decrypt 64531 57116 39088 17785 13510 2468
ctr(aes) 256 bit encrypt 64284 56094 37254 16524 12411 2267
ctr(aes) 256 bit decrypt 64272 56321 37296 16436 12411 2265
16 B 64 B 256 B 1024 B 2048 B 4096 B 8192 B
md5 47224 44513 39175 25264 17199 10548 5874
sha1 46389 43578 36878 22501 14890 8796 4835

Markus Stockhausen (6)
crypto/realtek: header definitions
crypto/realtek: core functions
crypto/realtek: hash algorithms
crypto/realtek: skcipher algorithms
crypto/realtek: enable module
crypto/realtek: add devicetree documentation

/devicetree/bindings/crypto/realtek,realtek-crypto.yaml| 51 +
drivers/crypto/Kconfig | 13
drivers/crypto/Makefile | 1
drivers/crypto/realtek/Makefile | 5
drivers/crypto/realtek/realtek_crypto.c | 475 ++++++++++
drivers/crypto/realtek/realtek_crypto.h | 328 ++++++
drivers/crypto/realtek/realtek_crypto_ahash.c | 412 ++++++++
drivers/crypto/realtek/realtek_crypto_skcipher.c | 376 +++++++
8 files changed, 1661 insertions(+)




2022-12-31 16:32:29

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 1/6] crypto/realtek: header definitions

Add header definitions for new Realtek crypto device.

Signed-off-by: Markus Stockhausen <[email protected]>
---
drivers/crypto/realtek/realtek_crypto.h | 328 ++++++++++++++++++++++++
1 file changed, 328 insertions(+)
create mode 100644 drivers/crypto/realtek/realtek_crypto.h

diff --git a/drivers/crypto/realtek/realtek_crypto.h b/drivers/crypto/realtek/realtek_crypto.h
new file mode 100644
index 000000000000..66b977c082fd
--- /dev/null
+++ b/drivers/crypto/realtek/realtek_crypto.h
@@ -0,0 +1,328 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Crypto acceleration support for Realtek crypto engine. Based on ideas from
+ * Rockchip & SafeXcel driver plus Realtek OpenWrt RTK.
+ *
+ * Copyright (c) 2022, Markus Stockhausen <[email protected]>
+ */
+
+#ifndef __REALTEK_CRYPTO_H__
+#define __REALTEK_CRYPTO_H__
+
+#include <linux/interrupt.h>
+#include <crypto/aes.h>
+#include <crypto/md5.h>
+#include <crypto/hash.h>
+#include <crypto/sha1.h>
+#include <crypto/skcipher.h>
+
+/*
+ * The four engine registers for instrumentation of the hardware.
+ */
+#define RTCR_REG_SRC 0x0 /* Source descriptor starting address */
+#define RTCR_REG_DST 0x4 /* Destination Descriptor starting address */
+#define RTCR_REG_CMD 0x8 /* Command/Status Register */
+#define RTCR_REG_CTR 0xC /* Control Register */
+/*
+ * Engine Command/Status Register.
+ */
+#define RTCR_CMD_SDUEIP BIT(15) /* Src desc unavail error interrupt pending */
+#define RTCR_CMD_SDLEIP BIT(14) /* Src desc length error interrupt pending */
+#define RTCR_CMD_DDUEIP BIT(13) /* Dst desc unavail error interrupt pending */
+#define RTCR_CMD_DDOKIP BIT(12) /* Dst dsec ok interrupt pending */
+#define RTCR_CMD_DABFIP BIT(11) /* Data address buffer interrupt pending */
+#define RTCR_CMD_POLL BIT(1) /* Descriptor polling. Set to kick engine */
+#define RTCR_CMD_SRST BIT(0) /* Software reset, write 1 to reset */
+/*
+ * Engine Control Register
+ */
+#define RTCR_CTR_SDUEIE BIT(15) /* Src desc unavail error interrupt enable */
+#define RTCR_CTR_SDLEIE BIT(14) /* Src desc length error interrupt enable */
+#define RTCR_CTR_DDUEIE BIT(13) /* Dst desc unavail error interrupt enable */
+#define RTCR_CTR_DDOKIE BIT(12) /* Dst desc ok interrupt enable */
+#define RTCR_CTR_DABFIE BIT(11) /* Data address buffer interrupt enable */
+#define RTCR_CTR_LBKM BIT(8) /* Loopback mode enable */
+#define RTCR_CTR_SAWB BIT(7) /* Source address write back = work inplace */
+#define RTCR_CTR_CKE BIT(6) /* Clock enable */
+#define RTCR_CTR_DDMMSK 0x38 /* Destination DMA max burst size mask */
+#define RTCR_CTR_DDM16 0x00 /* Destination DMA max burst size 16 bytes */
+#define RTCR_CTR_DDM32 0x08 /* Destination DMA max burst size 32 bytes */
+#define RTCR_CTR_DDM64 0x10 /* Destination DMA max burst size 64 bytes */
+#define RTCR_CTR_DDM128 0x18 /* Destination DMA max burst size 128 bytes */
+#define RTCR_CTR_SDMMSK 0x07 /* Source DMA max burst size mask */
+#define RTCR_CTR_SDM16 0x00 /* Source DMA max burst size 16 bytes */
+#define RTCR_CTR_SDM32 0x01 /* Source DMA max burst size 32 bytes */
+#define RTCR_CTR_SDM64 0x02 /* Source DMA max burst size 64 bytes */
+#define RTCR_CTR_SDM128 0x03 /* Source DMA max burst size 128 bytes */
+
+/*
+ * Module settings and constants. Some of the limiter values have been chosen
+ * based on testing (e.g. ring sizes). Others are based on real hardware
+ * limits (e.g. scatter, request size, hash size).
+ */
+#define RTCR_SRC_RING_SIZE 64
+#define RTCR_DST_RING_SIZE 16
+#define RTCR_BUF_RING_SIZE 32768
+#define RTCR_MAX_REQ_SIZE 8192
+#define RTCR_MAX_SG 8
+#define RTCR_MAX_SG_AHASH (RTCR_MAX_SG - 1)
+#define RTCR_MAX_SG_SKCIPHER (RTCR_MAX_SG - 3)
+#define RTCR_HASH_VECTOR_SIZE SHA1_DIGEST_SIZE
+
+#define RTCR_ALG_AHASH 0
+#define RTCR_ALG_SKCIPHER 1
+
+#define RTCR_HASH_UPDATE BIT(0)
+#define RTCR_HASH_FINAL BIT(1)
+#define RTCR_HASH_BUF_SIZE SHA1_BLOCK_SIZE
+#define RTCR_HASH_PAD_SIZE ((SHA1_BLOCK_SIZE + 8) / sizeof(u64))
+
+#define RTCR_REQ_SG_MASK 0xff
+#define RTCR_REQ_MD5 BIT(8)
+#define RTCR_REQ_SHA1 BIT(9)
+#define RTCR_REQ_FB_ACT BIT(10)
+#define RTCR_REQ_FB_RDY BIT(11)
+
+/*
+ * Crypto ring source data descripter. This data is fed into the engine. It
+ * takes all information about the input data and the type of cypher/hash
+ * algorithm that we want to apply. Each request consists of several source
+ * descriptors.
+ */
+struct rtcr_src_desc {
+ u32 opmode;
+ u32 len;
+ u32 dummy;
+ phys_addr_t paddr;
+};
+
+#define RTCR_SRC_DESC_SIZE (sizeof(struct rtcr_src_desc))
+/*
+ * Owner: This flag identifies the owner of the block. When we send the
+ * descripter to the ring set this flag to 1. Once the crypto engine has
+ * finished processing this will be reset to 0.
+ */
+#define RTCR_SRC_OP_OWN_ASIC BIT(31)
+#define RTCR_SRC_OP_OWN_CPU 0
+/*
+ * End of ring: Setting this flag to 1 tells the crypto engine that this is
+ * the last descriptor of the whole ring (not the request). If set the engine
+ * will not increase the processing pointer afterwards but will jump back to
+ * the first descriptor address it was initialized with.
+ */
+#define RTCR_SRC_OP_EOR BIT(30)
+#define RTCR_SRC_OP_CALC_EOR(idx) ((idx == RTCR_SRC_RING_SIZE - 1) ? \
+ RTCR_SRC_OP_EOR : 0)
+/*
+ * First segment: If set to 1 this is the first descriptor of a request. All
+ * descriptors that follow will have this flag set to 0 belong to the same
+ * request.
+ */
+#define RTCR_SRC_OP_FS BIT(29)
+/*
+ * Mode select: Set to 00b for crypto only, set to 01b for hash only, 10b for
+ * hash then crypto or 11b for crypto then hash.
+ */
+#define RTCR_SRC_OP_MS_CRYPTO 0
+#define RTCR_SRC_OP_MS_HASH BIT(26)
+#define RTCR_SRC_OP_MS_HASH_CRYPTO BIT(27)
+#define RTCR_SRC_OP_MS_CRYPTO_HASH GENMASK(27, 26)
+/*
+ * Key application management: Only relevant for cipher (AES/3DES/DES) mode. If
+ * using AES or DES it has to be set to 0 (000b) for decryption and 7 (111b) for
+ * encryption. For 3DES it has to be set to 2 (010b = decrypt, encrypt, decrypt)
+ * for decryption and 5 (101b = encrypt, decrypt, encrypt) for encryption.
+ */
+#define RTCR_SRC_OP_KAM_DEC 0
+#define RTCR_SRC_OP_KAM_ENC GENMASK(25, 23)
+#define RTCR_SRC_OP_KAM_3DES_DEC BIT(24)
+#define RTCR_SRC_OP_KAM_3DES_ENC (BIT(23) | BIT(25))
+/*
+ * AES/3DES/DES mode & key length: Upper two bits for AES mode. If set to values
+ * other than 0 we want to encrypt/decrypt with AES. The values are 01b for 128
+ * bit key length, 10b for 192 bit key length and 11b for 256 bit key length.
+ * If AES is disabled (upper two bits 00b) then the lowest bit determines if we
+ * want to use 3DES (1) or DES (0).
+ */
+#define RTCR_SRC_OP_CIPHER_FROM_KEY(k) ((k - 8) << 18)
+#define RTCR_SRC_OP_CIPHER_AES_128 BIT(21)
+#define RTCR_SRC_OP_CIPHER_AES_192 BIT(22)
+#define RTCR_SRC_OP_CIPHER_AES_256 GENMASK(22, 21)
+#define RTCR_SRC_OP_CIPHER_3DES BIT(20)
+#define RTCR_SRC_OP_CIPHER_DES 0
+#define RTCR_SRC_OP_CIPHER_MASK GENMASK(22, 20)
+/*
+ * Cipher block mode: Determines the block mode of a cipher request. Set to 00b
+ * for ECB, 01b for CTR and 10b for CTR.
+ */
+#define RTCR_SRC_OP_CRYPT_ECB 0
+#define RTCR_SRC_OP_CRYPT_CTR BIT(18)
+#define RTCR_SRC_OP_CRYPT_CBC BIT(19)
+/*
+ * Hash mode: Set to 1 for MD5 or 0 for SHA1 calculation.
+ */
+#define RTCR_SRC_OP_HASH_MD5 BIT(16)
+#define RTCR_SRC_OP_HASH_SHA1 0
+
+#define RTCR_SRC_OP_DUMMY_LEN 128
+
+/*
+ * Crypto ring destination data descriptor. Data inside will be fed to the
+ * engine and if we process a hash request we get the resulting hash from here.
+ * Each request consists of exactly one destination descriptor.
+ */
+struct rtcr_dst_desc {
+ u32 opmode;
+ phys_addr_t paddr;
+ u32 dummy;
+ u32 vector[RTCR_HASH_VECTOR_SIZE / sizeof(u32)];
+};
+
+#define RTCR_DST_DESC_SIZE (sizeof(struct rtcr_dst_desc))
+/*
+ * Owner: This flag identifies the owner of the block. When we send the
+ * descripter to the ring set this flag to 1. Once the crypto engine has
+ * finished processing this will be reset to 0.
+ */
+#define RTCR_DST_OP_OWN_ASIC BIT(31)
+#define RTCR_DST_OP_OWN_CPU 0
+/*
+ * End of ring: Setting this flag to 1 tells the crypto engine that this is
+ * the last descriptor of the whole ring (not the request). If set the engine
+ * will not increase the processing pointer afterwards but will jump back to
+ * the first descriptor address it was initialized with.
+ */
+#define RTCR_DST_OP_EOR BIT(30)
+#define RTCR_DST_OP_CALC_EOR(idx) ((idx == RTCR_DST_RING_SIZE - 1) ? \
+ RTCR_DST_OP_EOR : 0)
+
+/*
+ * Writeback descriptor. This descriptor maintains additional data per request
+ * about post processing. E.g. the hash result or a cipher that was written to
+ * the internal buffer only.
+ */
+struct rtcr_wbk_desc {
+ void *dst;
+ void *src;
+ int off;
+ int len;
+};
+/*
+ * To keep the size of the descriptor a power of 2 (cache line aligned) the
+ * length field can denote special writeback requests that need another type of
+ * postprocessing.
+ */
+#define RTCR_WB_LEN_DONE (0)
+#define RTCR_WB_LEN_HASH (-1)
+#define RTCR_WB_LEN_SG_DIRECT (-2)
+
+struct rtcr_crypto_dev {
+ char buf_ring[RTCR_BUF_RING_SIZE];
+ struct rtcr_src_desc src_ring[RTCR_SRC_RING_SIZE];
+ struct rtcr_dst_desc dst_ring[RTCR_DST_RING_SIZE];
+ struct rtcr_wbk_desc wbk_ring[RTCR_DST_RING_SIZE];
+
+ /* modified under ring lock */
+ int cpu_src_idx;
+ int cpu_dst_idx;
+ int cpu_buf_idx;
+
+ /* modified in (serialized) tasklet */
+ int pp_src_idx;
+ int pp_dst_idx;
+ int pp_buf_idx;
+
+ /* modified under asic lock */
+ int asic_dst_idx;
+ int asic_src_idx;
+ bool busy;
+
+ int irq;
+ spinlock_t asiclock;
+ spinlock_t ringlock;
+ struct tasklet_struct done_task;
+ wait_queue_head_t done_queue;
+
+ void __iomem *base;
+ dma_addr_t src_dma;
+ dma_addr_t dst_dma;
+
+ struct platform_device *pdev;
+ struct device *dev;
+};
+
+struct rtcr_alg_template {
+ struct rtcr_crypto_dev *cdev;
+ int type;
+ int opmode;
+ union {
+ struct skcipher_alg skcipher;
+ struct ahash_alg ahash;
+ } alg;
+};
+
+struct rtcr_ahash_ctx {
+ struct rtcr_crypto_dev *cdev;
+ struct crypto_ahash *fback;
+ int opmode;
+};
+
+struct rtcr_ahash_req {
+ int state;
+ /* Data from here is lost if fallback switch happens */
+ u32 vector[RTCR_HASH_VECTOR_SIZE];
+ u64 totallen;
+ char buf[RTCR_HASH_BUF_SIZE];
+ int buflen;
+};
+
+union rtcr_fallback_state {
+ struct md5_state md5;
+ struct sha1_state sha1;
+};
+
+struct rtcr_skcipher_ctx {
+ struct rtcr_crypto_dev *cdev;
+ int opmode;
+ int keylen;
+ dma_addr_t keydma;
+ u32 keyenc[AES_KEYSIZE_256 / sizeof(u32)];
+ u32 keydec[AES_KEYSIZE_256 / sizeof(u32)];
+};
+
+extern struct rtcr_alg_template rtcr_ahash_md5;
+extern struct rtcr_alg_template rtcr_ahash_sha1;
+extern struct rtcr_alg_template rtcr_skcipher_ecb_aes;
+extern struct rtcr_alg_template rtcr_skcipher_cbc_aes;
+extern struct rtcr_alg_template rtcr_skcipher_ctr_aes;
+
+extern void rtcr_lock_ring(struct rtcr_crypto_dev *cdev);
+extern void rtcr_unlock_ring(struct rtcr_crypto_dev *cdev);
+
+extern int rtcr_alloc_ring(struct rtcr_crypto_dev *cdev, int srclen,
+ int *srcidx, int *dstidx, int buflen, char **buf);
+extern void rtcr_add_src_ahash_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ int opmode, int totallen);
+extern void rtcr_add_src_pad_to_ring(struct rtcr_crypto_dev *cdev,
+ int idx, int len);
+extern void rtcr_add_src_skcipher_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ int opmode, int totallen,
+ struct rtcr_skcipher_ctx *sctx);
+extern void rtcr_add_src_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ void *vaddr, int blocklen, int totallen);
+extern void rtcr_add_wbk_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ void *dst, int off);
+extern void rtcr_add_dst_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ void *reqdst, int reqlen, void *wbkdst,
+ int wbkoff);
+
+extern void rtcr_kick_engine(struct rtcr_crypto_dev *cdev);
+
+extern void rtcr_prepare_request(struct rtcr_crypto_dev *cdev);
+extern void rtcr_finish_request(struct rtcr_crypto_dev *cdev, int opmode,
+ int totallen);
+extern int rtcr_wait_for_request(struct rtcr_crypto_dev *cdev, int idx);
+
+extern inline int rtcr_inc_src_idx(int idx, int cnt);
+extern inline int rtcr_inc_dst_idx(int idx, int cnt);
+#endif
--
2.38.1

2022-12-31 16:32:32

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 2/6] crypto/realtek: core functions

Add core functions for new Realtek crypto device.

Signed-off-by: Markus Stockhausen <[email protected]>
---
drivers/crypto/realtek/realtek_crypto.c | 475 ++++++++++++++++++++++++
1 file changed, 475 insertions(+)
create mode 100644 drivers/crypto/realtek/realtek_crypto.c

diff --git a/drivers/crypto/realtek/realtek_crypto.c b/drivers/crypto/realtek/realtek_crypto.c
new file mode 100644
index 000000000000..c325625d8e11
--- /dev/null
+++ b/drivers/crypto/realtek/realtek_crypto.c
@@ -0,0 +1,475 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Crypto acceleration support for Realtek crypto engine. Based on ideas from
+ * Rockchip & SafeXcel driver plus Realtek OpenWrt RTK.
+ *
+ * Copyright (c) 2022, Markus Stockhausen <[email protected]>
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <linux/of_irq.h>
+#include <linux/platform_device.h>
+
+#include "realtek_crypto.h"
+
+inline int rtcr_inc_src_idx(int idx, int cnt)
+{
+ return (idx + cnt) & (RTCR_SRC_RING_SIZE - 1);
+}
+
+inline int rtcr_inc_dst_idx(int idx, int cnt)
+{
+ return (idx + cnt) & (RTCR_DST_RING_SIZE - 1);
+}
+
+inline int rtcr_inc_buf_idx(int idx, int cnt)
+{
+ return (idx + cnt) & (RTCR_BUF_RING_SIZE - 1);
+}
+
+inline int rtcr_space_plus_pad(int len)
+{
+ return (len + 31) & ~31;
+}
+
+int rtcr_alloc_ring(struct rtcr_crypto_dev *cdev, int srclen, int *srcidx,
+ int *dstidx, int buflen, char **buf)
+{
+ int srcfree, dstfree, buffree, bufidx;
+ int srcalloc = (srclen + 1) & ~1, bufalloc = 0;
+ int ret = -ENOSPC;
+
+ spin_lock(&cdev->ringlock);
+
+ bufidx = cdev->cpu_buf_idx;
+ if (buflen > 0) {
+ bufalloc = rtcr_space_plus_pad(buflen);
+ if (bufidx + bufalloc > RTCR_BUF_RING_SIZE) {
+ if (unlikely(cdev->cpu_buf_idx > bufidx)) {
+ dev_err(cdev->dev, "buffer ring full\n");
+ goto err_nospace;
+ }
+ /* end of buffer is free but too small, skip it */
+ bufidx = 0;
+ }
+ }
+
+ srcfree = rtcr_inc_src_idx(cdev->pp_src_idx - cdev->cpu_src_idx, -1);
+ dstfree = rtcr_inc_dst_idx(cdev->pp_dst_idx - cdev->cpu_dst_idx, -1);
+ buffree = rtcr_inc_buf_idx(cdev->pp_buf_idx - bufidx, -1);
+
+ if (unlikely(srcfree < srcalloc)) {
+ dev_err(cdev->dev, "source ring full\n");
+ goto err_nospace;
+ }
+ if (unlikely(dstfree < 1)) {
+ dev_err(cdev->dev, "destination ring full\n");
+ goto err_nospace;
+ }
+ if (unlikely(buffree < bufalloc)) {
+ dev_err(cdev->dev, "buffer ring full\n");
+ goto err_nospace;
+ }
+
+ *srcidx = cdev->cpu_src_idx;
+ cdev->cpu_src_idx = rtcr_inc_src_idx(cdev->cpu_src_idx, srcalloc);
+
+ *dstidx = cdev->cpu_dst_idx;
+ cdev->cpu_dst_idx = rtcr_inc_dst_idx(cdev->cpu_dst_idx, 1);
+
+ ret = 0;
+ cdev->wbk_ring[*dstidx].len = buflen;
+ if (buflen > 0) {
+ *buf = &cdev->buf_ring[bufidx];
+ cdev->wbk_ring[*dstidx].src = *buf;
+ cdev->cpu_buf_idx = rtcr_inc_buf_idx(bufidx, bufalloc);
+ }
+
+err_nospace:
+ spin_unlock(&cdev->ringlock);
+
+ return ret;
+}
+
+static inline void rtcr_ack_irq(struct rtcr_crypto_dev *cdev)
+{
+ int v = ioread32(cdev->base + RTCR_REG_CMD);
+
+ if (unlikely((v != RTCR_CMD_DDOKIP) && v))
+ dev_err(cdev->dev, "unexpected IRQ result 0x%08x\n", v);
+ v = RTCR_CMD_SDUEIP | RTCR_CMD_SDLEIP | RTCR_CMD_DDUEIP |
+ RTCR_CMD_DDOKIP | RTCR_CMD_DABFIP;
+
+ iowrite32(v, cdev->base + RTCR_REG_CMD);
+}
+
+static void rtcr_done_task(unsigned long data)
+{
+ struct rtcr_crypto_dev *cdev = (struct rtcr_crypto_dev *)data;
+ int stop_src_idx, stop_dst_idx, idx, len;
+ struct scatterlist *sg;
+ unsigned long flags;
+
+ spin_lock_irqsave(&cdev->asiclock, flags);
+ stop_src_idx = cdev->asic_src_idx;
+ stop_dst_idx = cdev->asic_dst_idx;
+ spin_unlock_irqrestore(&cdev->asiclock, flags);
+
+ idx = cdev->pp_dst_idx;
+
+ while (idx != stop_dst_idx) {
+ len = cdev->wbk_ring[idx].len;
+ switch (len) {
+ case RTCR_WB_LEN_SG_DIRECT:
+ /* already written to the destination by the engine */
+ break;
+ case RTCR_WB_LEN_HASH:
+ /* write back hash from destination ring */
+ memcpy(cdev->wbk_ring[idx].dst,
+ cdev->dst_ring[idx].vector,
+ RTCR_HASH_VECTOR_SIZE);
+ break;
+ default:
+ /* write back data from buffer */
+ sg = (struct scatterlist *)cdev->wbk_ring[idx].dst;
+ sg_pcopy_from_buffer(sg, sg_nents(sg),
+ cdev->wbk_ring[idx].src,
+ len, cdev->wbk_ring[idx].off);
+ len = rtcr_space_plus_pad(len);
+ cdev->pp_buf_idx = ((char *)cdev->wbk_ring[idx].src - cdev->buf_ring) + len;
+ }
+
+ cdev->wbk_ring[idx].len = RTCR_WB_LEN_DONE;
+ idx = rtcr_inc_dst_idx(idx, 1);
+ }
+
+ wake_up_all(&cdev->done_queue);
+ cdev->pp_src_idx = stop_src_idx;
+ cdev->pp_dst_idx = stop_dst_idx;
+}
+
+static irqreturn_t rtcr_handle_irq(int irq, void *dev_id)
+{
+ struct rtcr_crypto_dev *cdev = dev_id;
+ u32 p;
+
+ spin_lock(&cdev->asiclock);
+
+ rtcr_ack_irq(cdev);
+ cdev->busy = false;
+
+ p = (u32)phys_to_virt((u32)ioread32(cdev->base + RTCR_REG_SRC));
+ cdev->asic_src_idx = (p - (u32)cdev->src_ring) / RTCR_SRC_DESC_SIZE;
+
+ p = (u32)phys_to_virt((u32)ioread32(cdev->base + RTCR_REG_DST));
+ cdev->asic_dst_idx = (p - (u32)cdev->dst_ring) / RTCR_DST_DESC_SIZE;
+
+ tasklet_schedule(&cdev->done_task);
+ spin_unlock(&cdev->asiclock);
+
+ return IRQ_HANDLED;
+}
+
+void rtcr_add_src_ahash_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ int opmode, int totallen)
+{
+ dma_addr_t dma = cdev->src_dma + idx * RTCR_SRC_DESC_SIZE;
+ struct rtcr_src_desc *src = &cdev->src_ring[idx];
+
+ src->len = totallen;
+ src->opmode = opmode | RTCR_SRC_OP_FS |
+ RTCR_SRC_OP_DUMMY_LEN | RTCR_SRC_OP_OWN_ASIC |
+ RTCR_SRC_OP_CALC_EOR(idx);
+
+ dma_sync_single_for_device(cdev->dev, dma, RTCR_SRC_DESC_SIZE, DMA_TO_DEVICE);
+}
+
+void rtcr_add_src_skcipher_to_ring(struct rtcr_crypto_dev *cdev, int idx,
+ int opmode, int totallen,
+ struct rtcr_skcipher_ctx *sctx)
+{
+ dma_addr_t dma = cdev->src_dma + idx * RTCR_SRC_DESC_SIZE;
+ struct rtcr_src_desc *src = &cdev->src_ring[idx];
+
+ src->len = totallen;
+ if (opmode & RTCR_SRC_OP_KAM_ENC)
+ src->paddr = sctx->keydma;
+ else
+ src->paddr = sctx->keydma + AES_KEYSIZE_256;
+
+ src->opmode = RTCR_SRC_OP_FS | RTCR_SRC_OP_OWN_ASIC |
+ RTCR_SRC_OP_MS_CRYPTO | RTCR_SRC_OP_CRYPT_ECB |
+ RTCR_SRC_OP_CALC_EOR(idx) | opmode | sctx->keylen;
+
+ dma_sync_single_for_device(cdev->dev, dma, RTCR_SRC_DESC_SIZE, DMA_TO_DEVICE);
+}
+
+void rtcr_add_src_to_ring(struct rtcr_crypto_dev *cdev, int idx, void *vaddr,
+ int blocklen, int totallen)
+{
+ dma_addr_t dma = cdev->src_dma + idx * RTCR_SRC_DESC_SIZE;
+ struct rtcr_src_desc *src = &cdev->src_ring[idx];
+
+ src->len = totallen;
+ src->paddr = virt_to_phys(vaddr);
+ src->opmode = RTCR_SRC_OP_OWN_ASIC | RTCR_SRC_OP_CALC_EOR(idx) | blocklen;
+
+ dma_sync_single_for_device(cdev->dev, dma, RTCR_SRC_DESC_SIZE, DMA_BIDIRECTIONAL);
+}
+
+inline void rtcr_add_src_pad_to_ring(struct rtcr_crypto_dev *cdev, int idx, int len)
+{
+ /* align 16 byte source descriptors with 32 byte cache lines */
+ if (!(idx & 1))
+ rtcr_add_src_to_ring(cdev, idx + 1, NULL, 0, len);
+}
+
+void rtcr_add_dst_to_ring(struct rtcr_crypto_dev *cdev, int idx, void *reqdst,
+ int reqlen, void *wbkdst, int wbkoff)
+{
+ dma_addr_t dma = cdev->dst_dma + idx * RTCR_DST_DESC_SIZE;
+ struct rtcr_dst_desc *dst = &cdev->dst_ring[idx];
+ struct rtcr_wbk_desc *wbk = &cdev->wbk_ring[idx];
+
+ dst->paddr = virt_to_phys(reqdst);
+ dst->opmode = RTCR_DST_OP_OWN_ASIC | RTCR_DST_OP_CALC_EOR(idx) | reqlen;
+
+ wbk->dst = wbkdst;
+ wbk->off = wbkoff;
+
+ dma_sync_single_for_device(cdev->dev, dma, RTCR_DST_DESC_SIZE, DMA_BIDIRECTIONAL);
+}
+
+inline int rtcr_wait_for_request(struct rtcr_crypto_dev *cdev, int idx)
+{
+ int *len = &cdev->wbk_ring[idx].len;
+
+ wait_event(cdev->done_queue, *len == RTCR_WB_LEN_DONE);
+ return 0;
+}
+
+void rtcr_kick_engine(struct rtcr_crypto_dev *cdev)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&cdev->asiclock, flags);
+
+ if (!cdev->busy) {
+ cdev->busy = true;
+ /* engine needs up to 5us to reset poll bit */
+ iowrite32(RTCR_CMD_POLL, cdev->base + RTCR_REG_CMD);
+ }
+
+ spin_unlock_irqrestore(&cdev->asiclock, flags);
+}
+
+static struct rtcr_alg_template *rtcr_algs[] = {
+ &rtcr_ahash_md5,
+ &rtcr_ahash_sha1,
+ &rtcr_skcipher_ecb_aes,
+ &rtcr_skcipher_cbc_aes,
+ &rtcr_skcipher_ctr_aes,
+};
+
+static void rtcr_unregister_algorithms(int end)
+{
+ int i;
+
+ for (i = 0; i < end; i++) {
+ if (rtcr_algs[i]->type == RTCR_ALG_SKCIPHER)
+ crypto_unregister_skcipher(&rtcr_algs[i]->alg.skcipher);
+ else
+ crypto_unregister_ahash(&rtcr_algs[i]->alg.ahash);
+ }
+}
+
+static int rtcr_register_algorithms(struct rtcr_crypto_dev *cdev)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < ARRAY_SIZE(rtcr_algs); i++) {
+ rtcr_algs[i]->cdev = cdev;
+ if (rtcr_algs[i]->type == RTCR_ALG_SKCIPHER)
+ ret = crypto_register_skcipher(&rtcr_algs[i]->alg.skcipher);
+ else {
+ rtcr_algs[i]->alg.ahash.halg.statesize =
+ max(sizeof(struct rtcr_ahash_req),
+ offsetof(struct rtcr_ahash_req, vector) +
+ sizeof(union rtcr_fallback_state));
+ ret = crypto_register_ahash(&rtcr_algs[i]->alg.ahash);
+ }
+ if (ret)
+ goto err_cipher_algs;
+ }
+
+ return 0;
+
+err_cipher_algs:
+ rtcr_unregister_algorithms(i);
+
+ return ret;
+}
+
+static void rtcr_init_engine(struct rtcr_crypto_dev *cdev)
+{
+ int v;
+
+ v = ioread32(cdev->base + RTCR_REG_CMD);
+ v |= RTCR_CMD_SRST;
+ iowrite32(v, cdev->base + RTCR_REG_CMD);
+
+ usleep_range(10000, 20000);
+
+ iowrite32(RTCR_CTR_CKE | RTCR_CTR_SDM16 | RTCR_CTR_DDM16 |
+ RTCR_CTR_SDUEIE | RTCR_CTR_SDLEIE | RTCR_CTR_DDUEIE |
+ RTCR_CTR_DDOKIE | RTCR_CTR_DABFIE, cdev->base + RTCR_REG_CTR);
+
+ rtcr_ack_irq(cdev);
+ usleep_range(10000, 20000);
+}
+
+static void rtcr_exit_engine(struct rtcr_crypto_dev *cdev)
+{
+ iowrite32(0, cdev->base + RTCR_REG_CTR);
+}
+
+static void rtcr_init_rings(struct rtcr_crypto_dev *cdev)
+{
+ iowrite32(cdev->src_dma, cdev->base + RTCR_REG_SRC);
+ iowrite32(cdev->dst_dma, cdev->base + RTCR_REG_DST);
+
+ cdev->asic_dst_idx = cdev->asic_src_idx = 0;
+ cdev->cpu_src_idx = cdev->cpu_dst_idx = cdev->cpu_buf_idx = 0;
+ cdev->pp_src_idx = cdev->pp_dst_idx = cdev->pp_buf_idx = 0;
+}
+
+static int rtcr_crypto_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct rtcr_crypto_dev *cdev;
+ unsigned long flags = 0;
+ struct resource *res;
+ void __iomem *base;
+ int irq, ret;
+
+#ifdef CONFIG_MIPS
+ if ((cpu_dcache_line_size() != 16) && (cpu_dcache_line_size() != 32)) {
+ dev_err(dev, "cache line size not 16 or 32 bytes\n");
+ ret = -EINVAL;
+ goto err_map;
+ }
+#endif
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(dev, "no IO address given\n");
+ ret = -ENODEV;
+ goto err_map;
+ }
+
+ base = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR_OR_NULL(base)) {
+ dev_err(dev, "failed to map IO address\n");
+ ret = -EINVAL;
+ goto err_map;
+ }
+
+ cdev = devm_kzalloc(dev, sizeof(*cdev), GFP_KERNEL);
+ if (!cdev) {
+ dev_err(dev, "failed to allocate device memory\n");
+ ret = -ENOMEM;
+ goto err_mem;
+ }
+
+ irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
+ if (!irq) {
+ dev_err(dev, "failed to determine device interrupt\n");
+ ret = -EINVAL;
+ goto err_of_irq;
+ }
+
+ if (devm_request_irq(dev, irq, rtcr_handle_irq, flags, "realtek-crypto", cdev)) {
+ dev_err(dev, "failed to request device interrupt\n");
+ ret = -ENXIO;
+ goto err_request_irq;
+ }
+
+ platform_set_drvdata(pdev, cdev);
+ cdev->base = base;
+ cdev->dev = dev;
+ cdev->irq = irq;
+ cdev->pdev = pdev;
+
+ cdev->src_dma = dma_map_single(cdev->dev, cdev->src_ring,
+ RTCR_SRC_DESC_SIZE * RTCR_SRC_RING_SIZE, DMA_BIDIRECTIONAL);
+ cdev->dst_dma = dma_map_single(cdev->dev, cdev->dst_ring,
+ RTCR_DST_DESC_SIZE * RTCR_DST_RING_SIZE, DMA_BIDIRECTIONAL);
+
+ dma_map_single(dev, (void *)empty_zero_page, PAGE_SIZE, DMA_TO_DEVICE);
+
+ init_waitqueue_head(&cdev->done_queue);
+ tasklet_init(&cdev->done_task, rtcr_done_task, (unsigned long)cdev);
+ spin_lock_init(&cdev->ringlock);
+ spin_lock_init(&cdev->asiclock);
+
+ /* Init engine first as it resets the ring pointers */
+ rtcr_init_engine(cdev);
+ rtcr_init_rings(cdev);
+ rtcr_register_algorithms(cdev);
+
+ dev_info(dev, "%d KB buffer, max %d requests of up to %d bytes\n",
+ RTCR_BUF_RING_SIZE / 1024, RTCR_DST_RING_SIZE,
+ RTCR_MAX_REQ_SIZE);
+ dev_info(dev, "ready for AES/SHA1/MD5 crypto acceleration\n");
+
+ return 0;
+
+err_request_irq:
+ irq_dispose_mapping(irq);
+err_of_irq:
+ kfree(cdev);
+err_mem:
+ iounmap(base);
+err_map:
+ return ret;
+}
+
+static int rtcr_crypto_remove(struct platform_device *pdev)
+{
+ struct rtcr_crypto_dev *cdev = platform_get_drvdata(pdev);
+
+ dma_unmap_single(cdev->dev, cdev->src_dma,
+ RTCR_SRC_DESC_SIZE * RTCR_SRC_RING_SIZE, DMA_BIDIRECTIONAL);
+ dma_unmap_single(cdev->dev, cdev->dst_dma,
+ RTCR_DST_DESC_SIZE * RTCR_DST_RING_SIZE, DMA_BIDIRECTIONAL);
+
+ rtcr_exit_engine(cdev);
+ rtcr_unregister_algorithms(ARRAY_SIZE(rtcr_algs));
+ tasklet_kill(&cdev->done_task);
+ return 0;
+}
+
+static const struct of_device_id rtcr_id_table[] = {
+ { .compatible = "realtek,realtek-crypto" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, rtcr_id_table);
+
+static struct platform_driver rtcr_driver = {
+ .probe = rtcr_crypto_probe,
+ .remove = rtcr_crypto_remove,
+ .driver = {
+ .name = "realtek-crypto",
+ .of_match_table = rtcr_id_table,
+ },
+};
+
+module_platform_driver(rtcr_driver);
+
+MODULE_AUTHOR("Markus Stockhausen <[email protected]>");
+MODULE_DESCRIPTION("Support for Realtek's cryptographic engine");
+MODULE_LICENSE("GPL");
--
2.38.1

2022-12-31 16:32:36

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 6/6] crypto/realtek: add devicetree documentation

Add devicetree documentation of new Realtek crypto device.

Signed-off-by: Markus Stockhausen <[email protected]>
---
.../crypto/realtek,realtek-crypto.yaml | 51 +++++++++++++++++++
1 file changed, 51 insertions(+)
create mode 100644 Documentation/devicetree/bindings/crypto/realtek,realtek-crypto.yaml

diff --git a/Documentation/devicetree/bindings/crypto/realtek,realtek-crypto.yaml b/Documentation/devicetree/bindings/crypto/realtek,realtek-crypto.yaml
new file mode 100644
index 000000000000..443195e2d850
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/realtek,realtek-crypto.yaml
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/crypto/realtek,realtek-crypto.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek crypto engine bindings
+
+maintainers:
+ - Markus Stockhausen <[email protected]>
+
+description: |
+ The Realtek crypto engine provides hardware accelerated AES, SHA1 & MD5
+ algorithms. It is included in SoCs of the RTL838x series, such as RTL8380,
+ RTL8381, RTL8382, as well as SoCs from the RTL930x series, such as RTL9301,
+ RTL9302 and RTL9303.
+
+properties:
+ compatible:
+ const: realtek,realtek-crypto
+
+ reg:
+ minItems: 1
+ maxItems: 1
+
+ interrupt-parent:
+ minItems: 1
+ maxItems: 1
+
+ interrupts:
+ minItems: 1
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupt-parent
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ crypto0: [email protected] {
+ compatible = "realtek,realtek-crypto";
+ reg = <0xc000 0x10>;
+ interrupt-parent = <&intc>;
+ interrupts = <22 3>;
+ };
+
+...
--
2.38.1

2022-12-31 16:32:36

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 5/6] crypto/realtek: enable module

Add new Realtek crypto device to the kernel configuration.

Signed-off-by: Markus Stockhausen <[email protected]>
---
drivers/crypto/Kconfig | 13 +++++++++++++
drivers/crypto/Makefile | 1 +
drivers/crypto/realtek/Makefile | 5 +++++
3 files changed, 19 insertions(+)
create mode 100644 drivers/crypto/realtek/Makefile

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 55e75fbb658e..990a74f7ad97 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -666,6 +666,19 @@ config CRYPTO_DEV_IMGTEC_HASH
hardware hash accelerator. Supporting MD5/SHA1/SHA224/SHA256
hashing algorithms.

+config CRYPTO_DEV_REALTEK
+ tristate "Realtek's Cryptographic Engine driver"
+ depends on OF && MIPS && CPU_BIG_ENDIAN
+ select CRYPTO_MD5
+ select CRYPTO_SHA1
+ select CRYPTO_AES
+ help
+ This driver adds support for the Realtek crypto engine. It provides
+ hardware accelerated AES, SHA1 & MD5 algorithms. It is included in
+ SoCs of the RTL838x series, such as RTL8380, RTL8381, RTL8382, as
+ well as SoCs from the RTL930x series, such as RTL9301, RTL9302 and
+ RTL9303.
+
config CRYPTO_DEV_ROCKCHIP
tristate "Rockchip's Cryptographic Engine driver"
depends on OF && ARCH_ROCKCHIP
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 116de173a66c..df4b4b7d7302 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -36,6 +36,7 @@ obj-$(CONFIG_CRYPTO_DEV_PPC4XX) += amcc/
obj-$(CONFIG_CRYPTO_DEV_QAT) += qat/
obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
obj-$(CONFIG_CRYPTO_DEV_QCOM_RNG) += qcom-rng.o
+obj-$(CONFIG_CRYPTO_DEV_REALTEK) += realtek/
obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
obj-$(CONFIG_CRYPTO_DEV_S5P) += s5p-sss.o
obj-$(CONFIG_CRYPTO_DEV_SA2UL) += sa2ul.o
diff --git a/drivers/crypto/realtek/Makefile b/drivers/crypto/realtek/Makefile
new file mode 100644
index 000000000000..8d973bf1d520
--- /dev/null
+++ b/drivers/crypto/realtek/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_CRYPTO_DEV_REALTEK) += rtl_crypto.o
+rtl_crypto-objs := realtek_crypto.o \
+ realtek_crypto_skcipher.o \
+ realtek_crypto_ahash.o
--
2.38.1

2022-12-31 16:32:36

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v3 3/6] crypto/realtek: hash algorithms

Add md5/sha1 hash algorithms for new Realtek crypto device.

Signed-off-by: Markus Stockhausen <[email protected]>
---
drivers/crypto/realtek/realtek_crypto_ahash.c | 412 ++++++++++++++++++
1 file changed, 412 insertions(+)
create mode 100644 drivers/crypto/realtek/realtek_crypto_ahash.c

diff --git a/drivers/crypto/realtek/realtek_crypto_ahash.c b/drivers/crypto/realtek/realtek_crypto_ahash.c
new file mode 100644
index 000000000000..445c13dedaf7
--- /dev/null
+++ b/drivers/crypto/realtek/realtek_crypto_ahash.c
@@ -0,0 +1,412 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Crypto acceleration support for Realtek crypto engine. Based on ideas from
+ * Rockchip & SafeXcel driver plus Realtek OpenWrt RTK.
+ *
+ * Copyright (c) 2022, Markus Stockhausen <[email protected]>
+ */
+
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <linux/dma-mapping.h>
+
+#include "realtek_crypto.h"
+
+static inline struct ahash_request *fallback_request_ctx(struct ahash_request *areq)
+{
+ char *p = (char *)ahash_request_ctx(areq);
+
+ return (struct ahash_request *)(p + offsetof(struct rtcr_ahash_req, vector));
+}
+
+static inline void *fallback_export_state(void *export)
+{
+ char *p = (char *)export;
+
+ return (void *)(p + offsetof(struct rtcr_ahash_req, vector));
+}
+
+static int rtcr_process_hash(struct ahash_request *areq, int opmode)
+{
+ unsigned int len, nextbuflen, datalen, padlen, reqlen, sgmap = 0;
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+ struct rtcr_ahash_ctx *hctx = crypto_ahash_ctx(tfm);
+ int sgcnt = hreq->state & RTCR_REQ_SG_MASK;
+ struct rtcr_crypto_dev *cdev = hctx->cdev;
+ struct scatterlist *sg = areq->src;
+ int idx, srcidx, dstidx, ret;
+ u64 pad[RTCR_HASH_PAD_SIZE];
+ dma_addr_t paddma, bufdma;
+ char *ppad;
+
+ /* Quick checks if processing is really needed */
+ if (unlikely(!areq->nbytes) && !(opmode & RTCR_HASH_FINAL))
+ return 0;
+
+ if (hreq->buflen + areq->nbytes < 64 && !(opmode & RTCR_HASH_FINAL)) {
+ hreq->buflen += sg_pcopy_to_buffer(areq->src, sg_nents(areq->src),
+ hreq->buf + hreq->buflen,
+ areq->nbytes, 0);
+ return 0;
+ }
+
+ /* calculate required parts of the request */
+ datalen = (opmode & RTCR_HASH_UPDATE) ? areq->nbytes : 0;
+ if (opmode & RTCR_HASH_FINAL) {
+ nextbuflen = 0;
+ padlen = 64 - ((hreq->buflen + datalen) & 63);
+ if (padlen < 9)
+ padlen += 64;
+ hreq->totallen += hreq->buflen + datalen;
+
+ memset(pad, 0, sizeof(pad) - sizeof(u64));
+ ppad = (char *)&pad[RTCR_HASH_PAD_SIZE] - padlen;
+ *ppad = 0x80;
+ pad[RTCR_HASH_PAD_SIZE - 1] = hreq->state & RTCR_REQ_MD5 ?
+ cpu_to_le64(hreq->totallen << 3) :
+ cpu_to_be64(hreq->totallen << 3);
+ } else {
+ nextbuflen = (hreq->buflen + datalen) & 63;
+ padlen = 0;
+ datalen -= nextbuflen;
+ hreq->totallen += hreq->buflen + datalen;
+ }
+ reqlen = hreq->buflen + datalen + padlen;
+
+ /* Write back any uncommitted data to memory. */
+ if (hreq->buflen)
+ bufdma = dma_map_single(cdev->dev, hreq->buf, hreq->buflen, DMA_TO_DEVICE);
+ if (padlen)
+ paddma = dma_map_single(cdev->dev, ppad, padlen, DMA_TO_DEVICE);
+ if (datalen)
+ sgmap = dma_map_sg(cdev->dev, sg, sgcnt, DMA_TO_DEVICE);
+
+ /* Get free space in the ring */
+ sgcnt = 1 + (hreq->buflen ? 1 : 0) + (datalen ? sgcnt : 0) + (padlen ? 1 : 0);
+
+ ret = rtcr_alloc_ring(cdev, sgcnt, &srcidx, &dstidx, RTCR_WB_LEN_HASH, NULL);
+ if (ret)
+ return ret;
+ /*
+ * Feed input data into the rings. Start with destination ring and fill
+ * source ring afterwards. Ensure that the owner flag of the first source
+ * ring is the last that becomes visible to the engine.
+ */
+ rtcr_add_dst_to_ring(cdev, dstidx, NULL, 0, hreq->vector, 0);
+
+ idx = srcidx;
+ if (hreq->buflen) {
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, hreq->buf, hreq->buflen, reqlen);
+ }
+
+ while (datalen) {
+ len = min(sg_dma_len(sg), datalen);
+
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, sg_virt(sg), len, reqlen);
+
+ datalen -= len;
+ if (datalen)
+ sg = sg_next(sg);
+ }
+
+ if (padlen) {
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, ppad, padlen, reqlen);
+ }
+
+ rtcr_add_src_pad_to_ring(cdev, idx, reqlen);
+ rtcr_add_src_ahash_to_ring(cdev, srcidx, hctx->opmode, reqlen);
+
+ /* Off we go */
+ rtcr_kick_engine(cdev);
+ if (rtcr_wait_for_request(cdev, dstidx))
+ return -EINVAL;
+
+ if (sgmap)
+ dma_unmap_sg(cdev->dev, sg, sgcnt, DMA_TO_DEVICE);
+ if (hreq->buflen)
+ dma_unmap_single(cdev->dev, bufdma, hreq->buflen, DMA_TO_DEVICE);
+ if (nextbuflen)
+ sg_pcopy_to_buffer(sg, sg_nents(sg), hreq->buf, nextbuflen, len);
+ if (padlen) {
+ dma_unmap_single(cdev->dev, paddma, padlen, DMA_TO_DEVICE);
+ memcpy(areq->result, hreq->vector, crypto_ahash_digestsize(tfm));
+ }
+
+ hreq->state |= RTCR_REQ_FB_ACT;
+ hreq->buflen = nextbuflen;
+
+ return 0;
+}
+
+static void rtcr_check_request(struct ahash_request *areq, int opmode)
+{
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ struct scatterlist *sg = areq->src;
+ int reqlen, sgcnt, sgmax;
+
+ if (hreq->state & RTCR_REQ_FB_ACT)
+ return;
+
+ if (reqlen > RTCR_MAX_REQ_SIZE) {
+ hreq->state |= RTCR_REQ_FB_ACT;
+ return;
+ }
+
+ sgcnt = 0;
+ sgmax = RTCR_MAX_SG_AHASH - (hreq->buflen ? 1 : 0);
+ reqlen = areq->nbytes;
+ if (!(opmode & RTCR_HASH_FINAL)) {
+ reqlen -= (hreq->buflen + reqlen) & 63;
+ sgmax--;
+ }
+
+ while (reqlen > 0) {
+ reqlen -= sg_dma_len(sg);
+ sgcnt++;
+ sg = sg_next(sg);
+ }
+
+ if (sgcnt > sgmax)
+ hreq->state |= RTCR_REQ_FB_ACT;
+ else
+ hreq->state = (hreq->state & ~RTCR_REQ_SG_MASK) | sgcnt;
+}
+
+static bool rtcr_check_fallback(struct ahash_request *areq)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+ struct rtcr_ahash_ctx *hctx = crypto_ahash_ctx(tfm);
+ union rtcr_fallback_state state;
+
+ if (!(hreq->state & RTCR_REQ_FB_ACT))
+ return false;
+
+ if (!(hreq->state & RTCR_REQ_FB_RDY)) {
+ /* Convert state to generic fallback state */
+ if (hreq->state & RTCR_REQ_MD5) {
+ memcpy(state.md5.hash, hreq->vector, MD5_DIGEST_SIZE);
+ if (hreq->totallen)
+ cpu_to_le32_array(state.md5.hash, 4);
+ memcpy(state.md5.block, hreq->buf, SHA1_BLOCK_SIZE);
+ state.md5.byte_count = hreq->totallen + (u64)hreq->buflen;
+ } else {
+ memcpy(state.sha1.state, hreq->vector, SHA1_DIGEST_SIZE);
+ memcpy(state.sha1.buffer, &hreq->buf, SHA1_BLOCK_SIZE);
+ state.sha1.count = hreq->totallen + (u64)hreq->buflen;
+ }
+ }
+
+ ahash_request_set_tfm(freq, hctx->fback);
+ ahash_request_set_crypt(freq, areq->src, areq->result, areq->nbytes);
+
+ if (!(hreq->state & RTCR_REQ_FB_RDY)) {
+ crypto_ahash_import(freq, &state);
+ hreq->state |= RTCR_REQ_FB_RDY;
+ }
+
+ return true;
+}
+
+static int rtcr_ahash_init(struct ahash_request *areq)
+{
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+ int ds = crypto_ahash_digestsize(tfm);
+
+ memset(hreq, 0, sizeof(*hreq));
+
+ hreq->vector[0] = SHA1_H0;
+ hreq->vector[1] = SHA1_H1;
+ hreq->vector[2] = SHA1_H2;
+ hreq->vector[3] = SHA1_H3;
+ hreq->vector[4] = SHA1_H4;
+
+ hreq->state |= (ds == MD5_DIGEST_SIZE) ? RTCR_REQ_MD5 : RTCR_REQ_SHA1;
+
+ return 0;
+}
+
+static int rtcr_ahash_update(struct ahash_request *areq)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+
+ rtcr_check_request(areq, RTCR_HASH_UPDATE);
+ if (rtcr_check_fallback(areq))
+ return crypto_ahash_update(freq);
+ return rtcr_process_hash(areq, RTCR_HASH_UPDATE);
+}
+
+static int rtcr_ahash_final(struct ahash_request *areq)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+
+ if (rtcr_check_fallback(areq))
+ return crypto_ahash_final(freq);
+
+ return rtcr_process_hash(areq, RTCR_HASH_FINAL);
+}
+
+static int rtcr_ahash_finup(struct ahash_request *areq)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+
+ rtcr_check_request(areq, RTCR_HASH_FINAL | RTCR_HASH_UPDATE);
+ if (rtcr_check_fallback(areq))
+ return crypto_ahash_finup(freq);
+
+ return rtcr_process_hash(areq, RTCR_HASH_FINAL | RTCR_HASH_UPDATE);
+}
+
+static int rtcr_ahash_digest(struct ahash_request *areq)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+ int ret;
+
+ ret = rtcr_ahash_init(areq);
+ if (ret)
+ return ret;
+
+ rtcr_check_request(areq, RTCR_HASH_FINAL | RTCR_HASH_UPDATE);
+ if (rtcr_check_fallback(areq))
+ return crypto_ahash_digest(freq);
+
+ return rtcr_process_hash(areq, RTCR_HASH_FINAL | RTCR_HASH_UPDATE);
+}
+
+static int rtcr_ahash_import(struct ahash_request *areq, const void *in)
+{
+ const void *fexp = (const void *)fallback_export_state((void *)in);
+ struct ahash_request *freq = fallback_request_ctx(areq);
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ const struct rtcr_ahash_req *hexp = in;
+
+ hreq->state = get_unaligned(&hexp->state);
+ if (hreq->state & RTCR_REQ_FB_ACT)
+ hreq->state |= RTCR_REQ_FB_RDY;
+
+ if (rtcr_check_fallback(areq))
+ return crypto_ahash_import(freq, fexp);
+
+ memcpy(hreq, hexp, sizeof(struct rtcr_ahash_req));
+
+ return 0;
+}
+
+static int rtcr_ahash_export(struct ahash_request *areq, void *out)
+{
+ struct ahash_request *freq = fallback_request_ctx(areq);
+ struct rtcr_ahash_req *hreq = ahash_request_ctx(areq);
+ void *fexp = fallback_export_state(out);
+ struct rtcr_ahash_req *hexp = out;
+
+ if (rtcr_check_fallback(areq)) {
+ put_unaligned(hreq->state, &hexp->state);
+ return crypto_ahash_export(freq, fexp);
+ }
+
+ memcpy(hexp, hreq, sizeof(struct rtcr_ahash_req));
+
+ return 0;
+}
+
+static int rtcr_ahash_cra_init(struct crypto_tfm *tfm)
+{
+ struct crypto_ahash *ahash = __crypto_ahash_cast(tfm);
+ struct rtcr_ahash_ctx *hctx = crypto_tfm_ctx(tfm);
+ struct rtcr_crypto_dev *cdev = hctx->cdev;
+ struct rtcr_alg_template *tmpl;
+
+ tmpl = container_of(__crypto_ahash_alg(tfm->__crt_alg),
+ struct rtcr_alg_template, alg.ahash);
+
+ hctx->cdev = tmpl->cdev;
+ hctx->opmode = tmpl->opmode;
+ hctx->fback = crypto_alloc_ahash(crypto_tfm_alg_name(tfm), 0,
+ CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+
+ if (IS_ERR(hctx->fback)) {
+ dev_err(cdev->dev, "could not allocate fallback for %s\n",
+ crypto_tfm_alg_name(tfm));
+ return PTR_ERR(hctx->fback);
+ }
+
+ crypto_ahash_set_reqsize(ahash, max(sizeof(struct rtcr_ahash_req),
+ offsetof(struct rtcr_ahash_req, vector) +
+ sizeof(struct ahash_request) +
+ crypto_ahash_reqsize(hctx->fback)));
+
+ return 0;
+}
+
+static void rtcr_ahash_cra_exit(struct crypto_tfm *tfm)
+{
+ struct rtcr_ahash_ctx *hctx = crypto_tfm_ctx(tfm);
+
+ crypto_free_ahash(hctx->fback);
+}
+
+struct rtcr_alg_template rtcr_ahash_md5 = {
+ .type = RTCR_ALG_AHASH,
+ .opmode = RTCR_SRC_OP_MS_HASH | RTCR_SRC_OP_HASH_MD5,
+ .alg.ahash = {
+ .init = rtcr_ahash_init,
+ .update = rtcr_ahash_update,
+ .final = rtcr_ahash_final,
+ .finup = rtcr_ahash_finup,
+ .export = rtcr_ahash_export,
+ .import = rtcr_ahash_import,
+ .digest = rtcr_ahash_digest,
+ .halg = {
+ .digestsize = MD5_DIGEST_SIZE,
+ /* statesize calculated during initialization */
+ .base = {
+ .cra_name = "md5",
+ .cra_driver_name = "realtek-md5",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK,
+ .cra_blocksize = SHA1_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct rtcr_ahash_ctx),
+ .cra_alignmask = 0,
+ .cra_init = rtcr_ahash_cra_init,
+ .cra_exit = rtcr_ahash_cra_exit,
+ .cra_module = THIS_MODULE,
+ }
+ }
+ }
+};
+
+struct rtcr_alg_template rtcr_ahash_sha1 = {
+ .type = RTCR_ALG_AHASH,
+ .opmode = RTCR_SRC_OP_MS_HASH | RTCR_SRC_OP_HASH_SHA1,
+ .alg.ahash = {
+ .init = rtcr_ahash_init,
+ .update = rtcr_ahash_update,
+ .final = rtcr_ahash_final,
+ .finup = rtcr_ahash_finup,
+ .export = rtcr_ahash_export,
+ .import = rtcr_ahash_import,
+ .digest = rtcr_ahash_digest,
+ .halg = {
+ .digestsize = SHA1_DIGEST_SIZE,
+ /* statesize calculated during initialization */
+ .base = {
+ .cra_name = "sha1",
+ .cra_driver_name = "realtek-sha1",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK,
+ .cra_blocksize = SHA1_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct rtcr_ahash_ctx),
+ .cra_alignmask = 0,
+ .cra_init = rtcr_ahash_cra_init,
+ .cra_exit = rtcr_ahash_cra_exit,
+ .cra_module = THIS_MODULE,
+ }
+ }
+ }
+};
--
2.38.1

2022-12-31 16:32:36

by Markus Stockhausen

[permalink] [raw]
Subject: [PATCH v4 4/6] crypto/realtek: skcipher algorithms

Add ecb(aes), cbc(aes) and ctr(aes) skcipher algorithms for
new Realtek crypto device.

Signed-off-by: Markus Stockhausen <[email protected]>
---
.../crypto/realtek/realtek_crypto_skcipher.c | 376 ++++++++++++++++++
1 file changed, 376 insertions(+)
create mode 100644 drivers/crypto/realtek/realtek_crypto_skcipher.c

diff --git a/drivers/crypto/realtek/realtek_crypto_skcipher.c b/drivers/crypto/realtek/realtek_crypto_skcipher.c
new file mode 100644
index 000000000000..8efc41485716
--- /dev/null
+++ b/drivers/crypto/realtek/realtek_crypto_skcipher.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Crypto acceleration support for Realtek crypto engine. Based on ideas from
+ * Rockchip & SafeXcel driver plus Realtek OpenWrt RTK.
+ *
+ * Copyright (c) 2022, Markus Stockhausen <[email protected]>
+ */
+
+#include <crypto/internal/skcipher.h>
+#include <linux/dma-mapping.h>
+
+#include "realtek_crypto.h"
+
+static inline void rtcr_inc_iv(u8 *iv, int cnt)
+{
+ u32 *ctr = (u32 *)iv + 4;
+ u32 old, new, carry = cnt;
+
+ /* avoid looping with crypto_inc() */
+ do {
+ old = be32_to_cpu(*--ctr);
+ new = old + carry;
+ *ctr = cpu_to_be32(new);
+ carry = (new < old) && (ctr > (u32 *)iv) ? 1 : 0;
+ } while (carry);
+}
+
+static inline void rtcr_cut_skcipher_len(int *reqlen, int opmode, u8 *iv)
+{
+ int len = min(*reqlen, RTCR_MAX_REQ_SIZE);
+
+ if (opmode & RTCR_SRC_OP_CRYPT_CTR) {
+ /* limit data as engine does not wrap around cleanly */
+ u32 ctr = be32_to_cpu(*((u32 *)iv + 3));
+ int blocks = min(~ctr, 0x3fffu) + 1;
+
+ len = min(blocks * AES_BLOCK_SIZE, len);
+ }
+
+ *reqlen = len;
+}
+
+static inline void rtcr_max_skcipher_len(int *reqlen, struct scatterlist **sg,
+ int *sgoff, int *sgcnt)
+{
+ int len, cnt, sgnoff, sgmax = RTCR_MAX_SG_SKCIPHER, datalen, maxlen = *reqlen;
+ struct scatterlist *sgn;
+
+redo:
+ datalen = cnt = 0;
+ sgnoff = *sgoff;
+ sgn = *sg;
+
+ while (sgn && (datalen < maxlen) && (cnt < sgmax)) {
+ cnt++;
+ len = min((int)sg_dma_len(sgn) - sgnoff, maxlen - datalen);
+ datalen += len;
+ if (len + sgnoff < sg_dma_len(sgn)) {
+ sgnoff = sgnoff + len;
+ break;
+ }
+ sgn = sg_next(sgn);
+ sgnoff = 0;
+ if (unlikely((cnt == sgmax) && (datalen < AES_BLOCK_SIZE))) {
+ /* expand search to get at least one block */
+ sgmax = AES_BLOCK_SIZE;
+ maxlen = min(maxlen, AES_BLOCK_SIZE);
+ }
+ }
+
+ if (unlikely((datalen < maxlen) && (datalen & (AES_BLOCK_SIZE - 1)))) {
+ /* recalculate to get aligned size */
+ maxlen = datalen & ~(AES_BLOCK_SIZE - 1);
+ goto redo;
+ }
+
+ *sg = sgn;
+ *sgoff = sgnoff;
+ *sgcnt = cnt;
+ *reqlen = datalen;
+}
+
+static int rtcr_process_skcipher(struct skcipher_request *sreq, int opmode)
+{
+ char *dataout, *iv, ivbk[AES_BLOCK_SIZE], datain[AES_BLOCK_SIZE];
+ int padlen, sgnoff, sgcnt, reqlen, ret, fblen, sgmap, sgdir;
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(sreq);
+ struct rtcr_skcipher_ctx *sctx = crypto_skcipher_ctx(tfm);
+ int totallen = sreq->cryptlen, sgoff = 0, dgoff = 0;
+ struct rtcr_crypto_dev *cdev = sctx->cdev;
+ struct scatterlist *sg = sreq->src, *sgn;
+ int idx, srcidx, dstidx, len, datalen;
+ dma_addr_t ivdma, outdma, indma;
+
+ if (!totallen)
+ return 0;
+
+ if ((totallen & (AES_BLOCK_SIZE - 1)) && (!(opmode & RTCR_SRC_OP_CRYPT_CTR)))
+ return -EINVAL;
+
+redo:
+ indma = outdma = 0;
+ sgmap = 0;
+ sgnoff = sgoff;
+ sgn = sg;
+ datalen = totallen;
+
+ /* limit input so that engine can process it */
+ rtcr_cut_skcipher_len(&datalen, opmode, sreq->iv);
+ rtcr_max_skcipher_len(&datalen, &sgn, &sgnoff, &sgcnt);
+
+ /* CTR padding */
+ padlen = (AES_BLOCK_SIZE - datalen) & (AES_BLOCK_SIZE - 1);
+ reqlen = datalen + padlen;
+
+ fblen = 0;
+ if (sgcnt > RTCR_MAX_SG_SKCIPHER) {
+ /* single AES block with too many SGs */
+ fblen = datalen;
+ sg_pcopy_to_buffer(sg, sgcnt, datain, datalen, sgoff);
+ }
+
+ if ((opmode & RTCR_SRC_OP_CRYPT_CBC) &&
+ (!(opmode & RTCR_SRC_OP_KAM_ENC))) {
+ /* CBC decryption IV might get overwritten */
+ sg_pcopy_to_buffer(sg, sgcnt, ivbk, AES_BLOCK_SIZE,
+ sgoff + datalen - AES_BLOCK_SIZE);
+ }
+
+ /* Get free space in the ring */
+ if (padlen || (datalen + dgoff > sg_dma_len(sreq->dst))) {
+ len = datalen;
+ } else {
+ len = RTCR_WB_LEN_SG_DIRECT;
+ dataout = sg_virt(sreq->dst) + dgoff;
+ }
+
+ ret = rtcr_alloc_ring(cdev, 2 + (fblen ? 1 : sgcnt) + (padlen ? 1 : 0),
+ &srcidx, &dstidx, len, &dataout);
+ if (ret)
+ return ret;
+
+ /* Write back any uncommitted data to memory */
+ if (dataout == sg_virt(sreq->src) + sgoff) {
+ sgdir = DMA_BIDIRECTIONAL;
+ sgmap = dma_map_sg(cdev->dev, sg, sgcnt, sgdir);
+ } else {
+ outdma = dma_map_single(cdev->dev, dataout, reqlen, DMA_BIDIRECTIONAL);
+ if (fblen)
+ indma = dma_map_single(cdev->dev, datain, reqlen, DMA_TO_DEVICE);
+ else {
+ sgdir = DMA_TO_DEVICE;
+ sgmap = dma_map_sg(cdev->dev, sg, sgcnt, sgdir);
+ }
+ }
+
+ if (sreq->iv)
+ ivdma = dma_map_single(cdev->dev, sreq->iv, AES_BLOCK_SIZE, DMA_TO_DEVICE);
+ /*
+ * Feed input data into the rings. Start with destination ring and fill
+ * source ring afterwards. Ensure that the owner flag of the first source
+ * ring is the last that becomes visible to the engine.
+ */
+ rtcr_add_dst_to_ring(cdev, dstidx, dataout, reqlen, sreq->dst, dgoff);
+
+ idx = rtcr_inc_src_idx(srcidx, 1);
+ rtcr_add_src_to_ring(cdev, idx, sreq->iv, AES_BLOCK_SIZE, reqlen);
+
+ if (fblen) {
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, (void *)datain, fblen, reqlen);
+ }
+
+ datalen -= fblen;
+ while (datalen) {
+ len = min((int)sg_dma_len(sg) - sgoff, datalen);
+
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, sg_virt(sg) + sgoff, len, reqlen);
+
+ datalen -= len;
+ sg = sg_next(sg);
+ sgoff = 0;
+ }
+
+ if (padlen) {
+ idx = rtcr_inc_src_idx(idx, 1);
+ rtcr_add_src_to_ring(cdev, idx, (void *)empty_zero_page, padlen, reqlen);
+ }
+
+ rtcr_add_src_pad_to_ring(cdev, idx, reqlen);
+ rtcr_add_src_skcipher_to_ring(cdev, srcidx, opmode, reqlen, sctx);
+
+ /* Off we go */
+ rtcr_kick_engine(cdev);
+ if (rtcr_wait_for_request(cdev, dstidx))
+ return -EINVAL;
+
+ if (sreq->iv)
+ dma_unmap_single(cdev->dev, ivdma, AES_BLOCK_SIZE, DMA_TO_DEVICE);
+ if (outdma)
+ dma_unmap_single(cdev->dev, outdma, reqlen, DMA_BIDIRECTIONAL);
+ if (indma)
+ dma_unmap_single(cdev->dev, indma, reqlen, DMA_TO_DEVICE);
+ if (sgmap)
+ dma_unmap_sg(cdev->dev, sg, sgcnt, sgdir);
+
+ /* Handle IV feedback as engine does not provide it */
+ if (opmode & RTCR_SRC_OP_CRYPT_CTR) {
+ rtcr_inc_iv(sreq->iv, reqlen / AES_BLOCK_SIZE);
+ } else if (opmode & RTCR_SRC_OP_CRYPT_CBC) {
+ iv = opmode & RTCR_SRC_OP_KAM_ENC ?
+ dataout + reqlen - AES_BLOCK_SIZE : ivbk;
+ memcpy(sreq->iv, iv, AES_BLOCK_SIZE);
+ }
+
+ sg = sgn;
+ sgoff = sgnoff;
+ dgoff += reqlen;
+ totallen -= min(reqlen, totallen);
+
+ if (totallen)
+ goto redo;
+
+ return 0;
+}
+
+static int rtcr_skcipher_encrypt(struct skcipher_request *sreq)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(sreq);
+ struct rtcr_skcipher_ctx *sctx = crypto_skcipher_ctx(tfm);
+ int opmode = sctx->opmode | RTCR_SRC_OP_KAM_ENC;
+
+ return rtcr_process_skcipher(sreq, opmode);
+}
+
+static int rtcr_skcipher_decrypt(struct skcipher_request *sreq)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(sreq);
+ struct rtcr_skcipher_ctx *sctx = crypto_skcipher_ctx(tfm);
+ int opmode = sctx->opmode;
+
+ opmode |= sctx->opmode & RTCR_SRC_OP_CRYPT_CTR ?
+ RTCR_SRC_OP_KAM_ENC : RTCR_SRC_OP_KAM_DEC;
+
+ return rtcr_process_skcipher(sreq, opmode);
+}
+
+static int rtcr_skcipher_setkey(struct crypto_skcipher *cipher,
+ const u8 *key, unsigned int keylen)
+{
+ struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
+ struct rtcr_skcipher_ctx *sctx = crypto_tfm_ctx(tfm);
+ struct rtcr_crypto_dev *cdev = sctx->cdev;
+ struct crypto_aes_ctx kctx;
+ int p, i;
+
+ if (aes_expandkey(&kctx, key, keylen))
+ return -EINVAL;
+
+ sctx->keylen = keylen;
+ sctx->opmode = (sctx->opmode & ~RTCR_SRC_OP_CIPHER_MASK) |
+ RTCR_SRC_OP_CIPHER_FROM_KEY(keylen);
+
+ memcpy(sctx->keyenc, key, keylen);
+ /* decryption key is derived from expanded key */
+ p = ((keylen / 4) + 6) * 4;
+ for (i = 0; i < 8; i++) {
+ sctx->keydec[i] = cpu_to_le32(kctx.key_enc[p + i]);
+ if (i == 3)
+ p -= keylen == AES_KEYSIZE_256 ? 8 : 6;
+ }
+
+ dma_sync_single_for_device(cdev->dev, sctx->keydma, 2 * AES_KEYSIZE_256, DMA_TO_DEVICE);
+
+ return 0;
+}
+
+static int rtcr_skcipher_cra_init(struct crypto_tfm *tfm)
+{
+ struct rtcr_skcipher_ctx *sctx = crypto_tfm_ctx(tfm);
+ struct rtcr_alg_template *tmpl;
+
+ tmpl = container_of(tfm->__crt_alg, struct rtcr_alg_template,
+ alg.skcipher.base);
+
+ sctx->cdev = tmpl->cdev;
+ sctx->opmode = tmpl->opmode;
+ sctx->keydma = dma_map_single(sctx->cdev->dev, sctx->keyenc,
+ 2 * AES_KEYSIZE_256, DMA_TO_DEVICE);
+
+ return 0;
+}
+
+static void rtcr_skcipher_cra_exit(struct crypto_tfm *tfm)
+{
+ struct rtcr_skcipher_ctx *sctx = crypto_tfm_ctx(tfm);
+ struct rtcr_crypto_dev *cdev = sctx->cdev;
+
+ dma_unmap_single(cdev->dev, sctx->keydma, 2 * AES_KEYSIZE_256, DMA_TO_DEVICE);
+ memzero_explicit(sctx, tfm->__crt_alg->cra_ctxsize);
+}
+
+struct rtcr_alg_template rtcr_skcipher_ecb_aes = {
+ .type = RTCR_ALG_SKCIPHER,
+ .opmode = RTCR_SRC_OP_MS_CRYPTO | RTCR_SRC_OP_CRYPT_ECB,
+ .alg.skcipher = {
+ .setkey = rtcr_skcipher_setkey,
+ .encrypt = rtcr_skcipher_encrypt,
+ .decrypt = rtcr_skcipher_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .base = {
+ .cra_name = "ecb(aes)",
+ .cra_driver_name = "realtek-ecb-aes",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_ASYNC,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct rtcr_skcipher_ctx),
+ .cra_alignmask = 0,
+ .cra_init = rtcr_skcipher_cra_init,
+ .cra_exit = rtcr_skcipher_cra_exit,
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+struct rtcr_alg_template rtcr_skcipher_cbc_aes = {
+ .type = RTCR_ALG_SKCIPHER,
+ .opmode = RTCR_SRC_OP_MS_CRYPTO | RTCR_SRC_OP_CRYPT_CBC,
+ .alg.skcipher = {
+ .setkey = rtcr_skcipher_setkey,
+ .encrypt = rtcr_skcipher_encrypt,
+ .decrypt = rtcr_skcipher_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .base = {
+ .cra_name = "cbc(aes)",
+ .cra_driver_name = "realtek-cbc-aes",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_ASYNC,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct rtcr_skcipher_ctx),
+ .cra_alignmask = 0,
+ .cra_init = rtcr_skcipher_cra_init,
+ .cra_exit = rtcr_skcipher_cra_exit,
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+struct rtcr_alg_template rtcr_skcipher_ctr_aes = {
+ .type = RTCR_ALG_SKCIPHER,
+ .opmode = RTCR_SRC_OP_MS_CRYPTO | RTCR_SRC_OP_CRYPT_CTR,
+ .alg.skcipher = {
+ .setkey = rtcr_skcipher_setkey,
+ .encrypt = rtcr_skcipher_encrypt,
+ .decrypt = rtcr_skcipher_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .base = {
+ .cra_name = "ctr(aes)",
+ .cra_driver_name = "realtek-ctr-aes",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_ASYNC,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct rtcr_skcipher_ctx),
+ .cra_alignmask = 0,
+ .cra_init = rtcr_skcipher_cra_init,
+ .cra_exit = rtcr_skcipher_cra_exit,
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
--
2.38.1

2023-01-06 09:09:32

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] crypto/realtek: hash algorithms

Markus Stockhausen <[email protected]> wrote:
>
> + while (datalen) {
> + len = min(sg_dma_len(sg), datalen);
> +
> + idx = rtcr_inc_src_idx(idx, 1);
> + rtcr_add_src_to_ring(cdev, idx, sg_virt(sg), len, reqlen);

You cannot use sg_virt because the SG might not even be mapped
into kernel address space. Since the code then simply converts
it to physical you should simply pass along the DMA address you
obtained from the DMA API.

> +
> + datalen -= len;
> + if (datalen)
> + sg = sg_next(sg);
> + }
> +
> + if (padlen) {
> + idx = rtcr_inc_src_idx(idx, 1);
> + rtcr_add_src_to_ring(cdev, idx, ppad, padlen, reqlen);
> + }
> +
> + rtcr_add_src_pad_to_ring(cdev, idx, reqlen);
> + rtcr_add_src_ahash_to_ring(cdev, srcidx, hctx->opmode, reqlen);
> +
> + /* Off we go */
> + rtcr_kick_engine(cdev);
> + if (rtcr_wait_for_request(cdev, dstidx))
> + return -EINVAL;

You cannot sleep in this function because it may be called from
softirq context. Instead you should use asynchronous completion.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-01-06 09:11:56

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] crypto/realtek: core functions

Markus Stockhausen <[email protected]> wrote:
>
> +void rtcr_add_src_to_ring(struct rtcr_crypto_dev *cdev, int idx, void *vaddr,
> + int blocklen, int totallen)
> +{
> + dma_addr_t dma = cdev->src_dma + idx * RTCR_SRC_DESC_SIZE;
> + struct rtcr_src_desc *src = &cdev->src_ring[idx];
> +
> + src->len = totallen;
> + src->paddr = virt_to_phys(vaddr);
> + src->opmode = RTCR_SRC_OP_OWN_ASIC | RTCR_SRC_OP_CALC_EOR(idx) | blocklen;
> +
> + dma_sync_single_for_device(cdev->dev, dma, RTCR_SRC_DESC_SIZE, DMA_BIDIRECTIONAL);

Why aren't there any calls to dma_sync_single_for_cpu if this is
truly bidirectional?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-01-06 16:10:20

by Markus Stockhausen

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] crypto/realtek: hash algorithms

Am Freitag, dem 06.01.2023 um 17:01 +0800 schrieb Herbert Xu:
> Markus Stockhausen <[email protected]> wrote:
> >
> > +
> > +       /* Off we go */
> > +       rtcr_kick_engine(cdev);
> > +       if (rtcr_wait_for_request(cdev, dstidx))
> > +               return -EINVAL;
>
> You cannot sleep in this function because it may be called from
> softirq context.  Instead you should use asynchronous completion.
>
> Thanks,

Hm,

I thought that using wait_event() inside the above function should be
sufficient to handle that. Any good example of how to achieve that type
of completion?

Markus

2023-01-06 16:27:03

by Markus Stockhausen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] crypto/realtek: core functions

Am Freitag, dem 06.01.2023 um 17:02 +0800 schrieb Herbert Xu:
> Markus Stockhausen <[email protected]> wrote:
> >
> > +void rtcr_add_src_to_ring(struct rtcr_crypto_dev *cdev, int idx,
> > void *vaddr,
> > +                         int blocklen, int totallen)
> > +{
> > +       dma_addr_t dma = cdev->src_dma + idx * RTCR_SRC_DESC_SIZE;
> > +       struct rtcr_src_desc *src = &cdev->src_ring[idx];
> > +
> > +       src->len = totallen;
> > +       src->paddr = virt_to_phys(vaddr);
> > +       src->opmode = RTCR_SRC_OP_OWN_ASIC |
> > RTCR_SRC_OP_CALC_EOR(idx) | blocklen;
> > +
> > +       dma_sync_single_for_device(cdev->dev, dma,
> > RTCR_SRC_DESC_SIZE, DMA_BIDIRECTIONAL);
>
> Why aren't there any calls to dma_sync_single_for_cpu if this is
> truly bidirectional?
>
> Cheers,

Thanks, I need to check this again. CPU sets ownership bit in that
descriptor to OWNED_BY_ASIC and after processing we expect that engine
has set it back to OWNED_BY_CPU. So bidirectional operation is somehow
needed.

Markus

2023-01-10 07:45:46

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] crypto/realtek: hash algorithms

On Fri, Jan 06, 2023 at 05:09:25PM +0100, Markus Stockhausen wrote:
>
> I thought that using wait_event() inside the above function should be
> sufficient to handle that. Any good example of how to achieve that type
> of completion?

I think most other drivers in drivers/crypto are async.

Essentially, you return -EINPROGRESS in the update/final functions
if the request was successfully queued to the hardware, and
once it completes you invoke the completion function.

If you don't have your own queueing mechanism, you should use
the crypto_engine API to queue the requests before they are
processed.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-01-10 07:54:02

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] crypto/realtek: core functions

On Fri, Jan 06, 2023 at 05:17:16PM +0100, Markus Stockhausen wrote:
>
> Thanks, I need to check this again. CPU sets ownership bit in that
> descriptor to OWNED_BY_ASIC and after processing we expect that engine
> has set it back to OWNED_BY_CPU. So bidirectional operation is somehow
> needed.

For each bi-directional mapping, you must call

dma_sync_single_for_cpu

before you read what the hardware has just written. Then you make
your changes, and once you are done you do

dma_sync_single_for_device

Note that you must ensure that the hardware does not modify the
memory area after you have called

dma_sync_single_for_cpu

and before the next

dma_sync_single_for_device

call.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt