This engine implements AES in CBC mode, using 128-bit keys only. It is
present on both the Wii and the Wii U, and is apparently identical in
both consoles.
The hardware is capable of firing an interrupt when the operation is
done, but this driver currently uses a busy loop, I’m not too sure
whether it would be preferable to switch, nor how to achieve that.
It also supports a mode where no operation is done, and thus could be
used as a DMA copy engine, but I don’t know how to expose that to the
kernel or whether it would even be useful.
In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
speedup.
This driver was written based on reversed documentation, see:
https://wiibrew.org/wiki/Hardware/AES
Emmanuel Gil Peyrot (4):
crypto: nintendo-aes - add a new AES driver
dt-bindings: nintendo-aes: Document the Wii and Wii U AES support
powerpc: wii.dts: Expose the AES engine on this platform
powerpc: wii_defconfig: Enable AES by default
.../bindings/crypto/nintendo-aes.yaml | 34 +++
arch/powerpc/boot/dts/wii.dts | 7 +
arch/powerpc/configs/wii_defconfig | 4 +-
drivers/crypto/Kconfig | 11 +
drivers/crypto/Makefile | 1 +
drivers/crypto/nintendo-aes.c | 273 ++++++++++++++++++
6 files changed, 329 insertions(+), 1 deletion(-)
create mode 100644 Documentation/devicetree/bindings/crypto/nintendo-aes.yaml
create mode 100644 drivers/crypto/nintendo-aes.c
--
2.33.0
This engine implements AES in CBC mode, using 128-bit keys only. It is
present on both the Wii and the Wii U, and is apparently identical in
both consoles.
The hardware is capable of firing an interrupt when the operation is
done, but this driver currently uses a busy loop, I’m not too sure
whether it would be preferable to switch, nor how to achieve that.
It also supports a mode where no operation is done, and thus could be
used as a DMA copy engine, but I don’t know how to expose that to the
kernel or whether it would even be useful.
In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
speedup.
This driver was written based on reversed documentation, see:
https://wiibrew.org/wiki/Hardware/AES
Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
---
drivers/crypto/Kconfig | 11 ++
drivers/crypto/Makefile | 1 +
drivers/crypto/nintendo-aes.c | 273 ++++++++++++++++++++++++++++++++++
3 files changed, 285 insertions(+)
create mode 100644 drivers/crypto/nintendo-aes.c
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 9a4c275a1335..adc94ad7462d 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -871,4 +871,15 @@ config CRYPTO_DEV_SA2UL
source "drivers/crypto/keembay/Kconfig"
+config CRYPTO_DEV_NINTENDO
+ tristate "Support for the Nintendo Wii U AES engine"
+ depends on WII || WIIU || COMPILE_TEST
+ select CRYPTO_AES
+ help
+ Say 'Y' here to use the Nintendo Wii or Wii U on-board AES
+ engine for the CryptoAPI AES algorithm.
+
+ To compile this driver as a module, choose M here: the module
+ will be called nintendo-aes.
+
endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index fa22cb19e242..004dae7bbf39 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_CRYPTO_DEV_MARVELL) += marvell/
obj-$(CONFIG_CRYPTO_DEV_MXS_DCP) += mxs-dcp.o
obj-$(CONFIG_CRYPTO_DEV_NIAGARA2) += n2_crypto.o
n2_crypto-y := n2_core.o n2_asm.o
+obj-$(CONFIG_CRYPTO_DEV_NINTENDO) += nintendo-aes.o
obj-$(CONFIG_CRYPTO_DEV_NX) += nx/
obj-$(CONFIG_CRYPTO_DEV_OMAP) += omap-crypto.o
obj-$(CONFIG_CRYPTO_DEV_OMAP_AES) += omap-aes-driver.o
diff --git a/drivers/crypto/nintendo-aes.c b/drivers/crypto/nintendo-aes.c
new file mode 100644
index 000000000000..79ae77500999
--- /dev/null
+++ b/drivers/crypto/nintendo-aes.c
@@ -0,0 +1,273 @@
+/*
+ * Copyright (C) 2021 Emmanuel Gil Peyrot <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/mod_devicetable.h>
+#include <linux/crypto.h>
+#include <linux/spinlock.h>
+#include <linux/platform_device.h>
+#include <crypto/aes.h>
+#include <crypto/internal/skcipher.h>
+
+#include <linux/io.h>
+#include <linux/delay.h>
+
+/* Addresses of the registers */
+#define AES_CTRL 0
+#define AES_SRC 4
+#define AES_DEST 8
+#define AES_KEY 12
+#define AES_IV 16
+
+#define AES_CTRL_EXEC 0x80000000
+#define AES_CTRL_EXEC_RESET 0x00000000
+#define AES_CTRL_EXEC_INIT 0x80000000
+#define AES_CTRL_IRQ 0x40000000
+#define AES_CTRL_ERR 0x20000000
+#define AES_CTRL_ENA 0x10000000
+#define AES_CTRL_DEC 0x08000000
+#define AES_CTRL_IV 0x00001000
+#define AES_CTRL_BLOCK 0x00000fff
+
+#define OP_TIMEOUT 0x1000
+
+#define AES_DIR_DECRYPT 0
+#define AES_DIR_ENCRYPT 1
+
+static void __iomem *base;
+static spinlock_t lock;
+
+/* Write a 128 bit field (either a writable key or IV) */
+static inline void
+writefield(u32 reg, const void *_value)
+{
+ const u32 *value = _value;
+ int i;
+
+ for (i = 0; i < 4; i++)
+ iowrite32be(value[i], base + reg);
+}
+
+static int
+do_crypt(const void *src, void *dst, u32 len, u32 flags)
+{
+ u32 blocks = ((len >> 4) - 1) & AES_CTRL_BLOCK;
+ u32 status;
+ u32 counter = OP_TIMEOUT;
+ u32 i;
+
+ /* Flush out all of src, we can’t know whether any of it is in cache */
+ for (i = 0; i < len; i += 32)
+ __asm__("dcbf 0, %0" : : "r" (src + i));
+ __asm__("sync" : : : "memory");
+
+ /* Set the addresses for DMA */
+ iowrite32be(virt_to_phys((void *)src), base + AES_SRC);
+ iowrite32be(virt_to_phys(dst), base + AES_DEST);
+
+ /* Start the operation */
+ iowrite32be(flags | blocks, base + AES_CTRL);
+
+ /* TODO: figure out how to use interrupts here, this will probably
+ * lower throughput but let the CPU do other things while the AES
+ * engine is doing its work. */
+ do {
+ status = ioread32be(base + AES_CTRL);
+ cpu_relax();
+ } while ((status & AES_CTRL_EXEC) && --counter);
+
+ /* Do we ever get called with dst ≠ src? If so we have to invalidate
+ * dst in addition to the earlier flush of src. */
+ if (unlikely(dst != src)) {
+ for (i = 0; i < len; i += 32)
+ __asm__("dcbi 0, %0" : : "r" (dst + i));
+ __asm__("sync" : : : "memory");
+ }
+
+ return counter ? 0 : 1;
+}
+
+static void
+nintendo_aes_crypt(const void *src, void *dst, u32 len, u8 *iv, int dir,
+ bool firstchunk)
+{
+ u32 flags = 0;
+ unsigned long iflags;
+ int ret;
+
+ flags |= AES_CTRL_EXEC_INIT /* | AES_CTRL_IRQ */ | AES_CTRL_ENA;
+
+ if (dir == AES_DIR_DECRYPT)
+ flags |= AES_CTRL_DEC;
+
+ if (!firstchunk)
+ flags |= AES_CTRL_IV;
+
+ /* Start the critical section */
+ spin_lock_irqsave(&lock, iflags);
+
+ if (firstchunk)
+ writefield(AES_IV, iv);
+
+ ret = do_crypt(src, dst, len, flags);
+ BUG_ON(ret);
+
+ spin_unlock_irqrestore(&lock, iflags);
+}
+
+static int nintendo_setkey_skcipher(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int len)
+{
+ /* The hardware only supports AES-128 */
+ if (len != AES_KEYSIZE_128)
+ return -EINVAL;
+
+ writefield(AES_KEY, key);
+ return 0;
+}
+
+static int nintendo_skcipher_crypt(struct skcipher_request *req, int dir)
+{
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+ char ivbuf[AES_BLOCK_SIZE];
+ unsigned int ivsize;
+
+ bool firstchunk = true;
+
+ /* Reset the engine */
+ iowrite32be(0, base + AES_CTRL);
+
+ err = skcipher_walk_virt(&walk, req, false);
+ ivsize = min(sizeof(ivbuf), walk.ivsize);
+
+ while ((nbytes = walk.nbytes) != 0) {
+ unsigned int chunkbytes = round_down(nbytes, AES_BLOCK_SIZE);
+ unsigned int ret = nbytes % AES_BLOCK_SIZE;
+
+ if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
+ /* If this is the last chunk and we're decrypting, take
+ * note of the IV (which is the last ciphertext block)
+ */
+ memcpy(ivbuf, walk.src.virt.addr + walk.total - ivsize,
+ ivsize);
+ }
+
+ nintendo_aes_crypt(walk.src.virt.addr, walk.dst.virt.addr,
+ chunkbytes, walk.iv, dir, firstchunk);
+
+ if (walk.total == chunkbytes && dir == AES_DIR_ENCRYPT) {
+ /* If this is the last chunk and we're encrypting, take
+ * note of the IV (which is the last ciphertext block)
+ */
+ memcpy(walk.iv,
+ walk.dst.virt.addr + walk.total - ivsize,
+ ivsize);
+ } else if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
+ memcpy(walk.iv, ivbuf, ivsize);
+ }
+
+ err = skcipher_walk_done(&walk, ret);
+ firstchunk = false;
+ }
+
+ return err;
+}
+
+static int nintendo_cbc_encrypt(struct skcipher_request *req)
+{
+ return nintendo_skcipher_crypt(req, AES_DIR_ENCRYPT);
+}
+
+static int nintendo_cbc_decrypt(struct skcipher_request *req)
+{
+ return nintendo_skcipher_crypt(req, AES_DIR_DECRYPT);
+}
+
+static struct skcipher_alg nintendo_alg = {
+ .base.cra_name = "cbc(aes)",
+ .base.cra_driver_name = "cbc-aes-nintendo",
+ .base.cra_priority = 400,
+ .base.cra_flags = CRYPTO_ALG_KERN_DRIVER_ONLY,
+ .base.cra_blocksize = AES_BLOCK_SIZE,
+ .base.cra_alignmask = 15,
+ .base.cra_module = THIS_MODULE,
+ .setkey = nintendo_setkey_skcipher,
+ .encrypt = nintendo_cbc_encrypt,
+ .decrypt = nintendo_cbc_decrypt,
+ .min_keysize = AES_KEYSIZE_128,
+ .max_keysize = AES_KEYSIZE_128,
+ .ivsize = AES_BLOCK_SIZE,
+};
+
+static int nintendo_aes_remove(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+
+ crypto_unregister_skcipher(&nintendo_alg);
+ devm_iounmap(dev, base);
+ base = NULL;
+
+ return 0;
+}
+
+static int nintendo_aes_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct resource *res;
+ int ret;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ base = devm_ioremap_resource(dev, res);
+ if (IS_ERR(base))
+ return PTR_ERR(base);
+
+ spin_lock_init(&lock);
+
+ ret = crypto_register_skcipher(&nintendo_alg);
+ if (ret)
+ goto eiomap;
+
+ dev_notice(dev, "Nintendo Wii and Wii U AES engine enabled\n");
+ return 0;
+
+ eiomap:
+ devm_iounmap(dev, base);
+
+ dev_err(dev, "Nintendo Wii and Wii U AES initialization failed\n");
+ return ret;
+}
+
+static const struct of_device_id nintendo_aes_of_match[] = {
+ { .compatible = "nintendo,hollywood-aes", },
+ { .compatible = "nintendo,latte-aes", },
+ {/* sentinel */},
+};
+MODULE_DEVICE_TABLE(of, nintendo_aes_of_match);
+
+static struct platform_driver nintendo_aes_driver = {
+ .driver = {
+ .name = "nintendo-aes",
+ .of_match_table = nintendo_aes_of_match,
+ },
+ .probe = nintendo_aes_probe,
+ .remove = nintendo_aes_remove,
+};
+
+module_platform_driver(nintendo_aes_driver);
+
+MODULE_AUTHOR("Emmanuel Gil Peyrot <[email protected]>");
+MODULE_DESCRIPTION("Nintendo Wii and Wii U Hardware AES driver");
+MODULE_LICENSE("GPL");
--
2.33.0
This selects the nintendo-aes module when building for this platform.
Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
---
arch/powerpc/configs/wii_defconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/configs/wii_defconfig b/arch/powerpc/configs/wii_defconfig
index 379c171f3ddd..752e081d28d0 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -123,4 +123,6 @@ CONFIG_SCHED_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DMA_API_DEBUG=y
CONFIG_PPC_EARLY_DEBUG=y
-# CONFIG_CRYPTO_HW is not set
+CONFIG_CRYPTO_HW=y
+CONFIG_CRYPTO_DEV_NINTENDO=y
+CONFIG_CRYPTO_USER_API_SKCIPHER=y
--
2.33.0
On Tue, Sep 21, 2021 at 11:39:26PM +0200, Emmanuel Gil Peyrot wrote:
> This engine implements AES in CBC mode, using 128-bit keys only. It is
> present on both the Wii and the Wii U, and is apparently identical in
> both consoles.
>
> The hardware is capable of firing an interrupt when the operation is
> done, but this driver currently uses a busy loop, I’m not too sure
> whether it would be preferable to switch, nor how to achieve that.
>
> It also supports a mode where no operation is done, and thus could be
> used as a DMA copy engine, but I don’t know how to expose that to the
> kernel or whether it would even be useful.
>
> In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> speedup.
>
> This driver was written based on reversed documentation, see:
> https://wiibrew.org/wiki/Hardware/AES
>
> Emmanuel Gil Peyrot (4):
> crypto: nintendo-aes - add a new AES driver
> dt-bindings: nintendo-aes: Document the Wii and Wii U AES support
> powerpc: wii.dts: Expose the AES engine on this platform
> powerpc: wii_defconfig: Enable AES by default
Does this pass the self-tests, including the fuzz tests which are enabled by
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
- Eric
On Tue, Sep 21, 2021 at 02:59:37PM -0700, Eric Biggers wrote:
> On Tue, Sep 21, 2021 at 11:39:26PM +0200, Emmanuel Gil Peyrot wrote:
> > This engine implements AES in CBC mode, using 128-bit keys only. It is
> > present on both the Wii and the Wii U, and is apparently identical in
> > both consoles.
> >
> > The hardware is capable of firing an interrupt when the operation is
> > done, but this driver currently uses a busy loop, I’m not too sure
> > whether it would be preferable to switch, nor how to achieve that.
> >
> > It also supports a mode where no operation is done, and thus could be
> > used as a DMA copy engine, but I don’t know how to expose that to the
> > kernel or whether it would even be useful.
> >
> > In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> > aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> > speedup.
> >
> > This driver was written based on reversed documentation, see:
> > https://wiibrew.org/wiki/Hardware/AES
> >
> > Emmanuel Gil Peyrot (4):
> > crypto: nintendo-aes - add a new AES driver
> > dt-bindings: nintendo-aes: Document the Wii and Wii U AES support
> > powerpc: wii.dts: Expose the AES engine on this platform
> > powerpc: wii_defconfig: Enable AES by default
>
> Does this pass the self-tests, including the fuzz tests which are enabled by
> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
I wasn’t aware of those, and indeed it doesn’t pass them yet:
[ 0.680164] alg: skcipher: cbc-aes-nintendo encryption overran dst buffer on test vector 0, cfg="out-of-place"
[ 0.680201] fbcon: Taking over console
[ 0.680219] ------------[ cut here ]------------
[ 0.680222] alg: self-tests for cbc-aes-nintendo (cbc(aes)) failed (rc=-75)
I’ll try to figure out how to debug this and I’ll send a v2, thanks for
the hint!
>
> - Eric
--
Emmanuel Gil Peyrot
On Tue, 21 Sept 2021 at 21:47, Emmanuel Gil Peyrot
<[email protected]> wrote:
>
> This engine implements AES in CBC mode, using 128-bit keys only. It is
> present on both the Wii and the Wii U, and is apparently identical in
> both consoles.
>
> The hardware is capable of firing an interrupt when the operation is
> done, but this driver currently uses a busy loop, I’m not too sure
> whether it would be preferable to switch, nor how to achieve that.
>
> It also supports a mode where no operation is done, and thus could be
> used as a DMA copy engine, but I don’t know how to expose that to the
> kernel or whether it would even be useful.
>
> In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> speedup.
>
> This driver was written based on reversed documentation, see:
> https://wiibrew.org/wiki/Hardware/AES
>
> Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
> Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
> ---
> drivers/crypto/Kconfig | 11 ++
> drivers/crypto/Makefile | 1 +
> drivers/crypto/nintendo-aes.c | 273 ++++++++++++++++++++++++++++++++++
> 3 files changed, 285 insertions(+)
> create mode 100644 drivers/crypto/nintendo-aes.c
>
> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> index 9a4c275a1335..adc94ad7462d 100644
> --- a/drivers/crypto/Kconfig
> +++ b/drivers/crypto/Kconfig
> @@ -871,4 +871,15 @@ config CRYPTO_DEV_SA2UL
>
> source "drivers/crypto/keembay/Kconfig"
>
> +config CRYPTO_DEV_NINTENDO
> + tristate "Support for the Nintendo Wii U AES engine"
> + depends on WII || WIIU || COMPILE_TEST
This current seteup will allow the driver to be compile tested for
non-powerpc, which will fail on the dcbf instructions.
Perhaps use this instead:
depends on WII || WIIU || (COMPILE_TEST && PPC)
> + select CRYPTO_AES
> + help
> + Say 'Y' here to use the Nintendo Wii or Wii U on-board AES
> + engine for the CryptoAPI AES algorithm.
> +
> + To compile this driver as a module, choose M here: the module
> + will be called nintendo-aes.
> +
> endif # CRYPTO_HW
> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
> index fa22cb19e242..004dae7bbf39 100644
> --- a/drivers/crypto/Makefile
> +++ b/drivers/crypto/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_CRYPTO_DEV_MARVELL) += marvell/
> obj-$(CONFIG_CRYPTO_DEV_MXS_DCP) += mxs-dcp.o
> obj-$(CONFIG_CRYPTO_DEV_NIAGARA2) += n2_crypto.o
> n2_crypto-y := n2_core.o n2_asm.o
> +obj-$(CONFIG_CRYPTO_DEV_NINTENDO) += nintendo-aes.o
> obj-$(CONFIG_CRYPTO_DEV_NX) += nx/
> obj-$(CONFIG_CRYPTO_DEV_OMAP) += omap-crypto.o
> obj-$(CONFIG_CRYPTO_DEV_OMAP_AES) += omap-aes-driver.o
> diff --git a/drivers/crypto/nintendo-aes.c b/drivers/crypto/nintendo-aes.c
> new file mode 100644
> index 000000000000..79ae77500999
> --- /dev/null
> +++ b/drivers/crypto/nintendo-aes.c
> @@ -0,0 +1,273 @@
> +/*
> + * Copyright (C) 2021 Emmanuel Gil Peyrot <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
The kernel uses the SDPX header instead of pasting the text.
> +static int
> +do_crypt(const void *src, void *dst, u32 len, u32 flags)
> +{
> + u32 blocks = ((len >> 4) - 1) & AES_CTRL_BLOCK;
> + u32 status;
> + u32 counter = OP_TIMEOUT;
> + u32 i;
> +
> + /* Flush out all of src, we can’t know whether any of it is in cache */
> + for (i = 0; i < len; i += 32)
> + __asm__("dcbf 0, %0" : : "r" (src + i));
> + __asm__("sync" : : : "memory");
This could be flush_dcache_range, from asm/cacheflush.h
> +
> + /* Set the addresses for DMA */
> + iowrite32be(virt_to_phys((void *)src), base + AES_SRC);
> + iowrite32be(virt_to_phys(dst), base + AES_DEST);
> +
> + /* Start the operation */
> + iowrite32be(flags | blocks, base + AES_CTRL);
> +
> + /* TODO: figure out how to use interrupts here, this will probably
> + * lower throughput but let the CPU do other things while the AES
> + * engine is doing its work. */
> + do {
> + status = ioread32be(base + AES_CTRL);
> + cpu_relax();
> + } while ((status & AES_CTRL_EXEC) && --counter);
You could add a msleep in here?
Consider using readl_poll_timeout().
Cheers,
Joel
> +
> + /* Do we ever get called with dst ≠ src? If so we have to invalidate
> + * dst in addition to the earlier flush of src. */
> + if (unlikely(dst != src)) {
> + for (i = 0; i < len; i += 32)
> + __asm__("dcbi 0, %0" : : "r" (dst + i));
> + __asm__("sync" : : : "memory");
> + }
> +
> + return counter ? 0 : 1;
> +}
Le Tue, Sep 21, 2021 at 11:39:27PM +0200, Emmanuel Gil Peyrot a écrit :
> This engine implements AES in CBC mode, using 128-bit keys only. It is
> present on both the Wii and the Wii U, and is apparently identical in
> both consoles.
>
> The hardware is capable of firing an interrupt when the operation is
> done, but this driver currently uses a busy loop, I’m not too sure
> whether it would be preferable to switch, nor how to achieve that.
>
> It also supports a mode where no operation is done, and thus could be
> used as a DMA copy engine, but I don’t know how to expose that to the
> kernel or whether it would even be useful.
>
> In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> speedup.
>
> This driver was written based on reversed documentation, see:
> https://wiibrew.org/wiki/Hardware/AES
>
> Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
> Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
[...]
> +static int
> +do_crypt(const void *src, void *dst, u32 len, u32 flags)
> +{
> + u32 blocks = ((len >> 4) - 1) & AES_CTRL_BLOCK;
> + u32 status;
> + u32 counter = OP_TIMEOUT;
> + u32 i;
> +
> + /* Flush out all of src, we can’t know whether any of it is in cache */
> + for (i = 0; i < len; i += 32)
> + __asm__("dcbf 0, %0" : : "r" (src + i));
> + __asm__("sync" : : : "memory");
> +
> + /* Set the addresses for DMA */
> + iowrite32be(virt_to_phys((void *)src), base + AES_SRC);
> + iowrite32be(virt_to_phys(dst), base + AES_DEST);
Hello
Since you do DMA operation, I think you should use the DMA-API and call dma_map_xxx()
This will prevent the use of __asm__ and virt_to_phys().
Regards
On Tue, 21 Sept 2021 at 23:49, Emmanuel Gil Peyrot
<[email protected]> wrote:
>
> This engine implements AES in CBC mode, using 128-bit keys only. It is
> present on both the Wii and the Wii U, and is apparently identical in
> both consoles.
>
> The hardware is capable of firing an interrupt when the operation is
> done, but this driver currently uses a busy loop, I’m not too sure
> whether it would be preferable to switch, nor how to achieve that.
>
> It also supports a mode where no operation is done, and thus could be
> used as a DMA copy engine, but I don’t know how to expose that to the
> kernel or whether it would even be useful.
>
> In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> speedup.
>
> This driver was written based on reversed documentation, see:
> https://wiibrew.org/wiki/Hardware/AES
>
> Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
> Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
This is redundant - everybody should test the code they submit.
...
> + /* TODO: figure out how to use interrupts here, this will probably
> + * lower throughput but let the CPU do other things while the AES
> + * engine is doing its work. */
So is it worthwhile like this? How much faster is it to use this
accelerator rather than the CPU?
> + do {
> + status = ioread32be(base + AES_CTRL);
> + cpu_relax();
> + } while ((status & AES_CTRL_EXEC) && --counter);
> +
> + /* Do we ever get called with dst ≠ src? If so we have to invalidate
> + * dst in addition to the earlier flush of src. */
> + if (unlikely(dst != src)) {
> + for (i = 0; i < len; i += 32)
> + __asm__("dcbi 0, %0" : : "r" (dst + i));
> + __asm__("sync" : : : "memory");
> + }
> +
> + return counter ? 0 : 1;
> +}
> +
> +static void
> +nintendo_aes_crypt(const void *src, void *dst, u32 len, u8 *iv, int dir,
> + bool firstchunk)
> +{
> + u32 flags = 0;
> + unsigned long iflags;
> + int ret;
> +
> + flags |= AES_CTRL_EXEC_INIT /* | AES_CTRL_IRQ */ | AES_CTRL_ENA;
> +
> + if (dir == AES_DIR_DECRYPT)
> + flags |= AES_CTRL_DEC;
> +
> + if (!firstchunk)
> + flags |= AES_CTRL_IV;
> +
> + /* Start the critical section */
> + spin_lock_irqsave(&lock, iflags);
> +
> + if (firstchunk)
> + writefield(AES_IV, iv);
> +
> + ret = do_crypt(src, dst, len, flags);
> + BUG_ON(ret);
> +
> + spin_unlock_irqrestore(&lock, iflags);
> +}
> +
> +static int nintendo_setkey_skcipher(struct crypto_skcipher *tfm, const u8 *key,
> + unsigned int len)
> +{
> + /* The hardware only supports AES-128 */
> + if (len != AES_KEYSIZE_128)
> + return -EINVAL;
> +
> + writefield(AES_KEY, key);
> + return 0;
> +}
> +
> +static int nintendo_skcipher_crypt(struct skcipher_request *req, int dir)
> +{
> + struct skcipher_walk walk;
> + unsigned int nbytes;
> + int err;
> + char ivbuf[AES_BLOCK_SIZE];
> + unsigned int ivsize;
> +
> + bool firstchunk = true;
> +
> + /* Reset the engine */
> + iowrite32be(0, base + AES_CTRL);
> +
> + err = skcipher_walk_virt(&walk, req, false);
> + ivsize = min(sizeof(ivbuf), walk.ivsize);
> +
> + while ((nbytes = walk.nbytes) != 0) {
> + unsigned int chunkbytes = round_down(nbytes, AES_BLOCK_SIZE);
> + unsigned int ret = nbytes % AES_BLOCK_SIZE;
> +
> + if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
> + /* If this is the last chunk and we're decrypting, take
> + * note of the IV (which is the last ciphertext block)
> + */
> + memcpy(ivbuf, walk.src.virt.addr + walk.total - ivsize,
> + ivsize);
> + }
> +
> + nintendo_aes_crypt(walk.src.virt.addr, walk.dst.virt.addr,
> + chunkbytes, walk.iv, dir, firstchunk);
> +
> + if (walk.total == chunkbytes && dir == AES_DIR_ENCRYPT) {
> + /* If this is the last chunk and we're encrypting, take
> + * note of the IV (which is the last ciphertext block)
> + */
> + memcpy(walk.iv,
> + walk.dst.virt.addr + walk.total - ivsize,
> + ivsize);
> + } else if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
> + memcpy(walk.iv, ivbuf, ivsize);
> + }
> +
> + err = skcipher_walk_done(&walk, ret);
> + firstchunk = false;
> + }
> +
> + return err;
> +}
> +
> +static int nintendo_cbc_encrypt(struct skcipher_request *req)
> +{
> + return nintendo_skcipher_crypt(req, AES_DIR_ENCRYPT);
> +}
> +
> +static int nintendo_cbc_decrypt(struct skcipher_request *req)
> +{
> + return nintendo_skcipher_crypt(req, AES_DIR_DECRYPT);
> +}
> +
> +static struct skcipher_alg nintendo_alg = {
> + .base.cra_name = "cbc(aes)",
> + .base.cra_driver_name = "cbc-aes-nintendo",
> + .base.cra_priority = 400,
> + .base.cra_flags = CRYPTO_ALG_KERN_DRIVER_ONLY,
> + .base.cra_blocksize = AES_BLOCK_SIZE,
> + .base.cra_alignmask = 15,
> + .base.cra_module = THIS_MODULE,
> + .setkey = nintendo_setkey_skcipher,
> + .encrypt = nintendo_cbc_encrypt,
> + .decrypt = nintendo_cbc_decrypt,
> + .min_keysize = AES_KEYSIZE_128,
> + .max_keysize = AES_KEYSIZE_128,
> + .ivsize = AES_BLOCK_SIZE,
> +};
> +
> +static int nintendo_aes_remove(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> +
> + crypto_unregister_skcipher(&nintendo_alg);
> + devm_iounmap(dev, base);
> + base = NULL;
> +
> + return 0;
> +}
> +
> +static int nintendo_aes_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct resource *res;
> + int ret;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + base = devm_ioremap_resource(dev, res);
> + if (IS_ERR(base))
> + return PTR_ERR(base);
> +
> + spin_lock_init(&lock);
> +
> + ret = crypto_register_skcipher(&nintendo_alg);
> + if (ret)
> + goto eiomap;
> +
> + dev_notice(dev, "Nintendo Wii and Wii U AES engine enabled\n");
> + return 0;
> +
> + eiomap:
> + devm_iounmap(dev, base);
> +
> + dev_err(dev, "Nintendo Wii and Wii U AES initialization failed\n");
> + return ret;
> +}
> +
> +static const struct of_device_id nintendo_aes_of_match[] = {
> + { .compatible = "nintendo,hollywood-aes", },
> + { .compatible = "nintendo,latte-aes", },
> + {/* sentinel */},
> +};
> +MODULE_DEVICE_TABLE(of, nintendo_aes_of_match);
> +
> +static struct platform_driver nintendo_aes_driver = {
> + .driver = {
> + .name = "nintendo-aes",
> + .of_match_table = nintendo_aes_of_match,
> + },
> + .probe = nintendo_aes_probe,
> + .remove = nintendo_aes_remove,
> +};
> +
> +module_platform_driver(nintendo_aes_driver);
> +
> +MODULE_AUTHOR("Emmanuel Gil Peyrot <[email protected]>");
> +MODULE_DESCRIPTION("Nintendo Wii and Wii U Hardware AES driver");
> +MODULE_LICENSE("GPL");
> --
> 2.33.0
>
On Wed, 22 Sept 2021 at 12:43, Emmanuel Gil Peyrot
<[email protected]> wrote:
>
> On Wed, Sep 22, 2021 at 12:10:41PM +0200, Ard Biesheuvel wrote:
> > On Tue, 21 Sept 2021 at 23:49, Emmanuel Gil Peyrot
> > <[email protected]> wrote:
> > >
> > > This engine implements AES in CBC mode, using 128-bit keys only. It is
> > > present on both the Wii and the Wii U, and is apparently identical in
> > > both consoles.
> > >
> > > The hardware is capable of firing an interrupt when the operation is
> > > done, but this driver currently uses a busy loop, I’m not too sure
> > > whether it would be preferable to switch, nor how to achieve that.
> > >
> > > It also supports a mode where no operation is done, and thus could be
> > > used as a DMA copy engine, but I don’t know how to expose that to the
> > > kernel or whether it would even be useful.
> > >
> > > In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> > > aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> > > speedup.
> > >
> > > This driver was written based on reversed documentation, see:
> > > https://wiibrew.org/wiki/Hardware/AES
> > >
> > > Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
> > > Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
> >
> > This is redundant - everybody should test the code they submit.
>
> Indeed, except for the comment, as I haven’t been able to test on the
> Wii just yet and that’s kind of a call for doing exactly that. :)
>
> >
> > ...
> > > + /* TODO: figure out how to use interrupts here, this will probably
> > > + * lower throughput but let the CPU do other things while the AES
> > > + * engine is doing its work. */
> >
> > So is it worthwhile like this? How much faster is it to use this
> > accelerator rather than the CPU?
>
> As I mentioned above, on my hardware it reaches 80.7 MiB/s using this
> busy loop instead of 30.9 MiB/s using aes-generic, measured using
> `cryptsetup benchmark --cipher=aes --key-size=128`. I expect the
> difference would be even more pronounced on the Wii, with its CPU being
> clocked lower.
>
Ah apologies for not spotting that. This is a nice speedup.
> I will give a try at using the interrupt, but I fully expect a lower
> throughput alongside a lower CPU usage (for large requests).
>
You should consider latency as well. Is it really necessary to disable
interrupts as well? A scheduling blackout of ~1ms (for the worst case
of 64k of input @ 80 MB/s) may be tolerable but keeping interrupts
disabled for that long is probably not a great idea. (Just make sure
you use spin_lock_bh() to prevent deadlocks that could occur if your
code is called from softirq context)
But using the interrupt is obviously preferred. What's wrong with it?
Btw the crypto API does not permit AES-128 only - you will need to add
a fallback for other key sizes as well.
> >
> > > + do {
> > > + status = ioread32be(base + AES_CTRL);
> > > + cpu_relax();
> > > + } while ((status & AES_CTRL_EXEC) && --counter);
> > > +
> > > + /* Do we ever get called with dst ≠ src? If so we have to invalidate
> > > + * dst in addition to the earlier flush of src. */
> > > + if (unlikely(dst != src)) {
> > > + for (i = 0; i < len; i += 32)
> > > + __asm__("dcbi 0, %0" : : "r" (dst + i));
> > > + __asm__("sync" : : : "memory");
> > > + }
> > > +
> > > + return counter ? 0 : 1;
> > > +}
> > > +
> > > +static void
> > > +nintendo_aes_crypt(const void *src, void *dst, u32 len, u8 *iv, int dir,
> > > + bool firstchunk)
> > > +{
> > > + u32 flags = 0;
> > > + unsigned long iflags;
> > > + int ret;
> > > +
> > > + flags |= AES_CTRL_EXEC_INIT /* | AES_CTRL_IRQ */ | AES_CTRL_ENA;
> > > +
> > > + if (dir == AES_DIR_DECRYPT)
> > > + flags |= AES_CTRL_DEC;
> > > +
> > > + if (!firstchunk)
> > > + flags |= AES_CTRL_IV;
> > > +
> > > + /* Start the critical section */
> > > + spin_lock_irqsave(&lock, iflags);
> > > +
> > > + if (firstchunk)
> > > + writefield(AES_IV, iv);
> > > +
> > > + ret = do_crypt(src, dst, len, flags);
> > > + BUG_ON(ret);
> > > +
> > > + spin_unlock_irqrestore(&lock, iflags);
> > > +}
> > > +
> > > +static int nintendo_setkey_skcipher(struct crypto_skcipher *tfm, const u8 *key,
> > > + unsigned int len)
> > > +{
> > > + /* The hardware only supports AES-128 */
> > > + if (len != AES_KEYSIZE_128)
> > > + return -EINVAL;
> > > +
> > > + writefield(AES_KEY, key);
> > > + return 0;
> > > +}
> > > +
> > > +static int nintendo_skcipher_crypt(struct skcipher_request *req, int dir)
> > > +{
> > > + struct skcipher_walk walk;
> > > + unsigned int nbytes;
> > > + int err;
> > > + char ivbuf[AES_BLOCK_SIZE];
> > > + unsigned int ivsize;
> > > +
> > > + bool firstchunk = true;
> > > +
> > > + /* Reset the engine */
> > > + iowrite32be(0, base + AES_CTRL);
> > > +
> > > + err = skcipher_walk_virt(&walk, req, false);
> > > + ivsize = min(sizeof(ivbuf), walk.ivsize);
> > > +
> > > + while ((nbytes = walk.nbytes) != 0) {
> > > + unsigned int chunkbytes = round_down(nbytes, AES_BLOCK_SIZE);
> > > + unsigned int ret = nbytes % AES_BLOCK_SIZE;
> > > +
> > > + if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
> > > + /* If this is the last chunk and we're decrypting, take
> > > + * note of the IV (which is the last ciphertext block)
> > > + */
> > > + memcpy(ivbuf, walk.src.virt.addr + walk.total - ivsize,
> > > + ivsize);
> > > + }
> > > +
> > > + nintendo_aes_crypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > + chunkbytes, walk.iv, dir, firstchunk);
> > > +
> > > + if (walk.total == chunkbytes && dir == AES_DIR_ENCRYPT) {
> > > + /* If this is the last chunk and we're encrypting, take
> > > + * note of the IV (which is the last ciphertext block)
> > > + */
> > > + memcpy(walk.iv,
> > > + walk.dst.virt.addr + walk.total - ivsize,
> > > + ivsize);
> > > + } else if (walk.total == chunkbytes && dir == AES_DIR_DECRYPT) {
> > > + memcpy(walk.iv, ivbuf, ivsize);
> > > + }
> > > +
> > > + err = skcipher_walk_done(&walk, ret);
> > > + firstchunk = false;
> > > + }
> > > +
> > > + return err;
> > > +}
> > > +
> > > +static int nintendo_cbc_encrypt(struct skcipher_request *req)
> > > +{
> > > + return nintendo_skcipher_crypt(req, AES_DIR_ENCRYPT);
> > > +}
> > > +
> > > +static int nintendo_cbc_decrypt(struct skcipher_request *req)
> > > +{
> > > + return nintendo_skcipher_crypt(req, AES_DIR_DECRYPT);
> > > +}
> > > +
> > > +static struct skcipher_alg nintendo_alg = {
> > > + .base.cra_name = "cbc(aes)",
> > > + .base.cra_driver_name = "cbc-aes-nintendo",
> > > + .base.cra_priority = 400,
> > > + .base.cra_flags = CRYPTO_ALG_KERN_DRIVER_ONLY,
> > > + .base.cra_blocksize = AES_BLOCK_SIZE,
> > > + .base.cra_alignmask = 15,
> > > + .base.cra_module = THIS_MODULE,
> > > + .setkey = nintendo_setkey_skcipher,
> > > + .encrypt = nintendo_cbc_encrypt,
> > > + .decrypt = nintendo_cbc_decrypt,
> > > + .min_keysize = AES_KEYSIZE_128,
> > > + .max_keysize = AES_KEYSIZE_128,
> > > + .ivsize = AES_BLOCK_SIZE,
> > > +};
> > > +
> > > +static int nintendo_aes_remove(struct platform_device *pdev)
> > > +{
> > > + struct device *dev = &pdev->dev;
> > > +
> > > + crypto_unregister_skcipher(&nintendo_alg);
> > > + devm_iounmap(dev, base);
> > > + base = NULL;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int nintendo_aes_probe(struct platform_device *pdev)
> > > +{
> > > + struct device *dev = &pdev->dev;
> > > + struct resource *res;
> > > + int ret;
> > > +
> > > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > + base = devm_ioremap_resource(dev, res);
> > > + if (IS_ERR(base))
> > > + return PTR_ERR(base);
> > > +
> > > + spin_lock_init(&lock);
> > > +
> > > + ret = crypto_register_skcipher(&nintendo_alg);
> > > + if (ret)
> > > + goto eiomap;
> > > +
> > > + dev_notice(dev, "Nintendo Wii and Wii U AES engine enabled\n");
> > > + return 0;
> > > +
> > > + eiomap:
> > > + devm_iounmap(dev, base);
> > > +
> > > + dev_err(dev, "Nintendo Wii and Wii U AES initialization failed\n");
> > > + return ret;
> > > +}
> > > +
> > > +static const struct of_device_id nintendo_aes_of_match[] = {
> > > + { .compatible = "nintendo,hollywood-aes", },
> > > + { .compatible = "nintendo,latte-aes", },
> > > + {/* sentinel */},
> > > +};
> > > +MODULE_DEVICE_TABLE(of, nintendo_aes_of_match);
> > > +
> > > +static struct platform_driver nintendo_aes_driver = {
> > > + .driver = {
> > > + .name = "nintendo-aes",
> > > + .of_match_table = nintendo_aes_of_match,
> > > + },
> > > + .probe = nintendo_aes_probe,
> > > + .remove = nintendo_aes_remove,
> > > +};
> > > +
> > > +module_platform_driver(nintendo_aes_driver);
> > > +
> > > +MODULE_AUTHOR("Emmanuel Gil Peyrot <[email protected]>");
> > > +MODULE_DESCRIPTION("Nintendo Wii and Wii U Hardware AES driver");
> > > +MODULE_LICENSE("GPL");
> > > --
> > > 2.33.0
> > >
>
> --
> Emmanuel Gil Peyrot
On Wed, Sep 22, 2021 at 4:12 AM Joel Stanley <[email protected]> wrote:
> On Tue, 21 Sept 2021 at 21:47, Emmanuel Gil Peyrot
> <[email protected]> wrote:
> >
> > This engine implements AES in CBC mode, using 128-bit keys only. It is
> > present on both the Wii and the Wii U, and is apparently identical in
> > both consoles.
> >
> > The hardware is capable of firing an interrupt when the operation is
> > done, but this driver currently uses a busy loop, I’m not too sure
> > whether it would be preferable to switch, nor how to achieve that.
> >
> > It also supports a mode where no operation is done, and thus could be
> > used as a DMA copy engine, but I don’t know how to expose that to the
> > kernel or whether it would even be useful.
> >
> > In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> > aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> > speedup.
> >
> > This driver was written based on reversed documentation, see:
> > https://wiibrew.org/wiki/Hardware/AES
> >
> > Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
> > Tested-by: Emmanuel Gil Peyrot <[email protected]> # on Wii U
> > ---
> > drivers/crypto/Kconfig | 11 ++
> > drivers/crypto/Makefile | 1 +
> > drivers/crypto/nintendo-aes.c | 273 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 285 insertions(+)
> > create mode 100644 drivers/crypto/nintendo-aes.c
> >
> > diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> > index 9a4c275a1335..adc94ad7462d 100644
> > --- a/drivers/crypto/Kconfig
> > +++ b/drivers/crypto/Kconfig
> > @@ -871,4 +871,15 @@ config CRYPTO_DEV_SA2UL
> >
> > source "drivers/crypto/keembay/Kconfig"
> >
> > +config CRYPTO_DEV_NINTENDO
> > + tristate "Support for the Nintendo Wii U AES engine"
> > + depends on WII || WIIU || COMPILE_TEST
>
> This current seteup will allow the driver to be compile tested for
> non-powerpc, which will fail on the dcbf instructions.
>
> Perhaps use this instead:
>
> depends on WII || WIIU || (COMPILE_TEST && PPC)
Or:
depends on PPC
depends on WII || WIIU || COMPILE_TEST
to distinguish between hard and soft dependencies.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds