2023-05-24 17:12:12

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 00/12] x86: Support Key Locker

Hi all,

The previous posting [v6] was intended to deliver the hardware
performance update primarily by revamping the old series. Then,
multiple feedbacks were received. This revision thus incorporates the
changes to address them. So, the primary goal here is to meet those
reviewers' expectations at first.

The series has two parts of changes -- the x86 core and its crypto
library. In this revision, much more changes were made on the latter.
Also, overall changelogs were revisited to be more reviewable.

Here is the overview of those feedbacks/changes (skip some trivial):

Part 1. Crypto Library (PATCH10-12):

PATCH12: AES-KL driver
- Merge all the AES-KL patches. (Eric Biggers)
- Make the driver for the 64-bit mode only. (Eric Biggers)
- Rework the key-size check code (Eric Biggers)
- Adjust the Kconfig change (Robert Elliott)
- Update the changelog. (Eric Biggers)
- Adjust the ASM code to return a proper error code. (Eric Biggers)

PATCH11: AES-NI rework
- Inline the helper code to avoid the indirect call. (Eric Biggers)
- Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
- Don't export symbols yet here. Instead, do it when needed later.
- Improve the coding style (Eric Biggers)

PATCH10: the AESNI-XTS field type fix
- Add as a new patch. (Eric Biggers)

Part 2. The X86 Core (PATCH1-9):

PATCH09: a chicken bit and a Kconfig option
- Rebase on the upstream: commit a894a8a56b57 ("Documentation:
kernel-parameters: sort all "no..." parameters")

PATCH08: the wrapping key recovery
- Limit the symbol export only when needed.

PATCH07: the wrapping key load at boot-time
- Use memzero_explicit() instead of memset(). (Robert Elliott)
- Improve the function prototype. (Eric Biggers and Dave Hansen)

PATCH01: documentation
- Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
documentation into Documentation/arch/"). (Nathan Huckleberry)

This version can be also found here:
git://github.com/intel-staging/keylocker.git kl-v7

[v6] -- the last posting:
https://lore.kernel.org/lkml/[email protected]/

Thanks,
Chang

Chang S. Bae (12):
Documentation/x86: Document Key Locker
x86/cpufeature: Enumerate Key Locker feature
x86/insn: Add Key Locker instructions to the opcode map
x86/asm: Add a wrapper function for the LOADIWKEY instruction
x86/msr-index: Add MSRs for Key Locker wrapping key
x86/keylocker: Define Key Locker CPUID leaf
x86/cpu/keylocker: Load a wrapping key at boot-time
x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
S3/4
x86/cpu: Add a configuration and command line option for Key Locker
crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
crypto: x86/aes - Prepare for a new AES implementation
crypto: x86/aes-kl - Implement the AES-XTS algorithm

.../admin-guide/kernel-parameters.txt | 2 +
Documentation/arch/x86/index.rst | 1 +
Documentation/arch/x86/keylocker.rst | 97 +++
arch/x86/Kconfig | 3 +
arch/x86/crypto/Kconfig | 22 +
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/aes-helper_asm.S | 22 +
arch/x86/crypto/aes-helper_glue.h | 161 +++++
arch/x86/crypto/aeskl-intel_asm.S | 580 ++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 216 +++++++
arch/x86/crypto/aesni-intel_asm.S | 55 +-
arch/x86/crypto/aesni-intel_glue.c | 242 +++-----
arch/x86/crypto/aesni-intel_glue.h | 17 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/keylocker.h | 45 ++
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/special_insns.h | 32 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 21 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/keylocker.c | 212 +++++++
arch/x86/kernel/smpboot.c | 2 +
arch/x86/lib/x86-opcode-map.txt | 11 +-
arch/x86/power/cpu.c | 2 +
tools/arch/x86/lib/x86-opcode-map.txt | 11 +-
27 files changed, 1573 insertions(+), 203 deletions(-)
create mode 100644 Documentation/arch/x86/keylocker.rst
create mode 100644 arch/x86/crypto/aes-helper_asm.S
create mode 100644 arch/x86/crypto/aes-helper_glue.h
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h
create mode 100644 arch/x86/include/asm/keylocker.h
create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a128547bd4425cdef27c606efc88e1eb03a2dba
--
2.17.1



2023-05-24 17:12:19

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 01/12] Documentation/x86: Document Key Locker

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
to process a 192-bit key.'
* Update the text for clarity and readability:
- Clarify the error code and exemplify the backup failure
- Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
Documentation/arch/x86/index.rst | 1 +
Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
2 files changed, 98 insertions(+)
create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
features
elf_auxvec
xstate
+ keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+ restriction failure, the instruction fails with setting RFLAGS.ZF. The
+ crypto library implementation includes the flag check to return -EINVAL.
+ Note that this flag is also set if the wrapping key is changed, e.g.,
+ because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+ AES-KL instruction to process an 192-bit key. The AES-KL cipher
+ implementation logs a warning message with a 192-bit key and then falls
+ back to AES-NI. So, this 192-bit key-size limitation is only documented,
+ not enforced. It means the key will remain in clear-text in memory. This
+ is to meet Linux crypto-cipher expectation that each implementation must
+ support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+ overhead when compared with AES-NI instructions.
+
--
2.17.1


2023-05-24 17:12:30

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 03/12] x86/insn: Add Key Locker instructions to the opcode map

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
Load a CPU-internal wrapping key.

ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.

AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
Locker module is built.
---
arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
--
2.17.1


2023-05-24 17:12:44

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
Williams)

Note, Dan wondered if:
WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
arch/x86/include/asm/keylocker.h | 25 ++++++++++++++++++++++
arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)
create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key: A 128-bit key to check that key handles have not
+ * been tampered with.
+ * @encryption_key: A 256-bit encryption key used in
+ * wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+ struct reg_128_bit integrity_key;
+ struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
#include <asm/processor-flags.h>
#include <linux/irqflags.h>
#include <linux/jump_label.h>
+#include <asm/keylocker.h>

/*
* The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
}

+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key: A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+ struct reg_128_bit zeros = { 0 };
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+ :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+ "m"(key->encryption_key[1]));
+
+ /*
+ * LOADIWKEY %xmm1,%xmm2
+ *
+ * EAX and XMM0 are implicit operands. Load a key value
+ * from XMM0-2 to a software-invisible CPU state. With zero
+ * in EAX, CPU does not do hardware randomization and the key
+ * backup is allowed.
+ *
+ * This instruction is supported by binutils >= 2.36.
+ */
+ asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+ :: "m"(zeros));
+}
+
#endif /* __KERNEL__ */

#endif /* _ASM_X86_SPECIAL_INSNS_H */
--
2.17.1


2023-05-24 17:12:55

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
(Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
arch/x86/include/asm/keylocker.h | 9 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 5 +-
arch/x86/kernel/keylocker.c | 82 ++++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 2 +
5 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <asm/processor.h>
#include <linux/bits.h>
#include <asm/fpu/types.h>

@@ -28,5 +29,13 @@ struct iwkey {
#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)

+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
obj-$(CONFIG_X86_UMIP) += umip.o
+obj-$(CONFIG_X86_KEYLOCKER) += keylocker.o

obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
#include <asm/microcode_intel.h>
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
#include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);

- /* Set up SMEP/SMAP/UMIP */
+ /* Setup various Intel-specific CPU security features */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+ setup_keylocker(c);

/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+ struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+ get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
+ get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+ memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+ kernel_fpu_begin();
+ load_xmm_iwkey(&kl_setup.key);
+ kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c: A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ goto out;
+
+ if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+ pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+ goto disable;
+ }
+
+ cr4_set_bits(X86_CR4_KEYLOCKER);
+
+ if (c == &boot_cpu_data) {
+ u32 eax, ebx, ecx, edx;
+
+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ /*
+ * Check the feature readiness via CPUID. Note that the
+ * CPUID AESKLE bit is conditionally set only when CR4.KL
+ * is set.
+ */
+ if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+ !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+ pr_debug("x86/keylocker: Not fully supported.\n");
+ goto disable;
+ }
+
+ generate_keylocker_data();
+ }
+
+ load_keylocker();
+
+ pr_info_once("x86/keylocker: Enabled.\n");
+ return;
+
+disable:
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info_once("x86/keylocker: Disabled.\n");
+out:
+ /* Make sure the feature disabled for kexec-reboot. */
+ cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e82be82cda55..41f24ab74fb9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
#include <asm/sev.h>
+#include <asm/keylocker.h>

/* representing HT siblings of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1386,6 +1387,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
nmi_selftest();
impress_friends();
cache_aps_init();
+ destroy_keylocker_data();
}

static int __initdata setup_possible_cpus = -1;
--
2.17.1


2023-05-24 17:13:26

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
arch/x86/include/asm/msr-index.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
* a #GP
*/

+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS 0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS 0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM 0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL 0x00000d92
+
#endif /* _ASM_X86_MSR_INDEX_H */
--
2.17.1


2023-05-24 17:13:42

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 06/12] x86/keylocker: Define Key Locker CPUID leaf

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/include/asm/keylocker.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <linux/bits.h>
#include <asm/fpu/types.h>

/**
@@ -21,5 +22,11 @@ struct iwkey {
struct reg_128_bit encryption_key[2];
};

+#define KEYLOCKER_CPUID 0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
--
2.17.1


2023-05-24 17:13:47

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
(Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
Biggers and Dave Hansen)
* Update the documentation:
- Massage the changelog to clarify the problem-and-solution by
sections
- Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
(Reported by Marvin Hsu [email protected]) Add the function
comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
fall through the end that disables the feature. Otherwise, all the
successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
arch/x86/include/asm/keylocker.h | 4 +
arch/x86/kernel/keylocker.c | 136 ++++++++++++++++++++++++++++++-
arch/x86/power/cpu.c | 2 +
3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
#ifdef CONFIG_X86_KEYLOCKER
void setup_keylocker(struct cpuinfo_x86 *c);
void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
#else
static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
#endif

#endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
#include <asm/fpu/api.h>
#include <asm/keylocker.h>
#include <asm/tlbflush.h>
+#include <asm/msr.h>

static __initdata struct keylocker_setup_data {
+ bool initialized;
struct iwkey key;
} kl_setup;

+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+ return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
static void __init generate_keylocker_data(void)
{
get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
}

+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
void __init destroy_keylocker_data(void)
{
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ return;
+
memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+ kl_setup.initialized = true;
+ valid_kl = true;
}

static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
kernel_fpu_end();
}

+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns: -EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+ u64 status;
+
+ wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+ rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+ if (status & BIT(0))
+ return 0;
+ else
+ return -EBUSY;
+}
+
/**
* setup_keylocker - Enable the feature.
* @c: A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)

if (c == &boot_cpu_data) {
u32 eax, ebx, ecx, edx;
+ bool backup_available;

cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
goto disable;
}

+ backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+ /*
+ * The wrapping key in CPU state is volatile in S3/4
+ * states. So ensure the backup capability along with
+ * S-states.
+ */
+ if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+ pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+ goto disable;
+ }
+
generate_keylocker_data();
+ load_keylocker();
+
+ /* Backup a wrapping key in non-volatile media. */
+ if (backup_available)
+ wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+ pr_info("x86/keylocker: Enabled.\n");
+ return;
}

- load_keylocker();
+ /*
+ * At boot time, the key is loaded directly from the memory.
+ * Otherwise, this path performs the key recovery on each CPU
+ * wake-up. Then, the backup in the platform-scoped state is
+ * copied to the CPU state.
+ */
+ if (!kl_setup.initialized) {
+ load_keylocker();
+ return;
+ } else if (valid_kl) {
+ int rc;

- pr_info_once("x86/keylocker: Enabled.\n");
- return;
+ rc = copy_keylocker();
+ if (!rc)
+ return;
+
+ /*
+ * The key copy failed here. The subsequent feature
+ * use will have inconsistent keys and failures. So,
+ * invalidate the feature via the flag like with the
+ * backup failure.
+ */
+ valid_kl = false;
+ pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+ }

disable:
setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
/* Make sure the feature disabled for kexec-reboot. */
cr4_clear_bits(X86_CR4_KEYLOCKER);
}
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+ u64 backup_status;
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+ return;
+
+ /*
+ * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+ * indicates an invalid backup if bit 0 is set and a read (or
+ * write) error if bit 2 is set.
+ */
+ rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+ if (backup_status & BIT(0)) {
+ rc = copy_keylocker();
+ if (rc)
+ pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+ else
+ return;
+ } else {
+ pr_err("x86/keylocker: The key backup access failed with %s.\n",
+ (backup_status & BIT(2)) ? "read error" : "invalid status");
+ }
+
+ /*
+ * Now the backup key is not available. Invalidate the feature
+ * via the flag to avoid any subsequent use. But keep the
+ * feature with zeroed wrapping key instead of disabling it.
+ */
+ pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+ valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
#include <asm/microcode.h>
+#include <asm/keylocker.h>

#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
x86_platform.restore_sched_clock_state();
cache_bp_restore();
perf_restore_debug_store();
+ restore_keylocker();

c = &cpu_data(smp_processor_id());
if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
--
2.17.1


2023-05-24 17:14:23

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 09/12] x86/cpu: Add a configuration and command line option for Key Locker

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/x86/Kconfig | 3 +++
arch/x86/kernel/cpu/common.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
kernel and module base offset ASLR (Address Space
Layout Randomization).

+ nokeylocker [X86] Disable Key Locker hardware feature.
+
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS

If unsure, say y.

+config X86_KEYLOCKER
+ bool
+
choice
prompt "TSX enable mode"
depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
static const unsigned long cr4_pinned_mask =
X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+ /* Expect an exact match without trailing characters. */
+ if (strlen(arg))
+ return 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info("x86/keylocker: Disabled by kernel command line.\n");
+ return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

--
2.17.1


2023-05-24 17:14:35

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away. This can also simplify the runtime alignment code.

Redefine struct aesni_xts_ctx, and align its address on the front.
This draws a rework by refactoring the common alignment code. Then,
clean up the alignment for the old pointers.

Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
arch/x86/crypto/aesni-intel_glue.c | 38 +++++++++++++++++-------------
1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..97a1629b84c4 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
};

struct aesni_xts_ctx {
- u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
- u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+ struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+ struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
};

#define GCM_BLOCK_LEN 16
@@ -219,14 +219,20 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
}
#endif

+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+ return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
+ ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
+}
+
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
{
- unsigned long addr = (unsigned long)raw_ctx;
- unsigned long align = AESNI_ALIGN;
+ return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
+}

- if (align <= crypto_tfm_ctx_alignment())
- align = 1;
- return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+ return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
}

static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +889,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
int err;

err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +899,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
keylen /= 2;

/* first half of xts-key is for crypt */
- err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+ err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
key, keylen);
if (err)
return err;

/* second half of xts-key is for tweak */
- return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+ return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
key + keylen, keylen);
}

static int xts_crypt(struct skcipher_request *req, bool encrypt)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
int tail = req->cryptlen % AES_BLOCK_SIZE;
struct skcipher_request subreq;
struct skcipher_walk walk;
@@ -942,7 +948,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
kernel_fpu_begin();

/* calculate first value of T */
- aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+ aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);

while (walk.nbytes > 0) {
int nbytes = walk.nbytes;
@@ -951,11 +957,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
nbytes &= ~(AES_BLOCK_SIZE - 1);

if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_encrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
nbytes, walk.iv);
else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_decrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
nbytes, walk.iv);
kernel_fpu_end();
@@ -983,11 +989,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)

kernel_fpu_begin();
if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_encrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes, walk.iv);
else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_decrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes, walk.iv);
kernel_fpu_end();
--
2.17.1


2023-05-24 17:14:56

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 11/12] crypto: x86/aes - Prepare for a new AES implementation

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
- Follow the symbol convention: '_' -> '__' (Eric Biggers)
- Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
"CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/aes-helper_asm.S | 22 +++
arch/x86/crypto/aes-helper_glue.h | 161 +++++++++++++++++++++
arch/x86/crypto/aesni-intel_asm.S | 47 +++----
arch/x86/crypto/aesni-intel_glue.c | 218 ++++++++---------------------
4 files changed, 257 insertions(+), 191 deletions(-)
create mode 100644 arch/x86/crypto/aes-helper_asm.S
create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+ .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+ .octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..af422e2c9263
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN 16
+#define AES_ALIGN_ATTR __attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA ((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE (sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+ struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+ struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+ return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? ALIGN(addr, 1) : ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+ return (struct aes_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+ int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+ unsigned int key_len))
+{
+ struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ /* first half of xts-key is for crypt */
+ err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+ if (err)
+ return err;
+
+ /* second half of xts-key is for tweak */
+ return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+ int tail = req->cryptlen % AES_BLOCK_SIZE;
+ struct skcipher_request subreq;
+ struct skcipher_walk walk;
+ int err;
+
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+
+ if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+ int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+ skcipher_walk_abort(&walk);
+
+ skcipher_request_set_tfm(&subreq, tfm);
+ skcipher_request_set_callback(&subreq,
+ skcipher_request_flags(req),
+ NULL, NULL);
+ skcipher_request_set_crypt(&subreq, req->src, req->dst,
+ blocks * AES_BLOCK_SIZE, req->iv);
+ req = &subreq;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+ } else {
+ tail = 0;
+ }
+
+ kernel_fpu_begin();
+
+ /* calculate first value of T */
+ err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+ if (err) {
+ kernel_fpu_end();
+ return err;
+ }
+
+ while (walk.nbytes > 0) {
+ int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+ err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+ if (walk.nbytes > 0)
+ kernel_fpu_begin();
+ }
+
+ if (unlikely(tail > 0 && !err)) {
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct scatterlist *src, *dst;
+
+ src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+ else
+ dst = src;
+
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
+
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
+
+ kernel_fpu_begin();
+ err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, 0);
+ }
+ return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
#include <linux/linkage.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"

/*
* The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
SYM_FUNC_END(aesni_set_key)

/*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)

/*
* _aesni_enc1: internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
SYM_FUNC_END(_aesni_enc4)

/*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)

/*
* _aesni_dec1: internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
RET
SYM_FUNC_END(aesni_cts_cbc_dec)

+#ifdef __x86_64__
+
.pushsection .rodata
.align 16
-.Lcts_permute_table:
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
- .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
.Lbswap_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
.popsection

-#ifdef __x86_64__
/*
* _aesni_inc_init: internal ABI
* setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)

#endif

-.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
- .octa 0x00000000000000010000000000000087
-.previous
-
/*
* _aesni_gf128mul_x_ble: internal ABI
* Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
pxor KEY, IV;

/*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)

movups STATE, (OUTP)
jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)

/*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)

movups STATE, (OUTP)
jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 97a1629b84c4..857cff484bb3 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
#include <linux/spinlock.h>
#include <linux/static_call.h>

+#include "aes-helper_glue.h"

-#define AESNI_ALIGN 16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
#define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)

/* This data is stored at the end of the crypto_tfm struct.
* It's a type of per "session" data storage location.
* This needs to be 16 byte aligned.
*/
struct aesni_rfc4106_gcm_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
u8 nonce[4];
};

struct generic_gcmaes_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
- struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
};

#define GCM_BLOCK_LEN 16
@@ -82,8 +74,8 @@ struct gcm_context_data {

asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len);
asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +89,40 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ __aesni_enc(ctx, out, in);
+ return 0;
+}
+
+static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+ __aesni_dec(ctx, out, in);
+ return 0;
+}
+
#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096

-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);

-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ __aesni_xts_encrypt(ctx, out, in, len, iv);
+ return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ __aesni_xts_decrypt(ctx, out, in, len, iv);
+ return 0;
+}

#ifdef CONFIG_X86_64

@@ -201,7 +219,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
static inline struct
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
+ unsigned long align = AES_ALIGN;

if (align <= crypto_tfm_ctx_alignment())
align = 1;
@@ -211,7 +229,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
static inline struct
generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
+ unsigned long align = AES_ALIGN;

if (align <= crypto_tfm_ctx_alignment())
align = 1;
@@ -219,22 +237,11 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
}
#endif

-static inline unsigned long aes_align_addr(unsigned long addr)
-{
- return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
- ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
-}
-
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
{
return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
}

-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
- return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
-}
-
static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
const u8 *in_key, unsigned int key_len)
{
@@ -270,7 +277,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_encrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_enc(ctx, dst, src);
+ __aesni_enc(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -283,7 +290,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_decrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_dec(ctx, dst, src);
+ __aesni_dec(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -534,7 +541,7 @@ static int ctr_crypt(struct skcipher_request *req)
nbytes &= ~AES_BLOCK_MASK;

if (walk.nbytes == walk.total && nbytes > 0) {
- aesni_enc(ctx, keystream, walk.iv);
+ __aesni_enc(ctx, keystream, walk.iv);
crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
walk.src.virt.addr + walk.nbytes - nbytes,
keystream, nbytes);
@@ -679,8 +686,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
u8 *iv, void *aes_ctx, u8 *auth_tag,
unsigned long auth_tag_len)
{
- u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
- struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+ u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+ struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
unsigned long left = req->cryptlen;
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
@@ -835,8 +842,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;
__be32 counter = cpu_to_be32(1);

@@ -863,8 +870,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;

if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -889,128 +896,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
- struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
- int err;
-
- err = xts_verify_key(tfm, key, keylen);
- if (err)
- return err;
-
- keylen /= 2;
-
- /* first half of xts-key is for crypt */
- err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
- key, keylen);
- if (err)
- return err;
-
- /* second half of xts-key is for tweak */
- return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
- key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
- int tail = req->cryptlen % AES_BLOCK_SIZE;
- struct skcipher_request subreq;
- struct skcipher_walk walk;
- int err;
-
- if (req->cryptlen < AES_BLOCK_SIZE)
- return -EINVAL;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
-
- if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
- int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
- skcipher_walk_abort(&walk);
-
- skcipher_request_set_tfm(&subreq, tfm);
- skcipher_request_set_callback(&subreq,
- skcipher_request_flags(req),
- NULL, NULL);
- skcipher_request_set_crypt(&subreq, req->src, req->dst,
- blocks * AES_BLOCK_SIZE, req->iv);
- req = &subreq;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
- } else {
- tail = 0;
- }
-
- kernel_fpu_begin();
-
- /* calculate first value of T */
- aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
- while (walk.nbytes > 0) {
- int nbytes = walk.nbytes;
-
- if (nbytes < walk.total)
- nbytes &= ~(AES_BLOCK_SIZE - 1);
-
- if (encrypt)
- aesni_xts_encrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- else
- aesni_xts_decrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
- if (walk.nbytes > 0)
- kernel_fpu_begin();
- }
-
- if (unlikely(tail > 0 && !err)) {
- struct scatterlist sg_src[2], sg_dst[2];
- struct scatterlist *src, *dst;
-
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
-
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
-
- kernel_fpu_begin();
- if (encrypt)
- aesni_xts_encrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- else
- aesni_xts_decrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, 0);
- }
- return err;
+ return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
}

static int xts_encrypt(struct skcipher_request *req)
{
- return xts_crypt(req, true);
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
}

static int xts_decrypt(struct skcipher_request *req)
{
- return xts_crypt(req, false);
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
}

static struct crypto_alg aesni_cipher_alg = {
@@ -1166,8 +1062,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
__be32 counter = cpu_to_be32(1);

memcpy(iv, req->iv, 12);
@@ -1183,8 +1079,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);

memcpy(iv, req->iv, 12);
*((__be32 *)(iv+12)) = counter;
--
2.17.1


2023-05-24 17:16:02

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

== API Limitation ==

The setkey() function transforms an AES key into a handle. But, an
extended key is a usual outcome of setkey() in other AES cipher
implementations. For this reason, a setkey() failure does not fall
back to the other. So, expose AES-KL methods via synchronous
interfaces only.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Milan Broz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
- Trim unnecessary checks. (Eric Biggers)
- Document the reason
- Make sure both XTS keys with the same size
* Adjust the Kconfig change:
- Move the location. (Robert Elliott)
- Trim the description to follow others such as AES-NI.
* Update the changelog:
- Explain the priority value for the common name under 'User
Exposition' (renamed from 'Performance'). (Eric Biggers)
- Trim the introduction
- Switch to more imperative mood for those explaining the code
change
- Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
- Remove unused one.
- Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/Kconfig | 22 ++
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/aeskl-intel_asm.S | 580 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 216 +++++++++++
arch/x86/crypto/aesni-intel_asm.S | 8 +-
arch/x86/crypto/aesni-intel_glue.c | 40 +-
arch/x86/crypto/aesni-intel_glue.h | 17 +
7 files changed, 873 insertions(+), 13 deletions(-)
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@

menu "Accelerated Cryptographic Algorithms for CPU (x86)"

+config AS_HAS_KEYLOCKER
+ def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+ help
+ Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
config CRYPTO_CURVE25519_X86
tristate "Public key crypto: Curve25519 (ADX)"
depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
Architecture: x86 (32-bit and 64-bit) using:
- AES-NI (AES new instructions)

+config CRYPTO_AES_KL
+ tristate "Ciphers: AES, modes: XTS (AES-KL)"
+ depends on X86 && 64BIT
+ depends on AS_HAS_KEYLOCKER
+ depends on CRYPTO_AES_NI_INTEL
+ select X86_KEYLOCKER
+
+ help
+ Block cipher: AES cipher algorithms
+ Length-preserving ciphers: AES with XTS
+
+ Architecture: x86 (64-bit) using:
+ - AES-KL (AES Key Locker)
+ - AES-NI for a 192-bit key
+
+ See Documentation/arch/x86/keylocker.rst for more details.
+
config CRYPTO_BLOWFISH_X86_64
tristate "Ciphers: Blowfish, modes: ECB, CBC"
depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o

+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..402dd7796375
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,580 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1 %xmm0
+#define STATE2 %xmm1
+#define STATE3 %xmm2
+#define STATE4 %xmm3
+#define STATE5 %xmm4
+#define STATE6 %xmm5
+#define STATE7 %xmm6
+#define STATE8 %xmm7
+#define STATE STATE1
+
+#define IV %xmm9
+#define KEY %xmm10
+#define INC %xmm13
+
+#define IN1 %xmm8
+#define IN IN1
+
+#define AREG %rax
+#define HANDLEP %rdi
+#define OUTP %rsi
+#define KLEN %r9d
+#define INP %rdx
+#define T1 %r10
+#define LEN %rcx
+#define IVP %r8
+
+#define UKEYP OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+ FRAME_BEGIN
+ movl %edx, 480(HANDLEP)
+ movdqu (UKEYP), STATE1
+ mov $1, %eax
+ cmp $16, %dl
+ je .Lsetkey_128
+
+ movdqu 0x10(UKEYP), STATE2
+ encodekey256 %eax, %eax
+ movdqu STATE4, 0x30(HANDLEP)
+ jmp .Lsetkey_end
+.Lsetkey_128:
+ encodekey128 %eax, %eax
+
+.Lsetkey_end:
+ movdqu STATE1, (HANDLEP)
+ movdqu STATE2, 0x10(HANDLEP)
+ movdqu STATE3, 0x20(HANDLEP)
+
+ xor AREG, AREG
+ FRAME_END
+ RET
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+ FRAME_BEGIN
+ movdqu (INP), STATE
+ movl 480(HANDLEP), KLEN
+
+ cmp $16, KLEN
+ je .Lenc_128
+ aesenc256kl (HANDLEP), STATE
+ jz .Lenc_err
+ jmp .Lenc_noerr
+.Lenc_128:
+ aesenc128kl (HANDLEP), STATE
+ jz .Lenc_err
+
+.Lenc_noerr:
+ xor AREG, AREG
+ jmp .Lenc_end
+.Lenc_err:
+ mov $(-EINVAL), AREG
+.Lenc_end:
+ movdqu STATE, (OUTP)
+ FRAME_END
+ RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * int __aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_dec)
+ FRAME_BEGIN
+ movdqu (INP), STATE
+ mov 480(HANDLEP), KLEN
+
+ cmp $16, KLEN
+ je .Ldec_128
+ aesdec256kl (HANDLEP), STATE
+ jz .Ldec_err
+ jmp .Ldec_noerr
+.Ldec_128:
+ aesdec128kl (HANDLEP), STATE
+ jz .Ldec_err
+
+.Ldec_noerr:
+ xor AREG, AREG
+ jmp .Ldec_end
+.Ldec_err:
+ mov $(-EINVAL), AREG
+.Ldec_end:
+ movdqu STATE, (OUTP)
+ FRAME_END
+ RET
+SYM_FUNC_END(__aeskl_dec)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR: == temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+ pshufd $0x13, IV, KEY; \
+ paddq IV, IV; \
+ psrad $31, KEY; \
+ pand GF128MUL_MASK, KEY; \
+ pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+ sub $128, LEN
+ jl .Lxts_enc1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_enc8_128
+ aesencwide256kl (%rdi)
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+ aesencwide128kl (%rdi)
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+ movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+ mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+ FRAME_END
+ RET
+
+.Lxts_enc1_pre:
+ add $128, LEN
+ jz .Lxts_enc_ret_iv
+ sub $16, LEN
+ jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+ movdqu (INP), STATE1
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_enc1_out
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_enc_cts1
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+ movdqu STATE8, STATE1
+ sub $16, OUTP
+
+.Lxts_enc_cts1:
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_cts_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+ pxor IV, STATE1
+ movups STATE1, (OUTP)
+ jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+ test $15, LEN
+ jz .Lxts_dec8
+ sub $16, LEN
+
+.Lxts_dec8:
+ sub $128, LEN
+ jl .Lxts_dec1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_dec8_128
+ aesdecwide256kl (%rdi)
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+ aesdecwide128kl (%rdi)
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+ movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+ mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+ FRAME_END
+ RET
+
+.Lxts_dec1_pre:
+ add $128, LEN
+ jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+ movdqu (INP), STATE1
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_dec_cts1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_dec1_out
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+ movdqa IV, STATE5
+ _aeskl_gf128mul_x_ble()
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_pre_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+ pxor IV, STATE1
+
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor STATE5, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+ pxor STATE5, STATE1
+
+ movups STATE1, (OUTP)
+ jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..c6824c50fc72
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int __aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+ unsigned int keylen)
+{
+ /* raw_ctx is an aligned address via xts_setkey_common() */
+ struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+ int err;
+
+ if (!crypto_simd_usable())
+ return -EBUSY;
+
+ if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+ keylen != AES_KEYSIZE_256)
+ return -EINVAL;
+
+ kernel_fpu_begin();
+ if (unlikely(keylen == AES_KEYSIZE_192)) {
+ pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+ err = aesni_set_key(ctx, in_key, keylen);
+ } else {
+ if (!valid_keylocker())
+ err = -ENODEV;
+ else
+ err = aeskl_setkey(ctx, in_key, keylen);
+ }
+ kernel_fpu_end();
+
+ return err;
+}
+
+/*
+ * The below wrappers for the encryption/decryption functions
+ * incorporate the feature availability check:
+ *
+ * In the rare event of hardware failure, the wrapping key can be lost
+ * after wake-up from a deep sleep state. Then, this check helps to
+ * avoid any subsequent misuse with populating a proper error code.
+ */
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_dec(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
+{
+ struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+ if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
+ return -EINVAL;
+
+ *keylen = ctx->crypt_ctx.key_length;
+ return 0;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ u32 keylen;
+ int err;
+
+ err = xts_keylen(req, &keylen);
+ if (err)
+ return err;
+
+ if (likely(keylen != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ u32 keylen;
+ int rc;
+
+ rc = xts_keylen(req, &keylen);
+ if (rc)
+ return rc;
+
+ if (likely(keylen != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+ {
+ .base = {
+ .cra_name = "__xts(aes)",
+ .cra_driver_name = "__xts-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = XTS_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = 2 * AES_MIN_KEY_SIZE,
+ .max_keysize = 2 * AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = 2 * AES_BLOCK_SIZE,
+ .setkey = aeskl_xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
+ }
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+ u32 eax, ebx, ecx, edx;
+ int err;
+
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+ return -ENODEV;
+
+ /*
+ * AES-KL itself does not depend on AES-NI. But AES-KL does not
+ * support 192-bit keys. To make itself AES-compliant, it falls
+ * back to AES-NI.
+ */
+ if (!boot_cpu_has(X86_FEATURE_AES))
+ return -ENODEV;
+
+ err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+ simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
SYM_FUNC_END(_key_expansion_256b)

/*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- * unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ * unsigned int key_len)
*/
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)

/*
* void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 857cff484bb3..3aaf5504e349 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
#include <linux/static_call.h>

#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"

#define RFC4106_HASH_SUBKEY_SIZE 16
#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
u8 hash_keys[GCM_BLOCK_LEN * 16];
};

-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len);
asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,17 +90,32 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len)
+{
+ return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
{
__aesni_enc(ctx, out, in);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif

-static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
{
__aesni_dec(ctx, out, in);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_dec);
+#endif

#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096
@@ -110,19 +126,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
- unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
{
__aesni_xts_encrypt(ctx, out, in, len, iv);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif

-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
- unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
{
__aesni_xts_decrypt(ctx, out, in, len, iv);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif

#ifdef CONFIG_X86_64

@@ -256,7 +278,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
err = aes_expandkey(ctx, in_key, key_len);
else {
kernel_fpu_begin();
- err = aesni_set_key(ctx, in_key, key_len);
+ err = __aesni_set_key(ctx, in_key, key_len);
kernel_fpu_end();
}

diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+
--
2.17.1


2023-05-26 06:58:49

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
> +static inline unsigned long aes_align_addr(unsigned long addr)
> +{
> + return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
> + ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
> +}

Wouldn't it be simpler to make this take and return 'void *'? Then the callers
wouldn't need to cast to/from 'unsigned long'.

Also, aligning to a 1-byte boundary is a no-op.

So, please consider the following:

static inline void *aes_align_addr(void *addr)
{
if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
return addr;
return PTR_ALIGN(addr, AES_ALIGN);
}

Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
too, rather than duplicating similar code.

- Eric

2023-05-26 07:34:22

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
> == API Limitation ==
>
> The setkey() function transforms an AES key into a handle. But, an
> extended key is a usual outcome of setkey() in other AES cipher
> implementations. For this reason, a setkey() failure does not fall
> back to the other. So, expose AES-KL methods via synchronous
> interfaces only.

I don't understand what this paragraph is trying to say.

> +/*
> + * The below wrappers for the encryption/decryption functions
> + * incorporate the feature availability check:
> + *
> + * In the rare event of hardware failure, the wrapping key can be lost
> + * after wake-up from a deep sleep state. Then, this check helps to
> + * avoid any subsequent misuse with populating a proper error code.
> + */
> +
> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
> +{
> + if (!valid_keylocker())
> + return -ENODEV;
> +
> + return __aeskl_enc(ctx, out, in);
> +}

Is it not sufficient for the valid_keylocker() check to occur at the top level
(in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
do it? Is this because valid_keylocker() needs to be checked in every
kernel_fpu_begin() section separately, to avoid a race condition? If that's
indeed the reason, can you explain that in the comment?

> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
> +{
> + struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
> +
> + if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
> + return -EINVAL;
> +
> + *keylen = ctx->crypt_ctx.key_length;
> + return 0;
> +}

This is odd. Why would the key lengths be different here?

> + err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
> + aeskl_simd_skciphers);
> + if (err)
> + return err;
> +
> + return 0;

This can be simplified to:

return simd_register_skciphers_compat(aeskl_skciphers,
ARRAY_SIZE(aeskl_skciphers),
aeskl_simd_skciphers);

- Eric

2023-05-30 20:52:46

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

On 5/26/2023 12:23 AM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
>> == API Limitation ==
>>
>> The setkey() function transforms an AES key into a handle. But, an
>> extended key is a usual outcome of setkey() in other AES cipher
>> implementations. For this reason, a setkey() failure does not fall
>> back to the other. So, expose AES-KL methods via synchronous
>> interfaces only.
>
> I don't understand what this paragraph is trying to say.

This text comes with this particular comment as I look back:

> This basically implies that we cannot expose the cipher interface at
> all, and so AES-KL can only be used by callers that use the
> asynchronous interface, which rules out 802.11, s/w kTLS, macsec and
> kerberos.

https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/

Then, I realize that at that moment the dm-crypt use model was not
clearly out yet.

This seems to be carried over the versions. But, now, it has XTS only.
Then, this becomes less relevant which makes confusion I guess.

I think this can go away as claiming the usage clearly now.

>
>> +/*
>> + * The below wrappers for the encryption/decryption functions
>> + * incorporate the feature availability check:
>> + *
>> + * In the rare event of hardware failure, the wrapping key can be lost
>> + * after wake-up from a deep sleep state. Then, this check helps to
>> + * avoid any subsequent misuse with populating a proper error code.
>> + */
>> +
>> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
>> +{
>> + if (!valid_keylocker())
>> + return -ENODEV;
>> +
>> + return __aeskl_enc(ctx, out, in);
>> +}
>
> Is it not sufficient for the valid_keylocker() check to occur at the top level
> (in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
> do it? Is this because valid_keylocker() needs to be checked in every
> kernel_fpu_begin() section separately, to avoid a race condition? If that's
> indeed the reason, can you explain that in the comment?

Maybe something like this:

/*
* In the event of hardware failure, the wrapping key can be lost
* from a sleep state. Then, the feature is not usable anymore. This
* feature state can be found via valid_keylocker().
*
* Such disabling could be anywhere preemptible, outside
* kernel_fpu_begin()/end(). So, to avoid the race condition, check
* the feature availability on every use in the below wrappers.
*/

>
>> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
>> +{
>> + struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
>> +
>> + if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
>> + return -EINVAL;
>> +
>> + *keylen = ctx->crypt_ctx.key_length;
>> + return 0;
>> +}
>
> This is odd. Why would the key lengths be different here?

I thought it was logical to do such sanity check. But, in practice, they
are the same.

Looks like this entire crypto code is treated as performance-critical or so.

>
>> + err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
>> + aeskl_simd_skciphers);
>> + if (err)
>> + return err;
>> +
>> + return 0;
>
> This can be simplified to:
>
> return simd_register_skciphers_compat(aeskl_skciphers,
> ARRAY_SIZE(aeskl_skciphers),
> aeskl_simd_skciphers);

Oh, obviously!

Thanks,
Chang


2023-05-30 20:53:04

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

On 5/25/2023 11:54 PM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
>> +static inline unsigned long aes_align_addr(unsigned long addr)
>> +{
>> + return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
>> + ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
>> +}
>
> Wouldn't it be simpler to make this take and return 'void *'? Then the callers
> wouldn't need to cast to/from 'unsigned long'.
>
> Also, aligning to a 1-byte boundary is a no-op.
>
> So, please consider the following:
>
> static inline void *aes_align_addr(void *addr)
> {
> if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
> return addr;
> return PTR_ALIGN(addr, AES_ALIGN);
> }
>
> Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
> too, rather than duplicating similar code.

Indeed, your suggestion can improve the overall code there.

Thanks!
Chang


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 00/12] x86: Support Key Locker

Hi all,

Posting V8 here, a brief status of this enabling:

The last two revisions trend to update the crypto code mainly. The
existing AEX-XTS code was improved further before being shared with
the new code. Then, the new implementation was tuned for the AES-XTS
mode which aligns with the claimed use case -- dm-crypt.

But, I'd say some additional change might be still needed.

The overall changes in charge of the last review:

* PATCH12:
- Clarify some documentation (Eric Biggers)
- Simplify (Eric Biggers) and cleanup code

* PATCH11:
- Remove dead code.

* PATCH10:
- Deduplicate the alignment code (Eric Biggers)

The series can be found in this repo:
git://github.com/intel-staging/keylocker.git kl-v8

The overall diff was populated in:
https://raw.githubusercontent.com/intel-staging/keylocker/diff/kl-v8-vs-v7.diff

The feature is already available on recent Intel client systems. The
V3 cover letter covered the usage, the threat model and other details:
https://lore.kernel.org/lkml/[email protected]/

And the V6 cover followed up with updating the performance data:
https://lore.kernel.org/lkml/[email protected]/

V7 posting:
https://lore.kernel.org/lkml/[email protected]/

Thanks,
Chang

Chang S. Bae (12):
Documentation/x86: Document Key Locker
x86/cpufeature: Enumerate Key Locker feature
x86/insn: Add Key Locker instructions to the opcode map
x86/asm: Add a wrapper function for the LOADIWKEY instruction
x86/msr-index: Add MSRs for Key Locker wrapping key
x86/keylocker: Define Key Locker CPUID leaf
x86/cpu/keylocker: Load a wrapping key at boot-time
x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
S3/4
x86/cpu: Add a configuration and command line option for Key Locker
crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
crypto: x86/aes - Prepare for a new AES-XTS implementation
crypto: x86/aes-kl - Implement the AES-XTS algorithm

.../admin-guide/kernel-parameters.txt | 2 +
Documentation/arch/x86/index.rst | 1 +
Documentation/arch/x86/keylocker.rst | 97 +++
arch/x86/Kconfig | 3 +
arch/x86/crypto/Kconfig | 22 +
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/aes-helper_asm.S | 22 +
arch/x86/crypto/aes-helper_glue.h | 161 +++++
arch/x86/crypto/aeskl-intel_asm.S | 552 ++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 188 ++++++
arch/x86/crypto/aesni-intel_asm.S | 55 +-
arch/x86/crypto/aesni-intel_glue.c | 241 +++-----
arch/x86/crypto/aesni-intel_glue.h | 16 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/keylocker.h | 45 ++
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/special_insns.h | 32 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 21 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/keylocker.c | 212 +++++++
arch/x86/kernel/smpboot.c | 2 +
arch/x86/lib/x86-opcode-map.txt | 11 +-
arch/x86/power/cpu.c | 2 +
tools/arch/x86/lib/x86-opcode-map.txt | 11 +-
27 files changed, 1507 insertions(+), 211 deletions(-)
create mode 100644 Documentation/arch/x86/keylocker.rst
create mode 100644 arch/x86/crypto/aes-helper_asm.S
create mode 100644 arch/x86/crypto/aes-helper_glue.h
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h
create mode 100644 arch/x86/include/asm/keylocker.h
create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 054377e4774eee812b7930933d7a354ed5a7ddd6
--
2.17.1


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 03/12] x86/insn: Add Key Locker instructions to the opcode map

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
Load a CPU-internal wrapping key.

ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.

AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
Locker module is built.
---
arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
--
2.17.1


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
Williams)

Note, Dan wondered if:
WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
arch/x86/include/asm/keylocker.h | 25 ++++++++++++++++++++++
arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)
create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key: A 128-bit key to check that key handles have not
+ * been tampered with.
+ * @encryption_key: A 256-bit encryption key used in
+ * wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+ struct reg_128_bit integrity_key;
+ struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
#include <asm/processor-flags.h>
#include <linux/irqflags.h>
#include <linux/jump_label.h>
+#include <asm/keylocker.h>

/*
* The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
}

+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key: A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+ struct reg_128_bit zeros = { 0 };
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+ :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+ "m"(key->encryption_key[1]));
+
+ /*
+ * LOADIWKEY %xmm1,%xmm2
+ *
+ * EAX and XMM0 are implicit operands. Load a key value
+ * from XMM0-2 to a software-invisible CPU state. With zero
+ * in EAX, CPU does not do hardware randomization and the key
+ * backup is allowed.
+ *
+ * This instruction is supported by binutils >= 2.36.
+ */
+ asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+ :: "m"(zeros));
+}
+
#endif /* __KERNEL__ */

#endif /* _ASM_X86_SPECIAL_INSNS_H */
--
2.17.1


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 02/12] x86/cpufeature: Enumerate Key Locker feature

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature depending on XMM2. The new AES implementation will be
in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled features mask at the moment.
It will be enabled under a new config option.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..4a12673df7e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -389,6 +389,7 @@
#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */
#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER (16*32+23) /* "" Key Locker */
#define X86_FEATURE_BUS_LOCK_DETECT (16*32+24) /* Bus Lock detect */
#define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */
#define X86_FEATURE_MOVDIRI (16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..eb841d694ed9 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
# define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31))
#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */

+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER 0
+#else
+# define DISABLE_KEYLOCKER (1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
#ifdef CONFIG_X86_5LEVEL
# define DISABLE_LA57 0
#else
@@ -126,7 +132,7 @@
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
- DISABLE_ENQCMD)
+ DISABLE_ENQCMD|DISABLE_KEYLOCKER)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 0
#define DISABLED_MASK19 0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
#define X86_CR4_PCIDE _BITUL(X86_CR4_PCIDE_BIT)
#define X86_CR4_OSXSAVE_BIT 18 /* enable xsave and xrestore */
#define X86_CR4_OSXSAVE _BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT 19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER _BITUL(X86_CR4_KEYLOCKER_BIT)
#define X86_CR4_SMEP_BIT 20 /* enable SMEP support */
#define X86_CR4_SMEP _BITUL(X86_CR4_SMEP_BIT)
#define X86_CR4_SMAP_BIT 21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..200c5e69f78c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
+ { X86_FEATURE_KEYLOCKER, X86_FEATURE_XMM2 },
{}
};

--
2.17.1


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 01/12] Documentation/x86: Document Key Locker

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
to process a 192-bit key.'
* Update the text for clarity and readability:
- Clarify the error code and exemplify the backup failure
- Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
Documentation/arch/x86/index.rst | 1 +
Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
2 files changed, 98 insertions(+)
create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
features
elf_auxvec
xstate
+ keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+ restriction failure, the instruction fails with setting RFLAGS.ZF. The
+ crypto library implementation includes the flag check to return -EINVAL.
+ Note that this flag is also set if the wrapping key is changed, e.g.,
+ because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+ AES-KL instruction to process an 192-bit key. The AES-KL cipher
+ implementation logs a warning message with a 192-bit key and then falls
+ back to AES-NI. So, this 192-bit key-size limitation is only documented,
+ not enforced. It means the key will remain in clear-text in memory. This
+ is to meet Linux crypto-cipher expectation that each implementation must
+ support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+ overhead when compared with AES-NI instructions.
+
--
2.17.1


2023-06-03 15:35:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
arch/x86/include/asm/msr-index.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
* a #GP
*/

+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS 0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS 0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM 0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL 0x00000d92
+
#endif /* _ASM_X86_MSR_INDEX_H */
--
2.17.1


2023-06-03 15:35:57

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
(Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
Biggers and Dave Hansen)
* Update the documentation:
- Massage the changelog to clarify the problem-and-solution by
sections
- Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
(Reported by Marvin Hsu [email protected]) Add the function
comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
fall through the end that disables the feature. Otherwise, all the
successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
arch/x86/include/asm/keylocker.h | 4 +
arch/x86/kernel/keylocker.c | 136 ++++++++++++++++++++++++++++++-
arch/x86/power/cpu.c | 2 +
3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
#ifdef CONFIG_X86_KEYLOCKER
void setup_keylocker(struct cpuinfo_x86 *c);
void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
#else
static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
#endif

#endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
#include <asm/fpu/api.h>
#include <asm/keylocker.h>
#include <asm/tlbflush.h>
+#include <asm/msr.h>

static __initdata struct keylocker_setup_data {
+ bool initialized;
struct iwkey key;
} kl_setup;

+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+ return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
static void __init generate_keylocker_data(void)
{
get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
}

+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
void __init destroy_keylocker_data(void)
{
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ return;
+
memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+ kl_setup.initialized = true;
+ valid_kl = true;
}

static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
kernel_fpu_end();
}

+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns: -EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+ u64 status;
+
+ wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+ rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+ if (status & BIT(0))
+ return 0;
+ else
+ return -EBUSY;
+}
+
/**
* setup_keylocker - Enable the feature.
* @c: A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)

if (c == &boot_cpu_data) {
u32 eax, ebx, ecx, edx;
+ bool backup_available;

cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
goto disable;
}

+ backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+ /*
+ * The wrapping key in CPU state is volatile in S3/4
+ * states. So ensure the backup capability along with
+ * S-states.
+ */
+ if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+ pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+ goto disable;
+ }
+
generate_keylocker_data();
+ load_keylocker();
+
+ /* Backup a wrapping key in non-volatile media. */
+ if (backup_available)
+ wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+ pr_info("x86/keylocker: Enabled.\n");
+ return;
}

- load_keylocker();
+ /*
+ * At boot time, the key is loaded directly from the memory.
+ * Otherwise, this path performs the key recovery on each CPU
+ * wake-up. Then, the backup in the platform-scoped state is
+ * copied to the CPU state.
+ */
+ if (!kl_setup.initialized) {
+ load_keylocker();
+ return;
+ } else if (valid_kl) {
+ int rc;

- pr_info_once("x86/keylocker: Enabled.\n");
- return;
+ rc = copy_keylocker();
+ if (!rc)
+ return;
+
+ /*
+ * The key copy failed here. The subsequent feature
+ * use will have inconsistent keys and failures. So,
+ * invalidate the feature via the flag like with the
+ * backup failure.
+ */
+ valid_kl = false;
+ pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+ }

disable:
setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
/* Make sure the feature disabled for kexec-reboot. */
cr4_clear_bits(X86_CR4_KEYLOCKER);
}
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+ u64 backup_status;
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+ return;
+
+ /*
+ * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+ * indicates an invalid backup if bit 0 is set and a read (or
+ * write) error if bit 2 is set.
+ */
+ rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+ if (backup_status & BIT(0)) {
+ rc = copy_keylocker();
+ if (rc)
+ pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+ else
+ return;
+ } else {
+ pr_err("x86/keylocker: The key backup access failed with %s.\n",
+ (backup_status & BIT(2)) ? "read error" : "invalid status");
+ }
+
+ /*
+ * Now the backup key is not available. Invalidate the feature
+ * via the flag to avoid any subsequent use. But keep the
+ * feature with zeroed wrapping key instead of disabling it.
+ */
+ pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+ valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
#include <asm/microcode.h>
+#include <asm/keylocker.h>

#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
x86_platform.restore_sched_clock_state();
cache_bp_restore();
perf_restore_debug_store();
+ restore_keylocker();

c = &cpu_data(smp_processor_id());
if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
--
2.17.1


2023-06-03 15:36:14

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 11/12] crypto: x86/aes - Prepare for a new AES-XTS implementation

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v7:
* Remove aesni_dec() as not referenced by the refactored helpers. But,
keep the ASM symbol '__aesni_dec' to make it appears consistent with
its counterpart '__aesni_enc'.
* Call out 'AES-XTS' in the subject.

Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
- Follow the symbol convention: '_' -> '__' (Eric Biggers)
- Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
"CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/aes-helper_asm.S | 22 +++
arch/x86/crypto/aes-helper_glue.h | 161 ++++++++++++++++++++++
arch/x86/crypto/aesni-intel_asm.S | 47 +++----
arch/x86/crypto/aesni-intel_glue.c | 209 +++++++----------------------
4 files changed, 249 insertions(+), 190 deletions(-)
create mode 100644 arch/x86/crypto/aes-helper_asm.S
create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+ .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+ .octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..fbb5a70f1e1b
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN 16
+#define AES_ALIGN_ATTR __attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA ((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE (sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+ struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+ struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline void *aes_align_addr(void *addr)
+{
+ return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? addr : PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+ return (struct aes_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+ int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+ unsigned int key_len))
+{
+ struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ /* first half of xts-key is for crypt */
+ err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+ if (err)
+ return err;
+
+ /* second half of xts-key is for tweak */
+ return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+ int tail = req->cryptlen % AES_BLOCK_SIZE;
+ struct skcipher_request subreq;
+ struct skcipher_walk walk;
+ int err;
+
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+
+ if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+ int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+ skcipher_walk_abort(&walk);
+
+ skcipher_request_set_tfm(&subreq, tfm);
+ skcipher_request_set_callback(&subreq,
+ skcipher_request_flags(req),
+ NULL, NULL);
+ skcipher_request_set_crypt(&subreq, req->src, req->dst,
+ blocks * AES_BLOCK_SIZE, req->iv);
+ req = &subreq;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+ } else {
+ tail = 0;
+ }
+
+ kernel_fpu_begin();
+
+ /* calculate first value of T */
+ err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+ if (err) {
+ kernel_fpu_end();
+ return err;
+ }
+
+ while (walk.nbytes > 0) {
+ int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+ err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+ if (walk.nbytes > 0)
+ kernel_fpu_begin();
+ }
+
+ if (unlikely(tail > 0 && !err)) {
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct scatterlist *src, *dst;
+
+ src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+ else
+ dst = src;
+
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
+
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
+
+ kernel_fpu_begin();
+ err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, 0);
+ }
+ return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
#include <linux/linkage.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"

/*
* The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
SYM_FUNC_END(aesni_set_key)

/*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)

/*
* _aesni_enc1: internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
SYM_FUNC_END(_aesni_enc4)

/*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)

/*
* _aesni_dec1: internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
RET
SYM_FUNC_END(aesni_cts_cbc_dec)

+#ifdef __x86_64__
+
.pushsection .rodata
.align 16
-.Lcts_permute_table:
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
- .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
.Lbswap_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
.popsection

-#ifdef __x86_64__
/*
* _aesni_inc_init: internal ABI
* setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)

#endif

-.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
- .octa 0x00000000000000010000000000000087
-.previous
-
/*
* _aesni_gf128mul_x_ble: internal ABI
* Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
pxor KEY, IV;

/*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)

movups STATE, (OUTP)
jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)

/*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)

movups STATE, (OUTP)
jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c17..518f48f3bd6b 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
#include <linux/spinlock.h>
#include <linux/static_call.h>

+#include "aes-helper_glue.h"

-#define AESNI_ALIGN 16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
#define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)

/* This data is stored at the end of the crypto_tfm struct.
* It's a type of per "session" data storage location.
* This needs to be 16 byte aligned.
*/
struct aesni_rfc4106_gcm_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
u8 nonce[4];
};

struct generic_gcmaes_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
- struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
};

#define GCM_BLOCK_LEN 16
@@ -80,17 +72,10 @@ struct gcm_context_data {
u8 hash_keys[GCM_BLOCK_LEN * 16];
};

-static inline void *aes_align_addr(void *addr)
-{
- if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
- return addr;
- return PTR_ALIGN(addr, AESNI_ALIGN);
-}
-
asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len);
asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -104,14 +89,34 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ __aesni_enc(ctx, out, in);
+ return 0;
+}
+
#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096

-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);

-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ __aesni_xts_encrypt(ctx, out, in, len, iv);
+ return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ __aesni_xts_decrypt(ctx, out, in, len, iv);
+ return 0;
+}

#ifdef CONFIG_X86_64

@@ -223,11 +228,6 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
}

-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
- return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
-}
-
static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
const u8 *in_key, unsigned int key_len)
{
@@ -263,7 +263,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_encrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_enc(ctx, dst, src);
+ __aesni_enc(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -276,7 +276,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_decrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_dec(ctx, dst, src);
+ __aesni_dec(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -527,7 +527,7 @@ static int ctr_crypt(struct skcipher_request *req)
nbytes &= ~AES_BLOCK_MASK;

if (walk.nbytes == walk.total && nbytes > 0) {
- aesni_enc(ctx, keystream, walk.iv);
+ __aesni_enc(ctx, keystream, walk.iv);
crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
walk.src.virt.addr + walk.nbytes - nbytes,
keystream, nbytes);
@@ -672,8 +672,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
u8 *iv, void *aes_ctx, u8 *auth_tag,
unsigned long auth_tag_len)
{
- u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
- struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+ u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+ struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
unsigned long left = req->cryptlen;
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
@@ -828,8 +828,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;
__be32 counter = cpu_to_be32(1);

@@ -856,8 +856,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;

if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -882,128 +882,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
- struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
- int err;
-
- err = xts_verify_key(tfm, key, keylen);
- if (err)
- return err;
-
- keylen /= 2;
-
- /* first half of xts-key is for crypt */
- err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
- key, keylen);
- if (err)
- return err;
-
- /* second half of xts-key is for tweak */
- return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
- key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
- int tail = req->cryptlen % AES_BLOCK_SIZE;
- struct skcipher_request subreq;
- struct skcipher_walk walk;
- int err;
-
- if (req->cryptlen < AES_BLOCK_SIZE)
- return -EINVAL;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
-
- if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
- int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
- skcipher_walk_abort(&walk);
-
- skcipher_request_set_tfm(&subreq, tfm);
- skcipher_request_set_callback(&subreq,
- skcipher_request_flags(req),
- NULL, NULL);
- skcipher_request_set_crypt(&subreq, req->src, req->dst,
- blocks * AES_BLOCK_SIZE, req->iv);
- req = &subreq;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
- } else {
- tail = 0;
- }
-
- kernel_fpu_begin();
-
- /* calculate first value of T */
- aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
- while (walk.nbytes > 0) {
- int nbytes = walk.nbytes;
-
- if (nbytes < walk.total)
- nbytes &= ~(AES_BLOCK_SIZE - 1);
-
- if (encrypt)
- aesni_xts_encrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- else
- aesni_xts_decrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
- if (walk.nbytes > 0)
- kernel_fpu_begin();
- }
-
- if (unlikely(tail > 0 && !err)) {
- struct scatterlist sg_src[2], sg_dst[2];
- struct scatterlist *src, *dst;
-
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
-
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
-
- kernel_fpu_begin();
- if (encrypt)
- aesni_xts_encrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- else
- aesni_xts_decrypt(&ctx->crypt_ctx,
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, 0);
- }
- return err;
+ return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
}

static int xts_encrypt(struct skcipher_request *req)
{
- return xts_crypt(req, true);
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
}

static int xts_decrypt(struct skcipher_request *req)
{
- return xts_crypt(req, false);
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
}

static struct crypto_alg aesni_cipher_alg = {
@@ -1159,8 +1048,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
__be32 counter = cpu_to_be32(1);

memcpy(iv, req->iv, 12);
@@ -1176,8 +1065,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);

memcpy(iv, req->iv, 12);
*((__be32 *)(iv+12)) = counter;
--
2.17.1


2023-06-03 15:36:22

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/x86/Kconfig | 3 +++
arch/x86/kernel/cpu/common.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
kernel and module base offset ASLR (Address Space
Layout Randomization).

+ nokeylocker [X86] Disable Key Locker hardware feature.
+
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS

If unsure, say y.

+config X86_KEYLOCKER
+ bool
+
choice
prompt "TSX enable mode"
depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
static const unsigned long cr4_pinned_mask =
X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+ /* Expect an exact match without trailing characters. */
+ if (strlen(arg))
+ return 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info("x86/keylocker: Disabled by kernel command line.\n");
+ return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

--
2.17.1


2023-06-03 15:36:23

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
(Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
arch/x86/include/asm/keylocker.h | 9 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 5 +-
arch/x86/kernel/keylocker.c | 82 ++++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 2 +
5 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <asm/processor.h>
#include <linux/bits.h>
#include <asm/fpu/types.h>

@@ -28,5 +29,13 @@ struct iwkey {
#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)

+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
obj-$(CONFIG_X86_UMIP) += umip.o
+obj-$(CONFIG_X86_KEYLOCKER) += keylocker.o

obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
#include <asm/microcode_intel.h>
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
#include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);

- /* Set up SMEP/SMAP/UMIP */
+ /* Setup various Intel-specific CPU security features */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+ setup_keylocker(c);

/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+ struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+ get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
+ get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+ memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+ kernel_fpu_begin();
+ load_xmm_iwkey(&kl_setup.key);
+ kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c: A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ goto out;
+
+ if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+ pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+ goto disable;
+ }
+
+ cr4_set_bits(X86_CR4_KEYLOCKER);
+
+ if (c == &boot_cpu_data) {
+ u32 eax, ebx, ecx, edx;
+
+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ /*
+ * Check the feature readiness via CPUID. Note that the
+ * CPUID AESKLE bit is conditionally set only when CR4.KL
+ * is set.
+ */
+ if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+ !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+ pr_debug("x86/keylocker: Not fully supported.\n");
+ goto disable;
+ }
+
+ generate_keylocker_data();
+ }
+
+ load_keylocker();
+
+ pr_info_once("x86/keylocker: Enabled.\n");
+ return;
+
+disable:
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info_once("x86/keylocker: Disabled.\n");
+out:
+ /* Make sure the feature disabled for kexec-reboot. */
+ cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ea4f4c1c74cd..13ebdb5fb6c4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
#include <asm/sev.h>
+#include <asm/keylocker.h>

/* representing HT siblings of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1371,6 +1372,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
nmi_selftest();
impress_friends();
cache_aps_init();
+ destroy_keylocker_data();
}

static int __initdata setup_possible_cpus = -1;
--
2.17.1


2023-06-03 15:36:25

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away rather than on every access to the field.

Thus, redefine struct aesni_xts_ctx, and align its address on the
front. This draws a rework by refactoring the common alignment code.

Then, the refactored code itself appears to be useful to simplify
the overall runtime alignment. So, use it for other modes.

Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v7:
* Massage the helper function to be usable for other alignment code
such as aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get().
(Eric Biggers)

Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
arch/x86/crypto/aesni-intel_glue.c | 51 +++++++++++++++---------------
1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..589648142c17 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
};

struct aesni_xts_ctx {
- u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
- u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+ struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+ struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
};

#define GCM_BLOCK_LEN 16
@@ -80,6 +80,13 @@ struct gcm_context_data {
u8 hash_keys[GCM_BLOCK_LEN * 16];
};

+static inline void *aes_align_addr(void *addr)
+{
+ if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
+ return addr;
+ return PTR_ALIGN(addr, AESNI_ALIGN);
+}
+
asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
unsigned int key_len);
asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
@@ -201,32 +208,24 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
static inline struct
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
-
- if (align <= crypto_tfm_ctx_alignment())
- align = 1;
- return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+ return (struct aesni_rfc4106_gcm_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
}

static inline struct
generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
-
- if (align <= crypto_tfm_ctx_alignment())
- align = 1;
- return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+ return (struct generic_gcmaes_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
}
#endif

static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
{
- unsigned long addr = (unsigned long)raw_ctx;
- unsigned long align = AESNI_ALIGN;
+ return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
+}

- if (align <= crypto_tfm_ctx_alignment())
- align = 1;
- return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+ return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
}

static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +882,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
int err;

err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +892,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
keylen /= 2;

/* first half of xts-key is for crypt */
- err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+ err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
key, keylen);
if (err)
return err;

/* second half of xts-key is for tweak */
- return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+ return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
key + keylen, keylen);
}

static int xts_crypt(struct skcipher_request *req, bool encrypt)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
int tail = req->cryptlen % AES_BLOCK_SIZE;
struct skcipher_request subreq;
struct skcipher_walk walk;
@@ -942,7 +941,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
kernel_fpu_begin();

/* calculate first value of T */
- aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+ aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);

while (walk.nbytes > 0) {
int nbytes = walk.nbytes;
@@ -951,11 +950,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
nbytes &= ~(AES_BLOCK_SIZE - 1);

if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_encrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
nbytes, walk.iv);
else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_decrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
nbytes, walk.iv);
kernel_fpu_end();
@@ -983,11 +982,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)

kernel_fpu_begin();
if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_encrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes, walk.iv);
else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+ aesni_xts_decrypt(&ctx->crypt_ctx,
walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes, walk.iv);
kernel_fpu_end();
--
2.17.1


2023-06-03 15:36:33

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Milan Broz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
- Remove the key-length check and simplify the code. (Eric Biggers)
- Remove aeskl_dec() and __aeskl_dec() as not needed.
- Simplify the register-function return handling. (Eric)
- Rename setkey functions for coherent naming:
aeskl_setkey() -> __aeskl_setkey(),
aeskl_setkey_common() -> aeskl_setkey(),
aeskl_xts_setkey() -> xts_setkey()
- Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
- Trim unnecessary checks. (Eric Biggers)
- Document the reason
- Make sure both XTS keys with the same size
* Adjust the Kconfig change:
- Move the location. (Robert Elliott)
- Trim the description to follow others such as AES-NI.
* Update the changelog:
- Explain the priority value for the common name under 'User
Exposition' (renamed from 'Performance'). (Eric Biggers)
- Trim the introduction
- Switch to more imperative mood for those explaining the code
change
- Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
- Remove unused one.
- Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/Kconfig | 22 ++
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/aeskl-intel_asm.S | 552 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 188 ++++++++++
arch/x86/crypto/aesni-intel_asm.S | 8 +-
arch/x86/crypto/aesni-intel_glue.c | 35 +-
arch/x86/crypto/aesni-intel_glue.h | 16 +
7 files changed, 812 insertions(+), 12 deletions(-)
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@

menu "Accelerated Cryptographic Algorithms for CPU (x86)"

+config AS_HAS_KEYLOCKER
+ def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+ help
+ Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
config CRYPTO_CURVE25519_X86
tristate "Public key crypto: Curve25519 (ADX)"
depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
Architecture: x86 (32-bit and 64-bit) using:
- AES-NI (AES new instructions)

+config CRYPTO_AES_KL
+ tristate "Ciphers: AES, modes: XTS (AES-KL)"
+ depends on X86 && 64BIT
+ depends on AS_HAS_KEYLOCKER
+ depends on CRYPTO_AES_NI_INTEL
+ select X86_KEYLOCKER
+
+ help
+ Block cipher: AES cipher algorithms
+ Length-preserving ciphers: AES with XTS
+
+ Architecture: x86 (64-bit) using:
+ - AES-KL (AES Key Locker)
+ - AES-NI for a 192-bit key
+
+ See Documentation/arch/x86/keylocker.rst for more details.
+
config CRYPTO_BLOWFISH_X86_64
tristate "Ciphers: Blowfish, modes: ECB, CBC"
depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o

+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..61addc61dd4e
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,552 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1 %xmm0
+#define STATE2 %xmm1
+#define STATE3 %xmm2
+#define STATE4 %xmm3
+#define STATE5 %xmm4
+#define STATE6 %xmm5
+#define STATE7 %xmm6
+#define STATE8 %xmm7
+#define STATE STATE1
+
+#define IV %xmm9
+#define KEY %xmm10
+#define INC %xmm13
+
+#define IN1 %xmm8
+#define IN IN1
+
+#define AREG %rax
+#define HANDLEP %rdi
+#define OUTP %rsi
+#define KLEN %r9d
+#define INP %rdx
+#define T1 %r10
+#define LEN %rcx
+#define IVP %r8
+
+#define UKEYP OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+ FRAME_BEGIN
+ movl %edx, 480(HANDLEP)
+ movdqu (UKEYP), STATE1
+ mov $1, %eax
+ cmp $16, %dl
+ je .Lsetkey_128
+
+ movdqu 0x10(UKEYP), STATE2
+ encodekey256 %eax, %eax
+ movdqu STATE4, 0x30(HANDLEP)
+ jmp .Lsetkey_end
+.Lsetkey_128:
+ encodekey128 %eax, %eax
+
+.Lsetkey_end:
+ movdqu STATE1, (HANDLEP)
+ movdqu STATE2, 0x10(HANDLEP)
+ movdqu STATE3, 0x20(HANDLEP)
+
+ xor AREG, AREG
+ FRAME_END
+ RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+ FRAME_BEGIN
+ movdqu (INP), STATE
+ movl 480(HANDLEP), KLEN
+
+ cmp $16, KLEN
+ je .Lenc_128
+ aesenc256kl (HANDLEP), STATE
+ jz .Lenc_err
+ jmp .Lenc_noerr
+.Lenc_128:
+ aesenc128kl (HANDLEP), STATE
+ jz .Lenc_err
+
+.Lenc_noerr:
+ xor AREG, AREG
+ jmp .Lenc_end
+.Lenc_err:
+ mov $(-EINVAL), AREG
+.Lenc_end:
+ movdqu STATE, (OUTP)
+ FRAME_END
+ RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR: == temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+ pshufd $0x13, IV, KEY; \
+ paddq IV, IV; \
+ psrad $31, KEY; \
+ pand GF128MUL_MASK, KEY; \
+ pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+ sub $128, LEN
+ jl .Lxts_enc1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_enc8_128
+ aesencwide256kl (%rdi)
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+ aesencwide128kl (%rdi)
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+ movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+ mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+ FRAME_END
+ RET
+
+.Lxts_enc1_pre:
+ add $128, LEN
+ jz .Lxts_enc_ret_iv
+ sub $16, LEN
+ jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+ movdqu (INP), STATE1
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_enc1_out
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_enc_cts1
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+ movdqu STATE8, STATE1
+ sub $16, OUTP
+
+.Lxts_enc_cts1:
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_cts_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+ pxor IV, STATE1
+ movups STATE1, (OUTP)
+ jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+ test $15, LEN
+ jz .Lxts_dec8
+ sub $16, LEN
+
+.Lxts_dec8:
+ sub $128, LEN
+ jl .Lxts_dec1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_dec8_128
+ aesdecwide256kl (%rdi)
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+ aesdecwide128kl (%rdi)
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+ movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+ mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+ FRAME_END
+ RET
+
+.Lxts_dec1_pre:
+ add $128, LEN
+ jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+ movdqu (INP), STATE1
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_dec_cts1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_dec1_out
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+ movdqa IV, STATE5
+ _aeskl_gf128mul_x_ble()
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_pre_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+ pxor IV, STATE1
+
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor STATE5, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+ pxor STATE5, STATE1
+
+ movups STATE1, (OUTP)
+ jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..193a3a96eb09
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+
+/*
+ * In the event of hardware failure, the wrapping key can be lost
+ * from a sleep state. Then, it is not usable anymore. The feature
+ * state can be found via valid_keylocker().
+ *
+ * Such disabling can happen anywhere preemptible. So, to avoid the
+ * race condition, check the availability on every use along with
+ * kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+ unsigned int keylen)
+{
+ struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+ int err;
+
+ if (!crypto_simd_usable())
+ return -EBUSY;
+
+ if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+ keylen != AES_KEYSIZE_256)
+ return -EINVAL;
+
+ kernel_fpu_begin();
+ if (unlikely(keylen == AES_KEYSIZE_192)) {
+ pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+ err = aesni_set_key(ctx, in_key, keylen);
+ } else {
+ if (!valid_keylocker())
+ err = -ENODEV;
+ else
+ err = __aeskl_setkey(ctx, in_key, keylen);
+ }
+ kernel_fpu_end();
+
+ return err;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+ struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+ return ctx->crypt_ctx.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ u32 keylen = xts_keylen(req);
+
+ if (likely(keylen != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ u32 keylen = xts_keylen(req);
+
+ if (likely(keylen != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+ {
+ .base = {
+ .cra_name = "__xts(aes)",
+ .cra_driver_name = "__xts-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = XTS_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = 2 * AES_MIN_KEY_SIZE,
+ .max_keysize = 2 * AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = 2 * AES_BLOCK_SIZE,
+ .setkey = xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
+ }
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+ u32 eax, ebx, ecx, edx;
+
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+ return -ENODEV;
+
+ /*
+ * AES-KL itself does not depend on AES-NI. But AES-KL does not
+ * support 192-bit keys. To make itself AES-compliant, it falls
+ * back to AES-NI.
+ */
+ if (!boot_cpu_has(X86_FEATURE_AES))
+ return -ENODEV;
+
+ return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+ simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
SYM_FUNC_END(_key_expansion_256b)

/*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- * unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ * unsigned int key_len)
*/
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
#endif
FRAME_END
RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)

/*
* void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 518f48f3bd6b..774e3a78b662 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
#include <linux/static_call.h>

#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"

#define RFC4106_HASH_SUBKEY_SIZE 16
#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
u8 hash_keys[GCM_BLOCK_LEN * 16];
};

-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len);
asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,11 +90,23 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len)
+{
+ return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
{
__aesni_enc(ctx, out, in);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif

#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096
@@ -104,19 +117,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);

-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
- unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
{
__aesni_xts_encrypt(ctx, out, in, len, iv);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif

-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
- unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
{
__aesni_xts_decrypt(ctx, out, in, len, iv);
return 0;
}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif

#ifdef CONFIG_X86_64

@@ -242,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
err = aes_expandkey(ctx, in_key, key_len);
else {
kernel_fpu_begin();
- err = aesni_set_key(ctx, in_key, key_len);
+ err = __aesni_set_key(ctx, in_key, key_len);
kernel_fpu_end();
}

diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..5b1919f49efe
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+
--
2.17.1


2023-06-03 15:36:44

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v8 06/12] x86/keylocker: Define Key Locker CPUID leaf

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/include/asm/keylocker.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <linux/bits.h>
#include <asm/fpu/types.h>

/**
@@ -21,5 +22,11 @@ struct iwkey {
struct reg_128_bit encryption_key[2];
};

+#define KEYLOCKER_CPUID 0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
--
2.17.1


2023-06-03 16:42:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker

On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
> +static __init int x86_nokeylocker_setup(char *arg)
> +{
> + /* Expect an exact match without trailing characters. */
> + if (strlen(arg))
> + return 0;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
> + return 1;
> +
> + setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
> + pr_info("x86/keylocker: Disabled by kernel command line.\n");
> + return 1;
> +}
> +__setup("nokeylocker", x86_nokeylocker_setup);

Can we stop adding those just to remove them at some point later but
simply do:

clearcpuid=keylocker

?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-04 22:05:23

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

On 6/4/2023 8:34 AM, Eric Biggers wrote:
>
> To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> the aes_xts_ctx. It should not happen when accessing each individual field in
> the aes_xts_ctx.
>
> Yet, this code is still doing runtime alignment when accessing each individual
> field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> aes_set_key_common() runtime-aligns to crypto_aes_ctx.
>
> We should keep everything consistent, which means making aes_set_key_common()
> take a pointer to crypto_aes_ctx and not do the runtime alignment.

Let me clarify what is the problem this patch tried to solve here. The
current struct aesni_xts_ctx is ugly. So, the main story is let's fix it
before using the code for AES-KL.

Then, the rework part may be applicable for code re-usability. That
seems to be okay to do here.

Fixing the runtime alignment entirely seems to be touching other code
than AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems
to be less relevant in this series. I'd be happy to follow up on that
improvement though.

Thanks,
Chang

2023-06-04 22:57:33

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker

On 6/3/2023 9:37 AM, Borislav Petkov wrote:
> On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
>> +static __init int x86_nokeylocker_setup(char *arg)
>> +{
>> + /* Expect an exact match without trailing characters. */
>> + if (strlen(arg))
>> + return 0;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
>> + return 1;
>> +
>> + setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
>> + pr_info("x86/keylocker: Disabled by kernel command line.\n");
>> + return 1;
>> +}
>> +__setup("nokeylocker", x86_nokeylocker_setup);
>
> Can we stop adding those just to remove them at some point later but
> simply do:
>
> clearcpuid=keylocker
>
> ?

Oh, I was not sure about this policy. Thanks, now I'm glad that I have
confidence in removing this.

Chang

2023-06-05 02:53:11

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

On Sun, Jun 04, 2023 at 03:02:32PM -0700, Chang S. Bae wrote:
> On 6/4/2023 8:34 AM, Eric Biggers wrote:
> >
> > To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> > should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> > the aes_xts_ctx. It should not happen when accessing each individual field in
> > the aes_xts_ctx.
> >
> > Yet, this code is still doing runtime alignment when accessing each individual
> > field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> > aes_set_key_common() runtime-aligns to crypto_aes_ctx.
> >
> > We should keep everything consistent, which means making aes_set_key_common()
> > take a pointer to crypto_aes_ctx and not do the runtime alignment.
>
> Let me clarify what is the problem this patch tried to solve here. The
> current struct aesni_xts_ctx is ugly. So, the main story is let's fix it
> before using the code for AES-KL.
>
> Then, the rework part may be applicable for code re-usability. That seems to
> be okay to do here.
>
> Fixing the runtime alignment entirely seems to be touching other code than
> AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems to be less
> relevant in this series. I'd be happy to follow up on that improvement
> though.

IMO the issue is that your patch makes the code (including the XTS code)
inconsistent because it makes it use a mix of both approaches: it aligns each
field individually, *and* it aligns the ctx up-front. I was hoping to switch
fully from the former approach to the latter approach, instead of switching from
the former approach to a mix of the two approaches as you are proposing.

The following on top of this patch is what I am asking for. I think it would be
appropriate to fold into this patch.

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c173..ad1ae7a88b59d 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -228,10 +228,10 @@ static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
}

-static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
+static int aes_set_key_common(struct crypto_tfm *tfm,
+ struct crypto_aes_ctx *ctx,
const u8 *in_key, unsigned int key_len)
{
- struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
int err;

if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
@@ -252,7 +252,8 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
{
- return aes_set_key_common(tfm, crypto_tfm_ctx(tfm), in_key, key_len);
+ return aes_set_key_common(tfm, aes_ctx(crypto_tfm_ctx(tfm)),
+ in_key, key_len);
}

static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
@@ -285,7 +286,7 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int len)
{
return aes_set_key_common(crypto_skcipher_tfm(tfm),
- crypto_skcipher_ctx(tfm), key, len);
+ aes_ctx(crypto_skcipher_ctx(tfm)), key, len);
}

static int ecb_encrypt(struct skcipher_request *req)

2023-06-05 04:51:30

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx

On 6/4/2023 7:46 PM, Eric Biggers wrote:
>
> IMO the issue is that your patch makes the code (including the XTS code)
> inconsistent because it makes it use a mix of both approaches: it aligns each
> field individually, *and* it aligns the ctx up-front.

But, the code did this already:

common_rfc4106_set_key()
-> aesni_rfc4106_gcm_ctx_get(aead) // align ->ctx
-> aes_set_key_common()
-> aes_ctx() // here, this aligns again

And generic_gcmaes_set_key() as well.

Hmm, maybe a separate patch can spell out this issue more clearly for
the record, no?

Thanks,
Chang

2023-06-05 11:08:40

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker

On 6/3/23 22:22, Chang S. Bae wrote:
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all
> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
"... resume from suspend or reboot."
> +impacted by a wrapping key restore failure is a data-volume then it is
> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> + restriction failure, the instruction fails with setting RFLAGS.ZF. The
> + crypto library implementation includes the flag check to return -EINVAL.
> + Note that this flag is also set if the wrapping key is changed, e.g.,
> + because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> + AES-KL instruction to process an 192-bit key. The AES-KL cipher
> + implementation logs a warning message with a 192-bit key and then falls
> + back to AES-NI. So, this 192-bit key-size limitation is only documented,
> + not enforced. It means the key will remain in clear-text in memory. This
> + is to meet Linux crypto-cipher expectation that each implementation must
> + support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> + overhead when compared with AES-NI instructions.
> +

The rest is LGTM, thanks!

Anyway,

Reviewed-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara


2023-06-06 02:42:18

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker



On 6/3/23 08:22, Chang S. Bae wrote:
> Document the overview of the feature along with relevant consideration
> when provisioning dm-crypt volumes with AES-KL instead of AES-NI.
>
> ---
> ---
> Documentation/arch/x86/index.rst | 1 +
> Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
> 2 files changed, 98 insertions(+)
> create mode 100644 Documentation/arch/x86/keylocker.rst
>

> diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
> new file mode 100644
> index 000000000000..5557b8d0659a
> --- /dev/null
> +++ b/Documentation/arch/x86/keylocker.rst
> @@ -0,0 +1,97 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all

Upon the event of a ...

> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
> +impacted by a wrapping key restore failure is a data-volume then it is

data volume

> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> + restriction failure, the instruction fails with setting RFLAGS.ZF. The
> + crypto library implementation includes the flag check to return -EINVAL.
> + Note that this flag is also set if the wrapping key is changed, e.g.,
> + because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> + AES-KL instruction to process an 192-bit key. The AES-KL cipher
> + implementation logs a warning message with a 192-bit key and then falls
> + back to AES-NI. So, this 192-bit key-size limitation is only documented,

Is it logged anywhere? i.e., a kernel log message?

> + not enforced. It means the key will remain in clear-text in memory. This
> + is to meet Linux crypto-cipher expectation that each implementation must
> + support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> + overhead when compared with AES-NI instructions.
> +

--
~Randy

2023-06-06 04:29:28

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker

On 6/5/2023 7:17 PM, Randy Dunlap wrote:
> On 6/3/23 08:22, Chang S. Bae wrote:
>> +
>> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
>> + AES-KL instruction to process an 192-bit key. The AES-KL cipher
>> + implementation logs a warning message with a 192-bit key and then falls
>> + back to AES-NI. So, this 192-bit key-size limitation is only documented,
>
> Is it logged anywhere? i.e., a kernel log message?

Yes, this is the relevant change in the last patch:

> +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const
u8 *in_key,
> + unsigned int keylen)
> +{
...
> + if (unlikely(keylen == AES_KEYSIZE_192)) {
> + pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
...
> +}

Thanks,
Chang

2023-06-07 05:39:25

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
> == Key Handle Restriction ==
>
> A key handle may be encoded with some restrictions.

It's unclear what this means. Please avoid passive tense and the word "may"
like this. I think you mean something like "The AES-KL instruction set supports
selecting key usage restrictions at key handle creation time."

> Restrict every handle only available in kernel mode via setkey().

I think you mean something like "Restrict all key handles created by the kernel
to kernel mode use only."

Can you also mention why you are doing this? I suppose it might as well be
done, but I'm not seeing how it would actually matter.

What other sorts of key usage restrictions does AES-KL support? Are any other
ones useful here?

> Subsequently the key handle could be corrupted or fail with handle
> restrictions. Then, encrypt()/decrypt() returns -EINVAL.

Aren't these scenarios actually impossible? At least without memory corruption.

> == Userspace Exposition ==
>
> Some hardware implementations may have some performance penalties.

Likewise, please avoid vague statements like this. This makes it unclear
whether this is something that happens in the real world or whether it's just
theoretical. You indeed have actual benchmark results that show that AES-KL is
much slower than AES-NI on current CPUs, right?

> But, for disk encryption, storage bandwidth may be the bottleneck before
> encryption bandwidth.

Again, please try to be less vague. E.g. "With a slow storage device, storage
bandwidth is the bottleneck, even if disk encryption is enabled..."

> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.

The above sentence seems to say that xts-aes-aeskl does *not* have a lower
priority than xts-aes-aesni. But actually it does.

> Then, the performance is unlikely better than 64-bit which has already a gap
> vs. AES-NI.

I don't understand what this sentence is trying to say.

> diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
> index 9bbfd01cfa2f..658adfd7aebf 100644
> --- a/arch/x86/crypto/Kconfig
> +++ b/arch/x86/crypto/Kconfig
> @@ -2,6 +2,11 @@
>
> menu "Accelerated Cryptographic Algorithms for CPU (x86)"
>
> +config AS_HAS_KEYLOCKER
> + def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> + help
> + Supported by binutils >= 2.36 and LLVM integrated assembler >= V12

It looks like arch/x86/Kconfig.assembler would be a better place for this.

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> new file mode 100644
> index 000000000000..61addc61dd4e
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-intel_asm.S
> @@ -0,0 +1,552 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Implement AES algorithm using AES Key Locker instructions.
> + *
> + * Most code is based from the AES-NI implementation, aesni-intel_asm.S
> + *
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/cfi_types.h>
> +#include <asm/errno.h>
> +#include <asm/inst.h>
> +#include <asm/frame.h>
> +#include "aes-helper_asm.S"
> +
> +.text
> +
> +#define STATE1 %xmm0
> +#define STATE2 %xmm1
> +#define STATE3 %xmm2
> +#define STATE4 %xmm3
> +#define STATE5 %xmm4
> +#define STATE6 %xmm5
> +#define STATE7 %xmm6
> +#define STATE8 %xmm7
> +#define STATE STATE1
> +
> +#define IV %xmm9
> +#define KEY %xmm10
> +#define INC %xmm13
> +
> +#define IN1 %xmm8
> +#define IN IN1

Why do both IN1 and IN exist? Shouldn't there just be IN?

> +
> +#define AREG %rax

Shouldn't %rax just be hardcoded?

> +#define HANDLEP %rdi

This should be called CTX, to match the function prototypes.

> +#define UKEYP OUTP

This should be called IN_KEY, to match the function prototypes.

> +#define GF128MUL_MASK %xmm11
> +
> +/*
> + * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
> + */
> +SYM_FUNC_START(__aeskl_setkey)
> + FRAME_BEGIN
> + movl %edx, 480(HANDLEP)
> + movdqu (UKEYP), STATE1
> + mov $1, %eax
> + cmp $16, %dl
> + je .Lsetkey_128
> +
> + movdqu 0x10(UKEYP), STATE2
> + encodekey256 %eax, %eax
> + movdqu STATE4, 0x30(HANDLEP)
> + jmp .Lsetkey_end
> +.Lsetkey_128:
> + encodekey128 %eax, %eax
> +
> +.Lsetkey_end:
> + movdqu STATE1, (HANDLEP)
> + movdqu STATE2, 0x10(HANDLEP)
> + movdqu STATE3, 0x20(HANDLEP)

The moves to the ctx should use movdqa, since it is aligned.

> +
> + xor AREG, AREG
> + FRAME_END
> + RET
> +SYM_FUNC_END(__aeskl_setkey)

This function always returns 0, so it really should return void.

> +/*
> + * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
> + */
> +SYM_FUNC_START(__aeskl_enc)
> + FRAME_BEGIN
> + movdqu (INP), STATE
> + movl 480(HANDLEP), KLEN
> +
> + cmp $16, KLEN
> + je .Lenc_128
> + aesenc256kl (HANDLEP), STATE
> + jz .Lenc_err
> + jmp .Lenc_noerr
> +.Lenc_128:
> + aesenc128kl (HANDLEP), STATE
> + jz .Lenc_err
> +
> +.Lenc_noerr:
> + xor AREG, AREG
> + jmp .Lenc_end
> +.Lenc_err:
> + mov $(-EINVAL), AREG
> +.Lenc_end:
> + movdqu STATE, (OUTP)
> + FRAME_END
> + RET
> +SYM_FUNC_END(__aeskl_enc)

In the common case (successful AES-256 encryption) this is executing 'jmp'
twice. I think the code should be rearranged to eliminate these jmps.

> +/*
> + * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
> + * const u8 *src, unsigned int len, le128 *iv)
> + */
> +SYM_FUNC_START(__aeskl_xts_encrypt)

__aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar. To reduce
code duplication, can you consider generating them from a macro that takes an
argument that indicates whether it is encrypt or decrypt?

> +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
> + unsigned int keylen)
> +{
> + struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
> + int err;
> +
> + if (!crypto_simd_usable())
> + return -EBUSY;
> +
> + if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
> + keylen != AES_KEYSIZE_256)
> + return -EINVAL;
> +
> + kernel_fpu_begin();
> + if (unlikely(keylen == AES_KEYSIZE_192)) {
> + pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
> + err = aesni_set_key(ctx, in_key, keylen);
> + } else {
> + if (!valid_keylocker())
> + err = -ENODEV;
> + else
> + err = __aeskl_setkey(ctx, in_key, keylen);
> + }
> + kernel_fpu_end();
> +
> + return err;
> +}
[...]
> + .cra_ctxsize = XTS_AES_CTX_SIZE,
[...]

Something that your AES-KL code does that's a bit ugly is that it abuses
'struct crypto_aes_ctx' to store a Keylocker key handle instead
of the actual AES key schedule which the struct is supposed to be for.

The proper way to represent that would be to make the tfm context for
xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

If you don't do that and instead keep the proposed workaround, then please add a
comment somewhere that very clearly explains how the struct is being used.
Above aeskl_setkey() or above .cra_ctxsize might be a good place.

> diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
> new file mode 100644
> index 000000000000..5b1919f49efe
> --- /dev/null
> +++ b/arch/x86/crypto/aesni-intel_glue.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Support for Intel AES-NI instructions. This file contains function
> + * prototypes to be referenced for other AES implementations
> + */

It would be helpful if this comment was more concrete, like "These are AES-NI
functions that are used by the AES-KL code as a fallback when it is given a
192-bit key. Key Locker does not support 192-bit keys."

- Eric

2023-06-07 22:17:55

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm

On 6/6/2023 10:35 PM, Eric Biggers wrote:
> On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
>
> Can you also mention why you are doing this? I suppose it might as well be
> done, but I'm not seeing how it would actually matter.

While this crypto implementation is in the kernel mode, userspace can
call it:
https://docs.kernel.org/crypto/userspace-if.html

And those AES instructions are executable in userspace.

Say someone takes a key handle out of the kernel code and then decrypts
some disk image from userspace. At least, this is enforced not to do.

> What other sorts of key usage restrictions does AES-KL support? Are any other
> ones useful here?

Besides this, there are additional bits to restrict using encryption and
decryption respectively.

This can be found in Section 1.1.1.1 'Handle Restrictions' in its
whitepaper:

https://www.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html

>> Subsequently the key handle could be corrupted or fail with handle
>> restrictions. Then, encrypt()/decrypt() returns -EINVAL.
>
> Aren't these scenarios actually impossible? At least without memory corruption.

Yes, in the dm-crypt path, I think. But, the key handle can be tainted
in the userspace -> API path.

I think this may help users as this feature can do some integrity checks
at first and then populate an error right away if it goes wrong.
>> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
>> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.
>
> The above sentence seems to say that xts-aes-aeskl does *not* have a lower
> priority than xts-aes-aesni. But actually it does.

No, it does not say that. This needs to call out the latter part more
clearly.

>> Then, the performance is unlikely better than 64-bit which has already a gap
>> vs. AES-NI.
>
> I don't understand what this sentence is trying to say.

This is in another section for explaining why 64-bitness only. I kinda
added another point to avoid 32-bit code. But, anyways it is known that
32-bit kernel mode is being deprecated. Then, the 128-bit register story
seems to be enough there.

>> +config AS_HAS_KEYLOCKER
>> + def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
>> + help
>> + Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
>
> It looks like arch/x86/Kconfig.assembler would be a better place for this.

Yeah, the commit 5e8ebd841a44 ("x86: probe assembler capabilities via
kconfig instead of makefile") moved those over there.

>> +
>> +#define IN1 %xmm8
>> +#define IN IN1
>
> Why do both IN1 and IN exist? Shouldn't there just be IN?

Oh, this is a silly leftover from the CBC code as it has multiple inputs.

#define IN %xmm8 then, s/IN1/IN/g

>> +
>> +#define AREG %rax
>
> Shouldn't %rax just be hardcoded?

I thought this (or any other) renaming helps to read. Maybe I'm missing
something. Can I get to know your thought on this?

>> +#define HANDLEP %rdi
>
> This should be called CTX, to match the function prototypes.
>
>> +#define UKEYP OUTP
>
> This should be called IN_KEY, to match the function prototypes.

Okay. But, OTOH, the prototype itself is somewhat generic. Then its
argument naming does not always match with what is supposed to be meant
in the code. Thus, AES-NI renamed those like

ctx -> KEYP
in_key -> UKEY
...

So, another option can be leaving some comments there, e.g. '# ctx is
renamed to KEYP'.

>> +
>> +.Lsetkey_end:
>> + movdqu STATE1, (HANDLEP)
>> + movdqu STATE2, 0x10(HANDLEP)
>> + movdqu STATE3, 0x20(HANDLEP)
>
> The moves to the ctx should use movdqa, since it is aligned.

Reading the manual, the difference is whether generating #GP or not when
any misaligned memory operand comes. Then, MOVDQA all here seems to be
saying please check the alignment every time.

But, HANDLEP is known to have an aligned address. Then, the plain move
seems to be enough and coherent with the glue code -- avoid unnecessary
sanity checks.

>> +
>> + xor AREG, AREG
>> + FRAME_END
>> + RET
>> +SYM_FUNC_END(__aeskl_setkey)
>
> This function always returns 0, so it really should return void.

Yeah, fair enough.

> In the common case (successful AES-256 encryption) this is executing 'jmp'
> twice. I think the code should be rearranged to eliminate these jmps.

Ah, right. I think a good point! Let me tweak this for those most likely
cases.

> __aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar. To reduce
> code duplication, can you consider generating them from a macro that takes an
> argument that indicates whether it is encrypt or decrypt?

Yeah, I can see the code that prepares operands is common between them.
But, I'm not sure folding them together can make it more readable.

> Something that your AES-KL code does that's a bit ugly is that it abuses
> 'struct crypto_aes_ctx' to store a Keylocker key handle instead
> of the actual AES key schedule which the struct is supposed to be for.
>
> The proper way to represent that would be to make the tfm context for
> xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

Agreed. I think this is likely the fallout of that struct aesni_xts_ctx
fix. Previously, the field was a byte array which itself is not
necessarily representing the extended-key format. Now the fix changed it
to be more specific. Accordingly, Key Locker has to specify it.

Thanks,
Chang


2024-03-11 21:48:25

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void

The aesni_set_key() implementation has no error case, yet its prototype
specifies to return an error code.

Modify the function prototype to return void and adjust the related code.

Signed-off-by: Chang S. Bae <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Previously, Eric identified a similar case in my AES-KL code [1]. Then,
this parallel issue was realized.

[1]: https://lore.kernel.org/lkml/[email protected]/
---
arch/x86/crypto/aesni-intel_asm.S | 5 ++---
arch/x86/crypto/aesni-intel_glue.c | 6 +++---
2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 411d8c83e88a..7ecb55cae3d6 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1820,8 +1820,8 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
SYM_FUNC_END(_key_expansion_256b)

/*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- * unsigned int key_len)
+ * void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ * unsigned int key_len)
*/
SYM_FUNC_START(aesni_set_key)
FRAME_BEGIN
@@ -1926,7 +1926,6 @@ SYM_FUNC_START(aesni_set_key)
sub $0x10, UKEYP
cmp TKEYP, KEYP
jb .Ldec_key_loop
- xor AREG, AREG
#ifndef __x86_64__
popl KEYP
#endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index b1d90c25975a..c807b2f48ea3 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -87,8 +87,8 @@ static inline void *aes_align_addr(void *addr)
return PTR_ALIGN(addr, AESNI_ALIGN);
}

-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- unsigned int key_len);
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len);
asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
err = aes_expandkey(ctx, in_key, key_len);
else {
kernel_fpu_begin();
- err = aesni_set_key(ctx, in_key, key_len);
+ aesni_set_key(ctx, in_key, key_len);
kernel_fpu_end();
}

--
2.34.1


2024-03-12 07:47:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void

On Mon, 11 Mar 2024 at 22:48, Chang S. Bae <[email protected]> wrote:
>
> The aesni_set_key() implementation has no error case, yet its prototype
> specifies to return an error code.
>
> Modify the function prototype to return void and adjust the related code.
>
> Signed-off-by: Chang S. Bae <[email protected]>
> Cc: Eric Biggers <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> Previously, Eric identified a similar case in my AES-KL code [1]. Then,
> this parallel issue was realized.
>
> [1]: https://lore.kernel.org/lkml/[email protected]/
> ---
> arch/x86/crypto/aesni-intel_asm.S | 5 ++---
> arch/x86/crypto/aesni-intel_glue.c | 6 +++---
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
> index 411d8c83e88a..7ecb55cae3d6 100644
> --- a/arch/x86/crypto/aesni-intel_asm.S
> +++ b/arch/x86/crypto/aesni-intel_asm.S
> @@ -1820,8 +1820,8 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
> SYM_FUNC_END(_key_expansion_256b)
>
> /*
> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> - * unsigned int key_len)
> + * void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> + * unsigned int key_len)
> */
> SYM_FUNC_START(aesni_set_key)
> FRAME_BEGIN
> @@ -1926,7 +1926,6 @@ SYM_FUNC_START(aesni_set_key)
> sub $0x10, UKEYP
> cmp TKEYP, KEYP
> jb .Ldec_key_loop
> - xor AREG, AREG
> #ifndef __x86_64__
> popl KEYP
> #endif
> diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> index b1d90c25975a..c807b2f48ea3 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -87,8 +87,8 @@ static inline void *aes_align_addr(void *addr)
> return PTR_ALIGN(addr, AESNI_ALIGN);
> }
>
> -asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> - unsigned int key_len);
> +asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> + unsigned int key_len);
> asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
> asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
> asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
> @@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
> err = aes_expandkey(ctx, in_key, key_len);
> else {
> kernel_fpu_begin();
> - err = aesni_set_key(ctx, in_key, key_len);
> + aesni_set_key(ctx, in_key, key_len);

This will leave 'err' uninitialized.

> kernel_fpu_end();
> }
>
> --
> 2.34.1
>
>

2024-03-12 15:18:25

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void

On Tue, 12 Mar 2024 at 16:03, Chang S. Bae <[email protected]> wrote:
>
> On 3/12/2024 12:46 AM, Ard Biesheuvel wrote:
> > On Mon, 11 Mar 2024 at 22:48, Chang S. Bae <[email protected]> wrote:
> >>
> >> @@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
> >> err = aes_expandkey(ctx, in_key, key_len);
> >> else {
> >> kernel_fpu_begin();
> >> - err = aesni_set_key(ctx, in_key, key_len);
> >> + aesni_set_key(ctx, in_key, key_len);
> >
> > This will leave 'err' uninitialized.
>
> Ah, right. Thanks for catching it.
>
> Also, upon reviewing aes_expandkey(), I noticed there's no error case,
> except for the key length sanity check.
>
> While addressing this, perhaps additional cleanup is considerable like:
>
> @@ -233,19 +233,20 @@ static int aes_set_key_common(struct
> crypto_aes_ctx *ctx,
> {
> int err;
>
> - if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
> - key_len != AES_KEYSIZE_256)
> - return -EINVAL;
> + err = aes_check_keylen(key_len);
> + if (err)
> + return err;
>
> if (!crypto_simd_usable())
> - err = aes_expandkey(ctx, in_key, key_len);
> + /* no error with a valid key length */
> + aes_expandkey(ctx, in_key, key_len);
> else {
> kernel_fpu_begin();
> - err = aesni_set_key(ctx, in_key, key_len);
> + aesni_set_key(ctx, in_key, key_len);
> kernel_fpu_end();
> }
>
> - return err;
> + return 0;
> }
>

I wonder whether we need aesni_set_key() at all.

2024-03-12 15:38:24

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void

On 3/12/2024 8:18 AM, Ard Biesheuvel wrote:
>
> I wonder whether we need aesni_set_key() at all.

The following looks to be relevant from the AES-NI whitepaper [1]:

The Relative Cost of the Key Expansion
...
Some less frequent applications require frequent key scheduling. For
example, some random number generators may rekey frequently to
achieve forward secrecy. One extreme example is a Davies-Meyer
hashing construction, which uses a block cipher primitive as a
compression function, and the cipher is re-keyed for each processed
data block.

Although these are not the mainstream usage models of the AES
instructions, we point out that the AESKEYGENASSIST and AESIMC
instructions facilitate Key Expansion procedure which is lookup
tables free, and faster than software only key expansion. In
addition, we point out that unrolling of the key expansion code,
which is provided in the previous sections, improves the key
expansion performance. The AES256 case can also utilize the
instruction AESENCLAST, for the sbox transformation, that is faster
than using AESKEYGENASSIST.

[1]
https://www.intel.com/content/dam/doc/white-paper/advanced-encryption-standard-new-instructions-set-paper.pdf

Thanks,
Chang