2021-11-24 20:14:20

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 00/15] x86: Support Key Locker

Recall that AES-NI [1] is a set of CPU instructions to perform encryption
operations. Like all other software encryption implementations it relies on
OS memory protections to keep clear-text key material from being
exfiltrated across security domains. However, there are demonstrated
methods for exfiltrating data across security domains.

Key Locker [2] is a CPU feature to reduce key exfiltration opportunities
while maintaining a programming interface similar to AES-NI. It converts
the AES key into an encoded form, called the 'key handle' [3]. The key
handle is a wrapped version of the clear-text key where the wrapping key
has limited exposure. Once converted, all subsequent data encryption using
new AES instructions (AES-KL) uses this key handle, reducing the exposure
of private key material in memory.

As mentioned, Key Locker introduces a CPU-internal wrapping key. This key
is loaded in a CPU state that software can not access, and then it is used
to convert the AES keys. At every boot, a new randomized key is
generated and loaded. Thus, the key handle is revoked upon system reboot
(including kexec-reboot).

== Threat Model and Mitigation Description ==

A targeted attack is with brief physical access [4] to the victim device in
unlock state or with the screen locked, but with keys in memory. For
example, the attacker might use a malicious USB peripheral to exploit a
kernel bug or perform a cold boot attack [4], and then extract the disk
encryption master key. Using this master key, the attacker will be able to
extract the contents of the disk. This includes data yet-to-be-written when
acquiring the device from the victim user at a later point in time and
performing forensic analysis.

Key Locker reduces opportunities for long-lived keys to be exfiltrated from
system RAM. Once the key is converted to a key handle, the clear text key
is no longer available to a class of data exfiltration techniques.

== Disk Encryption Use Case ==

Disk encryption uses Key Locker to mitigate key exfiltration as follows:

1. Configuration for Key Locker: AES-KL shows up in /proc/crypto as a
distinct cipher option. From there, tools like cryptsetup [5] can select
AES-KL vs AES-NI. For example,

$ cryptsetup luksFormat --cipher="capi:xts-aes-aeskl-plain" <device>

Note: AES-KL has a performance tradeoff. See details in 'Performance'
below.

2. Disk encryption flow with key protection:

* The cryptsetup utility is responsible for loading the volume key into the
kernel's keyring and passing a reference of the key. Once dm-crypt [6]
has set up the volume, user space is responsible for invalidating the key
material so that only the key handle remains in memory. Cryptsetup does
this, e.g. via crypt_free_volume_key() and crypt_safe_free().

* The AES-KL code in the kernel's crypto library uses the key handle
instead of the actual clear text key.

== Non Use Cases ==

Bare metal disk encryption is the only use case intended by these patches.
Userspace usage is not supported because there is no ABI provided to
communicate and coordinate wrapping-key restore failures to userspace.
For now, the key restore failures are only coordinated with kernel users.
For this reason a "keylocker" indicator is not published in /proc/cpuinfo.
At the same time, the kernel can not prevent userspace from using the
AES-KL instructions when Key Locker support has been enabled, so the lack
of userspace support is only documented, not actively enforced. Key Locker
support is not enumerated to VM guests.

== Performance ==

This feature comes with some performance penalty vs AESNI. The cryptsetup
utility [5] has been used to measure the Key Locker performance. Below
results have been measured [8] on an Intel 11th-gen Core Processor, code
named Tigerlake mobile part [9].

Below is the average encryption and decryption rates with key size of 256b.

Commands:
cryptsetup version – 2.3.4
$ cryptsetup benchmark-c aes-cbc -s 256
$ cryptsetup benchmark-c aes-xts -s 256

Tests are approximate using memory only (no storage IO).

+-----------+---------------+---------------+
| Cipher | Encryption | Decryption |
| (AES-NI) | (MiB/s) | (MiB/s) |
+-----------+---------------+---------------+
| AES-CBC | 1242.6 | 4446.5 |
| AES-XTS | 4233.3 | 4359.7 |
+-----------+-------------------------------+

+-----------+---------------+---------------+
| Cipher | Encryption | Decryption |
| (AES-KL) | (MiB/s) | (MiB/s) |
+-----------+---------------+---------------+
| AES-CBC | 505.3 | 2097.8 |
| AES-XTS | 1130 | 696.4 |
+-----------+-------------------------------+

The cryptsetup benchmark indicates Key Locker raw throughput can be ~5x
slower than AES-NI. For disk encryption, storage bandwidth may be the
bottleneck before encryption bandwidth, but the potential performance
difference is why AES-KL is advertised as a distinct cipher in /proc/crypto
rather than the kernel transparently replacing AES-NI usage with AES-KL.

== Patch Series ==

The series touches two areas -- the x86 core and the x86 crypto library:

* PATCH01-09: Implement Key Locker support in the x86 core. A new internal
wrapping key is loaded at boot time and then it is restored from deep
sleep states. The implication is that, e.g., a dm-crypt user needs to
re-enter the private key at every power-on, per typical expectations, but
it does not expect the user to re-enter the key over suspend events.
The AES-KL code in the kernel's crypto library depends on this key
support. Build up this support via helpers in the feature-dedicated .c
file. Include documentation.

* PATCH10-15: For the x86 crypto library, it first prepares the AES-NI code
to accommodate the new AES implementation. Then incrementally add base
functions and various modes support -- ECB, CBC, CTR, and XTS. The code
was found to pass the crypto test.

Changes from RFC v2 [7]:
* Clarify the usage case -- bare metal disk encryption. (Andy Lutomirski)
* Change the backup key failure handling. (Dan Williams) (PATCH8)
* Do not publish 'keylocker' indicator to userspace via /proc/cpuinfo.
(PATCH2)
* Make CONFIG_X86_KEYLOCKER selected by CONFIG_CRYPTO_AES_KL.
(Dan Williams) (PATCH9)
* Clarify AES-KL limitation and tradeoff. (Andy Lutomirski) (PATCH11)
* Polish the changelog, and refactor patches.
* Drop the self-test as recommending no userspace use. (Dan Williams)
* Drop the hardware randomization option.
* Limit to support bare metal only. (PATCH7)
* Rename MSRs. (Dan Williams) (PATCH5)
* Clean up code. (Dan Williams) (PATCH7)
* Add documentation. (PATCH1)

Thanks to Dan Williams, Charishma Gairuboyina, and Kumar Dwarakanath for
help with the cover letter.

== Reference ==

[1] Intel Advanced Encryption Standard Instructions (AES-NI):
https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-encryption-standard-instructions-aes-ni.html
[2] Intel Key Locker Specification:
https://software.intel.com/content/dam/develop/external/us/en/documents/343965-intel-key-locker-specification.pdf
[3] This encoded form contains ciphertext of AES key, Additional
Authentication Data, and integrity tag information. Section 1.4 Handle
Format [2] describes the format.
[4] Key Locker cannot protect the user data in the event of a full system
compromise, or against the scenarios where the attack can observe the
creation of the key handle from the original key.
[5] cryptsetup: https://gitlab.com/cryptsetup/cryptsetup
[6] DM-crypt:
https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-crypt.html
[7] RFC V2: https://lore.kernel.org/lkml/[email protected]/
[8] Intel publishes information about product performance at
http://www.Intel.com/PerformanceIndex.
[9] Tigerlake:
https://www.intel.com/content/www/us/en/products/docs/processors/embedded/11th-gen-product-brief.html

Chang S. Bae (15):
Documentation/x86: Document Key Locker
x86/cpufeature: Enumerate Key Locker feature
x86/insn: Add Key Locker instructions to the opcode map
x86/asm: Add a wrapper function for the LOADIWKEY instruction
x86/msr-index: Add MSRs for Key Locker internal wrapping key
x86/keylocker: Define Key Locker CPUID leaf
x86/cpu/keylocker: Load an internal wrapping key at boot-time
x86/power/keylocker: Restore internal wrapping key from the ACPI S3/4
sleep states
x86/cpu: Add a configuration and command line option for Key Locker
crypto: x86/aes - Prepare for a new AES implementation
crypto: x86/aes-kl - Support AES algorithm using Key Locker
instructions
crypto: x86/aes-kl - Support ECB mode
crypto: x86/aes-kl - Support CBC mode
crypto: x86/aes-kl - Support CTR mode
crypto: x86/aes-kl - Support XTS mode

.../admin-guide/kernel-parameters.txt | 2 +
Documentation/x86/index.rst | 1 +
Documentation/x86/keylocker.rst | 98 ++
arch/x86/Kconfig | 3 +
arch/x86/crypto/Makefile | 5 +-
arch/x86/crypto/aes-intel_asm.S | 26 +
arch/x86/crypto/aes-intel_glue.c | 219 +++
arch/x86/crypto/aes-intel_glue.h | 61 +
arch/x86/crypto/aeskl-intel_asm.S | 1186 +++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 401 ++++++
arch/x86/crypto/aesni-intel_asm.S | 90 +-
arch/x86/crypto/aesni-intel_glue.c | 313 +----
arch/x86/crypto/aesni-intel_glue.h | 88 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/keylocker.h | 45 +
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/special_insns.h | 33 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 21 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/keylocker.c | 199 +++
arch/x86/kernel/smpboot.c | 2 +
arch/x86/lib/x86-opcode-map.txt | 11 +-
arch/x86/power/cpu.c | 2 +
crypto/Kconfig | 44 +
tools/arch/x86/lib/x86-opcode-map.txt | 11 +-
28 files changed, 2534 insertions(+), 346 deletions(-)
create mode 100644 Documentation/x86/keylocker.rst
create mode 100644 arch/x86/crypto/aes-intel_asm.S
create mode 100644 arch/x86/crypto/aes-intel_glue.c
create mode 100644 arch/x86/crypto/aes-intel_glue.h
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h
create mode 100644 arch/x86/include/asm/keylocker.h
create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 7284bd9822f33a3be80ac6d92b4540d6dcfb5219
--
2.17.1



2021-11-24 20:14:24

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 01/15] Documentation/x86: Document Key Locker

Document the overview of the feature along with relevant consideration when
provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Add as a new patch.
---
Documentation/x86/index.rst | 1 +
Documentation/x86/keylocker.rst | 98 +++++++++++++++++++++++++++++++++
2 files changed, 99 insertions(+)
create mode 100644 Documentation/x86/keylocker.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..bbea47ea10f6 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -38,3 +38,4 @@ x86-specific Documentation
features
elf_auxvec
xstate
+ keylocker
diff --git a/Documentation/x86/keylocker.rst b/Documentation/x86/keylocker.rst
new file mode 100644
index 000000000000..e65d936ef199
--- /dev/null
+++ b/Documentation/x86/keylocker.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature feature to reduce key exfiltration
+opportunities while maintaining a programming interface similar to AES-NI.
+It converts the AES key into an encoded form, called the 'key handle'. The
+key handle is a wrapped version of the clear-text key where the wrapping
+key has limited exposure. Once converted, all subsequent data encryption
+using new AES instructions (AES-KL) uses this key handle, reducing the
+exposure of private key material in memory.
+
+Internal Wrapping Key (IWKey)
+=============================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+IWKey Restore Failure
+---------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of an IWKey restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by an IWKey restore failure is a data-volume then it is possible
+that I/O errors on that volume do not bring down the rest of the system.
+However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new IWKey at initial boot, not any time after like resume from
+suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private IWKey state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
+backup / restore facility is also not performant enough to be suitable for
+guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+ restriction failure, the instruction fails with setting RFLAGS.ZF. The
+ crypto library implementation includes the flag check to return an error
+ code. Note that the flag is also set when the internal wrapping key is
+ changed because of missing backup.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+ AES-KL instruction to process an 192-bit key. But there is no AES-KL
+ instruction to process a 192-bit key. The AES-KL cipher implementation
+ logs a warning message with a 192-bit key and then falls back to AES-NI.
+ So, this 192-bit key-size limitation is only documented, not enforced. It
+ means the key will remain in clear-text in memory. This is to meet Linux
+ crypto-cipher expectation that each implementation must support all the
+ AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+ overhead when compared with AES-NI instructions.
+
--
2.17.1


2021-11-24 20:14:28

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 02/15] x86/cpufeature: Enumerate Key Locker feature

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used to
transform a user key into a key handle.

It supports Advanced Encryption Standard (AES) cipher algorithm with new
SIMD instruction set like its predecessor (AES-NI). So a new AES
implementation will follow in the kernel's crypto library.

Here add it to enumerate the hardware capability, but it will not be
shown in /proc/cpuinfo as userspace usage is not supported.

Make the feature depend on XMM2 as it comes with AES SIMD instructions.

Add X86_FEATURE_KEYLOCKER to the disabled features mask. It will be
enabled under a new config option.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Do not publish
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d5b5f2ab87a0..e1964446bbe5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -361,6 +361,7 @@
#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */
#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER (16*32+23) /* "" Key Locker */
#define X86_FEATURE_BUS_LOCK_DETECT (16*32+24) /* Bus Lock detect */
#define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */
#define X86_FEATURE_MOVDIRI (16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..75e1e87640d4 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
# define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31))
#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */

+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER 0
+#else
+# define DISABLE_KEYLOCKER (1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
#ifdef CONFIG_X86_5LEVEL
# define DISABLE_LA57 0
#else
@@ -85,7 +91,7 @@
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
- DISABLE_ENQCMD)
+ DISABLE_ENQCMD|DISABLE_KEYLOCKER)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 0
#define DISABLED_MASK19 0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..b958a95a0908 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -124,6 +124,8 @@
#define X86_CR4_PCIDE _BITUL(X86_CR4_PCIDE_BIT)
#define X86_CR4_OSXSAVE_BIT 18 /* enable xsave and xrestore */
#define X86_CR4_OSXSAVE _BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT 19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER _BITUL(X86_CR4_KEYLOCKER_BIT)
#define X86_CR4_SMEP_BIT 20 /* enable SMEP support */
#define X86_CR4_SMEP _BITUL(X86_CR4_SMEP_BIT)
#define X86_CR4_SMAP_BIT 21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..abe7e04b27d9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
+ { X86_FEATURE_KEYLOCKER, X86_FEATURE_XMM2 },
{}
};

--
2.17.1


2021-11-24 20:14:32

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 03/15] x86/insn: Add Key Locker instructions to the opcode map

Add the following Key Locker instructions to the opcode map:

LOADIWKEY:
Load an CPU-internal wrapping key.

ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.

AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key indicated
by a key handle.

AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key indicated
by a key handle.

AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key indicated
by a key handle.

AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key indicated
by a key handle.

AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated
by a key handle.

AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated
by a key handle.

AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated
by a key handle.

AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated
by a key handle.

Details of these instructions can be found in Intel Software Developer's
Manual.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
Locker module is built.
---
arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
--
2.17.1


2021-11-24 20:14:35

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 04/15] x86/asm: Add a wrapper function for the LOADIWKEY instruction

Key Locker introduces a CPU-internal wrapping key to encode a user key to a
key handle. Then a key handle is referenced instead of the plain text key.

The new instruction loads an internal wrapping key in the
software-inaccessible CPU state. It operates only in kernel mode.

Define struct iwkey to pass the key value.

The kernel will use this function to load a new key at boot time when the
feature is enabled.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan Williams)

Note, Dan wondered if:
WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
arch/x86/include/asm/keylocker.h | 25 +++++++++++++++++++++
arch/x86/include/asm/special_insns.h | 33 ++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)
create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..df84c83228a1
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary internal wrapping key storage.
+ * @integrity_key: A 128-bit key to check that key handles have not
+ * been tampered with.
+ * @encryption_key: A 256-bit encryption key used in
+ * wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+ struct reg_128_bit integrity_key;
+ struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..e6469b05facf 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
#include <asm/processor-flags.h>
#include <linux/irqflags.h>
#include <linux/jump_label.h>
+#include <asm/keylocker.h>

/*
* The compiler should not reorder volatile asm statements with respect to each
@@ -294,6 +295,38 @@ static inline int enqcmds(void __iomem *dst, const void *src)
return 0;
}

+
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key: A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_cpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+ struct reg_128_bit zeros = { 0 };
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+ :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+ "m"(key->encryption_key[1]));
+
+ /*
+ * LOADIWKEY %xmm1,%xmm2
+ *
+ * EAX and XMM0 are implicit operands. Load a key value
+ * from XMM0-2 to a software-invisible CPU state. With zero
+ * in EAX, CPU does not do hardware randomization and the key
+ * backup is allowed.
+ *
+ * This instruction is supported by binutils >= 2.36.
+ */
+ asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+ asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+ :: "m"(zeros));
+}
+
#endif /* __KERNEL__ */

#endif /* _ASM_X86_SPECIAL_INSNS_H */
--
2.17.1


2021-11-24 20:14:38

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 05/15] x86/msr-index: Add MSRs for Key Locker internal wrapping key

The CPU state that contains the internal wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the cache
(like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and restore
the wrapping key, and to check the key status.

The wrapping key is saved in a platform-scoped state of non-volatile media.
The backup itself and its path from the CPU are encrypted and integrity
protected.

But this storage's non-volatility is not architecturally guaranteed across
off states, such as S5 and G3.

The MSRs will be used to back up the key for S3/4 sleep states. Then the
kernel code writes one of them to restore the key in each CPU state.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
arch/x86/include/asm/msr-index.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 01e2650b9585..7f11a3b3a75b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -941,4 +941,10 @@
#define MSR_VM_IGNNE 0xc0010115
#define MSR_VM_HSAVE_PA 0xc0010117

+/* MSRs for managing an internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS 0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS 0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM 0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL 0x00000d92
+
#endif /* _ASM_X86_MSR_INDEX_H */
--
2.17.1


2021-11-24 20:14:42

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 06/15] x86/keylocker: Define Key Locker CPUID leaf

Define the feature-specific CPUID leaf and bits. Both Key Locker enabling
code in the x86 core and AES Key Locker code in the crypto library will
reference them.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/include/asm/keylocker.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index df84c83228a1..e85dfb6c1524 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <linux/bits.h>
#include <asm/fpu/types.h>

/**
@@ -21,5 +22,11 @@ struct iwkey {
struct reg_128_bit encryption_key[2];
};

+#define KEYLOCKER_CPUID 0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
--
2.17.1


2021-11-24 20:14:45

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

Key Locker is a CPU feature to reduce key exfiltration opportunities while
maintaining a programming interface similar to AES-NI. It converts the AES
key into an encoded form, called the 'key handle'.

The key handle is a wrapped version of the clear-text key where the
wrapping key has limited exposure. Once converted via setkey(), all
subsequent data encryption using new AES instructions ('AES-KL') uses this
key handle, reducing the exposure of private key material in memory.

AES-KL is analogous to that of AES-NI. Most assembly code is translated
from the AES-NI code. They are operational in both 32-bit and 64-bit modes
like AES-NI. However, users need to be aware of the following differences:

== Key Handle ==

AES-KL may fail with an invalid key handle. It could be corrupted or fail
with handle restriction. A key handle may be encoded with some
restrictions. The implementation restricts every handle only available
in kernel mode via setkey().

=== AES Compliance ===

Key Locker is not AES compliant in that it lacks support for 192-bit keys.
However, per the expectations of Linux crypto-cipher implementations the
software cipher implementation must support all the AES compliant
key-sizes. The AES-KL cipher implementation achieves this constraint by
logging a warning and falling back to AES-NI. In other words the 192-bit
key-size limitation for what can be converted into a Key Locker key-handle
is only documented, not enforced. This along with the below performance and
failure mode implications is an end-user consideration for selecting AES-KL
vs AES-NI.

== API Limitation ==

The setkey() function transforms an AES key to a handle. An extended key is
a usual outcome of setkey() in other AES cipher implementations. For this
reason, a setkey() failure does not fall back to the other. So, AES-KL will
be exposed via synchronous interfaces only.

== Wrapping-key Restore Failure ==

The setkey() failure is also possible with the wrapping-key restore
failure. In an event of hardware failure, the wrapping key is lost from
deep sleep states. Then, setkey() will fail with the ENODEV error. And the
suspended data transforming task may resume but will fail due to the
wrapping key change.

== Performance ==

This feature comes with some performance penalties vs AES-NI. The
cryptsetup benchmark indicates Key Locker raw throughput can be ~5x slower
than AES-NI. For disk encryption, storage bandwidth may be the bottleneck
before encryption bandwidth, but the potential performance difference is
why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
the kernel transparently replacing AES-NI usage with AES-KL.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff clearly.
(Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/aeskl-intel_asm.S | 184 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 115 ++++++++++++++++++
crypto/Kconfig | 44 +++++++
4 files changed, 346 insertions(+)
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index ef6c0b9f69c6..f696b037faa5 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o

+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aesni-intel_asm.o aeskl-intel_glue.o aes-intel_glue.o
+
obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..d56ec8dd6644
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using Intel AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-intel_asm.S"
+
+.text
+
+#define STATE1 %xmm0
+#define STATE2 %xmm1
+#define STATE3 %xmm2
+#define STATE4 %xmm3
+#define STATE5 %xmm4
+#define STATE6 %xmm5
+#define STATE7 %xmm6
+#define STATE8 %xmm7
+#define STATE STATE1
+
+#define IV %xmm9
+#define KEY %xmm10
+#define BSWAP_MASK %xmm11
+#define CTR %xmm12
+#define INC %xmm13
+
+#ifdef __x86_64__
+#define IN1 %xmm8
+#define IN2 %xmm9
+#define IN3 %xmm10
+#define IN4 %xmm11
+#define IN5 %xmm12
+#define IN6 %xmm13
+#define IN7 %xmm14
+#define IN8 %xmm15
+#define IN IN1
+#define TCTR_LOW %r11
+#else
+#define IN %xmm1
+#endif
+
+#ifdef __x86_64__
+#define AREG %rax
+#define HANDLEP %rdi
+#define OUTP %rsi
+#define KLEN %r9d
+#define INP %rdx
+#define T1 %r10
+#define LEN %rcx
+#define IVP %r8
+#else
+#define AREG %eax
+#define HANDLEP %edi
+#define OUTP AREG
+#define KLEN %ebx
+#define INP %edx
+#define T1 %ecx
+#define LEN %esi
+#define IVP %ebp
+#endif
+
+#define UKEYP OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ push HANDLEP
+ movl (FRAME_OFFSET+8)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+12)(%esp), UKEYP # in_key
+ movl (FRAME_OFFSET+16)(%esp), %edx # key_len
+#endif
+ movl %edx, 480(HANDLEP)
+ movdqu (UKEYP), STATE1
+ mov $1, %eax
+ cmp $16, %dl
+ je .Lsetkey_128
+
+ movdqu 0x10(UKEYP), STATE2
+ encodekey256 %eax, %eax
+ movdqu STATE4, 0x30(HANDLEP)
+ jmp .Lsetkey_end
+.Lsetkey_128:
+ encodekey128 %eax, %eax
+
+.Lsetkey_end:
+ movdqu STATE1, (HANDLEP)
+ movdqu STATE2, 0x10(HANDLEP)
+ movdqu STATE3, 0x20(HANDLEP)
+
+ xor AREG, AREG
+#ifndef __x86_64__
+ popl HANDLEP
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int _aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_enc)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+12)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+16)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+20)(%esp), INP # src
+#endif
+ movdqu (INP), STATE
+ movl 480(HANDLEP), KLEN
+
+ cmp $16, KLEN
+ je .Lenc_128
+ aesenc256kl (HANDLEP), STATE
+ jz .Lenc_err
+ jmp .Lenc_noerr
+.Lenc_128:
+ aesenc128kl (HANDLEP), STATE
+ jz .Lenc_err
+
+.Lenc_noerr:
+ xor AREG, AREG
+ jmp .Lenc_end
+.Lenc_err:
+ mov $1, AREG
+.Lenc_end:
+ movdqu STATE, (OUTP)
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_enc)
+
+/*
+ * int _aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_dec)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+12)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+16)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+20)(%esp), INP # src
+#endif
+ movdqu (INP), STATE
+ mov 480(HANDLEP), KLEN
+
+ cmp $16, KLEN
+ je .Ldec_128
+ aesdec256kl (HANDLEP), STATE
+ jz .Ldec_err
+ jmp .Ldec_noerr
+.Ldec_128:
+ aesdec128kl (HANDLEP), STATE
+ jz .Ldec_err
+
+.Ldec_noerr:
+ xor AREG, AREG
+ jmp .Ldec_end
+.Ldec_err:
+ mov $1, AREG
+.Ldec_end:
+ movdqu STATE, (OUTP)
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..3e3a8b6eccb4
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
+ int err;
+
+ if (!crypto_simd_usable())
+ return -EBUSY;
+
+ if ((key_len != AES_KEYSIZE_128) && (key_len != AES_KEYSIZE_192) &&
+ (key_len != AES_KEYSIZE_256))
+ return -EINVAL;
+
+ kernel_fpu_begin();
+ if (unlikely(key_len == AES_KEYSIZE_192)) {
+ pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+ err = aesni_set_key(ctx, in_key, key_len);
+ } else {
+ if (!valid_keylocker())
+ err = -ENODEV;
+ else
+ err = aeskl_setkey(ctx, in_key, key_len);
+ }
+ kernel_fpu_end();
+
+ return err;
+}
+
+static inline u32 keylength(const void *raw_ctx)
+{
+ struct crypto_aes_ctx *ctx = aes_ctx((void *)raw_ctx);
+
+ return ctx->key_length;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_enc(ctx, out, in))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+ if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_dec(ctx, out, in))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static int __init aeskl_init(void)
+{
+ if (!valid_keylocker())
+ return -ENODEV;
+
+ /*
+ * AES-KL itself does not depend on AES-NI. But AES-KL does not
+ * support 192-bit keys. To make itself AES-compliant, it falls
+ * back to AES-NI.
+ */
+ if (!boot_cpu_has(X86_FEATURE_AES))
+ return -ENODEV;
+
+ return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+ return;
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 285f82647d2b..784a04433549 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1113,6 +1113,50 @@ config CRYPTO_AES_NI_INTEL
ECB, CBC, LRW, XTS. The 64 bit version has additional
acceleration for CTR.

+config CRYPTO_AES_KL
+ tristate "AES cipher algorithms (AES-KL)"
+ depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
+ depends on DM_CRYPT
+ select X86_KEYLOCKER
+ select CRYPTO_AES_NI_INTEL
+
+ help
+ Key Locker provides AES SIMD instructions (AES-KL) for secure
+ data encryption and decryption. While this new instruction
+ set is analogous to AES-NI, AES-KL supports to encode an AES
+ key to an encoded form ('key handle') and uses it to transform
+ data instead of accessing the AES key.
+
+ The setkey() transforms an AES key to a key handle, then the AES
+ key is no longer needed for data transformation. A user may
+ displace their keys from possible exposition.
+
+ This key encryption is done by the CPU-internal wrapping key. The
+ x86 core code loads a new random key at every boot time and
+ restores it from deep sleep states. This wrapping key support is
+ provided with X86_KEYLOCKER.
+
+ AES-KL supports 128-/256-bit keys only. While giving a 192-bit
+ key does not return an error, as AES-NI is chosen to process it,
+ the claimed security property is not available with that.
+
+ GNU binutils version 2.36 or above and LLVM version 12 or above
+ are assemblers that support AES-KL instructions.
+
+ Bare metal disk encryption is the preferred use case. Make it
+ depend on DM_CRYPT.
+
+ This selection enables an alternative crypto cipher for
+ cryptsetup, e.g. "capi:xts-aes-aeskl-plain", to use with dm-crypt
+ volumes. It trades off raw performance for reduced clear-text key
+ exposure and has an additional failure mode compared to AES-NI.
+ See Documentation/x86/keylocker.rst for more details. Key Locker
+ usage requires explicit opt-in at cryptsetup time. So, select it
+ if unsure.
+
+ See also the CRYPTO_AES_NI_INTEL description for more about the
+ AES cipher algorithm.
+
config CRYPTO_AES_SPARC64
tristate "AES cipher algorithms (SPARC64)"
depends on SPARC64
--
2.17.1


2021-11-24 20:14:47

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 07/15] x86/cpu/keylocker: Load an internal wrapping key at boot-time

The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
clear text key into a key handle. This key is a pivot in protecting user
keys. So the value has to be randomized before being loaded in the
software-invisible CPU state.

IWKey needs to be established before the first user. Given that the only
proposed Linux use case for Key Locker is dm-crypt, the feature could be
lazily enabled when the first dm-crypt user arrives, but there is no
precedent for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.

The kernel generates random bytes and load them at boot time. These bytes
are flushed out immediately.

Setting the CR4.KL bit does not always enable the feature so ensure the
dynamic CPU bit (CPUID.AESKLE) is set before loading the key.

Given that the Linux Key Locker support is only intended for bare metal
dm-crypt consumption, and that switching IWKey per VM is untenable,
explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
(Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
arch/x86/include/asm/keylocker.h | 9 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 5 +-
arch/x86/kernel/keylocker.c | 79 ++++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 2 +
5 files changed, 95 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index e85dfb6c1524..820ac29c06d9 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__

+#include <asm/processor.h>
#include <linux/bits.h>
#include <asm/fpu/types.h>

@@ -28,5 +29,13 @@ struct iwkey {
#define KEYLOCKER_CPUID_EBX_WIDE BIT(2)
#define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)

+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+#define setup_keylocker(c) do { } while (0)
+#define destroy_keylocker_data() do { } while (0)
+#endif
+
#endif /*__ASSEMBLY__ */
#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..e15efa238497 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -144,6 +144,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
obj-$(CONFIG_X86_UMIP) += umip.o
+obj-$(CONFIG_X86_KEYLOCKER) += keylocker.o

obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0083464de5e3..23b4aa437c1e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,8 @@
#include <asm/microcode_intel.h>
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
#include <asm/uv/uv.h>
#include <asm/sigframe.h>

@@ -1595,10 +1597,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);

- /* Set up SMEP/SMAP/UMIP */
+ /* Setup various Intel-specific CPU security features */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+ setup_keylocker(c);

/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..87d775a65716
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support internal wrapping key
+ * management.
+ */
+
+#include <linux/random.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+ struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+ get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
+ get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+ memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+ kernel_fpu_begin();
+ load_xmm_iwkey(&kl_setup.key);
+ kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c: A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+ /* This feature is not compatible with a hypervisor. */
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) ||
+ cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+ goto out;
+
+ cr4_set_bits(X86_CR4_KEYLOCKER);
+
+ if (c == &boot_cpu_data) {
+ u32 eax, ebx, ecx, edx;
+
+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ /*
+ * Check the feature readiness via CPUID. Note that the
+ * CPUID AESKLE bit is conditionally set only when CR4.KL
+ * is set.
+ */
+ if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+ !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+ pr_debug("x86/keylocker: Not fully supported.\n");
+ goto disable;
+ }
+
+ generate_keylocker_data();
+ }
+
+ load_keylocker();
+
+ pr_info_once("x86/keylocker: Enabled.\n");
+ return;
+
+disable:
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info_once("x86/keylocker: Disabled.\n");
+out:
+ /* Make sure the feature disabled for kexec-reboot. */
+ cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ac2909f0cab3..3b81fd643784 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
#include <asm/spec-ctrl.h>
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
+#include <asm/keylocker.h>

#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>
@@ -1475,6 +1476,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
nmi_selftest();
impress_friends();
mtrr_aps_init();
+ destroy_keylocker_data();
}

static int __initdata setup_possible_cpus = -1;
--
2.17.1


2021-11-24 20:14:50

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 09/15] x86/cpu: Add a configuration and command line option for Key Locker

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at boot.
The option is selected by the Key Locker cipher module CRYPTO_AES_KL (to be
added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/x86/Kconfig | 3 +++
arch/x86/kernel/cpu/common.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..336495722b12 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3399,6 +3399,8 @@

nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.

+ nokeylocker [X86] Disable Key Locker hardware feature.
+
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b9281fab4e3e..b0050e8cf723 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1864,6 +1864,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS

If unsure, say y.

+config X86_KEYLOCKER
+ bool
+
choice
prompt "TSX enable mode"
depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 23b4aa437c1e..db1fc9ff0fe3 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -364,6 +364,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
/* These bits should not change their value after CPU init is finished. */
static const unsigned long cr4_pinned_mask =
X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+ /* Expect an exact match without trailing characters. */
+ if (strlen(arg))
+ return 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+ pr_info("x86/keylocker: Disabled by kernel command line.\n");
+ return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

--
2.17.1


2021-11-24 20:14:52

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 08/15] x86/power/keylocker: Restore internal wrapping key from the ACPI S3/4 sleep states

When the system state switches to these sleep states, the internal
wrapping key gets reset in the CPU state.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
arch/x86/include/asm/keylocker.h | 4 +
arch/x86/kernel/keylocker.c | 124 ++++++++++++++++++++++++++++++-
arch/x86/power/cpu.c | 2 +
3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..4000a5eed2c2 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
#ifdef CONFIG_X86_KEYLOCKER
void setup_keylocker(struct cpuinfo_x86 *c);
void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
#else
#define setup_keylocker(c) do { } while (0)
#define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker { return false; }
#endif

#endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..ff0e012e3dd5 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
#include <asm/fpu/api.h>
#include <asm/keylocker.h>
#include <asm/tlbflush.h>
+#include <asm/msr.h>

static __initdata struct keylocker_setup_data {
+ bool initialized;
struct iwkey key;
} kl_setup;

+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+ return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
static void __init generate_keylocker_data(void)
{
get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
@@ -25,6 +40,8 @@ static void __init generate_keylocker_data(void)
void __init destroy_keylocker_data(void)
{
memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+ kl_setup.initialized = true;
+ valid_kl = true;
}

static void __init load_keylocker(void)
@@ -34,6 +51,27 @@ static void __init load_keylocker(void)
kernel_fpu_end();
}

+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns: -EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+ u64 status;
+
+ wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+ rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+ if (status & BIT(0))
+ return 0;
+ else
+ return -EBUSY;
+}
+
/**
* setup_keylocker - Enable the feature.
* @c: A pointer to struct cpuinfo_x86
@@ -49,6 +87,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)

if (c == &boot_cpu_data) {
u32 eax, ebx, ecx, edx;
+ bool backup_available;

cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
/*
@@ -62,10 +101,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
goto disable;
}

+ backup_available = (ebx & KEYLOCKER_CPUID_EBX_BACKUP) ? true : false;
+ /*
+ * The internal wrapping key in CPU state is volatile in
+ * S3/4 states. So ensure the backup capability along with
+ * S-states.
+ */
+ if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+ pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+ goto disable;
+ }
+
generate_keylocker_data();
- }
+ load_keylocker();

- load_keylocker();
+ /* Backup an internal wrapping key in non-volatile media. */
+ if (backup_available)
+ wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+ } else {
+ int rc;
+
+ /*
+ * Load the internal wrapping key directly when available
+ * in memory, which is only possible at boot-time.
+ *
+ * NB: When system wakes up, this path also recovers the
+ * internal wrapping key.
+ */
+ if (!kl_setup.initialized) {
+ load_keylocker();
+ } else if (valid_kl) {
+ rc = copy_keylocker();
+ /*
+ * The boot CPU was successful but the key copy
+ * fails here. Then, the subsequent feature use
+ * will have inconsistent keys and failures. So,
+ * invalidate the feature via the flag.
+ */
+ if (rc) {
+ valid_kl = false;
+ pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+ }
+ }
+ }

pr_info_once("x86/keylocker: Enabled.\n");
return;
@@ -77,3 +155,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
/* Make sure the feature disabled for kexec-reboot. */
cr4_clear_bits(X86_CR4_KEYLOCKER);
}
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+ u64 backup_status;
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+ return;
+
+ /*
+ * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+ * an invalid backup if bit 0 is set and a read (or write) error if
+ * bit 2 is set.
+ */
+ rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+ if (backup_status & BIT(0)) {
+ rc = copy_keylocker();
+ if (rc)
+ pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+ else
+ return;
+ } else {
+ pr_err("x86/keylocker: The key backup access failed with %s.\n",
+ (backup_status & BIT(2)) ? "read error" : "invalid status");
+ }
+
+ /*
+ * Now the backup key is not available. Invalidate the feature via
+ * the flag to avoid any subsequent use. But keep the feature with
+ * zero IWKeys instead of disabling it. The current users will see
+ * key handle integrity failure but that's because of the internal
+ * key change.
+ */
+ pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+ valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
#include <asm/cpu.h>
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>

#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
mtrr_bp_restore();
perf_restore_debug_store();
msr_restore_context(ctxt);
+ restore_keylocker();

c = &cpu_data(smp_processor_id());
if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
--
2.17.1


2021-11-24 20:14:56

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 12/15] crypto: x86/aes-kl - Support ECB mode

Implement ECB mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/crypto/aeskl-intel_asm.S | 193 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 92 +++++++++++++-
2 files changed, 284 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index d56ec8dd6644..833bb39ae903 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -182,3 +182,196 @@ SYM_FUNC_START(_aeskl_dec)
ret
SYM_FUNC_END(_aeskl_dec)

+/*
+ * int _aeskl_ecb_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len)
+ */
+SYM_FUNC_START(_aeskl_ecb_enc)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl LEN
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+16)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+20)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+24)(%esp), INP # src
+ movl (FRAME_OFFSET+28)(%esp), LEN # len
+#endif
+ test LEN, LEN
+ jz .Lecb_enc_noerr
+ mov 480(HANDLEP), KLEN
+ cmp $16, LEN
+ jb .Lecb_enc_noerr
+ cmp $128, LEN
+ jb .Lecb_enc1
+
+.align 4
+.Lecb_enc8:
+ movdqu (INP), STATE1
+ movdqu 0x10(INP), STATE2
+ movdqu 0x20(INP), STATE3
+ movdqu 0x30(INP), STATE4
+ movdqu 0x40(INP), STATE5
+ movdqu 0x50(INP), STATE6
+ movdqu 0x60(INP), STATE7
+ movdqu 0x70(INP), STATE8
+
+ cmp $16, KLEN
+ je .Lecb_enc8_128
+ aesencwide256kl (HANDLEP)
+ jz .Lecb_enc_err
+ jmp .Lecb_enc8_end
+.Lecb_enc8_128:
+ aesencwide128kl (HANDLEP)
+ jz .Lecb_enc_err
+
+.Lecb_enc8_end:
+ movdqu STATE1, (OUTP)
+ movdqu STATE2, 0x10(OUTP)
+ movdqu STATE3, 0x20(OUTP)
+ movdqu STATE4, 0x30(OUTP)
+ movdqu STATE5, 0x40(OUTP)
+ movdqu STATE6, 0x50(OUTP)
+ movdqu STATE7, 0x60(OUTP)
+ movdqu STATE8, 0x70(OUTP)
+
+ sub $128, LEN
+ add $128, INP
+ add $128, OUTP
+ cmp $128, LEN
+ jge .Lecb_enc8
+ cmp $16, LEN
+ jb .Lecb_enc_noerr
+
+.align 4
+.Lecb_enc1:
+ movdqu (INP), STATE1
+ cmp $16, KLEN
+ je .Lecb_enc1_128
+ aesenc256kl (HANDLEP), STATE
+ jz .Lecb_enc_err
+ jmp .Lecb_enc1_end
+.Lecb_enc1_128:
+ aesenc128kl (HANDLEP), STATE
+ jz .Lecb_enc_err
+
+.Lecb_enc1_end:
+ movdqu STATE1, (OUTP)
+ sub $16, LEN
+ add $16, INP
+ add $16, OUTP
+ cmp $16, LEN
+ jge .Lecb_enc1
+
+.Lecb_enc_noerr:
+ xor AREG, AREG
+ jmp .Lecb_enc_end
+.Lecb_enc_err:
+ mov $1, AREG
+.Lecb_enc_end:
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+ popl LEN
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_ecb_enc)
+
+/*
+ * int _aeskl_ecb_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len)
+ */
+SYM_FUNC_START(_aeskl_ecb_dec)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl LEN
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+16)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+20)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+24)(%esp), INP # src
+ movl (FRAME_OFFSET+28)(%esp), LEN # len
+#endif
+
+ test LEN, LEN
+ jz .Lecb_dec_noerr
+ mov 480(HANDLEP), KLEN
+ cmp $16, LEN
+ jb .Lecb_dec_noerr
+ cmp $128, LEN
+ jb .Lecb_dec1
+
+.align 4
+.Lecb_dec8:
+ movdqu (INP), STATE1
+ movdqu 0x10(INP), STATE2
+ movdqu 0x20(INP), STATE3
+ movdqu 0x30(INP), STATE4
+ movdqu 0x40(INP), STATE5
+ movdqu 0x50(INP), STATE6
+ movdqu 0x60(INP), STATE7
+ movdqu 0x70(INP), STATE8
+
+ cmp $16, KLEN
+ je .Lecb_dec8_128
+ aesdecwide256kl (HANDLEP)
+ jz .Lecb_dec_err
+ jmp .Lecb_dec8_end
+.Lecb_dec8_128:
+ aesdecwide128kl (HANDLEP)
+ jz .Lecb_dec_err
+
+.Lecb_dec8_end:
+ movdqu STATE1, (OUTP)
+ movdqu STATE2, 0x10(OUTP)
+ movdqu STATE3, 0x20(OUTP)
+ movdqu STATE4, 0x30(OUTP)
+ movdqu STATE5, 0x40(OUTP)
+ movdqu STATE6, 0x50(OUTP)
+ movdqu STATE7, 0x60(OUTP)
+ movdqu STATE8, 0x70(OUTP)
+
+ sub $128, LEN
+ add $128, INP
+ add $128, OUTP
+ cmp $128, LEN
+ jge .Lecb_dec8
+ cmp $16, LEN
+ jb .Lecb_dec_noerr
+
+.align 4
+.Lecb_dec1:
+ movdqu (INP), STATE1
+ cmp $16, KLEN
+ je .Lecb_dec1_128
+ aesdec256kl (HANDLEP), STATE
+ jz .Lecb_dec_err
+ jmp .Lecb_dec1_end
+.Lecb_dec1_128:
+ aesdec128kl (HANDLEP), STATE
+ jz .Lecb_dec_err
+
+.Lecb_dec1_end:
+ movdqu STATE1, (OUTP)
+ sub $16, LEN
+ add $16, INP
+ add $16, OUTP
+ cmp $16, LEN
+ jge .Lecb_dec1
+
+.Lecb_dec_noerr:
+ xor AREG, AREG
+ jmp .Lecb_dec_end
+.Lecb_dec_err:
+ mov $1, AREG
+.Lecb_dec_end:
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+ popl LEN
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_ecb_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 3e3a8b6eccb4..7c9794a0969d 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -27,6 +27,9 @@ asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsign
asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);

+asmlinkage int _aeskl_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);
+asmlinkage int _aeskl_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);
+
static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
unsigned int key_len)
{
@@ -86,11 +89,92 @@ static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
return 0;
}

+static int aeskl_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_ecb_enc(ctx, out, in, len))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static int aeskl_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_ecb_dec(ctx, out, in, len))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static int aeskl_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int len)
+{
+ struct crypto_tfm *crypto_tfm = crypto_skcipher_tfm(tfm);
+ void *raw_ctx = crypto_skcipher_ctx(tfm);
+
+ return aeskl_setkey_common(crypto_tfm, raw_ctx, key, len);
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return ecb_crypt_common(req, aeskl_ecb_enc);
+ else
+ return ecb_crypt_common(req, aesni_ecb_enc);
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return ecb_crypt_common(req, aeskl_ecb_dec);
+ else
+ return ecb_crypt_common(req, aesni_ecb_dec);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+ {
+ .base = {
+ .cra_name = "__ecb(aes)",
+ .cra_driver_name = "__ecb-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = CRYPTO_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .setkey = aeskl_skcipher_setkey,
+ .encrypt = ecb_encrypt,
+ .decrypt = ecb_decrypt,
+ }
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
static int __init aeskl_init(void)
{
+ u32 eax, ebx, ecx, edx;
+ int err;
+
if (!valid_keylocker())
return -ENODEV;

+ cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+ return -ENODEV;
+
/*
* AES-KL itself does not depend on AES-NI. But AES-KL does not
* support 192-bit keys. To make itself AES-compliant, it falls
@@ -99,12 +183,18 @@ static int __init aeskl_init(void)
if (!boot_cpu_has(X86_FEATURE_AES))
return -ENODEV;

+ err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
+ if (err)
+ return err;
+
return 0;
}

static void __exit aeskl_exit(void)
{
- return;
+ simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+ aeskl_simd_skciphers);
}

late_initcall(aeskl_init);
--
2.17.1


2021-11-24 20:15:08

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 13/15] crypto: x86/aes-kl - Support CBC mode

Implement CBC using AES-KL. Export the methods with a lower priority than
AES-NI to avoid from selected by default.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/crypto/aeskl-intel_asm.S | 188 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 67 ++++++++++
2 files changed, 255 insertions(+)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index 833bb39ae903..5ee7b24ee3c8 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -375,3 +375,191 @@ SYM_FUNC_START(_aeskl_ecb_dec)
ret
SYM_FUNC_END(_aeskl_ecb_dec)

+/*
+ * int _aeskl_cbc_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
+ */
+SYM_FUNC_START(_aeskl_cbc_enc)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl IVP
+ pushl LEN
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+20)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+24)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+28)(%esp), INP # src
+ movl (FRAME_OFFSET+32)(%esp), LEN # len
+ movl (FRAME_OFFSET+36)(%esp), IVP # iv
+#endif
+
+ cmp $16, LEN
+ jb .Lcbc_enc_noerr
+ mov 480(HANDLEP), KLEN
+ movdqu (IVP), STATE
+
+.align 4
+.Lcbc_enc1:
+ movdqu (INP), IN
+ pxor IN, STATE
+
+ cmp $16, KLEN
+ je .Lcbc_enc1_128
+ aesenc256kl (HANDLEP), STATE
+ jz .Lcbc_enc_err
+ jmp .Lcbc_enc1_end
+.Lcbc_enc1_128:
+ aesenc128kl (HANDLEP), STATE
+ jz .Lcbc_enc_err
+
+.Lcbc_enc1_end:
+ movdqu STATE, (OUTP)
+ sub $16, LEN
+ add $16, INP
+ add $16, OUTP
+ cmp $16, LEN
+ jge .Lcbc_enc1
+ movdqu STATE, (IVP)
+
+.Lcbc_enc_noerr:
+ xor AREG, AREG
+ jmp .Lcbc_enc_end
+.Lcbc_enc_err:
+ mov $1, AREG
+.Lcbc_enc_end:
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+ popl LEN
+ popl IVP
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_cbc_enc)
+
+/*
+ * int _aeskl_cbc_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
+ */
+SYM_FUNC_START(_aeskl_cbc_dec)
+ FRAME_BEGIN
+#ifndef __x86_64__
+ pushl IVP
+ pushl LEN
+ pushl HANDLEP
+ pushl KLEN
+ movl (FRAME_OFFSET+20)(%esp), HANDLEP # ctx
+ movl (FRAME_OFFSET+24)(%esp), OUTP # dst
+ movl (FRAME_OFFSET+28)(%esp), INP # src
+ movl (FRAME_OFFSET+32)(%esp), LEN # len
+ movl (FRAME_OFFSET+36)(%esp), IVP # iv
+#endif
+
+ cmp $16, LEN
+ jb .Lcbc_dec_noerr
+ mov 480(HANDLEP), KLEN
+#ifdef __x86_64__
+ cmp $128, LEN
+ jb .Lcbc_dec1_pre
+
+.align 4
+.Lcbc_dec8:
+ movdqu 0x0(INP), STATE1
+ movdqu 0x10(INP), STATE2
+ movdqu 0x20(INP), STATE3
+ movdqu 0x30(INP), STATE4
+ movdqu 0x40(INP), STATE5
+ movdqu 0x50(INP), STATE6
+ movdqu 0x60(INP), STATE7
+ movdqu 0x70(INP), STATE8
+
+ movdqu (IVP), IN1
+ movdqa STATE1, IN2
+ movdqa STATE2, IN3
+ movdqa STATE3, IN4
+ movdqa STATE4, IN5
+ movdqa STATE5, IN6
+ movdqa STATE6, IN7
+ movdqa STATE7, IN8
+ movdqu STATE8, (IVP)
+
+ cmp $16, KLEN
+ je .Lcbc_dec8_128
+ aesdecwide256kl (HANDLEP)
+ jz .Lcbc_dec_err
+ jmp .Lcbc_dec8_end
+.Lcbc_dec8_128:
+ aesdecwide128kl (HANDLEP)
+ jz .Lcbc_dec_err
+
+.Lcbc_dec8_end:
+ pxor IN1, STATE1
+ pxor IN2, STATE2
+ pxor IN3, STATE3
+ pxor IN4, STATE4
+ pxor IN5, STATE5
+ pxor IN6, STATE6
+ pxor IN7, STATE7
+ pxor IN8, STATE8
+
+ movdqu STATE1, 0x0(OUTP)
+ movdqu STATE2, 0x10(OUTP)
+ movdqu STATE3, 0x20(OUTP)
+ movdqu STATE4, 0x30(OUTP)
+ movdqu STATE5, 0x40(OUTP)
+ movdqu STATE6, 0x50(OUTP)
+ movdqu STATE7, 0x60(OUTP)
+ movdqu STATE8, 0x70(OUTP)
+
+ sub $128, LEN
+ add $128, INP
+ add $128, OUTP
+ cmp $128, LEN
+ jge .Lcbc_dec8
+ cmp $16, LEN
+ jb .Lcbc_dec_noerr
+#endif
+
+.align 4
+.Lcbc_dec1_pre:
+ movdqu (IVP), STATE3
+.Lcbc_dec1:
+ movdqu (INP), STATE2
+ movdqa STATE2, STATE1
+
+ cmp $16, KLEN
+ je .Lcbc_dec1_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lcbc_dec_err
+ jmp .Lcbc_dec1_end
+.Lcbc_dec1_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lcbc_dec_err
+
+.Lcbc_dec1_end:
+ pxor STATE3, STATE1
+ movdqu STATE1, (OUTP)
+ movdqa STATE2, STATE3
+ sub $16, LEN
+ add $16, INP
+ add $16, OUTP
+ cmp $16, LEN
+ jge .Lcbc_dec1
+ movdqu STATE3, (IVP)
+
+.Lcbc_dec_noerr:
+ xor AREG, AREG
+ jmp .Lcbc_dec_end
+.Lcbc_dec_err:
+ mov $1, AREG
+.Lcbc_dec_end:
+#ifndef __x86_64__
+ popl KLEN
+ popl HANDLEP
+ popl LEN
+ popl IVP
+#endif
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_cbc_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 7c9794a0969d..742576ae0481 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -30,6 +30,11 @@ asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
asmlinkage int _aeskl_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);
asmlinkage int _aeskl_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);

+asmlinkage int _aeskl_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+asmlinkage int _aeskl_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+
static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
unsigned int key_len)
{
@@ -113,6 +118,32 @@ static int aeskl_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsi
return 0;
}

+static int aeskl_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_cbc_enc(ctx, out, in, len, iv))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static int aeskl_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_cbc_dec(ctx, out, in, len, iv))
+ return -EINVAL;
+ else
+ return 0;
+}
+
static int aeskl_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int len)
{
@@ -142,6 +173,26 @@ static int ecb_decrypt(struct skcipher_request *req)
return ecb_crypt_common(req, aesni_ecb_dec);
}

+static int cbc_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return cbc_crypt_common(req, aeskl_cbc_enc);
+ else
+ return cbc_crypt_common(req, aesni_cbc_enc);
+}
+
+static int cbc_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return cbc_crypt_common(req, aeskl_cbc_dec);
+ else
+ return cbc_crypt_common(req, aesni_cbc_dec);
+}
+
static struct skcipher_alg aeskl_skciphers[] = {
{
.base = {
@@ -158,6 +209,22 @@ static struct skcipher_alg aeskl_skciphers[] = {
.setkey = aeskl_skcipher_setkey,
.encrypt = ecb_encrypt,
.decrypt = ecb_decrypt,
+ }, {
+ .base = {
+ .cra_name = "__cbc(aes)",
+ .cra_driver_name = "__cbc-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = CRYPTO_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .setkey = aeskl_skcipher_setkey,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
}
};

--
2.17.1


2021-11-24 20:15:07

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 10/15] crypto: x86/aes - Prepare for a new AES implementation

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
value. AES-KL may populate an error during data transformation.

Also refactor some constant values in the AES-NI's assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
arch/x86/crypto/Makefile | 2 +-
arch/x86/crypto/aes-intel_asm.S | 26 +++
arch/x86/crypto/aes-intel_glue.c | 219 ++++++++++++++++++++
arch/x86/crypto/aes-intel_glue.h | 61 ++++++
arch/x86/crypto/aesni-intel_asm.S | 90 ++++-----
arch/x86/crypto/aesni-intel_glue.c | 313 +++--------------------------
arch/x86/crypto/aesni-intel_glue.h | 88 ++++++++
7 files changed, 463 insertions(+), 336 deletions(-)
create mode 100644 arch/x86/crypto/aes-intel_asm.S
create mode 100644 arch/x86/crypto/aes-intel_glue.c
create mode 100644 arch/x86/crypto/aes-intel_glue.h
create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..ef6c0b9f69c6 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -47,7 +47,7 @@ chacha-x86_64-y := chacha-avx2-x86_64.o chacha-ssse3-x86_64.o chacha_glue.o
chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o

obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o

obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+ .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+ .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+ .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+ .octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.c b/arch/x86/crypto/aes-intel_glue.c
new file mode 100644
index 000000000000..3ff5f85bbf28
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+#include "aes-intel_glue.h"
+
+int ecb_crypt_common(struct skcipher_request *req,
+ int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+
+ while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
+ err = fn(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+int cbc_crypt_common(struct skcipher_request *req,
+ int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+
+ while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
+ err = fn(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK,
+ walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+ int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+ const u8 *in_key, unsigned int key_len))
+{
+ struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ /* first half of xts-key is for crypt */
+ err = fn(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx, key, keylen);
+ if (err)
+ return err;
+
+ /* second half of xts-key is for tweak */
+ return fn(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx, key + keylen, keylen);
+}
+
+int xts_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int tail = req->cryptlen % AES_BLOCK_SIZE;
+ struct skcipher_request subreq;
+ struct skcipher_walk walk;
+ int err;
+
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+
+ if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+ int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+ skcipher_walk_abort(&walk);
+
+ skcipher_request_set_tfm(&subreq, tfm);
+ skcipher_request_set_callback(&subreq,
+ skcipher_request_flags(req),
+ NULL, NULL);
+ skcipher_request_set_crypt(&subreq, req->src, req->dst,
+ blocks * AES_BLOCK_SIZE, req->iv);
+ req = &subreq;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ if (!walk.nbytes)
+ return err;
+ } else {
+ tail = 0;
+ }
+
+ kernel_fpu_begin();
+
+ /* calculate first value of T */
+ err = crypt1_fn(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+ if (err)
+ return err;
+
+ while (walk.nbytes > 0) {
+ int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+ err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+ if (walk.nbytes > 0)
+ kernel_fpu_begin();
+ }
+
+ if (unlikely(tail > 0 && !err)) {
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct scatterlist *src, *dst;
+
+ dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
+
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
+
+ kernel_fpu_begin();
+ err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes, walk.iv);
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, 0);
+ }
+ return err;
+}
+
+#ifdef CONFIG_X86_64
+
+int ctr_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+ u8 keystream[AES_BLOCK_SIZE];
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+
+ while ((nbytes = walk.nbytes) > 0) {
+ kernel_fpu_begin();
+ if (nbytes & AES_BLOCK_MASK) {
+ err = crypt_fn(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes & AES_BLOCK_MASK, walk.iv);
+ }
+ nbytes &= ~AES_BLOCK_MASK;
+
+ if (walk.nbytes == walk.total && nbytes > 0) {
+ err = crypt1_fn(ctx, keystream, walk.iv);
+ crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
+ walk.src.virt.addr + walk.nbytes - nbytes,
+ keystream, nbytes);
+ crypto_inc(walk.iv, AES_BLOCK_SIZE);
+ nbytes = 0;
+ }
+ kernel_fpu_end();
+ if (err)
+ return err;
+
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+ return err;
+}
+
+#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..41e93c0f7932
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+
+#define AES_ALIGN 16
+#define AES_ALIGN_ATTR __attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA ((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE (sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+ u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+ u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+ unsigned long addr = (unsigned long)raw_ctx;
+ unsigned long align = AES_ALIGN;
+
+ if (align <= crypto_tfm_ctx_alignment())
+ align = 1;
+
+ return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+int ecb_crypt_common(struct skcipher_request *req,
+ int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len));
+
+int cbc_crypt_common(struct skcipher_request *req,
+ int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv));
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+ int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+ const u8 *in_key, unsigned int key_len));
+
+int xts_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#ifdef CONFIG_X86_64
+
+int ctr_crypt_common(struct skcipher_request *req,
+ int (*crypt_fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv),
+ int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#endif /*CONFIG_X86_64 */
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 4e3972570916..06dfd71ffdf4 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
#include <linux/linkage.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"

/*
* The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1937,9 +1938,9 @@ SYM_FUNC_START(aesni_set_key)
SYM_FUNC_END(aesni_set_key)

/*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -1958,7 +1959,7 @@ SYM_FUNC_START(aesni_enc)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)

/*
* _aesni_enc1: internal ABI
@@ -2126,9 +2127,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
SYM_FUNC_END(_aesni_enc4)

/*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
*/
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
FRAME_BEGIN
#ifndef __x86_64__
pushl KEYP
@@ -2148,7 +2149,7 @@ SYM_FUNC_START(aesni_dec)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)

/*
* _aesni_dec1: internal ABI
@@ -2316,10 +2317,10 @@ SYM_FUNC_START_LOCAL(_aesni_dec4)
SYM_FUNC_END(_aesni_dec4)

/*
- * void aesni_ecb_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
- * size_t len)
+ * void _aesni_ecb_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len)
*/
-SYM_FUNC_START(aesni_ecb_enc)
+SYM_FUNC_START(_aesni_ecb_enc)
FRAME_BEGIN
#ifndef __x86_64__
pushl LEN
@@ -2373,13 +2374,13 @@ SYM_FUNC_START(aesni_ecb_enc)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_ecb_enc)
+SYM_FUNC_END(_aesni_ecb_enc)

/*
- * void aesni_ecb_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
- * size_t len);
+ * void _aesni_ecb_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len);
*/
-SYM_FUNC_START(aesni_ecb_dec)
+SYM_FUNC_START(_aesni_ecb_dec)
FRAME_BEGIN
#ifndef __x86_64__
pushl LEN
@@ -2434,13 +2435,13 @@ SYM_FUNC_START(aesni_ecb_dec)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_ecb_dec)
+SYM_FUNC_END(_aesni_ecb_dec)

/*
- * void aesni_cbc_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
- * size_t len, u8 *iv)
+ * void _aesni_cbc_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
*/
-SYM_FUNC_START(aesni_cbc_enc)
+SYM_FUNC_START(_aesni_cbc_enc)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -2478,13 +2479,13 @@ SYM_FUNC_START(aesni_cbc_enc)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_cbc_enc)
+SYM_FUNC_END(_aesni_cbc_enc)

/*
- * void aesni_cbc_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
- * size_t len, u8 *iv)
+ * void _aesni_cbc_dec(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
*/
-SYM_FUNC_START(aesni_cbc_dec)
+SYM_FUNC_START(_aesni_cbc_dec)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -2571,7 +2572,7 @@ SYM_FUNC_START(aesni_cbc_dec)
#endif
FRAME_END
ret
-SYM_FUNC_END(aesni_cbc_dec)
+SYM_FUNC_END(_aesni_cbc_dec)

/*
* void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
@@ -2691,21 +2692,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
ret
SYM_FUNC_END(aesni_cts_cbc_dec)

-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
- .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
- .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
- .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
#ifdef __x86_64__
/*
* _aesni_inc_init: internal ABI
@@ -2757,10 +2743,10 @@ SYM_FUNC_START_LOCAL(_aesni_inc)
SYM_FUNC_END(_aesni_inc)

/*
- * void aesni_ctr_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
- * size_t len, u8 *iv)
+ * void _aesni_ctr_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
*/
-SYM_FUNC_START(aesni_ctr_enc)
+SYM_FUNC_START(_aesni_ctr_enc)
FRAME_BEGIN
cmp $16, LEN
jb .Lctr_enc_just_ret
@@ -2817,16 +2803,10 @@ SYM_FUNC_START(aesni_ctr_enc)
.Lctr_enc_just_ret:
FRAME_END
ret
-SYM_FUNC_END(aesni_ctr_enc)
+SYM_FUNC_END(_aesni_ctr_enc)

#endif

-.section .rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
- .octa 0x00000000000000010000000000000087
-.previous
-
/*
* _aesni_gf128mul_x_ble: internal ABI
* Multiply in GF(2^128) for XTS IVs
@@ -2846,10 +2826,10 @@ SYM_FUNC_END(aesni_ctr_enc)
pxor KEY, IV;

/*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -2998,13 +2978,13 @@ SYM_FUNC_START(aesni_xts_encrypt)

movups STATE, (OUTP)
jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)

/*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- * const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
*/
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
FRAME_BEGIN
#ifndef __x86_64__
pushl IVP
@@ -3160,4 +3140,4 @@ SYM_FUNC_START(aesni_xts_decrypt)

movups STATE, (OUTP)
jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e09f4672dd38..b15a476e94c1 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -35,34 +35,24 @@
#include <linux/workqueue.h>
#include <linux/spinlock.h>
#include <linux/static_call.h>
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"

-
-#define AESNI_ALIGN 16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
#define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)

/* This data is stored at the end of the crypto_tfm struct.
* It's a type of per "session" data storage location.
* This needs to be 16 byte aligned.
*/
struct aesni_rfc4106_gcm_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
u8 nonce[4];
};

struct generic_gcmaes_ctx {
- u8 hash_subkey[16] AESNI_ALIGN_ATTR;
- struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
- u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
- u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+ u8 hash_subkey[16] AES_ALIGN_ATTR;
+ struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
};

#define GCM_BLOCK_LEN 16
@@ -80,18 +70,6 @@ struct gcm_context_data {
u8 hash_keys[GCM_BLOCK_LEN * 16];
};

-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len);
-asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len);
-asmlinkage void aesni_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
-asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -100,16 +78,8 @@ asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096

-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
-
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
-
#ifdef CONFIG_X86_64

-asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv);
DEFINE_STATIC_CALL(aesni_ctr_enc_tfm, aesni_ctr_enc);

/* Scatter / Gather routines, with args similar to above */
@@ -187,7 +157,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
static inline struct
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
+ unsigned long align = AES_ALIGN;

if (align <= crypto_tfm_ctx_alignment())
align = 1;
@@ -197,7 +167,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
static inline struct
generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
{
- unsigned long align = AESNI_ALIGN;
+ unsigned long align = AES_ALIGN;

if (align <= crypto_tfm_ctx_alignment())
align = 1;
@@ -205,16 +175,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
}
#endif

-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
- unsigned long addr = (unsigned long)raw_ctx;
- unsigned long align = AESNI_ALIGN;
-
- if (align <= crypto_tfm_ctx_alignment())
- align = 1;
- return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
const u8 *in_key, unsigned int key_len)
{
@@ -250,7 +210,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_encrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_enc(ctx, dst, src);
+ _aesni_enc(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -263,7 +223,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
aes_decrypt(ctx, dst, src);
} else {
kernel_fpu_begin();
- aesni_dec(ctx, dst, src);
+ _aesni_dec(ctx, dst, src);
kernel_fpu_end();
}
}
@@ -277,90 +237,22 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,

static int ecb_encrypt(struct skcipher_request *req)
{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- struct skcipher_walk walk;
- unsigned int nbytes;
- int err;
-
- err = skcipher_walk_virt(&walk, req, false);
-
- while ((nbytes = walk.nbytes)) {
- kernel_fpu_begin();
- aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK);
- kernel_fpu_end();
- nbytes &= AES_BLOCK_SIZE - 1;
- err = skcipher_walk_done(&walk, nbytes);
- }
-
- return err;
+ return ecb_crypt_common(req, aesni_ecb_enc);
}

static int ecb_decrypt(struct skcipher_request *req)
{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- struct skcipher_walk walk;
- unsigned int nbytes;
- int err;
-
- err = skcipher_walk_virt(&walk, req, false);
-
- while ((nbytes = walk.nbytes)) {
- kernel_fpu_begin();
- aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK);
- kernel_fpu_end();
- nbytes &= AES_BLOCK_SIZE - 1;
- err = skcipher_walk_done(&walk, nbytes);
- }
-
- return err;
+ return ecb_crypt_common(req, aesni_ecb_dec);
}

static int cbc_encrypt(struct skcipher_request *req)
{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- struct skcipher_walk walk;
- unsigned int nbytes;
- int err;
-
- err = skcipher_walk_virt(&walk, req, false);
-
- while ((nbytes = walk.nbytes)) {
- kernel_fpu_begin();
- aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK, walk.iv);
- kernel_fpu_end();
- nbytes &= AES_BLOCK_SIZE - 1;
- err = skcipher_walk_done(&walk, nbytes);
- }
-
- return err;
+ return cbc_crypt_common(req, aesni_cbc_enc);
}

static int cbc_decrypt(struct skcipher_request *req)
{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- struct skcipher_walk walk;
- unsigned int nbytes;
- int err;
-
- err = skcipher_walk_virt(&walk, req, false);
-
- while ((nbytes = walk.nbytes)) {
- kernel_fpu_begin();
- aesni_cbc_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK, walk.iv);
- kernel_fpu_end();
- nbytes &= AES_BLOCK_SIZE - 1;
- err = skcipher_walk_done(&walk, nbytes);
- }
-
- return err;
+ return cbc_crypt_common(req, aesni_cbc_dec);
}

static int cts_cbc_encrypt(struct skcipher_request *req)
@@ -476,8 +368,8 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
}

#ifdef CONFIG_X86_64
-static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
- const u8 *in, unsigned int len, u8 *iv)
+static int aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv)
{
/*
* based on key length, override with the by8 version
@@ -491,40 +383,12 @@ static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len);
else
aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
+ return 0;
}

static int ctr_crypt(struct skcipher_request *req)
{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- u8 keystream[AES_BLOCK_SIZE];
- struct skcipher_walk walk;
- unsigned int nbytes;
- int err;
-
- err = skcipher_walk_virt(&walk, req, false);
-
- while ((nbytes = walk.nbytes) > 0) {
- kernel_fpu_begin();
- if (nbytes & AES_BLOCK_MASK)
- static_call(aesni_ctr_enc_tfm)(ctx, walk.dst.virt.addr,
- walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK,
- walk.iv);
- nbytes &= ~AES_BLOCK_MASK;
-
- if (walk.nbytes == walk.total && nbytes > 0) {
- aesni_enc(ctx, keystream, walk.iv);
- crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
- walk.src.virt.addr + walk.nbytes - nbytes,
- keystream, nbytes);
- crypto_inc(walk.iv, AES_BLOCK_SIZE);
- nbytes = 0;
- }
- kernel_fpu_end();
- err = skcipher_walk_done(&walk, nbytes);
- }
- return err;
+ return ctr_crypt_common(req, static_call(aesni_ctr_enc_tfm), aesni_enc);
}

static int
@@ -606,8 +470,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
u8 *iv, void *aes_ctx, u8 *auth_tag,
unsigned long auth_tag_len)
{
- u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
- struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+ u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+ struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
unsigned long left = req->cryptlen;
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
@@ -762,8 +626,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;
__be32 counter = cpu_to_be32(1);

@@ -790,8 +654,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
unsigned int i;

if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -816,128 +680,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
- int err;
-
- err = xts_verify_key(tfm, key, keylen);
- if (err)
- return err;
-
- keylen /= 2;
-
- /* first half of xts-key is for crypt */
- err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
- key, keylen);
- if (err)
- return err;
-
- /* second half of xts-key is for tweak */
- return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
- key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
- struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
- int tail = req->cryptlen % AES_BLOCK_SIZE;
- struct skcipher_request subreq;
- struct skcipher_walk walk;
- int err;
-
- if (req->cryptlen < AES_BLOCK_SIZE)
- return -EINVAL;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
-
- if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
- int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
- skcipher_walk_abort(&walk);
-
- skcipher_request_set_tfm(&subreq, tfm);
- skcipher_request_set_callback(&subreq,
- skcipher_request_flags(req),
- NULL, NULL);
- skcipher_request_set_crypt(&subreq, req->src, req->dst,
- blocks * AES_BLOCK_SIZE, req->iv);
- req = &subreq;
-
- err = skcipher_walk_virt(&walk, req, false);
- if (!walk.nbytes)
- return err;
- } else {
- tail = 0;
- }
-
- kernel_fpu_begin();
-
- /* calculate first value of T */
- aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
- while (walk.nbytes > 0) {
- int nbytes = walk.nbytes;
-
- if (nbytes < walk.total)
- nbytes &= ~(AES_BLOCK_SIZE - 1);
-
- if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
- walk.dst.virt.addr, walk.src.virt.addr,
- nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
- if (walk.nbytes > 0)
- kernel_fpu_begin();
- }
-
- if (unlikely(tail > 0 && !err)) {
- struct scatterlist sg_src[2], sg_dst[2];
- struct scatterlist *src, *dst;
-
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
-
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
-
- kernel_fpu_begin();
- if (encrypt)
- aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- else
- aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
- walk.dst.virt.addr, walk.src.virt.addr,
- walk.nbytes, walk.iv);
- kernel_fpu_end();
-
- err = skcipher_walk_done(&walk, 0);
- }
- return err;
+ return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
}

static int xts_encrypt(struct skcipher_request *req)
{
- return xts_crypt(req, true);
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
}

static int xts_decrypt(struct skcipher_request *req)
{
- return xts_crypt(req, false);
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
}

static struct crypto_alg aesni_cipher_alg = {
@@ -1066,8 +819,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
__be32 counter = cpu_to_be32(1);

memcpy(iv, req->iv, 12);
@@ -1083,8 +836,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
void *aes_ctx = &(ctx->aes_key_expanded);
- u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
- u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+ u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+ u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);

memcpy(iv, req->iv, 12);
*((__be32 *)(iv+12)) = counter;
@@ -1107,7 +860,7 @@ static struct aead_alg aesni_aeads[] = { {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aesni_rfc4106_gcm_ctx),
- .cra_alignmask = AESNI_ALIGN - 1,
+ .cra_alignmask = AES_ALIGN - 1,
.cra_module = THIS_MODULE,
},
}, {
@@ -1124,7 +877,7 @@ static struct aead_alg aesni_aeads[] = { {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct generic_gcmaes_ctx),
- .cra_alignmask = AESNI_ALIGN - 1,
+ .cra_alignmask = AES_ALIGN - 1,
.cra_module = THIS_MODULE,
},
} };
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..51049f86d78e
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);
+asmlinkage void _aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len);
+asmlinkage void _aesni_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+asmlinkage void _aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+ _aesni_enc(ctx, out, in);
+ return 0;
+}
+
+static inline int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+ _aesni_dec(ctx, out, in);
+ return 0;
+}
+
+static inline int aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len)
+{
+ _aesni_ecb_enc(ctx, out, in, len);
+ return 0;
+}
+
+static inline int aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len)
+{
+ _aesni_ecb_dec(ctx, out, in, len);
+ return 0;
+}
+
+static inline int aesni_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ _aesni_cbc_enc(ctx, out, in, len, iv);
+ return 0;
+}
+
+static inline int aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ _aesni_cbc_dec(ctx, out, in, len, iv);
+ return 0;
+}
+
+static inline int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ _aesni_xts_encrypt(ctx, out, in, len, iv);
+ return 0;
+}
+
+static inline int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ _aesni_xts_decrypt(ctx, out, in, len, iv);
+ return 0;
+}
+
+#ifdef CONFIG_X86_64
+
+asmlinkage void _aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+
+static inline int aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ _aesni_ctr_enc(ctx, out, in, len, iv);
+ return 0;
+}
+
+#endif /* CONFIG_X86_64 */
+
--
2.17.1


2021-11-24 20:15:12

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 14/15] crypto: x86/aes-kl - Support CTR mode

Implement CTR mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/crypto/aeskl-intel_asm.S | 174 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 55 +++++++++
2 files changed, 229 insertions(+)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index 5ee7b24ee3c8..ffde0cd3dd42 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -563,3 +563,177 @@ SYM_FUNC_START(_aeskl_cbc_dec)
ret
SYM_FUNC_END(_aeskl_cbc_dec)

+#ifdef __x86_64__
+
+/*
+ * _aeskl_ctr_inc_init: internal ABI
+ * setup registers used by _aesni_inc
+ * input:
+ * IV
+ * output:
+ * CTR: == IV, in little endian
+ * TCTR_LOW: == lower qword of CTR
+ * INC: == 1, in little endian
+ * BSWAP_MASK == endian swapping mask
+ */
+SYM_FUNC_START_LOCAL(_aeskl_ctr_inc_init)
+ movaps .Lbswap_mask, BSWAP_MASK
+ movaps IV, CTR
+ pshufb BSWAP_MASK, CTR
+ mov $1, TCTR_LOW
+ movq TCTR_LOW, INC
+ movq CTR, TCTR_LOW
+ ret
+SYM_FUNC_END(_aeskl_ctr_inc_init)
+
+/*
+ * _aeskl_ctr_inc: internal ABI
+ * Increase IV by 1, IV is in big endian
+ * input:
+ * IV
+ * CTR: == IV, in little endian
+ * TCTR_LOW: == lower qword of CTR
+ * INC: == 1, in little endian
+ * BSWAP_MASK == endian swapping mask
+ * output:
+ * IV: Increase by 1
+ * changed:
+ * CTR: == output IV, in little endian
+ * TCTR_LOW: == lower qword of CTR
+ */
+SYM_FUNC_START_LOCAL(_aeskl_ctr_inc)
+ paddq INC, CTR
+ add $1, TCTR_LOW
+ jnc .Linc_low
+ pslldq $8, INC
+ paddq INC, CTR
+ psrldq $8, INC
+.Linc_low:
+ movaps CTR, IV
+ pshufb BSWAP_MASK, IV
+ ret
+SYM_FUNC_END(_aeskl_ctr_inc)
+
+/*
+ * CTR implementations
+ */
+
+/*
+ * int _aeskl_ctr_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ * size_t len, u8 *iv)
+ */
+SYM_FUNC_START(_aeskl_ctr_enc)
+ FRAME_BEGIN
+ cmp $16, LEN
+ jb .Lctr_enc_noerr
+ mov 480(HANDLEP), KLEN
+ movdqu (IVP), IV
+ call _aeskl_ctr_inc_init
+ cmp $128, LEN
+ jb .Lctr_enc1
+
+.align 4
+.Lctr_enc8:
+ movaps IV, STATE1
+ call _aeskl_ctr_inc
+ movaps IV, STATE2
+ call _aeskl_ctr_inc
+ movaps IV, STATE3
+ call _aeskl_ctr_inc
+ movaps IV, STATE4
+ call _aeskl_ctr_inc
+ movaps IV, STATE5
+ call _aeskl_ctr_inc
+ movaps IV, STATE6
+ call _aeskl_ctr_inc
+ movaps IV, STATE7
+ call _aeskl_ctr_inc
+ movaps IV, STATE8
+ call _aeskl_ctr_inc
+
+ cmp $16, KLEN
+ je .Lctr_enc8_128
+ aesencwide256kl (%rdi)
+ jz .Lctr_enc_err
+ jmp .Lctr_enc8_end
+.Lctr_enc8_128:
+ aesencwide128kl (%rdi)
+ jz .Lctr_enc_err
+.Lctr_enc8_end:
+
+ movups (INP), IN1
+ pxor IN1, STATE1
+ movups STATE1, (OUTP)
+
+ movups 0x10(INP), IN1
+ pxor IN1, STATE2
+ movups STATE2, 0x10(OUTP)
+
+ movups 0x20(INP), IN1
+ pxor IN1, STATE3
+ movups STATE3, 0x20(OUTP)
+
+ movups 0x30(INP), IN1
+ pxor IN1, STATE4
+ movups STATE4, 0x30(OUTP)
+
+ movups 0x40(INP), IN1
+ pxor IN1, STATE5
+ movups STATE5, 0x40(OUTP)
+
+ movups 0x50(INP), IN1
+ pxor IN1, STATE6
+ movups STATE6, 0x50(OUTP)
+
+ movups 0x60(INP), IN1
+ pxor IN1, STATE7
+ movups STATE7, 0x60(OUTP)
+
+ movups 0x70(INP), IN1
+ pxor IN1, STATE8
+ movups STATE8, 0x70(OUTP)
+
+ sub $128, LEN
+ add $128, INP
+ add $128, OUTP
+ cmp $128, LEN
+ jge .Lctr_enc8
+ cmp $16, LEN
+ jb .Lctr_enc_end
+
+.align 4
+.Lctr_enc1:
+ movaps IV, STATE1
+ call _aeskl_ctr_inc
+
+ cmp $16, KLEN
+ je .Lctr_enc1_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lctr_enc_err
+ jmp .Lctr_enc1_end
+.Lctr_enc1_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lctr_enc_err
+
+.Lctr_enc1_end:
+ movups (INP), IN1
+ pxor IN1, STATE1
+ movups STATE1, (OUTP)
+ sub $16, LEN
+ add $16, INP
+ add $16, OUTP
+ cmp $16, LEN
+ jge .Lctr_enc1
+
+.Lctr_enc_end:
+ movdqu IV, (IVP)
+.Lctr_enc_noerr:
+ xor AREG, AREG
+ jmp .Lctr_enc_ret
+.Lctr_enc_err:
+ mov $1, AREG
+.Lctr_enc_ret:
+ FRAME_END
+ ret
+SYM_FUNC_END(_aeskl_ctr_enc)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 742576ae0481..f99dfa4a052f 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -35,6 +35,11 @@ asmlinkage int _aeskl_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
asmlinkage int _aeskl_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
u8 *iv);

+#ifdef CONFIG_X86_64
+asmlinkage int _aeskl_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv);
+#endif
+
static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
unsigned int key_len)
{
@@ -144,6 +149,23 @@ static int aeskl_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsi
return 0;
}

+#ifdef CONFIG_X86_64
+
+static int aeskl_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
+ u8 *iv)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_ctr_enc(ctx, out, in, len, iv))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+#endif /* CONFIG_X86_64 */
+
static int aeskl_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int len)
{
@@ -193,6 +215,20 @@ static int cbc_decrypt(struct skcipher_request *req)
return cbc_crypt_common(req, aesni_cbc_dec);
}

+#ifdef CONFIG_X86_64
+
+static int ctr_crypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return ctr_crypt_common(req, aeskl_ctr_enc, aeskl_enc);
+ else
+ return ctr_crypt_common(req, aesni_ctr_enc, aesni_enc);
+}
+
+#endif /* CONFIG_X86_64 */
+
static struct skcipher_alg aeskl_skciphers[] = {
{
.base = {
@@ -225,6 +261,25 @@ static struct skcipher_alg aeskl_skciphers[] = {
.setkey = aeskl_skcipher_setkey,
.encrypt = cbc_encrypt,
.decrypt = cbc_decrypt,
+#ifdef CONFIG_X86_64
+ }, {
+ .base = {
+ .cra_name = "__ctr(aes)",
+ .cra_driver_name = "__ctr-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = 1,
+ .cra_ctxsize = CRYPTO_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .setkey = aeskl_skcipher_setkey,
+ .encrypt = ctr_crypt,
+ .decrypt = ctr_crypt,
+#endif
}
};

--
2.17.1


2021-11-24 20:15:15

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3 15/15] crypto: x86/aes-kl - Support XTS mode

Implement XTS mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

The assembly code clobbers more than eight 128-bit registers to make use of
performant wide instructions that process eight 128-bit blocks at once.
But that many 128-bit registers are not available in 32-bit mode, so
support 64-bit mode only.

Signed-off-by: Chang S. Bae <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
arch/x86/crypto/aeskl-intel_asm.S | 447 +++++++++++++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c | 74 +++++
2 files changed, 521 insertions(+)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index ffde0cd3dd42..d542324071d4 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -737,3 +737,450 @@ SYM_FUNC_START(_aeskl_ctr_enc)
ret
SYM_FUNC_END(_aeskl_ctr_enc)

+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR: == temporary value
+ */
+#define _aeskl_gf128mul_x_ble() \
+ pshufd $0x13, IV, KEY; \
+ paddq IV, IV; \
+ psrad $31, KEY; \
+ pand GF128MUL_MASK, KEY; \
+ pxor KEY, IV;
+
+/*
+ * int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_encrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+ sub $128, LEN
+ jl .Lxts_enc1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_enc8_128
+ aesencwide256kl (%rdi)
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+ aesencwide128kl (%rdi)
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+ movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+ mov $1, AREG
+.Lxts_enc_ret:
+ FRAME_END
+ ret
+
+.Lxts_enc1_pre:
+ add $128, LEN
+ jz .Lxts_enc_ret_iv
+ sub $16, LEN
+ jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+ movdqu (INP), STATE1
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_enc1_out
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_enc_cts1
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+ movdqu STATE8, STATE1
+ sub $16, OUTP
+
+.Lxts_enc_cts1:
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_enc1_cts_128
+ aesenc256kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+ jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+ aesenc128kl (HANDLEP), STATE1
+ jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+ pxor IV, STATE1
+ movups STATE1, (OUTP)
+ jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(_aeskl_xts_encrypt)
+
+/*
+ * int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ * const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_decrypt)
+ FRAME_BEGIN
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+ movups (IVP), IV
+
+ mov 480(HANDLEP), KLEN
+
+ test $15, LEN
+ jz .Lxts_dec8
+ sub $16, LEN
+
+.Lxts_dec8:
+ sub $128, LEN
+ jl .Lxts_dec1_pre
+
+ movdqa IV, STATE1
+ movdqu (INP), INC
+ pxor INC, STATE1
+ movdqu IV, (OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE2
+ movdqu 0x10(INP), INC
+ pxor INC, STATE2
+ movdqu IV, 0x10(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE3
+ movdqu 0x20(INP), INC
+ pxor INC, STATE3
+ movdqu IV, 0x20(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE4
+ movdqu 0x30(INP), INC
+ pxor INC, STATE4
+ movdqu IV, 0x30(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE5
+ movdqu 0x40(INP), INC
+ pxor INC, STATE5
+ movdqu IV, 0x40(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE6
+ movdqu 0x50(INP), INC
+ pxor INC, STATE6
+ movdqu IV, 0x50(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE7
+ movdqu 0x60(INP), INC
+ pxor INC, STATE7
+ movdqu IV, 0x60(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+ movdqa IV, STATE8
+ movdqu 0x70(INP), INC
+ pxor INC, STATE8
+ movdqu IV, 0x70(OUTP)
+
+ cmp $16, KLEN
+ je .Lxts_dec8_128
+ aesdecwide256kl (%rdi)
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+ aesdecwide128kl (%rdi)
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+ movdqu 0x00(OUTP), INC
+ pxor INC, STATE1
+ movdqu STATE1, 0x00(OUTP)
+
+ movdqu 0x10(OUTP), INC
+ pxor INC, STATE2
+ movdqu STATE2, 0x10(OUTP)
+
+ movdqu 0x20(OUTP), INC
+ pxor INC, STATE3
+ movdqu STATE3, 0x20(OUTP)
+
+ movdqu 0x30(OUTP), INC
+ pxor INC, STATE4
+ movdqu STATE4, 0x30(OUTP)
+
+ movdqu 0x40(OUTP), INC
+ pxor INC, STATE5
+ movdqu STATE5, 0x40(OUTP)
+
+ movdqu 0x50(OUTP), INC
+ pxor INC, STATE6
+ movdqu STATE6, 0x50(OUTP)
+
+ movdqu 0x60(OUTP), INC
+ pxor INC, STATE7
+ movdqu STATE7, 0x60(OUTP)
+
+ movdqu 0x70(OUTP), INC
+ pxor INC, STATE8
+ movdqu STATE8, 0x70(OUTP)
+
+ _aeskl_gf128mul_x_ble()
+
+ add $128, INP
+ add $128, OUTP
+ test LEN, LEN
+ jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+ movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+ xor AREG, AREG
+ jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+ mov $1, AREG
+.Lxts_dec_ret:
+ FRAME_END
+ ret
+
+.Lxts_dec1_pre:
+ add $128, LEN
+ jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+ movdqu (INP), STATE1
+
+ add $16, INP
+ sub $16, LEN
+ jl .Lxts_dec_cts1
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+ pxor IV, STATE1
+ _aeskl_gf128mul_x_ble()
+
+ test LEN, LEN
+ jz .Lxts_dec1_out
+
+ movdqu STATE1, (OUTP)
+ add $16, OUTP
+ jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+ movdqu STATE1, (OUTP)
+ jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+ movdqa IV, STATE5
+ _aeskl_gf128mul_x_ble()
+
+ pxor IV, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_pre_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+ pxor IV, STATE1
+
+ lea .Lcts_permute_table(%rip), T1
+ add LEN, INP /* rewind input pointer */
+ add $16, LEN /* # bytes in final block */
+ movups (INP), IN1
+
+ mov T1, IVP
+ add $32, IVP
+ add LEN, T1
+ sub LEN, IVP
+ add OUTP, LEN
+
+ movups (T1), STATE2
+ movaps STATE1, STATE3
+ pshufb STATE2, STATE1
+ movups STATE1, (LEN)
+
+ movups (IVP), STATE1
+ pshufb STATE1, IN1
+ pblendvb STATE3, IN1
+ movaps IN1, STATE1
+
+ pxor STATE5, STATE1
+
+ cmp $16, KLEN
+ je .Lxts_dec1_cts_128
+ aesdec256kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+ jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+ aesdec128kl (HANDLEP), STATE1
+ jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+ pxor STATE5, STATE1
+
+ movups STATE1, (OUTP)
+ jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(_aeskl_xts_decrypt)
+
+#endif
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index f99dfa4a052f..bfac7452709d 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -38,6 +38,11 @@ asmlinkage int _aeskl_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
#ifdef CONFIG_X86_64
asmlinkage int _aeskl_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len,
u8 *iv);
+
+asmlinkage int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
+asmlinkage int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv);
#endif

static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
@@ -164,6 +169,32 @@ static int aeskl_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsi
return 0;
}

+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_xts_encrypt(ctx, out, in, len, iv))
+ return -EINVAL;
+ else
+ return 0;
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+ unsigned int len, u8 *iv)
+{
+ if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+ return -EINVAL;
+ else if (!valid_keylocker())
+ return -ENODEV;
+ else if (_aeskl_xts_decrypt(ctx, out, in, len, iv))
+ return -EINVAL;
+ else
+ return 0;
+}
+
#endif /* CONFIG_X86_64 */

static int aeskl_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
@@ -227,6 +258,32 @@ static int ctr_crypt(struct skcipher_request *req)
return ctr_crypt_common(req, aesni_ctr_enc, aesni_enc);
}

+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+ if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+ return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+ else
+ return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
#endif /* CONFIG_X86_64 */

static struct skcipher_alg aeskl_skciphers[] = {
@@ -279,6 +336,23 @@ static struct skcipher_alg aeskl_skciphers[] = {
.setkey = aeskl_skcipher_setkey,
.encrypt = ctr_crypt,
.decrypt = ctr_crypt,
+ }, {
+ .base = {
+ .cra_name = "__xts(aes)",
+ .cra_driver_name = "__xts-aes-aeskl",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = XTS_AES_CTX_SIZE,
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = 2 * AES_MIN_KEY_SIZE,
+ .max_keysize = 2 * AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = 2 * AES_BLOCK_SIZE,
+ .setkey = aeskl_xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
#endif
}
};
--
2.17.1


2021-11-30 03:27:30

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] x86: Support Key Locker

On Wed, Nov 24, 2021 at 12:06:45PM -0800, Chang S. Bae wrote:
>
> == Non Use Cases ==
>
> Bare metal disk encryption is the only use case intended by these patches.

If that's the case, why are so many encryption modes being added (ECB, CTR, CBC,
and XTS)? Wouldn't just XTS be sufficient?

> * PATCH10-15: For the x86 crypto library, it first prepares the AES-NI code
> to accommodate the new AES implementation. Then incrementally add base
> functions and various modes support -- ECB, CBC, CTR, and XTS. The code
> was found to pass the crypto test.

Did you test with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?

- Eric

2021-11-30 03:30:43

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 08/15] x86/power/keylocker: Restore internal wrapping key from the ACPI S3/4 sleep states

On Wed, Nov 24, 2021 at 12:06:53PM -0800, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
> index 820ac29c06d9..4000a5eed2c2 100644
> --- a/arch/x86/include/asm/keylocker.h
> +++ b/arch/x86/include/asm/keylocker.h
> @@ -32,9 +32,13 @@ struct iwkey {
> #ifdef CONFIG_X86_KEYLOCKER
> void setup_keylocker(struct cpuinfo_x86 *c);
> void destroy_keylocker_data(void);
> +void restore_keylocker(void);
> +extern bool valid_keylocker(void);
> #else
> #define setup_keylocker(c) do { } while (0)
> #define destroy_keylocker_data() do { } while (0)
> +#define restore_keylocker() do { } while (0)
> +static inline bool valid_keylocker { return false; }
> #endif

This breaks the build when !CONFIG_X86_KEYLOCKER.

- Eric

2021-11-30 03:49:09

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index ef6c0b9f69c6..f696b037faa5 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
> aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
> aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
>
> +obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
> +aeskl-intel-y := aeskl-intel_asm.o aesni-intel_asm.o aeskl-intel_glue.o aes-intel_glue.o

This makes the object files aesni-intel_asm.o and aes-intel_glue.o each be built
into two separate kernel modules. My understanding is that duplicating code
like that is generally frowned upon. These files should either be built into a
separate module, which both aesni-intel.ko and aeskl-intel.ko would depend on,
or aeskl-intel.ko should depend on aesni-intel.ko.

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> new file mode 100644
> index 000000000000..d56ec8dd6644
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-intel_asm.S

This file gets very long after all the modes are added (> 1100 lines). Is there
really no feasible way to share code between this and aesni-intel_asm.S, similar
to how the arm64 AES implementations work? Surely most of the logic is the
same, and it's just the actual AES instructions that differ?

> +config CRYPTO_AES_KL
> + tristate "AES cipher algorithms (AES-KL)"
> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
> + depends on DM_CRYPT

'depends on DM_CRYPT' doesn't really make sense here, since there is no actual
dependency on dm-crypt in the code.

- Eric

2021-11-30 06:36:23

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] x86: Support Key Locker

On Nov 29, 2021, at 19:27, Eric Biggers <[email protected]> wrote:
> On Wed, Nov 24, 2021 at 12:06:45PM -0800, Chang S. Bae wrote:
>>
>> == Non Use Cases ==
>>
>> Bare metal disk encryption is the only use case intended by these patches.
>
> If that's the case, why are so many encryption modes being added (ECB, CTR, CBC,
> and XTS)? Wouldn't just XTS be sufficient?

Right, it would reduce the crypt library changes significantly. But it is
clueless whether XTS is sufficient to support DM-crypt, because a user may
select the kernel’s crypto API via ‘capi:', [1].

>> * PATCH10-15: For the x86 crypto library, it first prepares the AES-NI code
>> to accommodate the new AES implementation. Then incrementally add base
>> functions and various modes support -- ECB, CBC, CTR, and XTS. The code
>> was found to pass the crypto test.
>
> Did you test with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?

Yes.

Thanks,
Chang

[1] https://gitlab.com/cryptsetup/cryptsetup/-/wikis/DMCrypt

2021-11-30 06:38:38

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v3-fix 08/15] x86/power/keylocker: Restore internal wrapping key from the ACPI S3/4 sleep states

When the system state switches to these sleep states, the internal
wrapping key gets reset in the CPU state.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes from v3:
* Fix the build error. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
arch/x86/include/asm/keylocker.h | 4 +
arch/x86/kernel/keylocker.c | 124 ++++++++++++++++++++++++++++++-
arch/x86/power/cpu.c | 2 +
3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..4000a5eed2c2 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
#ifdef CONFIG_X86_KEYLOCKER
void setup_keylocker(struct cpuinfo_x86 *c);
void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
#else
#define setup_keylocker(c) do { } while (0)
#define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker() { return false; }
#endif

#endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..ff0e012e3dd5 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
#include <asm/fpu/api.h>
#include <asm/keylocker.h>
#include <asm/tlbflush.h>
+#include <asm/msr.h>

static __initdata struct keylocker_setup_data {
+ bool initialized;
struct iwkey key;
} kl_setup;

+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+ return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
static void __init generate_keylocker_data(void)
{
get_random_bytes(&kl_setup.key.integrity_key, sizeof(kl_setup.key.integrity_key));
@@ -25,6 +40,8 @@ static void __init generate_keylocker_data(void)
void __init destroy_keylocker_data(void)
{
memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+ kl_setup.initialized = true;
+ valid_kl = true;
}

static void __init load_keylocker(void)
@@ -34,6 +51,27 @@ static void __init load_keylocker(void)
kernel_fpu_end();
}

+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns: -EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+ u64 status;
+
+ wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+ rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+ if (status & BIT(0))
+ return 0;
+ else
+ return -EBUSY;
+}
+
/**
* setup_keylocker - Enable the feature.
* @c: A pointer to struct cpuinfo_x86
@@ -49,6 +87,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)

if (c == &boot_cpu_data) {
u32 eax, ebx, ecx, edx;
+ bool backup_available;

cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
/*
@@ -62,10 +101,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
goto disable;
}

+ backup_available = (ebx & KEYLOCKER_CPUID_EBX_BACKUP) ? true : false;
+ /*
+ * The internal wrapping key in CPU state is volatile in
+ * S3/4 states. So ensure the backup capability along with
+ * S-states.
+ */
+ if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+ pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+ goto disable;
+ }
+
generate_keylocker_data();
- }
+ load_keylocker();

- load_keylocker();
+ /* Backup an internal wrapping key in non-volatile media. */
+ if (backup_available)
+ wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+ } else {
+ int rc;
+
+ /*
+ * Load the internal wrapping key directly when available
+ * in memory, which is only possible at boot-time.
+ *
+ * NB: When system wakes up, this path also recovers the
+ * internal wrapping key.
+ */
+ if (!kl_setup.initialized) {
+ load_keylocker();
+ } else if (valid_kl) {
+ rc = copy_keylocker();
+ /*
+ * The boot CPU was successful but the key copy
+ * fails here. Then, the subsequent feature use
+ * will have inconsistent keys and failures. So,
+ * invalidate the feature via the flag.
+ */
+ if (rc) {
+ valid_kl = false;
+ pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+ }
+ }
+ }

pr_info_once("x86/keylocker: Enabled.\n");
return;
@@ -77,3 +155,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
/* Make sure the feature disabled for kexec-reboot. */
cr4_clear_bits(X86_CR4_KEYLOCKER);
}
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+ u64 backup_status;
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+ return;
+
+ /*
+ * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+ * an invalid backup if bit 0 is set and a read (or write) error if
+ * bit 2 is set.
+ */
+ rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+ if (backup_status & BIT(0)) {
+ rc = copy_keylocker();
+ if (rc)
+ pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+ else
+ return;
+ } else {
+ pr_err("x86/keylocker: The key backup access failed with %s.\n",
+ (backup_status & BIT(2)) ? "read error" : "invalid status");
+ }
+
+ /*
+ * Now the backup key is not available. Invalidate the feature via
+ * the flag to avoid any subsequent use. But keep the feature with
+ * zero IWKeys instead of disabling it. The current users will see
+ * key handle integrity failure but that's because of the internal
+ * key change.
+ */
+ pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+ valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
#include <asm/cpu.h>
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>

#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
mtrr_bp_restore();
perf_restore_debug_store();
msr_restore_context(ctxt);
+ restore_keylocker();

c = &cpu_data(smp_processor_id());
if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
--
2.17.1


2021-11-30 06:56:39

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 08/15] x86/power/keylocker: Restore internal wrapping key from the ACPI S3/4 sleep states

On Nov 29, 2021, at 19:30, Eric Biggers <[email protected]> wrote:
> On Wed, Nov 24, 2021 at 12:06:53PM -0800, Chang S. Bae wrote:
>> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
>> index 820ac29c06d9..4000a5eed2c2 100644
>> --- a/arch/x86/include/asm/keylocker.h
>> +++ b/arch/x86/include/asm/keylocker.h
>> @@ -32,9 +32,13 @@ struct iwkey {
>> #ifdef CONFIG_X86_KEYLOCKER
>> void setup_keylocker(struct cpuinfo_x86 *c);
>> void destroy_keylocker_data(void);
>> +void restore_keylocker(void);
>> +extern bool valid_keylocker(void);
>> #else
>> #define setup_keylocker(c) do { } while (0)
>> #define destroy_keylocker_data() do { } while (0)
>> +#define restore_keylocker() do { } while (0)
>> +static inline bool valid_keylocker { return false; }
>> #endif
>
> This breaks the build when !CONFIG_X86_KEYLOCKER.

Ah, sorry. My bad on the last-minute change.

Thanks,
Chang

2021-11-30 06:57:30

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Nov 29, 2021, at 19:48, Eric Biggers <[email protected]> wrote:
> On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
>> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
>> index ef6c0b9f69c6..f696b037faa5 100644
>> --- a/arch/x86/crypto/Makefile
>> +++ b/arch/x86/crypto/Makefile
>> @@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
>> aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
>> aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
>>
>> +obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
>> +aeskl-intel-y := aeskl-intel_asm.o aesni-intel_asm.o aeskl-intel_glue.o aes-intel_glue.o
>
> This makes the object files aesni-intel_asm.o and aes-intel_glue.o each be built
> into two separate kernel modules. My understanding is that duplicating code
> like that is generally frowned upon. These files should either be built into a
> separate module, which both aesni-intel.ko and aeskl-intel.ko would depend on,
> or aeskl-intel.ko should depend on aesni-intel.ko.

The only reason to include the AES-NI object here is that AES-KL does not
support the 192-bit key.

Maybe the fallback can be the aes-generic driver [1] instead of AES-NI here.

>> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
>> new file mode 100644
>> index 000000000000..d56ec8dd6644
>> --- /dev/null
>> +++ b/arch/x86/crypto/aeskl-intel_asm.S
>
> This file gets very long after all the modes are added (> 1100 lines). Is there
> really no feasible way to share code between this and aesni-intel_asm.S, similar
> to how the arm64 AES implementations work? Surely most of the logic is the
> same, and it's just the actual AES instructions that differ?

No, these two instruction sets are separate. So I think no room to share the
ASM code.

>> +config CRYPTO_AES_KL
>> + tristate "AES cipher algorithms (AES-KL)"
>> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
>> + depends on DM_CRYPT
>
> 'depends on DM_CRYPT' doesn't really make sense here, since there is no actual
> dependency on dm-crypt in the code.

I think the intention here is to build a policy that the library is available
only when there is a clear use case.

But maybe putting such restriction is too much here.

Thanks
Chang

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/crypto/aes_generic.c



2021-11-30 07:03:38

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Mon, Nov 29, 2021 at 10:57 PM Bae, Chang Seok
<[email protected]> wrote:
>
> On Nov 29, 2021, at 19:48, Eric Biggers <[email protected]> wrote:
> > On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
> >> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> >> index ef6c0b9f69c6..f696b037faa5 100644
> >> --- a/arch/x86/crypto/Makefile
> >> +++ b/arch/x86/crypto/Makefile
> >> @@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
> >> aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
> >> aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
> >>
> >> +obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
> >> +aeskl-intel-y := aeskl-intel_asm.o aesni-intel_asm.o aeskl-intel_glue.o aes-intel_glue.o
> >
> > This makes the object files aesni-intel_asm.o and aes-intel_glue.o each be built
> > into two separate kernel modules. My understanding is that duplicating code
> > like that is generally frowned upon. These files should either be built into a
> > separate module, which both aesni-intel.ko and aeskl-intel.ko would depend on,
> > or aeskl-intel.ko should depend on aesni-intel.ko.
>
> The only reason to include the AES-NI object here is that AES-KL does not
> support the 192-bit key.
>
> Maybe the fallback can be the aes-generic driver [1] instead of AES-NI here.
>
> >> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> >> new file mode 100644
> >> index 000000000000..d56ec8dd6644
> >> --- /dev/null
> >> +++ b/arch/x86/crypto/aeskl-intel_asm.S
> >
> > This file gets very long after all the modes are added (> 1100 lines). Is there
> > really no feasible way to share code between this and aesni-intel_asm.S, similar
> > to how the arm64 AES implementations work? Surely most of the logic is the
> > same, and it's just the actual AES instructions that differ?
>
> No, these two instruction sets are separate. So I think no room to share the
> ASM code.
>
> >> +config CRYPTO_AES_KL
> >> + tristate "AES cipher algorithms (AES-KL)"
> >> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
> >> + depends on DM_CRYPT
> >
> > 'depends on DM_CRYPT' doesn't really make sense here, since there is no actual
> > dependency on dm-crypt in the code.
>
> I think the intention here is to build a policy that the library is available
> only when there is a clear use case.
>
> But maybe putting such restriction is too much here.
>

Yeah, my bad the "depends on DM_CRYPT" can go. Even though the Key
Locker support has no real pressing reason to be built without it,
there is still no actual code dependency.

2021-11-30 07:23:49

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] x86: Support Key Locker

On Tue, Nov 30, 2021 at 06:36:15AM +0000, Bae, Chang Seok wrote:
> On Nov 29, 2021, at 19:27, Eric Biggers <[email protected]> wrote:
> > On Wed, Nov 24, 2021 at 12:06:45PM -0800, Chang S. Bae wrote:
> >>
> >> == Non Use Cases ==
> >>
> >> Bare metal disk encryption is the only use case intended by these patches.
> >
> > If that's the case, why are so many encryption modes being added (ECB, CTR, CBC,
> > and XTS)? Wouldn't just XTS be sufficient?
>
> Right, it would reduce the crypt library changes significantly. But it is
> clueless whether XTS is sufficient to support DM-crypt, because a user may
> select the kernel’s crypto API via ‘capi:', [1].
>

Just because dm-crypt allows you to create a ECB or CTR encrypted disk does not
mean that it is a good idea.

- Eric

2021-11-30 07:34:12

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] x86: Support Key Locker

On Nov 29, 2021, at 23:23, Eric Biggers <[email protected]> wrote:
> On Tue, Nov 30, 2021 at 06:36:15AM +0000, Bae, Chang Seok wrote:
>> On Nov 29, 2021, at 19:27, Eric Biggers <[email protected]> wrote:
>>> On Wed, Nov 24, 2021 at 12:06:45PM -0800, Chang S. Bae wrote:
>>>>
>>>> == Non Use Cases ==
>>>>
>>>> Bare metal disk encryption is the only use case intended by these patches.
>>>
>>> If that's the case, why are so many encryption modes being added (ECB, CTR, CBC,
>>> and XTS)? Wouldn't just XTS be sufficient?
>>
>> Right, it would reduce the crypt library changes significantly. But it is
>> clueless whether XTS is sufficient to support DM-crypt, because a user may
>> select the kernel’s crypto API via ‘capi:', [1].
>
> Just because dm-crypt allows you to create a ECB or CTR encrypted disk does not
> mean that it is a good idea.

Okay, it sounds like at least ECB and CTR supports are not practically needed.

I have no problem trimming down to exclude them.

Thanks,
Chang

2021-12-02 17:31:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
> + encodekey256 %eax, %eax

So this thing uses the fancy new keylocker instructions, however:

> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 285f82647d2b..784a04433549 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -1113,6 +1113,50 @@ config CRYPTO_AES_NI_INTEL
> ECB, CBC, LRW, XTS. The 64 bit version has additional
> acceleration for CTR.
>
> +config CRYPTO_AES_KL
> + tristate "AES cipher algorithms (AES-KL)"
> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
> + depends on DM_CRYPT
> + select X86_KEYLOCKER
> + select CRYPTO_AES_NI_INTEL


There is no dependency on the compiler actually supporting them..

config AS_HAS_KEYLOCKER
def_bool $(as-instr,encodekey256)

depends on AS_HAS_KEYLOCKER

Hmm?


> +
> + help
> + Key Locker provides AES SIMD instructions (AES-KL) for secure
> + data encryption and decryption. While this new instruction
> + set is analogous to AES-NI, AES-KL supports to encode an AES
> + key to an encoded form ('key handle') and uses it to transform
> + data instead of accessing the AES key.
> +
> + The setkey() transforms an AES key to a key handle, then the AES
> + key is no longer needed for data transformation. A user may
> + displace their keys from possible exposition.
> +
> + This key encryption is done by the CPU-internal wrapping key. The
> + x86 core code loads a new random key at every boot time and
> + restores it from deep sleep states. This wrapping key support is
> + provided with X86_KEYLOCKER.
> +
> + AES-KL supports 128-/256-bit keys only. While giving a 192-bit
> + key does not return an error, as AES-NI is chosen to process it,
> + the claimed security property is not available with that.
> +
> + GNU binutils version 2.36 or above and LLVM version 12 or above
> + are assemblers that support AES-KL instructions.
> +
> + Bare metal disk encryption is the preferred use case. Make it
> + depend on DM_CRYPT.
> +
> + This selection enables an alternative crypto cipher for
> + cryptsetup, e.g. "capi:xts-aes-aeskl-plain", to use with dm-crypt
> + volumes. It trades off raw performance for reduced clear-text key
> + exposure and has an additional failure mode compared to AES-NI.
> + See Documentation/x86/keylocker.rst for more details. Key Locker
> + usage requires explicit opt-in at cryptsetup time. So, select it
> + if unsure.
> +
> + See also the CRYPTO_AES_NI_INTEL description for more about the
> + AES cipher algorithm.
> +
> config CRYPTO_AES_SPARC64
> tristate "AES cipher algorithms (SPARC64)"
> depends on SPARC64
> --
> 2.17.1
>

2021-12-06 21:32:45

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Dec 2, 2021, at 06:21, Peter Zijlstra <[email protected]> wrote:
> On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
>> + encodekey256 %eax, %eax
>
> So this thing uses the fancy new keylocker instructions, however:
>
>> diff --git a/crypto/Kconfig b/crypto/Kconfig
>> index 285f82647d2b..784a04433549 100644
>> --- a/crypto/Kconfig
>> +++ b/crypto/Kconfig
>> @@ -1113,6 +1113,50 @@ config CRYPTO_AES_NI_INTEL
>> ECB, CBC, LRW, XTS. The 64 bit version has additional
>> acceleration for CTR.
>>
>> +config CRYPTO_AES_KL
>> + tristate "AES cipher algorithms (AES-KL)"
>> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
>> + depends on DM_CRYPT
>> + select X86_KEYLOCKER
>> + select CRYPTO_AES_NI_INTEL
>
>
> There is no dependency on the compiler actually supporting them..
>
> config AS_HAS_KEYLOCKER
> def_bool $(as-instr,encodekey256)
>
> depends on AS_HAS_KEYLOCKER
>
> Hmm?

Well, LD_VERSION reflects the binutils version.

But yes, the as-instr macro looks to be useful here:

+config AS_HAS_KEYLOCKER
+ def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+ help
+ Supported by binutils >=2.36 and LLVM integrated assembler >= V12
+
config CRYPTO_AES_KL
tristate "AES cipher algorithms (AES-KL)"
- depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
+ depends on AS_HAS_KEYLOCKER

Thanks,
Chang

2021-12-06 22:14:45

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Tue, 30 Nov 2021 at 07:57, Bae, Chang Seok <[email protected]> wrote:
>
> On Nov 29, 2021, at 19:48, Eric Biggers <[email protected]> wrote:
> > On Wed, Nov 24, 2021 at 12:06:56PM -0800, Chang S. Bae wrote:
...
> >> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> >> new file mode 100644
> >> index 000000000000..d56ec8dd6644
> >> --- /dev/null
> >> +++ b/arch/x86/crypto/aeskl-intel_asm.S
> >
> > This file gets very long after all the modes are added (> 1100 lines). Is there
> > really no feasible way to share code between this and aesni-intel_asm.S, similar
> > to how the arm64 AES implementations work? Surely most of the logic is the
> > same, and it's just the actual AES instructions that differ?
>
> No, these two instruction sets are separate. So I think no room to share the
> ASM code.
>

On arm64, we have

aes-ce.S, which uses AES instructions to implement the AES core transforms

aes-neon.S, which uses plain NEON instructions to implement the AES
core transforms

aes-modes.S, which can be combined with either of the above, and
implements the various chaining modes (ECB, CBC, CTR, XTS, and a
helper for CMAC, CBCMAC and XMAC)

If you have two different primitives for performing AES transforms
(the original round by round one, and the KL one that does 10 or 14
rounds at a time), you should still be able to reuse most of the code
that implements the non-trivial handling of the chaining modes.


> >> +config CRYPTO_AES_KL
> >> + tristate "AES cipher algorithms (AES-KL)"
> >> + depends on (LD_VERSION >= 23600) || (LLD_VERSION >= 120000)
> >> + depends on DM_CRYPT
> >
> > 'depends on DM_CRYPT' doesn't really make sense here, since there is no actual
> > dependency on dm-crypt in the code.
>
> I think the intention here is to build a policy that the library is available
> only when there is a clear use case.
>
> But maybe putting such restriction is too much here.
>
> Thanks
> Chang
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/crypto/aes_generic.c
>
>

2021-12-06 22:59:30

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

On Dec 6, 2021, at 14:14, Ard Biesheuvel <[email protected]> wrote:
> On Tue, 30 Nov 2021 at 07:57, Bae, Chang Seok <[email protected]> wrote:
>>
>>
>> No, these two instruction sets are separate. So I think no room to share the
>> ASM code.
>
> On arm64, we have
>
> aes-ce.S, which uses AES instructions to implement the AES core transforms
>
> aes-neon.S, which uses plain NEON instructions to implement the AES
> core transforms
>
> aes-modes.S, which can be combined with either of the above, and
> implements the various chaining modes (ECB, CBC, CTR, XTS, and a
> helper for CMAC, CBCMAC and XMAC)
>
> If you have two different primitives for performing AES transforms
> (the original round by round one, and the KL one that does 10 or 14
> rounds at a time), you should still be able to reuse most of the code
> that implements the non-trivial handling of the chaining modes.

Yes, no question about this for maintainability.

However, besides the fact that a KL instruction takes multiple rounds, some
AES-KL instructions have register constraints. E.g. AESENCWIDE256KL always
uses XMM0-7 for input blocks.

Today, AES-NI code maintains 32-bit compatibility, e.g. clobbering XMM2-3 for
key and input vector, so sharing the code makes the AES-KL code inefficient
and even ugly I think due to the register constraint. E.g. the AES-KL code
does use XMM9-10 for key and an input vector, but it has to move them around
just for code sharing.

Thanks,
Chang