LUKS is the standard for Linux disk encryption. Many users choose LUKS
and in some use cases like Confidential VM it's mandated. With kdump
enabled, when the 1st kernel crashes, the system could boot into the
kdump/crash kernel and dump the memory image i.e. /proc/vmcore to a
specified target. Currently, when dumping vmcore to a LUKS
encrypted device, there are two problems,
- Kdump kernel may not be able to decrypt the LUKS partition. For some
machines, a system administrator may not have a chance to enter the
password to decrypt the device in kdump initramfs after the 1st kernel
crashes; For cloud confidential VMs, depending on the policy the
kdump kernel may not be able to unseal the keys with TPM and the
console virtual keyboard is untrusted.
- LUKS2 by default use the memory-hard Argon2 key derivation function
which is quite memory-consuming compared to the limited memory reserved
for kdump. Take Fedora example, by default, only 256M is reserved for
systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
to be reserved for kdump. Note if the memory reserved for kdump can't
be used by 1st kernel i.e. an user sees ~1300M memory missing in the
1st kernel.
Besides users (at least for Fedora) usually expect kdump to work out of
the box i.e. no manual password input is needed. And it doesn't make
sense to derivate the keys again in kdump kernel which seems to be
redundant work.
This patch set addresses the above issues by make the LUKS volume keys
persistent for kdump kernel with the help of cryptsetup's new APIs
(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of
this kdump copy of LUKS volume keys,
1. After the 1st kernel loads the initramfs during boot, systemd
use an user-input passphrase or TPM-sealed key to de-crypt the LUKS
volume keys and then save the volume keys to specified keyring
(using the --link-vk-to-keyring API) and the key will expire within
specified time.
2. A user space tool (kdump initramfs builder) writes a key description to
/sys/kernel/crash_dm_crypt_keys to inform the 1st kernel to record the
key while building the kdump initramfs
3. The kexec_file_load syscall read the volume keys by recorded key
descriptions and then save them key to kdump reserved memory and wipe the
copy.
4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump
initramfs asks the kdump kernel to create a user key using the key stored in
kdump reserved memory by writing to to /sys/kernel/crash_dm_crypt_keys. Then
the LUKS encrypted devide is unlocked with libcryptsetup's
--volume-key-keyring API.
5. The system gets rebooted to the 1st kernel after dumping vmcore to
the LUKS encrypted device is finished
After libcryptsetup saving the LUKS volume keys to specified keyring,
whoever takes this should be responsible for the safety of these copies
of keys. The keys will be saved in the memory area exclusively reserved
for kdump where even the 1st kernel has no direct access. And further
more, two additional protections are added,
- save the copy randomly in kdump reserved memory as suggested by Jan
- clear the _PAGE_PRESENT flag of the page that stores the copy as
suggested by Pingfan
This patch set only supports x86. There will be patches to support other
architectures once this patch set gets merged.
v5
- Baoquan
- limit the feature of placing kexec_buf randomly to kdump (CONFIG_CRASH_DUMP)
- add documentation for added sysfs API
- allow to re-send init command to support the case of user switching to
a different LUKS-encrypted target
- make CONFIG_CRASH_DM_CRYPT depends on CONFIG_DM_CRYPT
- check if the number of keys exceed KEY_NUM_MAX
- rename (struct keys_header).key_count as (struct
keys_header).total_keys to improve code readiblity
- improve commit message
- fix the failure of calling crash_exclude_mem_range (there is a split
of mem_range)
- use ret instead of r as return code
- Greg
- add documentation for added sysfs API
- avoid spamming kernel logs
- fix a buffer overflow issue
- keep the state enums synced up with the string values
- use sysfs_emit other than sprintf
- explain KEY_NUM_MAX and KEY_SIZE_MAX
- s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/g
- improve code readability
- Rebase onto latest Linus tree
v4
- rebase onto latest Linus tree so Baoquan can apply the patches for
code review
- fix kernel test robot warnings
v3
- Support CPU/memory hot-plugging [Baoquan]
- Don't save the keys temporarily to simplify the implementation [Baoquan]
- Support multiple LUKS encrypted volumes
- Read logon key instead of user key to improve security [Ondrej]
- A kernel config option CRASH_DM_CRYPT for this feature (disabled by default)
- Fix warnings found by kernel test robot
- Rebase the code onto 6.9.0-rc5+
v2
- work together with libscryptsetup's --link-vk-to-keyring/--volume-key-keyring APIs [Milan and Ondrej]
- add the case where console virtual keyboard is untrusted for confidential VM
- use dm_crypt_key instead of LUKS volume key [Milan and Eric]
- fix some code format issues
- don't move "struct kexec_segment" declaration
- Rebase the code onto latest Linus tree (6.7.0)
v1
- "Put the luks key handling related to crash_dump out into a separate
file kernel/crash_dump_luks.c" [Baoquan]
- Put the generic luks handling code before the x86 specific code to
make it easier for other arches to follow suit [Baoquan]
- Use phys_to_virt instead of "pfn -> page -> vaddr" [Dave Hansen]
- Drop the RFC prefix [Dave Young]
- Rebase the code onto latest Linus tree (6.4.0-rc4)
RFC v2
- libcryptsetup interacts with the kernel via sysfs instead of "hacking"
dm-crypt
- to save a kdump copy of the LUKS volume key in 1st kernel
- to add a logon key using the copy for libcryptsetup in kdump kernel [Milan]
- to avoid the incorrect usage of LUKS master key in dm-crypt [Milan]
- save the kdump copy of LUKS volume key randomly [Jan]
- mark the kdump copy inaccessible [Pingfan]
- Miscellaneous
- explain when operations related to the LUKS volume key happen [Jan]
- s/master key/volume key/g
- use crash_ instead of kexec_ as function prefix
- fix commit subject prefixes e.g. "x86, kdump" to x86/crash
Coiby Xu (7):
kexec_file: allow to place kexec_buf randomly
crash_dump: make dm crypt keys persist for the kdump kernel
crash_dump: store dm crypt keys in kdump reserved memory
crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging
crash_dump: retrieve dm crypt keys in kdump kernel
x86/crash: pass dm crypt keys to kdump kernel
x86/crash: make the page that stores the dm crypt keys inaccessible
Documentation/ABI/testing/crash_dm_crypt_keys | 35 ++
arch/x86/kernel/crash.c | 20 +-
arch/x86/kernel/kexec-bzimage64.c | 7 +
arch/x86/kernel/machine_kexec_64.c | 22 ++
include/linux/crash_core.h | 9 +-
include/linux/crash_dump.h | 2 +
include/linux/kexec.h | 8 +
kernel/Kconfig.kexec | 9 +
kernel/Makefile | 1 +
kernel/crash_dump_dm_crypt.c | 338 ++++++++++++++++++
kernel/kexec_file.c | 21 ++
kernel/ksysfs.c | 24 ++
12 files changed, 493 insertions(+), 3 deletions(-)
create mode 100644 Documentation/ABI/testing/crash_dm_crypt_keys
create mode 100644 kernel/crash_dump_dm_crypt.c
base-commit: 8a92980606e3585d72d510a03b59906e96755b8a
--
2.45.1
When the kdump kernel image and initrd are loaded, the dm crypts keys
will be read from keyring and then stored in kdump reserved memory.
Signed-off-by: Coiby Xu <[email protected]>
---
include/linux/crash_core.h | 3 ++
include/linux/crash_dump.h | 2 +
include/linux/kexec.h | 4 ++
kernel/crash_dump_dm_crypt.c | 87 ++++++++++++++++++++++++++++++++++++
4 files changed, 96 insertions(+)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 6bff1c24efa3..ab20829d0bc9 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -37,6 +37,9 @@ static inline void arch_kexec_unprotect_crashkres(void) { }
#ifdef CONFIG_CRASH_DM_CRYPT
int crash_sysfs_dm_crypt_keys_read(char *buf);
int crash_sysfs_dm_crypt_keys_write(const char *buf, size_t count);
+int crash_load_dm_crypt_keys(struct kimage *image);
+#else
+static inline int crash_load_dm_crypt_keys(struct kimage *image) {return 0; }
#endif
#ifndef arch_crash_handle_hotplug_event
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index acc55626afdc..dfd8e4fe6129 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -15,6 +15,8 @@
extern unsigned long long elfcorehdr_addr;
extern unsigned long long elfcorehdr_size;
+extern unsigned long long dm_crypt_keys_addr;
+
#ifdef CONFIG_CRASH_DUMP
extern int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size);
extern void elfcorehdr_free(unsigned long long addr);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index c45bfc727737..cb6275928fbd 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -372,6 +372,10 @@ struct kimage {
void *elf_headers;
unsigned long elf_headers_sz;
unsigned long elf_load_addr;
+
+ /* dm crypt keys buffer */
+ unsigned long dm_crypt_keys_addr;
+ unsigned long dm_crypt_keys_sz;
};
/* kexec interface functions */
diff --git a/kernel/crash_dump_dm_crypt.c b/kernel/crash_dump_dm_crypt.c
index 608bde3aaa8e..0033152668ae 100644
--- a/kernel/crash_dump_dm_crypt.c
+++ b/kernel/crash_dump_dm_crypt.c
@@ -1,4 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/key.h>
+#include <linux/keyctl.h>
#include <keys/user-type.h>
#include <linux/crash_dump.h>
@@ -128,3 +130,88 @@ int crash_sysfs_dm_crypt_keys_read(char *buf)
return sysfs_emit(buf, "%s\n", STATE_STR[state]);
}
EXPORT_SYMBOL_GPL(crash_sysfs_dm_crypt_keys_read);
+
+static int read_key_from_user_keying(struct dm_crypt_key *dm_key)
+{
+ const struct user_key_payload *ukp;
+ struct key *key;
+
+ kexec_dprintk("Requesting key %s", dm_key->key_desc);
+ key = request_key(&key_type_logon, dm_key->key_desc, NULL);
+
+ if (IS_ERR(key)) {
+ pr_warn("No such key %s\n", dm_key->key_desc);
+ return PTR_ERR(key);
+ }
+
+ ukp = user_key_payload_locked(key);
+ if (!ukp)
+ return -EKEYREVOKED;
+
+ memcpy(dm_key->data, ukp->data, ukp->datalen);
+ dm_key->key_size = ukp->datalen;
+ kexec_dprintk("Get dm crypt key (size=%u) %s: %8ph\n", dm_key->key_size,
+ dm_key->key_desc, dm_key->data);
+ return 0;
+}
+
+static int build_keys_header(void)
+{
+ int i, r;
+
+ for (i = 0; i < key_count; i++) {
+ r = read_key_from_user_keying(&keys_header->keys[i]);
+ if (r != 0) {
+ pr_err("Failed to read key %s\n", keys_header->keys[i].key_desc);
+ return r;
+ }
+ }
+
+ return 0;
+}
+
+int crash_load_dm_crypt_keys(struct kimage *image)
+{
+ struct kexec_buf kbuf = {
+ .image = image,
+ .buf_min = 0,
+ .buf_max = ULONG_MAX,
+ .top_down = false,
+ .random = true,
+ };
+
+ int r;
+
+ if (state == FRESH)
+ return 0;
+
+ if (key_count != keys_header->total_keys) {
+ kexec_dprintk("Only record %u keys (%u in total)\n", key_count,
+ keys_header->total_keys);
+ return -EINVAL;
+ }
+
+ image->dm_crypt_keys_addr = 0;
+ r = build_keys_header();
+ if (r)
+ return r;
+
+ kbuf.buffer = keys_header;
+ kbuf.bufsz = keys_header_size;
+
+ kbuf.memsz = kbuf.bufsz;
+ kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
+ kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
+ r = kexec_add_buffer(&kbuf);
+ if (r) {
+ kvfree((void *)kbuf.buffer);
+ return r;
+ }
+ state = LOADED;
+ image->dm_crypt_keys_addr = kbuf.mem;
+ image->dm_crypt_keys_sz = kbuf.bufsz;
+ kexec_dprintk("Loaded dm crypt keys to kexec_buffer bufsz=0x%lx memsz=0x%lx\n",
+ kbuf.bufsz, kbuf.bufsz);
+
+ return r;
+}
--
2.45.1
This adds an addition layer of protection for the saved copy of dm
crypt key. Trying to access the saved copy will cause page fault.
Suggested-by: Pingfan Liu <[email protected]>
Signed-off-by: Coiby Xu <[email protected]>
---
arch/x86/kernel/machine_kexec_64.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b180d8e497c3..aba50ec641e6 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -545,13 +545,35 @@ static void kexec_mark_crashkres(bool protect)
kexec_mark_range(control, crashk_res.end, protect);
}
+/* make the memory storing dm crypt keys in/accessible */
+static void kexec_mark_dm_crypt_keys(bool protect)
+{
+ unsigned long start_paddr, end_paddr;
+ unsigned int nr_pages;
+
+ if (kexec_crash_image->dm_crypt_keys_addr) {
+ start_paddr = kexec_crash_image->dm_crypt_keys_addr;
+ end_paddr = start_paddr + kexec_crash_image->dm_crypt_keys_sz - 1;
+ nr_pages = (PAGE_ALIGN(end_paddr) - PAGE_ALIGN_DOWN(start_paddr))/PAGE_SIZE;
+ if (protect)
+ set_memory_np((unsigned long)phys_to_virt(start_paddr), nr_pages);
+ else
+ __set_memory_prot(
+ (unsigned long)phys_to_virt(start_paddr),
+ nr_pages,
+ __pgprot(_PAGE_PRESENT | _PAGE_NX | _PAGE_RW));
+ }
+}
+
void arch_kexec_protect_crashkres(void)
{
kexec_mark_crashkres(true);
+ kexec_mark_dm_crypt_keys(true);
}
void arch_kexec_unprotect_crashkres(void)
{
+ kexec_mark_dm_crypt_keys(false);
kexec_mark_crashkres(false);
}
#endif
--
2.45.1
The subject prefix should be "[PATCH v5 0/7]". I'm sorry if it causes
any confusion.
On Fri, Jun 07, 2024 at 08:26:10PM +0800, Coiby Xu wrote:
>LUKS is the standard for Linux disk encryption. Many users choose LUKS
>and in some use cases like Confidential VM it's mandated. With kdump
>enabled, when the 1st kernel crashes, the system could boot into the
>kdump/crash kernel and dump the memory image i.e. /proc/vmcore to a
>specified target. Currently, when dumping vmcore to a LUKS
>encrypted device, there are two problems,
>
> - Kdump kernel may not be able to decrypt the LUKS partition. For some
> machines, a system administrator may not have a chance to enter the
> password to decrypt the device in kdump initramfs after the 1st kernel
> crashes; For cloud confidential VMs, depending on the policy the
> kdump kernel may not be able to unseal the keys with TPM and the
> console virtual keyboard is untrusted.
>
> - LUKS2 by default use the memory-hard Argon2 key derivation function
> which is quite memory-consuming compared to the limited memory reserved
> for kdump. Take Fedora example, by default, only 256M is reserved for
> systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
> to be reserved for kdump. Note if the memory reserved for kdump can't
> be used by 1st kernel i.e. an user sees ~1300M memory missing in the
> 1st kernel.
>
>Besides users (at least for Fedora) usually expect kdump to work out of
>the box i.e. no manual password input is needed. And it doesn't make
>sense to derivate the keys again in kdump kernel which seems to be
>redundant work.
>
>This patch set addresses the above issues by make the LUKS volume keys
>persistent for kdump kernel with the help of cryptsetup's new APIs
>(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of
>this kdump copy of LUKS volume keys,
>
> 1. After the 1st kernel loads the initramfs during boot, systemd
> use an user-input passphrase or TPM-sealed key to de-crypt the LUKS
> volume keys and then save the volume keys to specified keyring
> (using the --link-vk-to-keyring API) and the key will expire within
> specified time.
>
> 2. A user space tool (kdump initramfs builder) writes a key description to
> /sys/kernel/crash_dm_crypt_keys to inform the 1st kernel to record the
> key while building the kdump initramfs
>
> 3. The kexec_file_load syscall read the volume keys by recorded key
> descriptions and then save them key to kdump reserved memory and wipe the
> copy.
>
> 4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump
> initramfs asks the kdump kernel to create a user key using the key stored in
> kdump reserved memory by writing to to /sys/kernel/crash_dm_crypt_keys. Then
> the LUKS encrypted devide is unlocked with libcryptsetup's
> --volume-key-keyring API.
>
> 5. The system gets rebooted to the 1st kernel after dumping vmcore to
> the LUKS encrypted device is finished
>
>After libcryptsetup saving the LUKS volume keys to specified keyring,
>whoever takes this should be responsible for the safety of these copies
>of keys. The keys will be saved in the memory area exclusively reserved
>for kdump where even the 1st kernel has no direct access. And further
>more, two additional protections are added,
> - save the copy randomly in kdump reserved memory as suggested by Jan
> - clear the _PAGE_PRESENT flag of the page that stores the copy as
> suggested by Pingfan
>
>This patch set only supports x86. There will be patches to support other
>architectures once this patch set gets merged.
>
>v5
> - Baoquan
> - limit the feature of placing kexec_buf randomly to kdump (CONFIG_CRASH_DUMP)
> - add documentation for added sysfs API
> - allow to re-send init command to support the case of user switching to
> a different LUKS-encrypted target
> - make CONFIG_CRASH_DM_CRYPT depends on CONFIG_DM_CRYPT
> - check if the number of keys exceed KEY_NUM_MAX
> - rename (struct keys_header).key_count as (struct
> keys_header).total_keys to improve code readiblity
> - improve commit message
> - fix the failure of calling crash_exclude_mem_range (there is a split
> of mem_range)
> - use ret instead of r as return code
>
> - Greg
> - add documentation for added sysfs API
> - avoid spamming kernel logs
> - fix a buffer overflow issue
> - keep the state enums synced up with the string values
> - use sysfs_emit other than sprintf
> - explain KEY_NUM_MAX and KEY_SIZE_MAX
> - s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/g
> - improve code readability
>
> - Rebase onto latest Linus tree
>
>
>v4
>- rebase onto latest Linus tree so Baoquan can apply the patches for
> code review
>- fix kernel test robot warnings
>
>v3
> - Support CPU/memory hot-plugging [Baoquan]
> - Don't save the keys temporarily to simplify the implementation [Baoquan]
> - Support multiple LUKS encrypted volumes
> - Read logon key instead of user key to improve security [Ondrej]
> - A kernel config option CRASH_DM_CRYPT for this feature (disabled by default)
> - Fix warnings found by kernel test robot
> - Rebase the code onto 6.9.0-rc5+
>
>v2
> - work together with libscryptsetup's --link-vk-to-keyring/--volume-key-keyring APIs [Milan and Ondrej]
> - add the case where console virtual keyboard is untrusted for confidential VM
> - use dm_crypt_key instead of LUKS volume key [Milan and Eric]
> - fix some code format issues
> - don't move "struct kexec_segment" declaration
> - Rebase the code onto latest Linus tree (6.7.0)
>
>v1
> - "Put the luks key handling related to crash_dump out into a separate
> file kernel/crash_dump_luks.c" [Baoquan]
> - Put the generic luks handling code before the x86 specific code to
> make it easier for other arches to follow suit [Baoquan]
> - Use phys_to_virt instead of "pfn -> page -> vaddr" [Dave Hansen]
> - Drop the RFC prefix [Dave Young]
> - Rebase the code onto latest Linus tree (6.4.0-rc4)
>
>RFC v2
> - libcryptsetup interacts with the kernel via sysfs instead of "hacking"
> dm-crypt
> - to save a kdump copy of the LUKS volume key in 1st kernel
> - to add a logon key using the copy for libcryptsetup in kdump kernel [Milan]
> - to avoid the incorrect usage of LUKS master key in dm-crypt [Milan]
> - save the kdump copy of LUKS volume key randomly [Jan]
> - mark the kdump copy inaccessible [Pingfan]
> - Miscellaneous
> - explain when operations related to the LUKS volume key happen [Jan]
> - s/master key/volume key/g
> - use crash_ instead of kexec_ as function prefix
> - fix commit subject prefixes e.g. "x86, kdump" to x86/crash
>
>Coiby Xu (7):
> kexec_file: allow to place kexec_buf randomly
> crash_dump: make dm crypt keys persist for the kdump kernel
> crash_dump: store dm crypt keys in kdump reserved memory
> crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging
> crash_dump: retrieve dm crypt keys in kdump kernel
> x86/crash: pass dm crypt keys to kdump kernel
> x86/crash: make the page that stores the dm crypt keys inaccessible
>
> Documentation/ABI/testing/crash_dm_crypt_keys | 35 ++
> arch/x86/kernel/crash.c | 20 +-
> arch/x86/kernel/kexec-bzimage64.c | 7 +
> arch/x86/kernel/machine_kexec_64.c | 22 ++
> include/linux/crash_core.h | 9 +-
> include/linux/crash_dump.h | 2 +
> include/linux/kexec.h | 8 +
> kernel/Kconfig.kexec | 9 +
> kernel/Makefile | 1 +
> kernel/crash_dump_dm_crypt.c | 338 ++++++++++++++++++
> kernel/kexec_file.c | 21 ++
> kernel/ksysfs.c | 24 ++
> 12 files changed, 493 insertions(+), 3 deletions(-)
> create mode 100644 Documentation/ABI/testing/crash_dm_crypt_keys
> create mode 100644 kernel/crash_dump_dm_crypt.c
>
>
>base-commit: 8a92980606e3585d72d510a03b59906e96755b8a
>--
>2.45.1
>
--
Best regards,
Coiby
On Sat, Jun 08, 2024 at 09:26:43AM +0800, Coiby Xu wrote:
> The subject prefix should be "[PATCH v5 0/7]". I'm sorry if it causes
> any confusion.
It will probably break our tools (b4) that want to look this up, so you
might want to fix that for the next version you send out, given that
this one has some 0-day bot issues.
thanks,
greg k-h