2018-06-14 08:48:46

by Lianbo Jiang

[permalink] [raw]
Subject: [PATCH 0/2 V2] Support kdump for AMD secure memory encryption(sme)

It is convenient to remap the old memory encrypted to the second kernel by
calling ioremap_encrypted().

When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the
data(encrypted or decrypted). For example, when sme enabled, if the old
memory is encrypted, we will remap the old memory in encrypted way, which
will automatically decrypt the old memory encrypted when we read those data
from the remapping address.

----------------------------------------------
| first-kernel | second-kernel | kdump support |
| (mem_encrypt=on|off) | (yes|no) |
|--------------+---------------+---------------|
| on | on | yes |
| off | off | yes |
| on | off | no |
| off | on | no |
|______________|_______________|_______________|

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Author: Lianbo Jiang <[email protected]>
Date: Mon May 14 17:02:40 2018 +0800
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.1: https://github.com/crash-utility/crash.git
commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
Author: Dave Anderson <[email protected]>
Date: Fri May 11 15:54:32 2018 -0400

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.17-rc7:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit b04e217704b7 ("Linux 4.17-rc7")
Author: Linus Torvalds <[email protected]>
Date: Sun May 27 13:01:47 2018 -0700

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Some changes based on V1:
1. remove the sme_active() check in __ioremap_caller().
2. remove the '#ifdef' stuff throughout this patch.
3. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
4. rewrite two functions, copy_oldmem_page() and
copy_oldmem_page_encrypted().
5. distingish sme_active() and sev_active(), when a distinction doesn't
need, mem_encrypt_active() will be used.
6. clean compile warning in copy_device_table().

Some known issues:
1. about SME
Upstream kernel doesn't work when we use kexec in the follow command. The
system will hang.
(This issue doesn't matter with the kdump patch.)

Reproduce steps:
# kexec -l /boot/vmlinuz-4.17.0-rc7+ --initrd=/boot/initramfs-4.17.0-rc7+.img --command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr"
# kexec -e (or reboot)

The system will hang:
[ 1248.932239] kexec_core: Starting new kernel
early console in extract_kernel
input_data: 0x000000087e91c3b4
input_len: 0x000000000067fcbd
output: 0x000000087d400000
output_len: 0x0000000001b6fa90
kernel_total_size: 0x0000000001a9d000
trampoline_32bit: 0x0000000000099000

Decompressing Linux...
Parsing ELF... [-here the system will hang]

2. about SEV
Upstream kernel doesn't work about SEV on our machine, some drivers always
go wrong. We don't have the suitable machine to test SEV for kdump patch.
Maybe it is resonable to improve SEV in another patch. When SEV works
fine, we will test the kdump patch for SEV.

[ 369.426131] INFO: task systemd-udevd:865 blocked for more than 120 seconds.
[ 369.433177] Not tainted 4.17.0-rc5+ #60
[ 369.437585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.445783] systemd-udevd D 0 865 813 0x80000004
[ 369.451323] Call Trace:
[ 369.453815] ? __schedule+0x290/0x870
[ 369.457523] schedule+0x32/0x80
[ 369.460714] __sev_do_cmd_locked+0x1f6/0x2a0 [ccp]
[ 369.465556] ? cleanup_uevent_env+0x10/0x10
[ 369.470084] ? remove_wait_queue+0x60/0x60
[ 369.474219] ? 0xffffffffc0247000
[ 369.477572] __sev_platform_init_locked+0x2b/0x70 [ccp]
[ 369.482843] sev_platform_init+0x1d/0x30 [ccp]
[ 369.487333] psp_pci_init+0x40/0xe0 [ccp]
[ 369.491380] ? 0xffffffffc0247000
[ 369.494936] sp_mod_init+0x18/0x1000 [ccp]
[ 369.499071] do_one_initcall+0x4e/0x1d4
[ 369.502944] ? _cond_resched+0x15/0x30
[ 369.506728] ? kmem_cache_alloc_trace+0xae/0x1d0
[ 369.511386] ? do_init_module+0x22/0x220
[ 369.515345] do_init_module+0x5a/0x220
[ 369.519444] load_module+0x21cb/0x2a50
[ 369.523227] ? m_show+0x1c0/0x1c0
[ 369.526571] ? security_capable+0x3f/0x60
[ 369.530611] __do_sys_finit_module+0x94/0xe0
[ 369.534915] do_syscall_64+0x5b/0x180
[ 369.538607] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 369.543698] RIP: 0033:0x7f708e6311b9
[ 369.547536] RSP: 002b:00007ffff9d32aa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 369.555162] RAX: ffffffffffffffda RBX: 000055602a04c2d0 RCX: 00007f708e6311b9
[ 369.562346] RDX: 0000000000000000 RSI: 00007f708ef52039 RDI: 0000000000000008
[ 369.569801] RBP: 00007f708ef52039 R08: 0000000000000000 R09: 000055602a048b20
[ 369.576988] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
[ 369.584177] R13: 000055602a075260 R14: 0000000000020000 R15: 0000000000000000

Lianbo Jiang (2):
Add a function(ioremap_encrypted) for kdump when AMD sme enabled.
Support kdump when AMD secure memory encryption is active

arch/x86/include/asm/io.h | 3 +++
arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
arch/x86/mm/ioremap.c | 28 +++++++++++++++++++--------
drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-
fs/proc/vmcore.c | 20 +++++++++++++++-----
include/linux/crash_dump.h | 5 +++++
kernel/kexec_core.c | 12 ++++++++++++
7 files changed, 100 insertions(+), 24 deletions(-)

--
2.9.5



2018-06-14 08:48:52

by Lianbo Jiang

[permalink] [raw]
Subject: [PATCH 1/2 V2] Add a function(ioremap_encrypted) for kdump when AMD sme enabled.

It is convenient to remap the old memory encrypted to the second kernel
by calling ioremap_encrypted().

Signed-off-by: Lianbo Jiang <[email protected]>
---
Some changes based on V1:
1. remove the sme_active() check in __ioremap_caller().

arch/x86/include/asm/io.h | 3 +++
arch/x86/mm/ioremap.c | 24 ++++++++++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6e5b93..989d60b 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
#define ioremap_cache ioremap_cache
extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, unsigned long prot_val);
#define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+ unsigned long size);
+#define ioremap_encrypted ioremap_encrypted

/**
* ioremap - map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545..24e0920 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -131,7 +131,8 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
* caller shouldn't need to know that small detail.
*/
static void __iomem *__ioremap_caller(resource_size_t phys_addr,
- unsigned long size, enum page_cache_mode pcm, void *caller)
+ unsigned long size, enum page_cache_mode pcm,
+ void *caller, bool encrypted)
{
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +200,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
* resulting mapping.
*/
prot = PAGE_KERNEL_IO;
- if (sev_active() && mem_flags.desc_other)
+ if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);

switch (pcm) {
@@ -291,7 +292,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;

return __ioremap_caller(phys_addr, size, pcm,
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL(ioremap_nocache);

@@ -324,7 +325,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;

return __ioremap_caller(phys_addr, size, pcm,
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL_GPL(ioremap_uc);

@@ -341,7 +342,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL(ioremap_wc);

@@ -358,14 +359,21 @@ EXPORT_SYMBOL(ioremap_wc);
void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
{
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL(ioremap_wt);

+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+ __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
{
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL(ioremap_cache);

@@ -374,7 +382,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size,
{
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
- __builtin_return_address(0));
+ __builtin_return_address(0), false);
}
EXPORT_SYMBOL(ioremap_prot);

--
2.9.5


2018-06-14 08:49:18

by Lianbo Jiang

[permalink] [raw]
Subject: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the data(
encrypted or unencrypted). For example, when sme enabled, if the old memory
is encrypted, we will remap the old memory in encrypted way, which will
automatically decrypt the old memory encrypted when we read those data from
the remapping address.

----------------------------------------------
| first-kernel | second-kernel | kdump support |
| (mem_encrypt=on|off) | (yes|no) |
|--------------+---------------+---------------|
| on | on | yes |
| off | off | yes |
| on | off | no |
| off | on | no |
|______________|_______________|_______________|

Signed-off-by: Lianbo Jiang <[email protected]>
---
Some changes based on V1:
1. remove the '#ifdef' stuff throughout this patch.
2. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
3. rewrite two functions, copy_oldmem_page() and
copy_oldmem_page_encrypted().
4. distingish sme_active() and sev_active(), when a distinction doesn't
need, mem_encrypt_active() will be used.
5. clean compile warning in copy_device_table().

arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
arch/x86/mm/ioremap.c | 4 ++++
drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-
fs/proc/vmcore.c | 20 +++++++++++++++-----
include/linux/crash_dump.h | 5 +++++
kernel/kexec_core.c | 12 ++++++++++++
6 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 4f2e077..a2c7b13 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -11,6 +11,23 @@
#include <linux/uaccess.h>
#include <linux/io.h>

+static ssize_t copy_to(void *to, void *vaddr, unsigned long offset,
+ size_t size, int userbuf)
+{
+ if (userbuf) {
+ if (copy_to_user(to, vaddr + offset, size)) {
+ iounmap(vaddr);
+ return -ENOMEM;
+ }
+ } else
+ memcpy(to, vaddr + offset, size);
+
+ set_iounmap_nonlazy();
+ iounmap(vaddr);
+
+ return size;
+}
+
/**
* copy_oldmem_page - copy one page from "oldmem"
* @pfn: page frame number to be copied
@@ -36,15 +53,20 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
if (!vaddr)
return -ENOMEM;

- if (userbuf) {
- if (copy_to_user(buf, vaddr + offset, csize)) {
- iounmap(vaddr);
- return -EFAULT;
- }
- } else
- memcpy(buf, vaddr + offset, csize);
+ return copy_to(buf, vaddr, offset, csize, userbuf);
+}

- set_iounmap_nonlazy();
- iounmap(vaddr);
- return csize;
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset, int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE);
+ if (!vaddr)
+ return -ENOMEM;
+
+ return copy_to(buf, vaddr, offset, csize, userbuf);
}
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 24e0920..e365fc4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
#include <asm/pgalloc.h>
#include <asm/pat.h>
#include <asm/setup.h>
+#include <linux/crash_dump.h>

#include "physaddr.h"

@@ -696,6 +697,9 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
encrypted_prot = false;

+ if (sme_active() && is_kdump_kernel())
+ encrypted_prot = false;
+
return encrypted_prot ? pgprot_encrypted(prot)
: pgprot_decrypted(prot);
}
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..5e535a6 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -889,11 +889,23 @@ static bool copy_device_table(void)
}

old_devtb_phys = entry & PAGE_MASK;
+ /*
+ * When sme enable in the first kernel, old_devtb_phys includes the
+ * memory encryption mask(sme_me_mask), we must remove the memory
+ * encryption mask to obtain the true physical address in kdump mode.
+ */
+ if (mem_encrypt_active() && is_kdump_kernel())
+ old_devtb_phys = __sme_clr(old_devtb_phys);
if (old_devtb_phys >= 0x100000000ULL) {
pr_err("The address of old device table is above 4G, not trustworthy!\n");
return false;
}
- old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+ if (mem_encrypt_active() && is_kdump_kernel())
+ old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
+ dev_table_size);
+ else
+ old_devtb = memremap(old_devtb_phys,
+ dev_table_size, MEMREMAP_WB);
if (!old_devtb)
return false;

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af..4d0c884 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,8 @@
#include <linux/uaccess.h>
#include <asm/io.h>
#include "internal.h"
+#include <linux/mem_encrypt.h>
+#include <asm/pgtable.h>

/* List representing chunks of contiguous memory areas and their offsets in
* vmcore file.
@@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)

/* Reads a page from the oldmem device from given offset. */
static ssize_t read_from_oldmem(char *buf, size_t count,
- u64 *ppos, int userbuf)
+ u64 *ppos, int userbuf,
+ bool encrypted)
{
unsigned long pfn, offset;
size_t nr_bytes;
@@ -108,8 +111,13 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
if (pfn_is_ram(pfn) == 0)
memset(buf, 0, nr_bytes);
else {
- tmp = copy_oldmem_page(pfn, buf, nr_bytes,
+ if (encrypted)
+ tmp = copy_oldmem_page_encrypted(pfn, buf,
+ nr_bytes, offset, userbuf);
+ else
+ tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
+
if (tmp < 0)
return tmp;
}
@@ -143,7 +151,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
*/
ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
{
- return read_from_oldmem(buf, count, ppos, 0);
+ return read_from_oldmem(buf, count, ppos, 0, sev_active());
}

/*
@@ -151,7 +159,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
*/
ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
{
- return read_from_oldmem(buf, count, ppos, 0);
+ return read_from_oldmem(buf, count, ppos, 0, sme_active());
}

/*
@@ -161,6 +169,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
unsigned long from, unsigned long pfn,
unsigned long size, pgprot_t prot)
{
+ prot = pgprot_encrypted(prot);
return remap_pfn_range(vma, from, pfn, size, prot);
}

@@ -235,7 +244,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
m->offset + m->size - *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
- tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
+ tmp = read_from_oldmem(buffer, tsz, &start, userbuf,
+ mem_encrypt_active());
if (tmp < 0)
return tmp;
buflen -= tsz;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index f7ac2aa..28b0a7c 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -25,6 +25,11 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,

extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf);
+#define copy_oldmem_page_encrypted copy_oldmem_page_encrypted
+
void vmcore_cleanup(void);

/* Architecture code defines this if there are other possible ELF
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 20fef1a..3c22a9b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
}
}

+ if (pages) {
+ unsigned int count, i;
+
+ pages->mapping = NULL;
+ set_page_private(pages, order);
+ count = 1 << order;
+ for (i = 0; i < count; i++)
+ SetPageReserved(pages + i);
+ arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+ }
return pages;
}

@@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = -ENOMEM;
goto out;
}
+ arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+ arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
--
2.9.5


2018-06-14 08:57:55

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

On 06/14/18 at 04:47pm, Lianbo Jiang wrote:
> When sme enabled on AMD server, we also need to support kdump. Because
> the memory is encrypted in the first kernel, we will remap the old memory
> encrypted to the second kernel(crash kernel), and sme is also enabled in
> the second kernel, otherwise the old memory encrypted can not be decrypted.
> Because simply changing the value of a C-bit on a page will not
> automatically encrypt the existing contents of a page, and any data in the
> page prior to the C-bit modification will become unintelligible. A page of
> memory that is marked encrypted will be automatically decrypted when read
> from DRAM and will be automatically encrypted when written to DRAM.
>
> For the kdump, it is necessary to distinguish whether the memory is
> encrypted. Furthermore, we should also know which part of the memory is
> encrypted or decrypted. We will appropriately remap the memory according
> to the specific situation in order to tell cpu how to deal with the data(
> encrypted or unencrypted). For example, when sme enabled, if the old memory
> is encrypted, we will remap the old memory in encrypted way, which will
> automatically decrypt the old memory encrypted when we read those data from
> the remapping address.
>
> ----------------------------------------------
> | first-kernel | second-kernel | kdump support |
> | (mem_encrypt=on|off) | (yes|no) |
> |--------------+---------------+---------------|
> | on | on | yes |
> | off | off | yes |
> | on | off | no |
> | off | on | no |
> |______________|_______________|_______________|
>
> Signed-off-by: Lianbo Jiang <[email protected]>
> ---
> Some changes based on V1:
> 1. remove the '#ifdef' stuff throughout this patch.
> 2. put some logic into the early_memremap_pgprot_adjust() and clean the
> previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
> arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
> 3. rewrite two functions, copy_oldmem_page() and
> copy_oldmem_page_encrypted().
> 4. distingish sme_active() and sev_active(), when a distinction doesn't
> need, mem_encrypt_active() will be used.
> 5. clean compile warning in copy_device_table().
>
> arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
> arch/x86/mm/ioremap.c | 4 ++++
> drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-
> fs/proc/vmcore.c | 20 +++++++++++++++-----
> include/linux/crash_dump.h | 5 +++++
> kernel/kexec_core.c | 12 ++++++++++++
> 6 files changed, 81 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index 4f2e077..a2c7b13 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -11,6 +11,23 @@
> #include <linux/uaccess.h>
> #include <linux/io.h>
>
> +static ssize_t copy_to(void *to, void *vaddr, unsigned long offset,
> + size_t size, int userbuf)
> +{
> + if (userbuf) {
> + if (copy_to_user(to, vaddr + offset, size)) {
> + iounmap(vaddr);
> + return -ENOMEM;
> + }
> + } else
> + memcpy(to, vaddr + offset, size);
> +
> + set_iounmap_nonlazy();
> + iounmap(vaddr);
> +
> + return size;
> +}

Hmm, the function name copy_to is strange

Also since iounmap is needed in the code path but not paired with
ioremap, it is bad. If you really want this function then need moving
the iounmap related code to caller function. And better to rename this
function as eg. copy_oldmem()

> +
> /**
> * copy_oldmem_page - copy one page from "oldmem"
> * @pfn: page frame number to be copied
> @@ -36,15 +53,20 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> if (!vaddr)
> return -ENOMEM;
>
> - if (userbuf) {
> - if (copy_to_user(buf, vaddr + offset, csize)) {
> - iounmap(vaddr);
> - return -EFAULT;
> - }
> - } else
> - memcpy(buf, vaddr + offset, csize);
> + return copy_to(buf, vaddr, offset, csize, userbuf);
> +}
>
> - set_iounmap_nonlazy();
> - iounmap(vaddr);
> - return csize;
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> + size_t csize, unsigned long offset, int userbuf)
> +{
> + void *vaddr;
> +
> + if (!csize)
> + return 0;
> +
> + vaddr = ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE);
> + if (!vaddr)
> + return -ENOMEM;
> +
> + return copy_to(buf, vaddr, offset, csize, userbuf);
> }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 24e0920..e365fc4 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -24,6 +24,7 @@
> #include <asm/pgalloc.h>
> #include <asm/pat.h>
> #include <asm/setup.h>
> +#include <linux/crash_dump.h>
>
> #include "physaddr.h"
>
> @@ -696,6 +697,9 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
> if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
> encrypted_prot = false;
>
> + if (sme_active() && is_kdump_kernel())
> + encrypted_prot = false;
> +
> return encrypted_prot ? pgprot_encrypted(prot)
> : pgprot_decrypted(prot);
> }
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 904c575..5e535a6 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -889,11 +889,23 @@ static bool copy_device_table(void)
> }
>
> old_devtb_phys = entry & PAGE_MASK;
> + /*
> + * When sme enable in the first kernel, old_devtb_phys includes the
> + * memory encryption mask(sme_me_mask), we must remove the memory
> + * encryption mask to obtain the true physical address in kdump mode.
> + */
> + if (mem_encrypt_active() && is_kdump_kernel())
> + old_devtb_phys = __sme_clr(old_devtb_phys);
> if (old_devtb_phys >= 0x100000000ULL) {
> pr_err("The address of old device table is above 4G, not trustworthy!\n");
> return false;
> }
> - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> + if (mem_encrypt_active() && is_kdump_kernel())
> + old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
> + dev_table_size);
> + else
> + old_devtb = memremap(old_devtb_phys,
> + dev_table_size, MEMREMAP_WB);
> if (!old_devtb)
> return false;
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af..4d0c884 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -25,6 +25,8 @@
> #include <linux/uaccess.h>
> #include <asm/io.h>
> #include "internal.h"
> +#include <linux/mem_encrypt.h>
> +#include <asm/pgtable.h>
>
> /* List representing chunks of contiguous memory areas and their offsets in
> * vmcore file.
> @@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)
>
> /* Reads a page from the oldmem device from given offset. */
> static ssize_t read_from_oldmem(char *buf, size_t count,
> - u64 *ppos, int userbuf)
> + u64 *ppos, int userbuf,
> + bool encrypted)
> {
> unsigned long pfn, offset;
> size_t nr_bytes;
> @@ -108,8 +111,13 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
> if (pfn_is_ram(pfn) == 0)
> memset(buf, 0, nr_bytes);
> else {
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> + if (encrypted)
> + tmp = copy_oldmem_page_encrypted(pfn, buf,
> + nr_bytes, offset, userbuf);
> + else
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> offset, userbuf);
> +
> if (tmp < 0)
> return tmp;
> }
> @@ -143,7 +151,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
> */
> ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sev_active());
> }
>
> /*
> @@ -151,7 +159,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> */
> ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sme_active());
> }
>
> /*
> @@ -161,6 +169,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
> unsigned long from, unsigned long pfn,
> unsigned long size, pgprot_t prot)
> {
> + prot = pgprot_encrypted(prot);
> return remap_pfn_range(vma, from, pfn, size, prot);
> }
>
> @@ -235,7 +244,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
> m->offset + m->size - *fpos,
> buflen);
> start = m->paddr + *fpos - m->offset;
> - tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
> + tmp = read_from_oldmem(buffer, tsz, &start, userbuf,
> + mem_encrypt_active());
> if (tmp < 0)
> return tmp;
> buflen -= tsz;
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index f7ac2aa..28b0a7c 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -25,6 +25,11 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
>
> extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
> unsigned long, int);
> +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> + size_t csize, unsigned long offset,
> + int userbuf);
> +#define copy_oldmem_page_encrypted copy_oldmem_page_encrypted
> +
> void vmcore_cleanup(void);
>
> /* Architecture code defines this if there are other possible ELF
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 20fef1a..3c22a9b 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
> }
> }
>
> + if (pages) {
> + unsigned int count, i;
> +
> + pages->mapping = NULL;
> + set_page_private(pages, order);
> + count = 1 << order;
> + for (i = 0; i < count; i++)
> + SetPageReserved(pages + i);
> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> + }
> return pages;
> }
>
> @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> result = -ENOMEM;
> goto out;
> }
> + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
> ptr = kmap(page);
> ptr += maddr & ~PAGE_MASK;
> mchunk = min_t(size_t, mbytes,
> @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> result = copy_from_user(ptr, buf, uchunk);
> kexec_flush_icache_page(page);
> kunmap(page);
> + arch_kexec_pre_free_pages(page_address(page), 1);
> if (result) {
> result = -EFAULT;
> goto out;
> --
> 2.9.5
>

Thanks
Dave

2018-06-14 12:56:08

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

Hi Lianbo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180614]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Lianbo-Jiang/Support-kdump-for-AMD-secure-memory-encryption-sme/20180614-164938
config: i386-randconfig-c0-06141337 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

fs/proc/vmcore.o: In function `read_from_oldmem':
>> fs/proc/vmcore.c:127: undefined reference to `copy_oldmem_page_encrypted'

vim +127 fs/proc/vmcore.c

100
101 /* Reads a page from the oldmem device from given offset. */
102 static ssize_t read_from_oldmem(char *buf, size_t count,
103 u64 *ppos, int userbuf,
104 bool encrypted)
105 {
106 unsigned long pfn, offset;
107 size_t nr_bytes;
108 ssize_t read = 0, tmp;
109
110 if (!count)
111 return 0;
112
113 offset = (unsigned long)(*ppos % PAGE_SIZE);
114 pfn = (unsigned long)(*ppos / PAGE_SIZE);
115
116 do {
117 if (count > (PAGE_SIZE - offset))
118 nr_bytes = PAGE_SIZE - offset;
119 else
120 nr_bytes = count;
121
122 /* If pfn is not ram, return zeros for sparse dump files */
123 if (pfn_is_ram(pfn) == 0)
124 memset(buf, 0, nr_bytes);
125 else {
126 if (encrypted)
> 127 tmp = copy_oldmem_page_encrypted(pfn, buf,
128 nr_bytes, offset, userbuf);
129 else
130 tmp = copy_oldmem_page(pfn, buf, nr_bytes,
131 offset, userbuf);
132
133 if (tmp < 0)
134 return tmp;
135 }
136 *ppos += nr_bytes;
137 count -= nr_bytes;
138 buf += nr_bytes;
139 read += nr_bytes;
140 ++pfn;
141 offset = 0;
142 } while (count);
143
144 return read;
145 }
146

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.23 kB)
.config.gz (29.48 kB)
Download all attachments

2018-06-14 13:03:11

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

Hi Lianbo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180614]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Lianbo-Jiang/Support-kdump-for-AMD-secure-memory-encryption-sme/20180614-164938
config: arm-sa1100 (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm

All errors (new ones prefixed by >>):

fs/proc/vmcore.o: In function `read_from_oldmem.part.0':
>> vmcore.c:(.text+0x1b4): undefined reference to `copy_oldmem_page_encrypted'

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.07 kB)
.config.gz (23.64 kB)
Download all attachments

2018-06-14 14:01:27

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

在 2018年06月14日 20:55, kbuild test robot 写道:
> Hi Lianbo,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.17 next-20180614]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Lianbo-Jiang/Support-kdump-for-AMD-secure-memory-encryption-sme/20180614-164938
> config: i386-randconfig-c0-06141337 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
> fs/proc/vmcore.o: In function `read_from_oldmem':
>>> fs/proc/vmcore.c:127: undefined reference to `copy_oldmem_page_encrypted'
>
It is strange, it doesn't have this error in my test. I will look for another environment to test
and see whether it can be reproduced.
Maybe the compile error will be fixed in the patch V3.

Thanks.
Lianbo
> vim +127 fs/proc/vmcore.c
>
> 100
> 101 /* Reads a page from the oldmem device from given offset. */
> 102 static ssize_t read_from_oldmem(char *buf, size_t count,
> 103 u64 *ppos, int userbuf,
> 104 bool encrypted)
> 105 {
> 106 unsigned long pfn, offset;
> 107 size_t nr_bytes;
> 108 ssize_t read = 0, tmp;
> 109
> 110 if (!count)
> 111 return 0;
> 112
> 113 offset = (unsigned long)(*ppos % PAGE_SIZE);
> 114 pfn = (unsigned long)(*ppos / PAGE_SIZE);
> 115
> 116 do {
> 117 if (count > (PAGE_SIZE - offset))
> 118 nr_bytes = PAGE_SIZE - offset;
> 119 else
> 120 nr_bytes = count;
> 121
> 122 /* If pfn is not ram, return zeros for sparse dump files */
> 123 if (pfn_is_ram(pfn) == 0)
> 124 memset(buf, 0, nr_bytes);
> 125 else {
> 126 if (encrypted)
> > 127 tmp = copy_oldmem_page_encrypted(pfn, buf,
> 128 nr_bytes, offset, userbuf);
> 129 else
> 130 tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> 131 offset, userbuf);
> 132
> 133 if (tmp < 0)
> 134 return tmp;
> 135 }
> 136 *ppos += nr_bytes;
> 137 count -= nr_bytes;
> 138 buf += nr_bytes;
> 139 read += nr_bytes;
> 140 ++pfn;
> 141 offset = 0;
> 142 } while (count);
> 143
> 144 return read;
> 145 }
> 146
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>

2018-06-14 19:12:52

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

Hi Lianbo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17 next-20180614]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Lianbo-Jiang/Support-kdump-for-AMD-secure-memory-encryption-sme/20180614-164938
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


vim +904 drivers/iommu/amd_iommu_init.c

854
855
856 static bool copy_device_table(void)
857 {
858 u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
859 struct dev_table_entry *old_devtb = NULL;
860 u32 lo, hi, devid, old_devtb_size;
861 phys_addr_t old_devtb_phys;
862 struct amd_iommu *iommu;
863 u16 dom_id, dte_v, irq_v;
864 gfp_t gfp_flag;
865 u64 tmp;
866
867 if (!amd_iommu_pre_enabled)
868 return false;
869
870 pr_warn("Translation is already enabled - trying to copy translation structures\n");
871 for_each_iommu(iommu) {
872 /* All IOMMUs should use the same device table with the same size */
873 lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
874 hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
875 entry = (((u64) hi) << 32) + lo;
876 if (last_entry && last_entry != entry) {
877 pr_err("IOMMU:%d should use the same dev table as others!\n",
878 iommu->index);
879 return false;
880 }
881 last_entry = entry;
882
883 old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
884 if (old_devtb_size != dev_table_size) {
885 pr_err("The device table size of IOMMU:%d is not expected!\n",
886 iommu->index);
887 return false;
888 }
889 }
890
891 old_devtb_phys = entry & PAGE_MASK;
892 /*
893 * When sme enable in the first kernel, old_devtb_phys includes the
894 * memory encryption mask(sme_me_mask), we must remove the memory
895 * encryption mask to obtain the true physical address in kdump mode.
896 */
897 if (mem_encrypt_active() && is_kdump_kernel())
898 old_devtb_phys = __sme_clr(old_devtb_phys);
899 if (old_devtb_phys >= 0x100000000ULL) {
900 pr_err("The address of old device table is above 4G, not trustworthy!\n");
901 return false;
902 }
903 if (mem_encrypt_active() && is_kdump_kernel())
> 904 old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
905 dev_table_size);
906 else
907 old_devtb = memremap(old_devtb_phys,
908 dev_table_size, MEMREMAP_WB);
909 if (!old_devtb)
910 return false;
911
912 gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
913 old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
914 get_order(dev_table_size));
915 if (old_dev_tbl_cpy == NULL) {
916 pr_err("Failed to allocate memory for copying old device table!\n");
917 return false;
918 }
919
920 for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
921 old_dev_tbl_cpy[devid] = old_devtb[devid];
922 dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
923 dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
924
925 if (dte_v && dom_id) {
926 old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
927 old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
928 __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
929 /* If gcr3 table existed, mask it out */
930 if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
931 tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
932 tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
933 old_dev_tbl_cpy[devid].data[1] &= ~tmp;
934 tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
935 tmp |= DTE_FLAG_GV;
936 old_dev_tbl_cpy[devid].data[0] &= ~tmp;
937 }
938 }
939
940 irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
941 int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
942 int_tab_len = old_devtb[devid].data[2] & DTE_IRQ_TABLE_LEN_MASK;
943 if (irq_v && (int_ctl || int_tab_len)) {
944 if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
945 (int_tab_len != DTE_IRQ_TABLE_LEN)) {
946 pr_err("Wrong old irq remapping flag: %#x\n", devid);
947 return false;
948 }
949
950 old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
951 }
952 }
953 memunmap(old_devtb);
954
955 return true;
956 }
957

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

2018-06-15 07:19:59

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

On 06/14/18 at 04:47pm, Lianbo Jiang wrote:
> When sme enabled on AMD server, we also need to support kdump. Because
> the memory is encrypted in the first kernel, we will remap the old memory
> encrypted to the second kernel(crash kernel), and sme is also enabled in
> the second kernel, otherwise the old memory encrypted can not be decrypted.
> Because simply changing the value of a C-bit on a page will not
> automatically encrypt the existing contents of a page, and any data in the
> page prior to the C-bit modification will become unintelligible. A page of
> memory that is marked encrypted will be automatically decrypted when read
> from DRAM and will be automatically encrypted when written to DRAM.
>
> For the kdump, it is necessary to distinguish whether the memory is
> encrypted. Furthermore, we should also know which part of the memory is
> encrypted or decrypted. We will appropriately remap the memory according
> to the specific situation in order to tell cpu how to deal with the data(
> encrypted or unencrypted). For example, when sme enabled, if the old memory
> is encrypted, we will remap the old memory in encrypted way, which will
> automatically decrypt the old memory encrypted when we read those data from
> the remapping address.
>
> ----------------------------------------------
> | first-kernel | second-kernel | kdump support |
> | (mem_encrypt=on|off) | (yes|no) |
> |--------------+---------------+---------------|
> | on | on | yes |
> | off | off | yes |
> | on | off | no |
> | off | on | no |
> |______________|_______________|_______________|
>
> Signed-off-by: Lianbo Jiang <[email protected]>
> ---
> Some changes based on V1:
> 1. remove the '#ifdef' stuff throughout this patch.
> 2. put some logic into the early_memremap_pgprot_adjust() and clean the
> previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
> arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
> 3. rewrite two functions, copy_oldmem_page() and
> copy_oldmem_page_encrypted().
> 4. distingish sme_active() and sev_active(), when a distinction doesn't
> need, mem_encrypt_active() will be used.

Lianbo, I think you modified this based on Tom's comment.
But it would be good to add this only when you tested sev and it worked
for you.

> 5. clean compile warning in copy_device_table().
>
> arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
> arch/x86/mm/ioremap.c | 4 ++++
> drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-

Assume it will not break bisection it is better to split the iommu
changes as one standalone patch and cc iommu list.

> fs/proc/vmcore.c | 20 +++++++++++++++-----
> include/linux/crash_dump.h | 5 +++++
> kernel/kexec_core.c | 12 ++++++++++++

Another two patches, one for kexec_core, another for vmcore.c will be
better for review.

> 6 files changed, 81 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index 4f2e077..a2c7b13 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -11,6 +11,23 @@
> #include <linux/uaccess.h>
> #include <linux/io.h>
>
> +static ssize_t copy_to(void *to, void *vaddr, unsigned long offset,
> + size_t size, int userbuf)
> +{
> + if (userbuf) {
> + if (copy_to_user(to, vaddr + offset, size)) {
> + iounmap(vaddr);
> + return -ENOMEM;
> + }
> + } else
> + memcpy(to, vaddr + offset, size);
> +
> + set_iounmap_nonlazy();
> + iounmap(vaddr);
> +
> + return size;
> +}
> +
> /**
> * copy_oldmem_page - copy one page from "oldmem"
> * @pfn: page frame number to be copied
> @@ -36,15 +53,20 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> if (!vaddr)
> return -ENOMEM;
>
> - if (userbuf) {
> - if (copy_to_user(buf, vaddr + offset, csize)) {
> - iounmap(vaddr);
> - return -EFAULT;
> - }
> - } else
> - memcpy(buf, vaddr + offset, csize);
> + return copy_to(buf, vaddr, offset, csize, userbuf);
> +}
>
> - set_iounmap_nonlazy();
> - iounmap(vaddr);
> - return csize;
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> + size_t csize, unsigned long offset, int userbuf)
> +{
> + void *vaddr;
> +
> + if (!csize)
> + return 0;
> +
> + vaddr = ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE);
> + if (!vaddr)
> + return -ENOMEM;
> +
> + return copy_to(buf, vaddr, offset, csize, userbuf);
> }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 24e0920..e365fc4 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -24,6 +24,7 @@
> #include <asm/pgalloc.h>
> #include <asm/pat.h>
> #include <asm/setup.h>
> +#include <linux/crash_dump.h>
>
> #include "physaddr.h"
>
> @@ -696,6 +697,9 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
> if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
> encrypted_prot = false;
>
> + if (sme_active() && is_kdump_kernel())
> + encrypted_prot = false;
> +
> return encrypted_prot ? pgprot_encrypted(prot)
> : pgprot_decrypted(prot);
> }
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 904c575..5e535a6 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -889,11 +889,23 @@ static bool copy_device_table(void)
> }
>
> old_devtb_phys = entry & PAGE_MASK;
> + /*
> + * When sme enable in the first kernel, old_devtb_phys includes the
> + * memory encryption mask(sme_me_mask), we must remove the memory
> + * encryption mask to obtain the true physical address in kdump mode.
> + */
> + if (mem_encrypt_active() && is_kdump_kernel())
> + old_devtb_phys = __sme_clr(old_devtb_phys);
> if (old_devtb_phys >= 0x100000000ULL) {
> pr_err("The address of old device table is above 4G, not trustworthy!\n");
> return false;
> }
> - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> + if (mem_encrypt_active() && is_kdump_kernel())
> + old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
> + dev_table_size);
> + else
> + old_devtb = memremap(old_devtb_phys,
> + dev_table_size, MEMREMAP_WB);
> if (!old_devtb)
> return false;
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af..4d0c884 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -25,6 +25,8 @@
> #include <linux/uaccess.h>
> #include <asm/io.h>
> #include "internal.h"
> +#include <linux/mem_encrypt.h>
> +#include <asm/pgtable.h>
>
> /* List representing chunks of contiguous memory areas and their offsets in
> * vmcore file.
> @@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)
>
> /* Reads a page from the oldmem device from given offset. */
> static ssize_t read_from_oldmem(char *buf, size_t count,
> - u64 *ppos, int userbuf)
> + u64 *ppos, int userbuf,
> + bool encrypted)
> {
> unsigned long pfn, offset;
> size_t nr_bytes;
> @@ -108,8 +111,13 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
> if (pfn_is_ram(pfn) == 0)
> memset(buf, 0, nr_bytes);
> else {
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> + if (encrypted)
> + tmp = copy_oldmem_page_encrypted(pfn, buf,
> + nr_bytes, offset, userbuf);
> + else
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> offset, userbuf);
> +
> if (tmp < 0)
> return tmp;
> }
> @@ -143,7 +151,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
> */
> ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sev_active());
> }
>
> /*
> @@ -151,7 +159,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> */
> ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sme_active());
> }
>
> /*
> @@ -161,6 +169,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
> unsigned long from, unsigned long pfn,
> unsigned long size, pgprot_t prot)
> {
> + prot = pgprot_encrypted(prot);
> return remap_pfn_range(vma, from, pfn, size, prot);
> }
>
> @@ -235,7 +244,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
> m->offset + m->size - *fpos,
> buflen);
> start = m->paddr + *fpos - m->offset;
> - tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
> + tmp = read_from_oldmem(buffer, tsz, &start, userbuf,
> + mem_encrypt_active());
> if (tmp < 0)
> return tmp;
> buflen -= tsz;
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index f7ac2aa..28b0a7c 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -25,6 +25,11 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
>
> extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
> unsigned long, int);
> +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> + size_t csize, unsigned long offset,
> + int userbuf);
> +#define copy_oldmem_page_encrypted copy_oldmem_page_encrypted
> +
> void vmcore_cleanup(void);
>
> /* Architecture code defines this if there are other possible ELF
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 20fef1a..3c22a9b 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
> }
> }
>
> + if (pages) {
> + unsigned int count, i;
> +
> + pages->mapping = NULL;
> + set_page_private(pages, order);
> + count = 1 << order;
> + for (i = 0; i < count; i++)
> + SetPageReserved(pages + i);
> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> + }
> return pages;
> }
>
> @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> result = -ENOMEM;
> goto out;
> }
> + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
> ptr = kmap(page);
> ptr += maddr & ~PAGE_MASK;
> mchunk = min_t(size_t, mbytes,
> @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> result = copy_from_user(ptr, buf, uchunk);
> kexec_flush_icache_page(page);
> kunmap(page);
> + arch_kexec_pre_free_pages(page_address(page), 1);
> if (result) {
> result = -EFAULT;
> goto out;
> --
> 2.9.5
>

2018-06-15 08:20:02

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

On 06/14/18 at 04:56pm, Dave Young wrote:
> On 06/14/18 at 04:47pm, Lianbo Jiang wrote:
> > When sme enabled on AMD server, we also need to support kdump. Because
> > the memory is encrypted in the first kernel, we will remap the old memory
> > encrypted to the second kernel(crash kernel), and sme is also enabled in
> > the second kernel, otherwise the old memory encrypted can not be decrypted.
> > Because simply changing the value of a C-bit on a page will not
> > automatically encrypt the existing contents of a page, and any data in the
> > page prior to the C-bit modification will become unintelligible. A page of
> > memory that is marked encrypted will be automatically decrypted when read
> > from DRAM and will be automatically encrypted when written to DRAM.
> >
> > For the kdump, it is necessary to distinguish whether the memory is
> > encrypted. Furthermore, we should also know which part of the memory is
> > encrypted or decrypted. We will appropriately remap the memory according
> > to the specific situation in order to tell cpu how to deal with the data(
> > encrypted or unencrypted). For example, when sme enabled, if the old memory
> > is encrypted, we will remap the old memory in encrypted way, which will
> > automatically decrypt the old memory encrypted when we read those data from
> > the remapping address.
> >
> > ----------------------------------------------
> > | first-kernel | second-kernel | kdump support |
> > | (mem_encrypt=on|off) | (yes|no) |
> > |--------------+---------------+---------------|
> > | on | on | yes |
> > | off | off | yes |
> > | on | off | no |
> > | off | on | no |
> > |______________|_______________|_______________|
> >
> > Signed-off-by: Lianbo Jiang <[email protected]>
> > ---
> > Some changes based on V1:
> > 1. remove the '#ifdef' stuff throughout this patch.
> > 2. put some logic into the early_memremap_pgprot_adjust() and clean the
> > previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
> > arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
> > 3. rewrite two functions, copy_oldmem_page() and
> > copy_oldmem_page_encrypted().
> > 4. distingish sme_active() and sev_active(), when a distinction doesn't
> > need, mem_encrypt_active() will be used.
> > 5. clean compile warning in copy_device_table().
> >
> > arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
> > arch/x86/mm/ioremap.c | 4 ++++
> > drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-
> > fs/proc/vmcore.c | 20 +++++++++++++++-----
> > include/linux/crash_dump.h | 5 +++++
> > kernel/kexec_core.c | 12 ++++++++++++
> > 6 files changed, 81 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> > index 4f2e077..a2c7b13 100644
> > --- a/arch/x86/kernel/crash_dump_64.c
> > +++ b/arch/x86/kernel/crash_dump_64.c
> > @@ -11,6 +11,23 @@
> > #include <linux/uaccess.h>
> > #include <linux/io.h>
> >
> > +static ssize_t copy_to(void *to, void *vaddr, unsigned long offset,
> > + size_t size, int userbuf)
> > +{
> > + if (userbuf) {
> > + if (copy_to_user(to, vaddr + offset, size)) {
> > + iounmap(vaddr);
> > + return -ENOMEM;
> > + }
> > + } else
> > + memcpy(to, vaddr + offset, size);
> > +
> > + set_iounmap_nonlazy();
> > + iounmap(vaddr);
> > +
> > + return size;
> > +}
>
> Hmm, the function name copy_to is strange
>
> Also since iounmap is needed in the code path but not paired with
> ioremap, it is bad. If you really want this function then need moving
> the iounmap related code to caller function. And better to rename this
> function as eg. copy_oldmem()
>

Rechecking the comments, and the robot reported build error, it can be
like this:

* move the #define copy_oldmem_page_encrypted in header file to
use a dummy inline function in case without the config option enabled.

* conditional compile your new function in Makefile with a new .c for
your copy_oldmem_page_encrypted

> > +
> > /**
> > * copy_oldmem_page - copy one page from "oldmem"
> > * @pfn: page frame number to be copied
> > @@ -36,15 +53,20 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > if (!vaddr)
> > return -ENOMEM;
> >
> > - if (userbuf) {
> > - if (copy_to_user(buf, vaddr + offset, csize)) {
> > - iounmap(vaddr);
> > - return -EFAULT;
> > - }
> > - } else
> > - memcpy(buf, vaddr + offset, csize);
> > + return copy_to(buf, vaddr, offset, csize, userbuf);
> > +}
> >
> > - set_iounmap_nonlazy();
> > - iounmap(vaddr);
> > - return csize;
> > +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> > + size_t csize, unsigned long offset, int userbuf)
> > +{
> > + void *vaddr;
> > +
> > + if (!csize)
> > + return 0;
> > +
> > + vaddr = ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE);
> > + if (!vaddr)
> > + return -ENOMEM;
> > +
> > + return copy_to(buf, vaddr, offset, csize, userbuf);
> > }
> > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> > index 24e0920..e365fc4 100644
> > --- a/arch/x86/mm/ioremap.c
> > +++ b/arch/x86/mm/ioremap.c
> > @@ -24,6 +24,7 @@
> > #include <asm/pgalloc.h>
> > #include <asm/pat.h>
> > #include <asm/setup.h>
> > +#include <linux/crash_dump.h>
> >
> > #include "physaddr.h"
> >
> > @@ -696,6 +697,9 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
> > if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
> > encrypted_prot = false;
> >
> > + if (sme_active() && is_kdump_kernel())
> > + encrypted_prot = false;
> > +
> > return encrypted_prot ? pgprot_encrypted(prot)
> > : pgprot_decrypted(prot);
> > }
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index 904c575..5e535a6 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -889,11 +889,23 @@ static bool copy_device_table(void)
> > }
> >
> > old_devtb_phys = entry & PAGE_MASK;
> > + /*
> > + * When sme enable in the first kernel, old_devtb_phys includes the
> > + * memory encryption mask(sme_me_mask), we must remove the memory
> > + * encryption mask to obtain the true physical address in kdump mode.
> > + */
> > + if (mem_encrypt_active() && is_kdump_kernel())
> > + old_devtb_phys = __sme_clr(old_devtb_phys);
> > if (old_devtb_phys >= 0x100000000ULL) {
> > pr_err("The address of old device table is above 4G, not trustworthy!\n");
> > return false;
> > }
> > - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> > + if (mem_encrypt_active() && is_kdump_kernel())
> > + old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
> > + dev_table_size);
> > + else
> > + old_devtb = memremap(old_devtb_phys,
> > + dev_table_size, MEMREMAP_WB);
> > if (!old_devtb)
> > return false;
> >
> > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> > index a45f0af..4d0c884 100644
> > --- a/fs/proc/vmcore.c
> > +++ b/fs/proc/vmcore.c
> > @@ -25,6 +25,8 @@
> > #include <linux/uaccess.h>
> > #include <asm/io.h>
> > #include "internal.h"
> > +#include <linux/mem_encrypt.h>
> > +#include <asm/pgtable.h>
> >
> > /* List representing chunks of contiguous memory areas and their offsets in
> > * vmcore file.
> > @@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)
> >
> > /* Reads a page from the oldmem device from given offset. */
> > static ssize_t read_from_oldmem(char *buf, size_t count,
> > - u64 *ppos, int userbuf)
> > + u64 *ppos, int userbuf,
> > + bool encrypted)
> > {
> > unsigned long pfn, offset;
> > size_t nr_bytes;
> > @@ -108,8 +111,13 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
> > if (pfn_is_ram(pfn) == 0)
> > memset(buf, 0, nr_bytes);
> > else {
> > - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> > + if (encrypted)
> > + tmp = copy_oldmem_page_encrypted(pfn, buf,
> > + nr_bytes, offset, userbuf);
> > + else
> > + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> > offset, userbuf);
> > +
> > if (tmp < 0)
> > return tmp;
> > }
> > @@ -143,7 +151,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
> > */
> > ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> > {
> > - return read_from_oldmem(buf, count, ppos, 0);
> > + return read_from_oldmem(buf, count, ppos, 0, sev_active());
> > }
> >
> > /*
> > @@ -151,7 +159,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> > */
> > ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> > {
> > - return read_from_oldmem(buf, count, ppos, 0);
> > + return read_from_oldmem(buf, count, ppos, 0, sme_active());
> > }
> >
> > /*
> > @@ -161,6 +169,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
> > unsigned long from, unsigned long pfn,
> > unsigned long size, pgprot_t prot)
> > {
> > + prot = pgprot_encrypted(prot);
> > return remap_pfn_range(vma, from, pfn, size, prot);
> > }
> >
> > @@ -235,7 +244,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
> > m->offset + m->size - *fpos,
> > buflen);
> > start = m->paddr + *fpos - m->offset;
> > - tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
> > + tmp = read_from_oldmem(buffer, tsz, &start, userbuf,
> > + mem_encrypt_active());
> > if (tmp < 0)
> > return tmp;
> > buflen -= tsz;
> > diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> > index f7ac2aa..28b0a7c 100644
> > --- a/include/linux/crash_dump.h
> > +++ b/include/linux/crash_dump.h
> > @@ -25,6 +25,11 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
> >
> > extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
> > unsigned long, int);
> > +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> > + size_t csize, unsigned long offset,
> > + int userbuf);
> > +#define copy_oldmem_page_encrypted copy_oldmem_page_encrypted
> > +
> > void vmcore_cleanup(void);
> >
> > /* Architecture code defines this if there are other possible ELF
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 20fef1a..3c22a9b 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
> > }
> > }
> >
> > + if (pages) {
> > + unsigned int count, i;
> > +
> > + pages->mapping = NULL;
> > + set_page_private(pages, order);
> > + count = 1 << order;
> > + for (i = 0; i < count; i++)
> > + SetPageReserved(pages + i);
> > + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> > + }
> > return pages;
> > }
> >
> > @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> > result = -ENOMEM;
> > goto out;
> > }
> > + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
> > ptr = kmap(page);
> > ptr += maddr & ~PAGE_MASK;
> > mchunk = min_t(size_t, mbytes,
> > @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
> > result = copy_from_user(ptr, buf, uchunk);
> > kexec_flush_icache_page(page);
> > kunmap(page);
> > + arch_kexec_pre_free_pages(page_address(page), 1);
> > if (result) {
> > result = -EFAULT;
> > goto out;
> > --
> > 2.9.5
> >
>
> Thanks
> Dave

2018-06-15 11:45:18

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] Support kdump when AMD secure memory encryption is active

在 2018年06月15日 15:19, Dave Young 写道:
> On 06/14/18 at 04:47pm, Lianbo Jiang wrote:
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the data(
>> encrypted or unencrypted). For example, when sme enabled, if the old memory
>> is encrypted, we will remap the old memory in encrypted way, which will
>> automatically decrypt the old memory encrypted when we read those data from
>> the remapping address.
>>
>> ----------------------------------------------
>> | first-kernel | second-kernel | kdump support |
>> | (mem_encrypt=on|off) | (yes|no) |
>> |--------------+---------------+---------------|
>> | on | on | yes |
>> | off | off | yes |
>> | on | off | no |
>> | off | on | no |
>> |______________|_______________|_______________|
>>
>> Signed-off-by: Lianbo Jiang <[email protected]>
>> ---
>> Some changes based on V1:
>> 1. remove the '#ifdef' stuff throughout this patch.
>> 2. put some logic into the early_memremap_pgprot_adjust() and clean the
>> previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
>> arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
>> 3. rewrite two functions, copy_oldmem_page() and
>> copy_oldmem_page_encrypted().
>> 4. distingish sme_active() and sev_active(), when a distinction doesn't
>> need, mem_encrypt_active() will be used.
>
> Lianbo, I think you modified this based on Tom's comment.
> But it would be good to add this only when you tested sev and it worked
> for you.
>
Thank you, Dave.
That's a good advice, because the upstream kernel(host os) doesn't work in host
side, some drivers always go wrong, we can't test SEV for kdump patch. SEV'S code
will be removed in the patch V3. This patch is only for SME about kdump. As we
previously mentioned, maybe it is more reasonable to improve the SEV in another patch.

>> 5. clean compile warning in copy_device_table().
>>
>> arch/x86/kernel/crash_dump_64.c | 42 +++++++++++++++++++++++++++++++----------
>> arch/x86/mm/ioremap.c | 4 ++++
>> drivers/iommu/amd_iommu_init.c | 14 +++++++++++++-
>
> Assume it will not break bisection it is better to split the iommu
> changes as one standalone patch and cc iommu list.
>
>> fs/proc/vmcore.c | 20 +++++++++++++++-----
>> include/linux/crash_dump.h | 5 +++++
>> kernel/kexec_core.c | 12 ++++++++++++
>
> Another two patches, one for kexec_core, another for vmcore.c will be
> better for review.
>
>> 6 files changed, 81 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
>> index 4f2e077..a2c7b13 100644
>> --- a/arch/x86/kernel/crash_dump_64.c
>> +++ b/arch/x86/kernel/crash_dump_64.c
>> @@ -11,6 +11,23 @@
>> #include <linux/uaccess.h>
>> #include <linux/io.h>
>>
>> +static ssize_t copy_to(void *to, void *vaddr, unsigned long offset,
>> + size_t size, int userbuf)
>> +{
>> + if (userbuf) {
>> + if (copy_to_user(to, vaddr + offset, size)) {
>> + iounmap(vaddr);
>> + return -ENOMEM;
>> + }
>> + } else
>> + memcpy(to, vaddr + offset, size);
>> +
>> + set_iounmap_nonlazy();
>> + iounmap(vaddr);
>> +
>> + return size;
>> +}
>> +
>> /**
>> * copy_oldmem_page - copy one page from "oldmem"
>> * @pfn: page frame number to be copied
>> @@ -36,15 +53,20 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>> if (!vaddr)
>> return -ENOMEM;
>>
>> - if (userbuf) {
>> - if (copy_to_user(buf, vaddr + offset, csize)) {
>> - iounmap(vaddr);
>> - return -EFAULT;
>> - }
>> - } else
>> - memcpy(buf, vaddr + offset, csize);
>> + return copy_to(buf, vaddr, offset, csize, userbuf);
>> +}
>>
>> - set_iounmap_nonlazy();
>> - iounmap(vaddr);
>> - return csize;
>> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
>> + size_t csize, unsigned long offset, int userbuf)
>> +{
>> + void *vaddr;
>> +
>> + if (!csize)
>> + return 0;
>> +
>> + vaddr = ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE);
>> + if (!vaddr)
>> + return -ENOMEM;
>> +
>> + return copy_to(buf, vaddr, offset, csize, userbuf);
>> }
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 24e0920..e365fc4 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>> #include <asm/pgalloc.h>
>> #include <asm/pat.h>
>> #include <asm/setup.h>
>> +#include <linux/crash_dump.h>
>>
>> #include "physaddr.h"
>>
>> @@ -696,6 +697,9 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
>> if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
>> encrypted_prot = false;
>>
>> + if (sme_active() && is_kdump_kernel())
>> + encrypted_prot = false;
>> +
>> return encrypted_prot ? pgprot_encrypted(prot)
>> : pgprot_decrypted(prot);
>> }
>> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
>> index 904c575..5e535a6 100644
>> --- a/drivers/iommu/amd_iommu_init.c
>> +++ b/drivers/iommu/amd_iommu_init.c
>> @@ -889,11 +889,23 @@ static bool copy_device_table(void)
>> }
>>
>> old_devtb_phys = entry & PAGE_MASK;
>> + /*
>> + * When sme enable in the first kernel, old_devtb_phys includes the
>> + * memory encryption mask(sme_me_mask), we must remove the memory
>> + * encryption mask to obtain the true physical address in kdump mode.
>> + */
>> + if (mem_encrypt_active() && is_kdump_kernel())
>> + old_devtb_phys = __sme_clr(old_devtb_phys);
>> if (old_devtb_phys >= 0x100000000ULL) {
>> pr_err("The address of old device table is above 4G, not trustworthy!\n");
>> return false;
>> }
>> - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
>> + if (mem_encrypt_active() && is_kdump_kernel())
>> + old_devtb = (void *)ioremap_encrypted(old_devtb_phys,
>> + dev_table_size);
>> + else
>> + old_devtb = memremap(old_devtb_phys,
>> + dev_table_size, MEMREMAP_WB);
>> if (!old_devtb)
>> return false;
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index a45f0af..4d0c884 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -25,6 +25,8 @@
>> #include <linux/uaccess.h>
>> #include <asm/io.h>
>> #include "internal.h"
>> +#include <linux/mem_encrypt.h>
>> +#include <asm/pgtable.h>
>>
>> /* List representing chunks of contiguous memory areas and their offsets in
>> * vmcore file.
>> @@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)
>>
>> /* Reads a page from the oldmem device from given offset. */
>> static ssize_t read_from_oldmem(char *buf, size_t count,
>> - u64 *ppos, int userbuf)
>> + u64 *ppos, int userbuf,
>> + bool encrypted)
>> {
>> unsigned long pfn, offset;
>> size_t nr_bytes;
>> @@ -108,8 +111,13 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
>> if (pfn_is_ram(pfn) == 0)
>> memset(buf, 0, nr_bytes);
>> else {
>> - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
>> + if (encrypted)
>> + tmp = copy_oldmem_page_encrypted(pfn, buf,
>> + nr_bytes, offset, userbuf);
>> + else
>> + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
>> offset, userbuf);
>> +
>> if (tmp < 0)
>> return tmp;
>> }
>> @@ -143,7 +151,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
>> */
>> ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>> {
>> - return read_from_oldmem(buf, count, ppos, 0);
>> + return read_from_oldmem(buf, count, ppos, 0, sev_active());
>> }
>>
>> /*
>> @@ -151,7 +159,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>> */
>> ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>> {
>> - return read_from_oldmem(buf, count, ppos, 0);
>> + return read_from_oldmem(buf, count, ppos, 0, sme_active());
>> }
>>
>> /*
>> @@ -161,6 +169,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
>> unsigned long from, unsigned long pfn,
>> unsigned long size, pgprot_t prot)
>> {
>> + prot = pgprot_encrypted(prot);
>> return remap_pfn_range(vma, from, pfn, size, prot);
>> }
>>
>> @@ -235,7 +244,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
>> m->offset + m->size - *fpos,
>> buflen);
>> start = m->paddr + *fpos - m->offset;
>> - tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
>> + tmp = read_from_oldmem(buffer, tsz, &start, userbuf,
>> + mem_encrypt_active());
>> if (tmp < 0)
>> return tmp;
>> buflen -= tsz;
>> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
>> index f7ac2aa..28b0a7c 100644
>> --- a/include/linux/crash_dump.h
>> +++ b/include/linux/crash_dump.h
>> @@ -25,6 +25,11 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
>>
>> extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
>> unsigned long, int);
>> +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
>> + size_t csize, unsigned long offset,
>> + int userbuf);
>> +#define copy_oldmem_page_encrypted copy_oldmem_page_encrypted
>> +
>> void vmcore_cleanup(void);
>>
>> /* Architecture code defines this if there are other possible ELF
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 20fef1a..3c22a9b 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
>> }
>> }
>>
>> + if (pages) {
>> + unsigned int count, i;
>> +
>> + pages->mapping = NULL;
>> + set_page_private(pages, order);
>> + count = 1 << order;
>> + for (i = 0; i < count; i++)
>> + SetPageReserved(pages + i);
>> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
>> + }
>> return pages;
>> }
>>
>> @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>> result = -ENOMEM;
>> goto out;
>> }
>> + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>> ptr = kmap(page);
>> ptr += maddr & ~PAGE_MASK;
>> mchunk = min_t(size_t, mbytes,
>> @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>> result = copy_from_user(ptr, buf, uchunk);
>> kexec_flush_icache_page(page);
>> kunmap(page);
>> + arch_kexec_pre_free_pages(page_address(page), 1);
>> if (result) {
>> result = -EFAULT;
>> goto out;
>> --
>> 2.9.5
>>