2017-07-24 19:08:28

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 00/17] x86: Secure Encrypted Virtualization (AMD)

This part of Secure Encrypted Virtualization (SEV) series focuses on the
changes required in a guest OS for SEV support.

When SEV is active, the memory content of guest OS will be transparently encrypted
with a key unique to the guest VM.

SEV guests have concept of private and shared memory. Private memory is encrypted
with the guest-specific key, while shared memory may be encrypted with hypervisor
key. Certain type of memory (namely insruction pages and guest page tables) are
always treated as private. Due to security reasons all DMA operations inside the
guest must be performed on shared memory.

The SEV feature is enabled by the hypervisor, and guest can identify it through
CPUID function and the 0xc0010131 (F17H_SEV) MSR. When enabled, page table entries
will determine how memory is accessed. If a page table entry has the memory
encryption mask set, then that memory will be accessed using guest-specific key.
Certain memory (instruction pages, page tables) will always be accessed using
guest-specific key.

This patch series builds upon the Secure Memory Encryption (SME) feature. Unlike
SME, when SEV is enabled, all the data (e.g EFI, kernel, initrd, etc) will have
been placed into memory as encrypted by the guest BIOS.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot. Since the DMA operations inside guest must be
performed on shared memory hence it uses SW-IOTLB to complete the DMA operations.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
SEV support has been accepted into EDKII/OVMF BIOS
https://github.com/tianocore/edk2/commits/master

---
This RFC is based on tip/master commit : 22db3de (Merge branch 'x86/mm').
Complete git tree is available: https://github.com/codomania/tip/tree/sev-rfc-3

Changes since v2:
* add documentation
* update early_set_memory_* to use kernel_physical_mapping_init()
to split larger page into smaller (recommended by Boris)
* changes to address v2 feedback

Brijesh Singh (4):
Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV)
descrption
x86: Add support for changing memory encryption attribute in early
boot
X86/KVM: Provide support to create Guest and HV shared per-CPU
variables
X86/KVM: Clear encryption attribute when SEV is active

Tom Lendacky (13):
x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature
x86/mm: Secure Encrypted Virtualization (SEV) support
x86/mm: Don't attempt to encrypt initrd under SEV
x86, realmode: Don't decrypt trampoline area under SEV
x86/mm: Use encrypted access of boot related data with SEV
x86/mm: Include SEV for encryption memory attribute changes
x86/efi: Access EFI data as encrypted when SEV is active
resource: Consolidate resource walking code
resource: Provide resource struct in resource walk callback
x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory
pages
x86/mm: DMA support for SEV memory encryption
x86/io: Unroll string I/O when SEV is active
x86/boot: Add early boot support when running with SEV active

Documentation/x86/amd-memory-encryption.txt | 29 +++-
arch/powerpc/kernel/machine_kexec_file_64.c | 12 +-
arch/x86/boot/compressed/Makefile | 2 +
arch/x86/boot/compressed/head_64.S | 16 ++
arch/x86/boot/compressed/mem_encrypt.S | 103 ++++++++++++
arch/x86/boot/compressed/misc.h | 2 +
arch/x86/boot/compressed/pagetable.c | 8 +-
arch/x86/entry/vdso/vma.c | 5 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/io.h | 26 ++-
arch/x86/include/asm/mem_encrypt.h | 22 +++
arch/x86/include/asm/msr-index.h | 5 +
arch/x86/include/uapi/asm/kvm_para.h | 1 -
arch/x86/kernel/cpu/amd.c | 30 +++-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/crash.c | 18 +-
arch/x86/kernel/kvm.c | 46 +++++-
arch/x86/kernel/kvmclock.c | 64 ++++++-
arch/x86/kernel/pmem.c | 2 +-
arch/x86/kernel/setup.c | 6 +-
arch/x86/mm/ioremap.c | 72 ++++++--
arch/x86/mm/mem_encrypt.c | 248 +++++++++++++++++++++++++++-
arch/x86/mm/pageattr.c | 4 +-
arch/x86/platform/efi/efi_64.c | 15 +-
arch/x86/realmode/init.c | 6 +-
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/ioport.h | 7 +-
include/linux/kexec.h | 2 +-
include/linux/mem_encrypt.h | 8 +-
include/linux/percpu-defs.h | 12 ++
kernel/kexec_file.c | 5 +-
kernel/resource.c | 75 +++++----
lib/swiotlb.c | 5 +-
33 files changed, 755 insertions(+), 106 deletions(-)
create mode 100644 arch/x86/boot/compressed/mem_encrypt.S

--
2.9.4


2017-07-24 19:08:46

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

Update amd-memory-encryption document describing the AMD Secure Encrypted
Virtualization (SEV) feature.

Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/x86/amd-memory-encryption.txt | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
index f512ab7..747df07 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -1,4 +1,5 @@
-Secure Memory Encryption (SME) is a feature found on AMD processors.
+Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
+features found on AMD processors.

SME provides the ability to mark individual pages of memory as encrypted using
the standard x86 page tables. A page that is marked encrypted will be
@@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when written to
DRAM. SME can therefore be used to protect the contents of DRAM from physical
attacks on the system.

+SEV enables running encrypted virtual machine (VMs) in which the code and data
+of the virtual machine are secured so that decrypted version is available only
+within the VM itself. SEV guest VMs have concept of private and shared memory.
+Private memory is encrypted with the guest-specific key, while shared memory
+may be encrypted with hypervisor key.
+
A page is encrypted when a page table entry has the encryption bit set (see
below on how to determine its position). The encryption bit can also be
specified in the cr3 register, allowing the PGD table to be encrypted. Each
@@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption bit in the PGD entry
for a PUD which results in the PUD pointed to by that entry to not be
encrypted.

-Support for SME can be determined through the CPUID instruction. The CPUID
-function 0x8000001f reports information related to SME:
+When SEV is enabled, certain type of memory (namely insruction pages and guest
+page tables) are always treated as private. Due to security reasons all DMA
+operations inside the guest must be performed on shared memory. Since the
+memory encryption bit is only controllable by the guest OS when it is operating
+in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory
+encryption bit to 1.
+
+Support for SME and SEV can be determined through the CPUID instruction. The
+CPUID function 0x8000001f reports information related to SME:

0x8000001f[eax]:
Bit[0] indicates support for SME
+ 0x800001f[eax]:
+ Bit[1] indicates support for SEV
0x8000001f[ebx]:
Bits[5:0] pagetable bit number used to activate memory
encryption
@@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory encryption:
Bit[23] 0 = memory encryption features are disabled
1 = memory encryption features are enabled

+If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if
+SEV is active:
+
+ 0xc0010131:
+ Bit[0] 0 = memory encryption is not active
+ 1 = memory encryption is active
+
Linux relies on BIOS to set this bit if BIOS has determined that the reduction
in the physical address space as a result of enabling memory encryption (see
CPUID information above) will not conflict with the address space resource
--
2.9.4

2017-07-24 19:09:07

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

From: Tom Lendacky <[email protected]>

Provide support for Secure Encyrpted Virtualization (SEV). This initial
support defines a flag that is used by the kernel to determine if it is
running with SEV active.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 2 ++
arch/x86/mm/mem_encrypt.c | 3 +++
include/linux/mem_encrypt.h | 8 +++++++-
3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 8e618fc..9274ec7 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -22,6 +22,7 @@
#ifdef CONFIG_AMD_MEM_ENCRYPT

extern unsigned long sme_me_mask;
+extern unsigned int sev_enabled;

void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
unsigned long decrypted_kernel_vaddr,
@@ -50,6 +51,7 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
#else /* !CONFIG_AMD_MEM_ENCRYPT */

#define sme_me_mask 0UL
+#define sev_enabled 0

static inline void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size) { }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 0fbd092..1e4643e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -40,6 +40,9 @@ static char sme_cmdline_off[] __initdata = "off";
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);

+unsigned int sev_enabled __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sev_enabled);
+
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);

diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 1255f09..ea0831a 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,12 +22,18 @@
#else /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */

#define sme_me_mask 0UL
+#define sev_enabled 0

#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */

static inline bool sme_active(void)
{
- return !!sme_me_mask;
+ return (sme_me_mask && !sev_enabled);
+}
+
+static inline bool sev_active(void)
+{
+ return (sme_me_mask && sev_enabled);
}

static inline unsigned long sme_get_me_mask(void)
--
2.9.4

2017-07-24 19:09:21

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

From: Tom Lendacky <[email protected]>

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature. SME is identified by
CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/amd.c | 30 +++++++++++++++++++++++++-----
arch/x86/kernel/cpu/scattered.c | 1 +
4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 14f0f29..b6ae647 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
+#define X86_FEATURE_SEV ( 7*32+11) /* AMD Secure Encrypted Virtualization */

#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 17f5c12..e399d68 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -378,6 +378,8 @@
#define MSR_K7_PERFCTR3 0xc0010007
#define MSR_K7_CLK_CTL 0xc001001b
#define MSR_K7_HWCR 0xc0010015
+#define MSR_K7_HWCR_SMMLOCK_BIT 0
+#define MSR_K7_HWCR_SMMLOCK BIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
#define MSR_K7_FID_VID_CTL 0xc0010041
#define MSR_K7_FID_VID_STATUS 0xc0010042

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 110ca5d..c413f04 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -618,11 +618,16 @@ static void early_init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_E400);

/*
- * BIOS support is required for SME. If BIOS has enabled SME then
- * adjust x86_phys_bits by the SME physical address space reduction
- * value. If BIOS has not enabled SME then don't advertise the
- * feature (set in scattered.c). Also, since the SME support requires
- * long mode, don't advertise the feature under CONFIG_X86_32.
+ * BIOS support is required for SME and SEV.
+ * For SME: If BIOS has enabled SME then adjust x86_phys_bits by
+ * the SME physical address space reduction value.
+ * If BIOS has not enabled SME then don't advertise the
+ * SME feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise the
+ * SEV feature (set in scattered.c).
+ *
+ * In all cases, since support for SME and SEV requires long mode,
+ * don't advertise the feature under CONFIG_X86_32.
*/
if (cpu_has(c, X86_FEATURE_SME)) {
u64 msr;
@@ -637,6 +642,21 @@ static void early_init_amd(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SME);
}
}
+
+ if (cpu_has(c, X86_FEATURE_SEV)) {
+ if (IS_ENABLED(CONFIG_X86_32)) {
+ clear_cpu_cap(c, X86_FEATURE_SEV);
+ } else {
+ u64 syscfg, hwcr;
+
+ /* Check if SEV is enabled */
+ rdmsrl(MSR_K8_SYSCFG, syscfg);
+ rdmsrl(MSR_K7_HWCR, hwcr);
+ if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
+ !(hwcr & MSR_K7_HWCR_SMMLOCK))
+ clear_cpu_cap(c, X86_FEATURE_SEV);
+ }
+ }
}

static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 05459ad..63a78d5 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -32,6 +32,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 },
{ X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
{ X86_FEATURE_SME, CPUID_EAX, 0, 0x8000001f, 0 },
+ { X86_FEATURE_SEV, CPUID_EAX, 1, 0x8000001f, 0 },
{ 0, 0, 0, 0, 0 }
};

--
2.9.4

2017-07-24 19:09:42

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 04/17] x86/mm: Don't attempt to encrypt initrd under SEV

From: Tom Lendacky <[email protected]>

When SEV is active the initrd/initramfs will already have already been
placed in memory encyrpted so do not try to encrypt it.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/setup.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0bfe0c1..01d56a1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -379,9 +379,11 @@ static void __init reserve_initrd(void)
* If SME is active, this memory will be marked encrypted by the
* kernel when it is accessed (including relocation). However, the
* ramdisk image was loaded decrypted by the bootloader, so make
- * sure that it is encrypted before accessing it.
+ * sure that it is encrypted before accessing it. For SEV the
+ * ramdisk will already be encyrpted, so only do this for SME.
*/
- sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+ if (sme_active())
+ sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);

initrd_start = 0;

--
2.9.4

2017-07-24 19:10:14

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

From: Tom Lendacky <[email protected]>

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e8388..1ecb3f6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
#include <linux/reboot.h>
#include <linux/slab.h>
#include <linux/ucs2_string.h>
+#include <linux/mem_encrypt.h>

#include <asm/setup.h>
#include <asm/page.h>
@@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* as trim_bios_range() will reserve the first page and isolate it away
* from memory allocators anyway.
*/
- if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+ pf = _PAGE_RW;
+ if (sev_active())
+ pf |= _PAGE_ENC;
+ if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;

+ if (sev_active())
+ flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
pgd_t *pgd = efi_pgd;
int err1, err2;

+ if (sev_active())
+ pf |= _PAGE_ENC;
+
/* Update the 1:1 mapping */
pfn = md->phys_addr >> PAGE_SHIFT;
err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, pf);
@@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;

+ if (sev_active())
+ pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
}
--
2.9.4

2017-07-24 19:10:29

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

From: Tom Lendacky <[email protected]>

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change. Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/mm/pageattr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d65..b726b23 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
unsigned long start;
int ret;

- /* Nothing to do if the SME is not active */
- if (!sme_active())
+ /* Nothing to do if SME and SEV are not active */
+ if (!sme_active() && !sev_active())
return 0;

/* Should not be working on unaligned addresses */
--
2.9.4

2017-07-24 19:10:46

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

From: Tom Lendacky <[email protected]>

When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/mm/ioremap.c | 44 ++++++++++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 34f0e18..c0be7cf 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -422,6 +422,9 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
* areas should be mapped decrypted. And since the encryption key can
* change across reboots, persistent memory should also be mapped
* decrypted.
+ *
+ * If SEV is active, that implies that BIOS/UEFI also ran encrypted so
+ * only persistent memory should be mapped decrypted.
*/
static bool memremap_should_map_decrypted(resource_size_t phys_addr,
unsigned long size)
@@ -458,6 +461,11 @@ static bool memremap_should_map_decrypted(resource_size_t phys_addr,
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+ /* For SEV, these areas are encrypted */
+ if (sev_active())
+ break;
+ /* Fallthrough */
+
case E820_TYPE_PRAM:
return true;
default:
@@ -581,7 +589,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
unsigned long flags)
{
- if (!sme_active())
+ if (!sme_active() && !sev_active())
return true;

if (flags & MEMREMAP_ENC)
@@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
if (flags & MEMREMAP_DEC)
return false;

- if (memremap_is_setup_data(phys_addr, size) ||
- memremap_is_efi_data(phys_addr, size) ||
- memremap_should_map_decrypted(phys_addr, size))
- return false;
+ if (sme_active()) {
+ if (memremap_is_setup_data(phys_addr, size) ||
+ memremap_is_efi_data(phys_addr, size) ||
+ memremap_should_map_decrypted(phys_addr, size))
+ return false;
+ } else if (sev_active()) {
+ if (memremap_should_map_decrypted(phys_addr, size))
+ return false;
+ }

return true;
}
@@ -608,15 +621,22 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
unsigned long size,
pgprot_t prot)
{
- if (!sme_active())
+ if (!sme_active() && !sev_active())
return prot;

- if (early_memremap_is_setup_data(phys_addr, size) ||
- memremap_is_efi_data(phys_addr, size) ||
- memremap_should_map_decrypted(phys_addr, size))
- prot = pgprot_decrypted(prot);
- else
- prot = pgprot_encrypted(prot);
+ if (sme_active()) {
+ if (early_memremap_is_setup_data(phys_addr, size) ||
+ memremap_is_efi_data(phys_addr, size) ||
+ memremap_should_map_decrypted(phys_addr, size))
+ prot = pgprot_decrypted(prot);
+ else
+ prot = pgprot_encrypted(prot);
+ } else if (sev_active()) {
+ if (memremap_should_map_decrypted(phys_addr, size))
+ prot = pgprot_decrypted(prot);
+ else
+ prot = pgprot_encrypted(prot);
+ }

return prot;
}
--
2.9.4

2017-07-24 19:11:00

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

From: Tom Lendacky <[email protected]>

When SEV is active the trampoline area will need to be in encrypted
memory so only mark the area decrypted if SME is active.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/realmode/init.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 1f71980..c7eeca7 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
/*
* If SME is active, the trampoline area will need to be in
* decrypted memory in order to bring up other processors
- * successfully.
+ * successfully. For SEV the trampoline area needs to be in
+ * encrypted memory, so only do this for SME.
*/
- set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+ if (sme_active())
+ set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);

memcpy(base, real_mode_blob, size);

--
2.9.4

2017-07-24 19:11:29

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

From: Tom Lendacky <[email protected]>

The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code. Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
kernel/resource.c | 53 ++++++++++++++++++++++++++---------------------------
1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..7b20b3e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
res->start = p->start;
if (res->end > p->end)
res->end = p->end;
+ res->desc = p->desc;
return 0;
}

+static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
+ bool first_level_children_only,
+ void *arg, int (*func)(u64, u64, void *))
+{
+ u64 orig_end = res->end;
+ int ret = -1;
+
+ while ((res->start < res->end) &&
+ !find_next_iomem_res(res, desc, first_level_children_only)) {
+ ret = (*func)(res->start, res->end, arg);
+ if (ret)
+ break;
+
+ res->start = res->end + 1;
+ res->end = orig_end;
+ }
+
+ return ret;
+}
+
/*
* Walks through iomem resources and calls func() with matching resource
* ranges. This walks through whole tree and not just first level children.
@@ -418,26 +439,12 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
u64 end, void *arg, int (*func)(u64, u64, void *))
{
struct resource res;
- u64 orig_end;
- int ret = -1;

res.start = start;
res.end = end;
res.flags = flags;
- orig_end = res.end;
-
- while ((res.start < res.end) &&
- (!find_next_iomem_res(&res, desc, false))) {
-
- ret = (*func)(res.start, res.end, arg);
- if (ret)
- break;
-
- res.start = res.end + 1;
- res.end = orig_end;
- }

- return ret;
+ return __walk_iomem_res_desc(&res, desc, false, arg, func);
}

/*
@@ -451,22 +458,13 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *))
{
struct resource res;
- u64 orig_end;
- int ret = -1;

res.start = start;
res.end = end;
res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
- orig_end = res.end;
- while ((res.start < res.end) &&
- (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
- ret = (*func)(res.start, res.end, arg);
- if (ret)
- break;
- res.start = res.end + 1;
- res.end = orig_end;
- }
- return ret;
+
+ return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+ arg, func);
}

#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
@@ -508,6 +506,7 @@ static int __is_ram(unsigned long pfn, unsigned long nr_pages, void *arg)
{
return 1;
}
+
/*
* This generic page_is_ram() returns true if specified address is
* registered as System RAM in iomem_resource list.
--
2.9.4

2017-07-24 19:11:59

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

From: Tom Lendacky <[email protected]>

In prep for a new function that will need additional resource information
during the resource walk, update the resource walk callback to pass the
resource structure. Since the current callback start and end arguments
are pulled from the resource structure, the callback functions can obtain
them from the resource structure directly.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/powerpc/kernel/machine_kexec_file_64.c | 12 +++++++++---
arch/x86/kernel/crash.c | 18 +++++++++---------
arch/x86/kernel/pmem.c | 2 +-
include/linux/ioport.h | 4 ++--
include/linux/kexec.h | 2 +-
kernel/kexec_file.c | 5 +++--
kernel/resource.c | 9 +++++----
7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c b/arch/powerpc/kernel/machine_kexec_file_64.c
index 992c0d2..e4395f9 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
* and that value will be returned. If all free regions are visited without
* func returning non-zero, then zero will be returned.
*/
-int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+ int (*func)(struct resource *, void *))
{
int ret = 0;
u64 i;
phys_addr_t mstart, mend;
+ struct resource res = { };

if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
* range while in kexec, end points to the last byte
* in the range.
*/
- ret = func(mstart, mend - 1, kbuf);
+ res.start = mstart;
+ res.end = mend - 1;
+ ret = func(&res, kbuf);
if (ret)
break;
}
@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
* range while in kexec, end points to the last byte
* in the range.
*/
- ret = func(mstart, mend - 1, kbuf);
+ res.start = mstart;
+ res.end = mend - 1;
+ ret = func(&res, kbuf);
if (ret)
break;
}
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2..815008c 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
}

#ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
{
unsigned int *nr_ranges = arg;

@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data *ced,
return ret;
}

-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
{
struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr;
@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
ehdr = ced->ehdr;

/* Exclude unwanted mem ranges */
- ret = elf_header_exclude_ranges(ced, start, end);
+ ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret)
return ret;

@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
return 0;
}

-static int memmap_entry_callback(u64 start, u64 end, void *arg)
+static int memmap_entry_callback(struct resource *res, void *arg)
{
struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params;
struct e820_entry ei;

- ei.addr = start;
- ei.size = end - start + 1;
+ ei.addr = res->start;
+ ei.size = res->end - res->start + 1;
ei.type = cmd->type;
add_e820_entry(params, &ei);

@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
return ret;
}

-static int determine_backup_region(u64 start, u64 end, void *arg)
+static int determine_backup_region(struct resource *res, void *arg)
{
struct kimage *image = arg;

- image->arch.backup_src_start = start;
- image->arch.backup_src_sz = end - start + 1;
+ image->arch.backup_src_start = res->start;
+ image->arch.backup_src_sz = res->end - res->start + 1;

/* Expecting only one range for backup region */
return 1;
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 0c5315d..297bb71 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -6,7 +6,7 @@
#include <linux/init.h>
#include <linux/ioport.h>

-static int found(u64 start, u64 end, void *data)
+static int found(struct resource *res, void *data)
{
return 1;
}
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064..1c66b9c 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -269,10 +269,10 @@ walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *));
extern int
walk_system_ram_res(u64 start, u64 end, void *arg,
- int (*func)(u64, u64, void *));
+ int (*func)(struct resource *, void *));
extern int
walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
- void *arg, int (*func)(u64, u64, void *));
+ void *arg, int (*func)(struct resource *, void *));

/* True if any part of r1 overlaps r2 */
static inline bool resource_overlaps(struct resource *r1, struct resource *r2)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 2b7590f..d672f95 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -159,7 +159,7 @@ struct kexec_buf {
};

int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
- int (*func)(u64, u64, void *));
+ int (*func)(struct resource *, void *));
extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
#endif /* CONFIG_KEXEC_FILE */
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 9f48f44..e5bcd94 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -406,9 +406,10 @@ static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end,
return 1;
}

-static int locate_mem_hole_callback(u64 start, u64 end, void *arg)
+static int locate_mem_hole_callback(struct resource *res, void *arg)
{
struct kexec_buf *kbuf = (struct kexec_buf *)arg;
+ u64 start = res->start, end = res->end;
unsigned long sz = end - start + 1;

/* Returning 0 will take to next memory range */
@@ -437,7 +438,7 @@ static int locate_mem_hole_callback(u64 start, u64 end, void *arg)
* func returning non-zero, then zero will be returned.
*/
int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
- int (*func)(u64, u64, void *))
+ int (*func)(struct resource *, void *))
{
if (kbuf->image->type == KEXEC_TYPE_CRASH)
return walk_iomem_res_desc(crashk_res.desc,
diff --git a/kernel/resource.c b/kernel/resource.c
index 7b20b3e..5f9ee7bb0 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -403,14 +403,15 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,

static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
bool first_level_children_only,
- void *arg, int (*func)(u64, u64, void *))
+ void *arg,
+ int (*func)(struct resource *, void *))
{
u64 orig_end = res->end;
int ret = -1;

while ((res->start < res->end) &&
!find_next_iomem_res(res, desc, first_level_children_only)) {
- ret = (*func)(res->start, res->end, arg);
+ ret = (*func)(res, arg);
if (ret)
break;

@@ -436,7 +437,7 @@ static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
* <linux/ioport.h> and set it in 'desc' of a target resource entry.
*/
int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
- u64 end, void *arg, int (*func)(u64, u64, void *))
+ u64 end, void *arg, int (*func)(struct resource *, void *))
{
struct resource res;

@@ -455,7 +456,7 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
* ranges.
*/
int walk_system_ram_res(u64 start, u64 end, void *arg,
- int (*func)(u64, u64, void *))
+ int (*func)(struct resource *, void *))
{
struct resource res;

--
2.9.4

2017-07-24 19:12:37

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption

From: Tom Lendacky <[email protected]>

DMA access to memory mapped as encrypted while SEV is active can not be
encrypted during device write or decrypted during device read. In order
for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
must be used.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/mm/mem_encrypt.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
lib/swiotlb.c | 5 +--
2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1e4643e..5e5d460 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -191,8 +191,86 @@ void __init sme_early_init(void)
/* Update the protection map with memory encryption mask */
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = pgprot_encrypted(protection_map[i]);
+
+ if (sev_active())
+ swiotlb_force = SWIOTLB_FORCE;
+}
+
+static void *sme_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
+{
+ unsigned long dma_mask;
+ unsigned int order;
+ struct page *page;
+ void *vaddr = NULL;
+
+ dma_mask = dma_alloc_coherent_mask(dev, gfp);
+ order = get_order(size);
+
+ /*
+ * Memory will be memset to zero after marking decrypted, so don't
+ * bother clearing it before.
+ */
+ gfp &= ~__GFP_ZERO;
+
+ page = alloc_pages_node(dev_to_node(dev), gfp, order);
+ if (page) {
+ dma_addr_t addr;
+
+ /*
+ * Since we will be clearing the encryption bit, check the
+ * mask with it already cleared.
+ */
+ addr = __sme_clr(phys_to_dma(dev, page_to_phys(page)));
+ if ((addr + size) > dma_mask) {
+ __free_pages(page, get_order(size));
+ } else {
+ vaddr = page_address(page);
+ *dma_handle = addr;
+ }
+ }
+
+ if (!vaddr)
+ vaddr = swiotlb_alloc_coherent(dev, size, dma_handle, gfp);
+
+ if (!vaddr)
+ return NULL;
+
+ /* Clear the SME encryption bit for DMA use if not swiotlb area */
+ if (!is_swiotlb_buffer(dma_to_phys(dev, *dma_handle))) {
+ set_memory_decrypted((unsigned long)vaddr, 1 << order);
+ memset(vaddr, 0, PAGE_SIZE << order);
+ *dma_handle = __sme_clr(*dma_handle);
+ }
+
+ return vaddr;
+}
+
+static void sme_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
+{
+ /* Set the SME encryption bit for re-use if not swiotlb area */
+ if (!is_swiotlb_buffer(dma_to_phys(dev, dma_handle)))
+ set_memory_encrypted((unsigned long)vaddr,
+ 1 << get_order(size));
+
+ swiotlb_free_coherent(dev, size, vaddr, dma_handle);
}

+static const struct dma_map_ops sme_dma_ops = {
+ .alloc = sme_alloc,
+ .free = sme_free,
+ .map_page = swiotlb_map_page,
+ .unmap_page = swiotlb_unmap_page,
+ .map_sg = swiotlb_map_sg_attrs,
+ .unmap_sg = swiotlb_unmap_sg_attrs,
+ .sync_single_for_cpu = swiotlb_sync_single_for_cpu,
+ .sync_single_for_device = swiotlb_sync_single_for_device,
+ .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
+ .sync_sg_for_device = swiotlb_sync_sg_for_device,
+ .mapping_error = swiotlb_dma_mapping_error,
+};
+
/* Architecture __weak replacement functions */
void __init mem_encrypt_init(void)
{
@@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();

+ /*
+ * With SEV, DMA operations cannot use encryption. New DMA ops
+ * are required in order to mark the DMA areas as decrypted or
+ * to use bounce buffers.
+ */
+ if (sev_active())
+ dma_ops = &sme_dma_ops;
+
pr_info("AMD Secure Memory Encryption (SME) active\n");
}

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 8c6c83e..85fed2f 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,8 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");

- if (sme_active())
- pr_warn_once("SME is active and system is using DMA bounce buffers\n");
+ if (sme_active() || sev_active())
+ pr_warn_once("%s is active and system is using DMA bounce buffers\n",
+ sme_active() ? "SME" : "SEV");

mask = dma_get_seg_boundary(hwdev);

--
2.9.4

2017-07-24 19:12:57

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

From: Tom Lendacky <[email protected]>

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/mm/ioremap.c | 28 ++++++++++++++++++++++++++++
include/linux/ioport.h | 3 +++
kernel/resource.c | 17 +++++++++++++++++
3 files changed, 48 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c0be7cf..7b27332 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
return 0;
}

+static int __ioremap_res_desc_other(struct resource *res, void *arg)
+{
+ return (res->desc != IORES_DESC_NONE);
+}
+
+/*
+ * This function returns true if the target memory is marked as
+ * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
+ * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
+{
+ u64 start, end;
+
+ start = (u64)addr;
+ end = start + size - 1;
+
+ return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
+}
+
/*
* Remap an arbitrary physical address space into the kernel virtual
* address space. It transparently creates kernel huge I/O mapping when
@@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
pcm = new_pcm;
}

+ /*
+ * If the page being mapped is in memory and SEV is active then
+ * make sure the memory encryption attribute is enabled in the
+ * resulting mapping.
+ */
prot = PAGE_KERNEL_IO;
+ if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
+ prot = pgprot_encrypted(prot);
+
switch (pcm) {
case _PAGE_CACHE_MODE_UC:
default:
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 1c66b9c..297f5b8 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -268,6 +268,9 @@ extern int
walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *));
extern int
+walk_mem_res(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *));
+extern int
walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
extern int
diff --git a/kernel/resource.c b/kernel/resource.c
index 5f9ee7bb0..ec3fa0c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -468,6 +468,23 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
arg, func);
}

+/*
+ * This function calls the @func callback against all memory ranges, which
+ * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
+ */
+int walk_mem_res(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *))
+{
+ struct resource res;
+
+ res.start = start;
+ res.end = end;
+ res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+
+ return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+ arg, func);
+}
+
#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)

/*
--
2.9.4

2017-07-24 19:14:43

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 17/17] X86/KVM: Clear encryption attribute when SEV is active

The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
periodically updates the contents of the memory. When SEV is active, we
must clear the encryption attributes from the shared memory pages so that
both hypervisor and guest can access the data.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/entry/vdso/vma.c | 5 ++--
arch/x86/kernel/kvmclock.c | 64 +++++++++++++++++++++++++++++++++++++++-------
2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 726355c..ff50251 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va();
if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
- ret = vm_insert_pfn(
+ ret = vm_insert_pfn_prot(
vma,
vmf->address,
- __pa(pvti) >> PAGE_SHIFT);
+ __pa(pvti) >> PAGE_SHIFT,
+ pgprot_decrypted(vma->vm_page_prot));
}
} else if (sym_offset == image->sym_hvclock_page) {
struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..f3a8101 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/sched/clock.h>

+#include <asm/mem_encrypt.h>
#include <asm/x86_init.h>
#include <asm/reboot.h>
#include <asm/kvmclock.h>
@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);

/* The hypervisor will put information about time periodically here */
static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock wall_clock;
+static struct pvclock_wall_clock *wall_clock;

struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
{
@@ -64,15 +65,18 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high;
int cpu;

- low = (int)__pa_symbol(&wall_clock);
- high = ((u64)__pa_symbol(&wall_clock) >> 32);
+ if (!wall_clock)
+ return;
+
+ low = (int)slow_virt_to_phys(wall_clock);
+ high = ((u64)slow_virt_to_phys(wall_clock) >> 32);

native_write_msr(msr_kvm_wall_clock, low, high);

cpu = get_cpu();

vcpu_time = &hv_clock[cpu].pvti;
- pvclock_read_wallclock(&wall_clock, vcpu_time, now);
+ pvclock_read_wallclock(wall_clock, vcpu_time, now);

put_cpu();
}
@@ -249,11 +253,39 @@ static void kvm_shutdown(void)
native_machine_shutdown();
}

+static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
+ phys_addr_t align)
+{
+ phys_addr_t mem;
+
+ mem = memblock_alloc(size, align);
+ if (!mem)
+ return 0;
+
+ if (sev_active()) {
+ if (early_set_memory_decrypted(mem, size))
+ goto e_free;
+ }
+
+ return mem;
+e_free:
+ memblock_free(mem, size);
+ return 0;
+}
+
+static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+{
+ if (sev_active())
+ early_set_memory_encrypted(addr, size);
+
+ memblock_free(addr, size);
+}
+
void __init kvmclock_init(void)
{
struct pvclock_vcpu_time_info *vcpu_time;
- unsigned long mem;
- int size, cpu;
+ unsigned long mem, mem_wall_clock;
+ int size, cpu, wall_clock_size;
u8 flags;

size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
@@ -270,15 +302,29 @@ void __init kvmclock_init(void)
printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);

- mem = memblock_alloc(size, PAGE_SIZE);
- if (!mem)
+ wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
+ mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
+ if (!mem_wall_clock)
return;
+
+ wall_clock = __va(mem_wall_clock);
+ memset(wall_clock, 0, wall_clock_size);
+
+ mem = kvm_memblock_alloc(size, PAGE_SIZE);
+ if (!mem) {
+ kvm_memblock_free(mem_wall_clock, wall_clock_size);
+ wall_clock = NULL;
+ return;
+ }
+
hv_clock = __va(mem);
memset(hv_clock, 0, size);

if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL;
- memblock_free(mem, size);
+ kvm_memblock_free(mem, size);
+ kvm_memblock_free(mem_wall_clock, wall_clock_size);
+ wall_clock = NULL;
return;
}

--
2.9.4

2017-07-24 19:15:00

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
variable at compile time and share its physical address with hypervisor.
It presents a challege when SEV is active in guest OS, when SEV is active,
the guest memory is encrypted with guest key hence hypervisor will not
able to modify the guest memory. When SEV is active, we need to clear the
encryption attribute (aka C-bit) of shared physical addresses so that both
guest and hypervisor can access the data.

To solve this problem, I have tried these three options:

1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV
is detected clear the C-bit from the page table. But while doing so I
found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init
was called.

2) Since the C-bit works on PAGE_SIZE hence add some extra padding to
'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime
clear the encryption attribute of the full PAGE. The downside of this -
we need to modify structure which may break the compatibility.

3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
used to hold the compile time shared per-CPU variables. When SEV is
detected we map this section without C-bit.

This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED
macro to create a compile time per-CPU variable. When SEV is detected we
clear the C-bit from the shared per-CPU variable.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/kvm.c | 46 ++++++++++++++++++++++++++++++++++++---
include/asm-generic/vmlinux.lds.h | 3 +++
include/linux/percpu-defs.h | 12 ++++++++++
3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 71c17a5..1f6fec8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)

early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);

-static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) __aligned(64);
static int has_steal_clock = 0;

/*
@@ -303,7 +303,7 @@ static void kvm_register_steal_time(void)
cpu, (unsigned long long) slow_virt_to_phys(st));
}

-static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
+static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;

static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
{
@@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
}

+/* NOTE: function is marked as __ref because it is used by __init functions */
+static int __ref kvm_map_hv_shared_decrypted(void)
+{
+ static int once, ret;
+ int cpu;
+
+ if (once)
+ return ret;
+
+ /*
+ * Iterate through all possible CPU's and clear the C-bit from
+ * percpu variables.
+ */
+ for_each_possible_cpu(cpu) {
+ struct kvm_vcpu_pv_apf_data *apf;
+ unsigned long pa;
+
+ apf = &per_cpu(apf_reason, cpu);
+ pa = slow_virt_to_phys(apf);
+ sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE);
+ ret = early_set_memory_decrypted(pa, PAGE_SIZE);
+ if (ret)
+ break;
+ }
+
+ once = 1;
+ return ret;
+}
+
static void kvm_guest_cpu_init(void)
{
if (!kvm_para_available())
return;

+ /*
+ * When SEV is active, map the shared percpu as unencrypted so that
+ * both guest and hypervsior can access the data.
+ */
+ if (sev_active()) {
+ if (kvm_map_hv_shared_decrypted()) {
+ printk(KERN_ERR "Failed to map percpu as unencrypted\n");
+ return;
+ }
+ }
+
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index da0be9a..52854cf 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -783,6 +783,9 @@
. = ALIGN(cacheline); \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
+ . = ALIGN(PAGE_SIZE); \
+ *(.data..percpu..hv_shared) \
+ . = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__per_cpu_end) = .;

/**
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index 8f16299..f74b0c3 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -173,6 +173,18 @@
DEFINE_PER_CPU_SECTION(type, name, "..read_mostly")

/*
+ * Declaration/definition used for per-CPU variables that must be shared
+ * between hypervisor and guest OS.
+ */
+#ifdef CONFIG_VIRTUALIZATION
+#define DECLARE_PER_CPU_HV_SHARED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, "..hv_shared")
+
+#define DEFINE_PER_CPU_HV_SHARED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, "..hv_shared")
+#endif
+
+/*
* Intermodule exports for per-CPU variables. sparse forgets about
* address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
* noop if __CHECKER__.
--
2.9.4

2017-07-24 19:15:17

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

Some KVM-specific custom MSRs shares the guest physical address with
hypervisor. When SEV is active, the shared physical address must be mapped
with encryption attribute cleared so that both hypervsior and guest can
access the data.

Add APIs to change memory encryption attribute in early boot code.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 17 ++++++
arch/x86/mm/mem_encrypt.c | 117 +++++++++++++++++++++++++++++++++++++
2 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9cb6472..30b539e 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -46,6 +46,11 @@ void __init sme_early_init(void);
void __init sme_encrypt_kernel(void);
void __init sme_enable(struct boot_params *bp);

+int __init early_set_memory_decrypted(resource_size_t paddr,
+ unsigned long size);
+int __init early_set_memory_encrypted(resource_size_t paddr,
+ unsigned long size);
+
/* Architecture __weak replacement functions */
void __init mem_encrypt_init(void);

@@ -69,6 +74,18 @@ static inline void __init sme_early_init(void) { }
static inline void __init sme_encrypt_kernel(void) { }
static inline void __init sme_enable(struct boot_params *bp) { }

+static inline int __init early_set_memory_decrypted(resource_size_t paddr,
+ unsigned long size)
+{
+ return 0;
+}
+
+static inline int __init early_set_memory_encrypted(resource_size_t paddr,
+ unsigned long size)
+{
+ return 0;
+}
+
#endif /* CONFIG_AMD_MEM_ENCRYPT */

/*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ed8780e..d174b1c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,6 +28,8 @@
#include <asm/msr.h>
#include <asm/cmdline.h>

+#include "mm_internal.h"
+
static char sme_cmdline_arg[] __initdata = "mem_encrypt";
static char sme_cmdline_on[] __initdata = "on";
static char sme_cmdline_off[] __initdata = "off";
@@ -257,6 +259,121 @@ static void sme_free(struct device *dev, size_t size, void *vaddr,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
}

+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+ pgprot_t old_prot, new_prot;
+ unsigned long pfn;
+ pte_t new_pte;
+
+ switch (level) {
+ case PG_LEVEL_4K:
+ pfn = pte_pfn(*kpte);
+ old_prot = pte_pgprot(*kpte);
+ break;
+ case PG_LEVEL_2M:
+ pfn = pmd_pfn(*(pmd_t *)kpte);
+ old_prot = pmd_pgprot(*(pmd_t *)kpte);
+ break;
+ case PG_LEVEL_1G:
+ pfn = pud_pfn(*(pud_t *)kpte);
+ old_prot = pud_pgprot(*(pud_t *)kpte);
+ break;
+ default:
+ return;
+ }
+
+ new_prot = old_prot;
+ if (enc)
+ pgprot_val(new_prot) |= _PAGE_ENC;
+ else
+ pgprot_val(new_prot) &= ~_PAGE_ENC;
+
+ /* if prot is same then do nothing */
+ if (pgprot_val(old_prot) == pgprot_val(new_prot))
+ return;
+
+ new_pte = pfn_pte(pfn, new_prot);
+ set_pte_atomic(kpte, new_pte);
+}
+
+static int __init early_set_memory_enc_dec(resource_size_t paddr,
+ unsigned long size, bool enc)
+{
+ unsigned long vaddr, vaddr_end, vaddr_next;
+ unsigned long psize, pmask;
+ int split_page_size_mask;
+ pte_t *kpte;
+ int level;
+
+ vaddr = (unsigned long)__va(paddr);
+ vaddr_next = vaddr;
+ vaddr_end = vaddr + size;
+
+ /*
+ * We are going to change the physical page attribute from C=1 to C=0
+ * or vice versa. Flush the caches to ensure that data is written into
+ * memory with correct C-bit before we change attribute.
+ */
+ clflush_cache_range(__va(paddr), size);
+
+ for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+ kpte = lookup_address(vaddr, &level);
+ if (!kpte || pte_none(*kpte))
+ return 1;
+
+ if (level == PG_LEVEL_4K) {
+ __set_clr_pte_enc(kpte, level, enc);
+ vaddr_next = (vaddr & PAGE_MASK) + PAGE_SIZE;
+ continue;
+ }
+
+ psize = page_level_size(level);
+ pmask = page_level_mask(level);
+
+ /*
+ * Check, whether we can change the large page in one go.
+ * We request a split, when the address is not aligned and
+ * the number of pages to set/clear encryption bit is smaller
+ * than the number of pages in the large page.
+ */
+ if (vaddr == (vaddr & pmask) &&
+ ((vaddr_end - vaddr) >= psize)) {
+ __set_clr_pte_enc(kpte, level, enc);
+ vaddr_next = (vaddr & pmask) + psize;
+ continue;
+ }
+
+ /*
+ * virtual address is part of large page, create the page table
+ * mapping to use smaller pages (4K or 2M). If virtual address
+ * is part of 2M page the we request spliting the large page
+ * into 4K, similarly 1GB large page is requested to split into
+ * 2M pages.
+ */
+ if (level == PG_LEVEL_2M)
+ split_page_size_mask = 0;
+ else
+ split_page_size_mask = 1 << PG_LEVEL_2M;
+
+ kernel_physical_mapping_init(__pa(vaddr & pmask),
+ __pa((vaddr_end & pmask) + psize),
+ split_page_size_mask);
+ }
+
+ __flush_tlb_all();
+ return 0;
+}
+
+int __init early_set_memory_decrypted(resource_size_t paddr, unsigned long size)
+{
+ return early_set_memory_enc_dec(paddr, size, false);
+}
+
+int __init early_set_memory_encrypted(resource_size_t paddr, unsigned long size)
+{
+ return early_set_memory_enc_dec(paddr, size, true);
+}
+
static const struct dma_map_ops sme_dma_ops = {
.alloc = sme_alloc,
.free = sme_free,
--
2.9.4

2017-07-24 19:15:31

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

From: Tom Lendacky <[email protected]>

Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.

Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then check a non-interceptable SEV MSR
(0xc0010131, bit 0).

This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory.

After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so. This allows
us to distinguish between SME and SEV, each of which have unique
differences in how certain things are handled: e.g. DMA (always bounce
buffered with SEV) or EFI tables (always access decrypted with SME).

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +
arch/x86/boot/compressed/head_64.S | 16 +++++
arch/x86/boot/compressed/mem_encrypt.S | 103 +++++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 2 +
arch/x86/boot/compressed/pagetable.c | 8 ++-
arch/x86/include/asm/mem_encrypt.h | 3 +
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/include/uapi/asm/kvm_para.h | 1 -
arch/x86/mm/mem_encrypt.c | 42 +++++++++++---
9 files changed, 169 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/boot/compressed/mem_encrypt.S

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 2c860ad..d2fe901 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
$(obj)/piggy.o $(obj)/cpuflags.o

+vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o
+
vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
ifdef CONFIG_X86_64
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fbf4c32..6179d43 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -130,6 +130,19 @@ ENTRY(startup_32)
/*
* Build early 4G boot pagetable
*/
+ /*
+ * If SEV is active then set the encryption mask in the page tables.
+ * This will insure that when the kernel is copied and decompressed
+ * it will be done so encrypted.
+ */
+ call get_sev_encryption_bit
+ xorl %edx, %edx
+ testl %eax, %eax
+ jz 1f
+ subl $32, %eax /* Encryption bit is always above bit 31 */
+ bts %eax, %edx /* Set encryption mask for page tables */
+1:
+
/* Initialize Page tables to 0 */
leal pgtable(%ebx), %edi
xorl %eax, %eax
@@ -140,12 +153,14 @@ ENTRY(startup_32)
leal pgtable + 0(%ebx), %edi
leal 0x1007 (%edi), %eax
movl %eax, 0(%edi)
+ addl %edx, 4(%edi)

/* Build Level 3 */
leal pgtable + 0x1000(%ebx), %edi
leal 0x1007(%edi), %eax
movl $4, %ecx
1: movl %eax, 0x00(%edi)
+ addl %edx, 0x04(%edi)
addl $0x00001000, %eax
addl $8, %edi
decl %ecx
@@ -156,6 +171,7 @@ ENTRY(startup_32)
movl $0x00000183, %eax
movl $2048, %ecx
1: movl %eax, 0(%edi)
+ addl %edx, 4(%edi)
addl $0x00200000, %eax
addl $8, %edi
decl %ecx
diff --git a/arch/x86/boot/compressed/mem_encrypt.S b/arch/x86/boot/compressed/mem_encrypt.S
new file mode 100644
index 0000000..696716e
--- /dev/null
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -0,0 +1,103 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/asm-offsets.h>
+
+ .text
+ .code32
+ENTRY(get_sev_encryption_bit)
+ xor %eax, %eax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ push %ebx
+ push %ecx
+ push %edx
+
+ /* Check if running under a hypervisor */
+ movl $1, %eax
+ cpuid
+ bt $31, %ecx /* Check the hypervisor bit */
+ jnc .Lno_sev
+
+ movl $0x80000000, %eax /* CPUID to check the highest leaf */
+ cpuid
+ cmpl $0x8000001f, %eax /* See if 0x8000001f is available */
+ jb .Lno_sev
+
+ /*
+ * Check for the SEV feature:
+ * CPUID Fn8000_001F[EAX] - Bit 1
+ * CPUID Fn8000_001F[EBX] - Bits 5:0
+ * Pagetable bit position used to indicate encryption
+ */
+ movl $0x8000001f, %eax
+ cpuid
+ bt $1, %eax /* Check if SEV is available */
+ jnc .Lno_sev
+
+ movl $MSR_F17H_SEV, %ecx /* Read the SEV MSR */
+ rdmsr
+ bt $MSR_F17H_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
+ jnc .Lno_sev
+
+ /*
+ * Get memory encryption information:
+ */
+ movl %ebx, %eax
+ andl $0x3f, %eax /* Return the encryption bit location */
+ jmp .Lsev_exit
+
+.Lno_sev:
+ xor %eax, %eax
+
+.Lsev_exit:
+ pop %edx
+ pop %ecx
+ pop %ebx
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+ ret
+ENDPROC(get_sev_encryption_bit)
+
+ .code64
+ENTRY(get_sev_encryption_mask)
+ xor %rax, %rax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ push %rbp
+ push %rdx
+
+ movq %rsp, %rbp /* Save current stack pointer */
+
+ call get_sev_encryption_bit /* Get the encryption bit position */
+ testl %eax, %eax
+ jz .Lno_sev_mask
+
+ xor %rdx, %rdx
+ bts %rax, %rdx /* Create the encryption mask */
+ mov %rdx, %rax /* ... and return it */
+
+.Lno_sev_mask:
+ movq %rbp, %rsp /* Restore original stack pointer */
+
+ pop %rdx
+ pop %rbp
+
+#endif
+
+ ret
+ENDPROC(get_sev_encryption_mask)
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 766a521..38c5f0e 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -108,4 +108,6 @@ static inline void console_init(void)
{ }
#endif

+unsigned long get_sev_encryption_mask(void);
+
#endif
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index f1aa438..a577329 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -76,16 +76,18 @@ static unsigned long top_level_pgt;
* Mapping information structure passed to kernel_ident_mapping_init().
* Due to relocation, pointers must be assigned at run time not build time.
*/
-static struct x86_mapping_info mapping_info = {
- .page_flag = __PAGE_KERNEL_LARGE_EXEC,
-};
+static struct x86_mapping_info mapping_info;

/* Locates and clears a region for a new top level page table. */
void initialize_identity_maps(void)
{
+ unsigned long sev_me_mask = get_sev_encryption_mask();
+
/* Init mapping_info with run-time function/buffer pointers. */
mapping_info.alloc_pgt_page = alloc_pgt_page;
mapping_info.context = &pgt_data;
+ mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sev_me_mask;
+ mapping_info.kernpg_flag = _KERNPG_TABLE | sev_me_mask;

/*
* It should be impossible for this not to already be true,
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9274ec7..9cb6472 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -19,6 +19,9 @@

#include <asm/bootparam.h>

+#define AMD_SME_FEATURE_BIT BIT(0)
+#define AMD_SEV_FEATURE_BIT BIT(1)
+
#ifdef CONFIG_AMD_MEM_ENCRYPT

extern unsigned long sme_me_mask;
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e399d68..530020f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -326,6 +326,9 @@

/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9
+#define MSR_F17H_SEV 0xc0010131
+#define MSR_F17H_SEV_ENABLED_BIT 0
+#define MSR_F17H_SEV_ENABLED BIT_ULL(MSR_F17H_SEV_ENABLED_BIT)

/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index a965e5b..c202ba3 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -109,5 +109,4 @@ struct kvm_vcpu_pv_apf_data {
#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
#define KVM_PV_EOI_DISABLED 0x0

-
#endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5e5d460..ed8780e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -288,7 +288,9 @@ void __init mem_encrypt_init(void)
if (sev_active())
dma_ops = &sme_dma_ops;

- pr_info("AMD Secure Memory Encryption (SME) active\n");
+ pr_info("AMD %s active\n",
+ sev_active() ? "Secure Encrypted Virtualization (SEV)"
+ : "Secure Memory Encryption (SME)");
}

void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
@@ -616,12 +618,23 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
{
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
unsigned int eax, ebx, ecx, edx;
+ unsigned long feature_mask;
bool active_by_default;
unsigned long me_mask;
char buffer[16];
u64 msr;

- /* Check for the SME support leaf */
+ /*
+ * Set the feature mask (SME or SEV) based on whether we are
+ * running under a hypervisor.
+ */
+ eax = 1;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ feature_mask = (ecx & BIT(31)) ? AMD_SEV_FEATURE_BIT
+ : AMD_SME_FEATURE_BIT;
+
+ /* Check for the SME/SEV support leaf */
eax = 0x80000000;
ecx = 0;
native_cpuid(&eax, &ebx, &ecx, &edx);
@@ -629,24 +642,39 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
return;

/*
- * Check for the SME feature:
+ * Check for the SME/SEV feature:
* CPUID Fn8000_001F[EAX] - Bit 0
* Secure Memory Encryption support
+ * CPUID Fn8000_001F[EAX] - Bit 1
+ * Secure Encrypted Virtualization support
* CPUID Fn8000_001F[EBX] - Bits 5:0
* Pagetable bit position used to indicate encryption
*/
eax = 0x8000001f;
ecx = 0;
native_cpuid(&eax, &ebx, &ecx, &edx);
- if (!(eax & 1))
+ if (!(eax & feature_mask))
return;

me_mask = 1UL << (ebx & 0x3f);

- /* Check if SME is enabled */
- msr = __rdmsr(MSR_K8_SYSCFG);
- if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ /* Check if memory encryption is enabled */
+ if (feature_mask == AMD_SME_FEATURE_BIT) {
+ /* For SME, check the SYSCFG MSR */
+ msr = __rdmsr(MSR_K8_SYSCFG);
+ if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ return;
+ } else {
+ /* For SEV, check the SEV MSR */
+ msr = __rdmsr(MSR_F17H_SEV);
+ if (!(msr & MSR_F17H_SEV_ENABLED))
+ return;
+
+ /* SEV state cannot be controlled by a command line option */
+ sme_me_mask = me_mask;
+ sev_enabled = 1;
return;
+ }

/*
* Fixups have not been applied to phys_base yet and we're running
--
2.9.4

2017-07-24 19:15:47

by Brijesh Singh

[permalink] [raw]
Subject: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

From: Tom Lendacky <[email protected]>

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/io.h | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) \
\
static inline void outs##bwl(int port, const void *addr, unsigned long count) \
{ \
- asm volatile("rep; outs" #bwl \
- : "+S"(addr), "+c"(count) : "d"(port)); \
+ if (sev_active()) { \
+ unsigned type *value = (unsigned type *)addr; \
+ while (count) { \
+ out##bwl(*value, port); \
+ value++; \
+ count--; \
+ } \
+ } else { \
+ asm volatile("rep; outs" #bwl \
+ : "+S"(addr), "+c"(count) : "d"(port)); \
+ } \
} \
\
static inline void ins##bwl(int port, void *addr, unsigned long count) \
{ \
- asm volatile("rep; ins" #bwl \
- : "+D"(addr), "+c"(count) : "d"(port)); \
+ if (sev_active()) { \
+ unsigned type *value = (unsigned type *)addr; \
+ while (count) { \
+ *value = in##bwl(port); \
+ value++; \
+ count--; \
+ } \
+ } else { \
+ asm volatile("rep; ins" #bwl \
+ : "+D"(addr), "+c"(count) : "d"(port)); \
+ } \
}

BUILDIO(b, b, char)
--
2.9.4

2017-07-25 05:46:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

On Mon, Jul 24, 2017 at 02:07:41PM -0500, Brijesh Singh wrote:

Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption
^^^^^^^^^^

Please introduce a spellchecker into your workflow.

> Update amd-memory-encryption document describing the AMD Secure Encrypted

"Update the AMD memory encryption document...

The patch has the proper URL already.

> Virtualization (SEV) feature.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/x86/amd-memory-encryption.txt | 29 ++++++++++++++++++++++++++---
> 1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
> index f512ab7..747df07 100644
> --- a/Documentation/x86/amd-memory-encryption.txt
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -1,4 +1,5 @@
> -Secure Memory Encryption (SME) is a feature found on AMD processors.
> +Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
> +features found on AMD processors.
>
> SME provides the ability to mark individual pages of memory as encrypted using
> the standard x86 page tables. A page that is marked encrypted will be
> @@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when written to
> DRAM. SME can therefore be used to protect the contents of DRAM from physical
> attacks on the system.
>
> +SEV enables running encrypted virtual machine (VMs) in which the code and data

machines

> +of the virtual machine are secured so that decrypted version is available only

... of the guest VM ... ... so that a decrypted ...

> +within the VM itself. SEV guest VMs have concept of private and shared memory.

have *the* concept - you need to use
definite and indefinite articles in your
text.

> +Private memory is encrypted with the guest-specific key, while shared memory
> +may be encrypted with hypervisor key.

And here you explain that the hypervisor key is the same key which we
use in SME. So that people can make the connection.

> +
> A page is encrypted when a page table entry has the encryption bit set (see
> below on how to determine its position). The encryption bit can also be
> specified in the cr3 register, allowing the PGD table to be encrypted. Each
> @@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption bit in the PGD entry
> for a PUD which results in the PUD pointed to by that entry to not be
> encrypted.
>
> -Support for SME can be determined through the CPUID instruction. The CPUID
> -function 0x8000001f reports information related to SME:
> +When SEV is enabled, certain type of memory (namely insruction pages and guest

When SEV is enabled, instruction pages and guest page tables are ...

> +page tables) are always treated as private. Due to security reasons all DMA

security reasons??

> +operations inside the guest must be performed on shared memory. Since the
> +memory encryption bit is only controllable by the guest OS when it is operating

... is controlled ...

> +in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory

... forces the memory ...

> +encryption bit to 1.
> +
> +Support for SME and SEV can be determined through the CPUID instruction. The
> +CPUID function 0x8000001f reports information related to SME:
>
> 0x8000001f[eax]:
> Bit[0] indicates support for SME
> + 0x800001f[eax]:

There's a 0 missing and you don't really need it as it is already above.

> + Bit[1] indicates support for SEV
> 0x8000001f[ebx]:
> Bits[5:0] pagetable bit number used to activate memory
> encryption
> @@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory encryption:
> Bit[23] 0 = memory encryption features are disabled
> 1 = memory encryption features are enabled
>
> +If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if

If this MSR is going to be part of the architecture - and I really think
it is - then call it MSR_AMD64_SEV.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-25 09:51:36

by David Laight

[permalink] [raw]
Subject: RE: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

From: Brijesh Singh
> Sent: 24 July 2017 20:08
> From: Tom Lendacky <[email protected]>
>
> Secure Encrypted Virtualization (SEV) does not support string I/O, so
> unroll the string I/O operation into a loop operating on one element at
> a time.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/io.h | 26 ++++++++++++++++++++++----
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index e080a39..2f3c002 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) \
> \
> static inline void outs##bwl(int port, const void *addr, unsigned long count) \
> {

Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling a real
function will be significant.
The code bloat reduction will be significant.

David

2017-07-25 10:27:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On Mon, Jul 24, 2017 at 02:07:42PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> Update the CPU features to include identifying and reporting on the
> Secure Encrypted Virtualization (SEV) feature. SME is identified by
> CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
> MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature
> as available if reported by CPUID and enabled by BIOS.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/cpu/amd.c | 30 +++++++++++++++++++++++++-----
> arch/x86/kernel/cpu/scattered.c | 1 +
> 4 files changed, 29 insertions(+), 5 deletions(-)

...

> @@ -637,6 +642,21 @@ static void early_init_amd(struct cpuinfo_x86 *c)
> clear_cpu_cap(c, X86_FEATURE_SME);
> }
> }
> +
> + if (cpu_has(c, X86_FEATURE_SEV)) {
> + if (IS_ENABLED(CONFIG_X86_32)) {
> + clear_cpu_cap(c, X86_FEATURE_SEV);
> + } else {
> + u64 syscfg, hwcr;
> +
> + /* Check if SEV is enabled */
> + rdmsrl(MSR_K8_SYSCFG, syscfg);
> + rdmsrl(MSR_K7_HWCR, hwcr);
> + if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
> + !(hwcr & MSR_K7_HWCR_SMMLOCK))
> + clear_cpu_cap(c, X86_FEATURE_SEV);
> + }
> + }

Let's simplify this and read the MSRs only once. Diff ontop. Please
check if I'm missing a case:

---
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c413f04bdd41..79af07731ab1 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -546,6 +546,48 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
}
}

+static void early_detect_mem_enc(struct cpuinfo_x86 *c)
+{
+ u64 syscfg, hwcr;
+
+ /*
+ * BIOS support is required for SME and SEV.
+ * For SME: If BIOS has enabled SME then adjust x86_phys_bits by
+ * the SME physical address space reduction value.
+ * If BIOS has not enabled SME then don't advertise the
+ * SME feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise the
+ * SEV feature (set in scattered.c).
+ *
+ * In all cases, since support for SME and SEV requires long mode,
+ * don't advertise the feature under CONFIG_X86_32.
+ */
+ if (cpu_has(c, X86_FEATURE_SME) ||
+ cpu_has(c, X86_FEATURE_SEV)) {
+
+ if (IS_ENABLED(CONFIG_X86_32))
+ goto clear;
+
+ /* Check if SME is enabled */
+ rdmsrl(MSR_K8_SYSCFG, syscfg);
+ if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ goto clear;
+
+ c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
+
+ /* Check if SEV is enabled */
+ rdmsrl(MSR_K7_HWCR, hwcr);
+ if (!(hwcr & MSR_K7_HWCR_SMMLOCK))
+ goto clear_sev;
+
+ return;
+clear:
+ clear_cpu_cap(c, X86_FEATURE_SME);
+clear_sev:
+ clear_cpu_cap(c, X86_FEATURE_SEV);
+ }
+}
+
static void early_init_amd(struct cpuinfo_x86 *c)
{
u32 dummy;
@@ -617,46 +659,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);

- /*
- * BIOS support is required for SME and SEV.
- * For SME: If BIOS has enabled SME then adjust x86_phys_bits by
- * the SME physical address space reduction value.
- * If BIOS has not enabled SME then don't advertise the
- * SME feature (set in scattered.c).
- * For SEV: If BIOS has not enabled SEV then don't advertise the
- * SEV feature (set in scattered.c).
- *
- * In all cases, since support for SME and SEV requires long mode,
- * don't advertise the feature under CONFIG_X86_32.
- */
- if (cpu_has(c, X86_FEATURE_SME)) {
- u64 msr;
-
- /* Check if SME is enabled */
- rdmsrl(MSR_K8_SYSCFG, msr);
- if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT) {
- c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
- if (IS_ENABLED(CONFIG_X86_32))
- clear_cpu_cap(c, X86_FEATURE_SME);
- } else {
- clear_cpu_cap(c, X86_FEATURE_SME);
- }
- }
+ early_detect_mem_enc(c);

- if (cpu_has(c, X86_FEATURE_SEV)) {
- if (IS_ENABLED(CONFIG_X86_32)) {
- clear_cpu_cap(c, X86_FEATURE_SEV);
- } else {
- u64 syscfg, hwcr;
-
- /* Check if SEV is enabled */
- rdmsrl(MSR_K8_SYSCFG, syscfg);
- rdmsrl(MSR_K7_HWCR, hwcr);
- if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
- !(hwcr & MSR_K7_HWCR_SMMLOCK))
- clear_cpu_cap(c, X86_FEATURE_SEV);
- }
- }
}

static void init_amd_k8(struct cpuinfo_x86 *c)

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-25 14:29:59

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On 7/25/2017 5:26 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:42PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Update the CPU features to include identifying and reporting on the
>> Secure Encrypted Virtualization (SEV) feature. SME is identified by
>> CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
>> MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature
>> as available if reported by CPUID and enabled by BIOS.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/include/asm/msr-index.h | 2 ++
>> arch/x86/kernel/cpu/amd.c | 30 +++++++++++++++++++++++++-----
>> arch/x86/kernel/cpu/scattered.c | 1 +
>> 4 files changed, 29 insertions(+), 5 deletions(-)
>
> ...
>
>> @@ -637,6 +642,21 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>> clear_cpu_cap(c, X86_FEATURE_SME);
>> }
>> }
>> +
>> + if (cpu_has(c, X86_FEATURE_SEV)) {
>> + if (IS_ENABLED(CONFIG_X86_32)) {
>> + clear_cpu_cap(c, X86_FEATURE_SEV);
>> + } else {
>> + u64 syscfg, hwcr;
>> +
>> + /* Check if SEV is enabled */
>> + rdmsrl(MSR_K8_SYSCFG, syscfg);
>> + rdmsrl(MSR_K7_HWCR, hwcr);
>> + if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
>> + !(hwcr & MSR_K7_HWCR_SMMLOCK))
>> + clear_cpu_cap(c, X86_FEATURE_SEV);
>> + }
>> + }
>
> Let's simplify this and read the MSRs only once. Diff ontop. Please
> check if I'm missing a case:

Yup, we can do something like that. I believe the only change that
would be needed to your patch would be to move the IS_ENABLED() check
to after the physical address space reduction check.

Thanks,
Tom

>
> ---
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index c413f04bdd41..79af07731ab1 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -546,6 +546,48 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
> }
> }
>
> +static void early_detect_mem_enc(struct cpuinfo_x86 *c)
> +{
> + u64 syscfg, hwcr;
> +
> + /*
> + * BIOS support is required for SME and SEV.
> + * For SME: If BIOS has enabled SME then adjust x86_phys_bits by
> + * the SME physical address space reduction value.
> + * If BIOS has not enabled SME then don't advertise the
> + * SME feature (set in scattered.c).
> + * For SEV: If BIOS has not enabled SEV then don't advertise the
> + * SEV feature (set in scattered.c).
> + *
> + * In all cases, since support for SME and SEV requires long mode,
> + * don't advertise the feature under CONFIG_X86_32.
> + */
> + if (cpu_has(c, X86_FEATURE_SME) ||
> + cpu_has(c, X86_FEATURE_SEV)) {
> +
> + if (IS_ENABLED(CONFIG_X86_32))
> + goto clear;
> +
> + /* Check if SME is enabled */
> + rdmsrl(MSR_K8_SYSCFG, syscfg);
> + if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + goto clear;
> +
> + c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
> +
> + /* Check if SEV is enabled */
> + rdmsrl(MSR_K7_HWCR, hwcr);
> + if (!(hwcr & MSR_K7_HWCR_SMMLOCK))
> + goto clear_sev;
> +
> + return;
> +clear:
> + clear_cpu_cap(c, X86_FEATURE_SME);
> +clear_sev:
> + clear_cpu_cap(c, X86_FEATURE_SEV);
> + }
> +}
> +
> static void early_init_amd(struct cpuinfo_x86 *c)
> {
> u32 dummy;
> @@ -617,46 +659,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
> if (cpu_has_amd_erratum(c, amd_erratum_400))
> set_cpu_bug(c, X86_BUG_AMD_E400);
>
> - /*
> - * BIOS support is required for SME and SEV.
> - * For SME: If BIOS has enabled SME then adjust x86_phys_bits by
> - * the SME physical address space reduction value.
> - * If BIOS has not enabled SME then don't advertise the
> - * SME feature (set in scattered.c).
> - * For SEV: If BIOS has not enabled SEV then don't advertise the
> - * SEV feature (set in scattered.c).
> - *
> - * In all cases, since support for SME and SEV requires long mode,
> - * don't advertise the feature under CONFIG_X86_32.
> - */
> - if (cpu_has(c, X86_FEATURE_SME)) {
> - u64 msr;
> -
> - /* Check if SME is enabled */
> - rdmsrl(MSR_K8_SYSCFG, msr);
> - if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT) {
> - c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
> - if (IS_ENABLED(CONFIG_X86_32))
> - clear_cpu_cap(c, X86_FEATURE_SME);
> - } else {
> - clear_cpu_cap(c, X86_FEATURE_SME);
> - }
> - }
> + early_detect_mem_enc(c);
>
> - if (cpu_has(c, X86_FEATURE_SEV)) {
> - if (IS_ENABLED(CONFIG_X86_32)) {
> - clear_cpu_cap(c, X86_FEATURE_SEV);
> - } else {
> - u64 syscfg, hwcr;
> -
> - /* Check if SEV is enabled */
> - rdmsrl(MSR_K8_SYSCFG, syscfg);
> - rdmsrl(MSR_K7_HWCR, hwcr);
> - if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
> - !(hwcr & MSR_K7_HWCR_SMMLOCK))
> - clear_cpu_cap(c, X86_FEATURE_SEV);
> - }
> - }
> }
>
> static void init_amd_k8(struct cpuinfo_x86 *c)
>

2017-07-25 14:37:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On Tue, Jul 25, 2017 at 09:29:40AM -0500, Tom Lendacky wrote:
> Yup, we can do something like that. I believe the only change that
> would be needed to your patch would be to move the IS_ENABLED() check
> to after the physical address space reduction check.

Yeah, I wasn't sure about that. The logic is that if BIOS has enabled
SME and thus reduction is in place, we need to update x86_phys_bits on
32-bit regardless, right?

But, come to think of it, that reduction won't have any effect since we
have 32-bit addresses and the reduction is above 32-bits, right? And
thus it is moot.

Or?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-25 14:59:12

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On 7/25/2017 9:36 AM, Borislav Petkov wrote:
> On Tue, Jul 25, 2017 at 09:29:40AM -0500, Tom Lendacky wrote:
>> Yup, we can do something like that. I believe the only change that
>> would be needed to your patch would be to move the IS_ENABLED() check
>> to after the physical address space reduction check.
>
> Yeah, I wasn't sure about that. The logic is that if BIOS has enabled
> SME and thus reduction is in place, we need to update x86_phys_bits on
> 32-bit regardless, right?
>
> But, come to think of it, that reduction won't have any effect since we
> have 32-bit addresses and the reduction is above 32-bits, right? And
> thus it is moot.


True, but it is more about being accurate and making sure the value is
correct where ever it may be used.

Thanks,
Tom

>
> Or?
>

2017-07-25 14:59:39

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption



On 07/25/2017 12:45 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:41PM -0500, Brijesh Singh wrote:
>
> Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption
> ^^^^^^^^^^
>
> Please introduce a spellchecker into your workflow.
>
>> Update amd-memory-encryption document describing the AMD Secure Encrypted
>
> "Update the AMD memory encryption document...
>
> The patch has the proper URL already.
>
>> Virtualization (SEV) feature.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> Documentation/x86/amd-memory-encryption.txt | 29 ++++++++++++++++++++++++++---
>> 1 file changed, 26 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
>> index f512ab7..747df07 100644
>> --- a/Documentation/x86/amd-memory-encryption.txt
>> +++ b/Documentation/x86/amd-memory-encryption.txt
>> @@ -1,4 +1,5 @@
>> -Secure Memory Encryption (SME) is a feature found on AMD processors.
>> +Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
>> +features found on AMD processors.
>>
>> SME provides the ability to mark individual pages of memory as encrypted using
>> the standard x86 page tables. A page that is marked encrypted will be
>> @@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when written to
>> DRAM. SME can therefore be used to protect the contents of DRAM from physical
>> attacks on the system.
>>
>> +SEV enables running encrypted virtual machine (VMs) in which the code and data
>
> machines
>
>> +of the virtual machine are secured so that decrypted version is available only
>
> ... of the guest VM ... ... so that a decrypted ...
>
>> +within the VM itself. SEV guest VMs have concept of private and shared memory.
>
> have *the* concept - you need to use
> definite and indefinite articles in your
> text.
>
>> +Private memory is encrypted with the guest-specific key, while shared memory
>> +may be encrypted with hypervisor key.
>
> And here you explain that the hypervisor key is the same key which we
> use in SME. So that people can make the connection.
>
>> +
>> A page is encrypted when a page table entry has the encryption bit set (see
>> below on how to determine its position). The encryption bit can also be
>> specified in the cr3 register, allowing the PGD table to be encrypted. Each
>> @@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption bit in the PGD entry
>> for a PUD which results in the PUD pointed to by that entry to not be
>> encrypted.
>>
>> -Support for SME can be determined through the CPUID instruction. The CPUID
>> -function 0x8000001f reports information related to SME:
>> +When SEV is enabled, certain type of memory (namely insruction pages and guest
>
> When SEV is enabled, instruction pages and guest page tables are ...
>
>> +page tables) are always treated as private. Due to security reasons all DMA
>
> security reasons??
>
>> +operations inside the guest must be performed on shared memory. Since the
>> +memory encryption bit is only controllable by the guest OS when it is operating
>
> ... is controlled ...
>
>> +in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory
>
> ... forces the memory ...
>
>> +encryption bit to 1.
>> +
>> +Support for SME and SEV can be determined through the CPUID instruction. The
>> +CPUID function 0x8000001f reports information related to SME:
>>
>> 0x8000001f[eax]:
>> Bit[0] indicates support for SME
>> + 0x800001f[eax]:
>
> There's a 0 missing and you don't really need it as it is already above.
>
>> + Bit[1] indicates support for SEV
>> 0x8000001f[ebx]:
>> Bits[5:0] pagetable bit number used to activate memory
>> encryption
>> @@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory encryption:
>> Bit[23] 0 = memory encryption features are disabled
>> 1 = memory encryption features are enabled
>>
>> +If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if
>
> If this MSR is going to be part of the architecture - and I really think
> it is - then call it MSR_AMD64_SEV.
>

Thanks Boris, I'll update the doc per your feedbacks. And will rename the MSR to
MSR_AMD64_SEV.

-Brijesh

2017-07-25 15:13:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On Tue, Jul 25, 2017 at 09:58:54AM -0500, Tom Lendacky wrote:
> True, but it is more about being accurate and making sure the value is
> correct where ever it may be used.

So early_identify_cpu() initializes phys_bits to 32 on 32-bit.
Subtracting it there would actually make actively it wrong, AFAICT.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-25 15:29:52

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On 7/25/2017 10:13 AM, Borislav Petkov wrote:
> On Tue, Jul 25, 2017 at 09:58:54AM -0500, Tom Lendacky wrote:
>> True, but it is more about being accurate and making sure the value is
>> correct where ever it may be used.
>
> So early_identify_cpu() initializes phys_bits to 32 on 32-bit.
> Subtracting it there would actually make actively it wrong, AFAICT.

But early_identify_cpu() calls get_cpu_cap() which will check for cpuid
leaf 0x80000008 support and set x86_phys_bits. I'll try to build and
run a 32-bit kernel and see how this all flows.

Thanks,
Tom

>

2017-07-25 15:34:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On Tue, Jul 25, 2017 at 10:29:40AM -0500, Tom Lendacky wrote:
> But early_identify_cpu() calls get_cpu_cap() which will check for cpuid
> leaf 0x80000008 support and set x86_phys_bits.

Right, but it can't be less than 32, can it? And if it is more than 32
bits, then it probably doesn't really matter on 32-bit. Unless it is
less than 36 bits and you do PAE...

> I'll try to build and run a 32-bit kernel and see how this all flows.

Yeah, that would be good.

Thanks.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-26 04:29:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

On Mon, Jul 24, 2017 at 02:07:43PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> Provide support for Secure Encyrpted Virtualization (SEV). This initial

Your subject misses a verb and patch subjects should have an active verb
denoting what the patch does. The sentence above is a good example.

> support defines a flag that is used by the kernel to determine if it is
> running with SEV active.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 ++
> arch/x86/mm/mem_encrypt.c | 3 +++
> include/linux/mem_encrypt.h | 8 +++++++-
> 3 files changed, 12 insertions(+), 1 deletion(-)

...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 0fbd092..1e4643e 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -40,6 +40,9 @@ static char sme_cmdline_off[] __initdata = "off";
> unsigned long sme_me_mask __section(.data) = 0;
> EXPORT_SYMBOL_GPL(sme_me_mask);
>
> +unsigned int sev_enabled __section(.data) = 0;
> +EXPORT_SYMBOL_GPL(sev_enabled);

So sev_enabled is a pure bool used only in bool context, not like
sme_me_mask whose value is read too. Which means, you can make the
former static and query it only through accessor functions.

> /* Buffer used for early in-place encryption by BSP, no locking needed */
> static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
>
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 1255f09..ea0831a 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -22,12 +22,18 @@
> #else /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
>
> #define sme_me_mask 0UL
> +#define sev_enabled 0
>
> #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
>
> static inline bool sme_active(void)
> {
> - return !!sme_me_mask;
> + return (sme_me_mask && !sev_enabled);

You don't need the brackets. Below too.

> +}
> +
> +static inline bool sev_active(void)
> +{
> + return (sme_me_mask && sev_enabled);
> }

So this is confusing, TBH. SME and SEV are not mutually exclusive and
yet the logic here says so. Why?

I mean, in the hypervisor context, sme_active() is still true.

/me is confused.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-26 10:45:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

On Tue, Jul 25, 2017 at 11:51 AM, David Laight <[email protected]> wrote:
> From: Brijesh Singh
>> Sent: 24 July 2017 20:08
>> From: Tom Lendacky <[email protected]>
>>
>> Secure Encrypted Virtualization (SEV) does not support string I/O, so
>> unroll the string I/O operation into a loop operating on one element at
>> a time.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/io.h | 26 ++++++++++++++++++++++----
>> 1 file changed, 22 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index e080a39..2f3c002 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) \
>> \
>> static inline void outs##bwl(int port, const void *addr, unsigned long count) \
>> {

This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/

> Is it even worth leaving these as inline functions?
> Given the speed of IO cycles it is unlikely that the cost of calling a real
> function will be significant.
> The code bloat reduction will be significant.

I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is actually
still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
console.

Arnd

---
Full list for reference

$ git grep -wl '\(__ide_\|\)\(in\|out\)s\(b\|w\|l\)' drivers sound/
drivers/atm/horizon.c
drivers/cdrom/gdrom.c
drivers/gpu/drm/vmwgfx/vmwgfx_msg.h
drivers/ide/ide-io-std.c
drivers/isdn/hardware/avm/avmcard.h
drivers/isdn/hardware/eicon/divasmain.c
drivers/isdn/hardware/mISDN/avmfritz.c
drivers/isdn/hardware/mISDN/iohelper.h
drivers/isdn/hardware/mISDN/netjet.c
drivers/isdn/hardware/mISDN/w6692.c
drivers/isdn/hisax/asuscom.c
drivers/isdn/hisax/avm_a1.c
drivers/isdn/hisax/avm_a1p.c
drivers/isdn/hisax/avm_pci.c
drivers/isdn/hisax/diva.c
drivers/isdn/hisax/elsa.c
drivers/isdn/hisax/gazel.c
drivers/isdn/hisax/hisax_fcpcipnp.c
drivers/isdn/hisax/ix1_micro.c
drivers/isdn/hisax/mic.c
drivers/isdn/hisax/netjet.c
drivers/isdn/hisax/niccy.c
drivers/isdn/hisax/saphir.c
drivers/isdn/hisax/sedlbauer.c
drivers/isdn/hisax/sportster.c
drivers/isdn/hisax/teles3.c
drivers/isdn/hisax/w6692.c
drivers/isdn/hisax/w6692.h
drivers/media/rc/winbond-cir.c
drivers/mtd/spi-nor/aspeed-smc.c
drivers/net/appletalk/cops.c
drivers/net/arcnet/arcdevice.h
drivers/net/arcnet/com20020.c
drivers/net/ethernet/3com/3c509.c
drivers/net/ethernet/3com/3c515.c
drivers/net/ethernet/3com/3c574_cs.c
drivers/net/ethernet/3com/3c589_cs.c
drivers/net/ethernet/8390/axnet_cs.c
drivers/net/ethernet/8390/mcf8390.c
drivers/net/ethernet/8390/ne.c
drivers/net/ethernet/8390/ne2k-pci.c
drivers/net/ethernet/8390/pcnet_cs.c
drivers/net/ethernet/8390/smc-ultra.c
drivers/net/ethernet/amd/nmclan_cs.c
drivers/net/ethernet/fujitsu/fmvj18x_cs.c
drivers/net/ethernet/hp/hp100.c
drivers/net/ethernet/smsc/smc9194.c
drivers/net/ethernet/smsc/smc91c92_cs.c
drivers/net/ethernet/smsc/smc91x.h
drivers/net/ethernet/xircom/xirc2ps_cs.c
drivers/net/sb1000.c
drivers/net/wan/sbni.c
drivers/net/wireless/cisco/airo.c
drivers/net/wireless/intersil/hostap/hostap_cs.c
drivers/net/wireless/intersil/hostap/hostap_plx.c
drivers/net/wireless/marvell/libertas/if_cs.c
drivers/net/wireless/wl3501_cs.c
drivers/parport/parport_pc.c
drivers/pcmcia/m32r_cfc.c
drivers/scsi/NCR53c406a.c
drivers/scsi/aha152x.c
drivers/scsi/eata_pio.c
drivers/scsi/fdomain.c
drivers/scsi/g_NCR5380.c
drivers/scsi/imm.c
drivers/scsi/nsp32_io.h
drivers/scsi/pcmcia/nsp_io.h
drivers/scsi/pcmcia/sym53c500_cs.c
drivers/scsi/ppa.c
drivers/scsi/qlogicfas408.c
drivers/scsi/sym53c416.c
drivers/staging/comedi/drivers/adl_pci9111.c
drivers/staging/comedi/drivers/cb_pcidas.c
drivers/staging/comedi/drivers/das16m1.c
drivers/staging/comedi/drivers/das1800.c
drivers/staging/iio/adc/ad7606_par.c
drivers/tty/hvc/hvc_xen.c
drivers/tty/isicom.c
drivers/tty/rocket_int.h
drivers/usb/host/isp1362.h
drivers/usb/musb/blackfin.c
sound/drivers/opl4/opl4_lib.c
sound/isa/gus/gus_dram.c
sound/isa/gus/gus_pcm.c

2017-07-26 14:45:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 04/17] x86/mm: Don't attempt to encrypt initrd under SEV

On Mon, Jul 24, 2017 at 02:07:44PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> When SEV is active the initrd/initramfs will already have already been
> placed in memory encyrpted so do not try to encrypt it.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/setup.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0bfe0c1..01d56a1 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -379,9 +379,11 @@ static void __init reserve_initrd(void)
> * If SME is active, this memory will be marked encrypted by the
> * kernel when it is accessed (including relocation). However, the
> * ramdisk image was loaded decrypted by the bootloader, so make
> - * sure that it is encrypted before accessing it.
> + * sure that it is encrypted before accessing it. For SEV the
> + * ramdisk will already be encyrpted, so only do this for SME.
^^^^^^^^^
Please introduce a spellchecker into your patch creation workflow.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-26 16:03:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

Subject: x86/realmode: ...

On Mon, Jul 24, 2017 at 02:07:45PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> When SEV is active the trampoline area will need to be in encrypted
> memory so only mark the area decrypted if SME is active.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/realmode/init.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 1f71980..c7eeca7 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
> /*
> * If SME is active, the trampoline area will need to be in
> * decrypted memory in order to bring up other processors
> - * successfully.
> + * successfully. For SEV the trampoline area needs to be in
> + * encrypted memory, so only do this for SME.

Or simply say:

"It is not needed for SEV."

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-26 16:47:57

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

On 7/25/2017 11:28 PM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:43PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Provide support for Secure Encyrpted Virtualization (SEV). This initial
>
> Your subject misses a verb and patch subjects should have an active verb
> denoting what the patch does. The sentence above is a good example.

Yup, will update.

>
>> support defines a flag that is used by the kernel to determine if it is
>> running with SEV active.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 2 ++
>> arch/x86/mm/mem_encrypt.c | 3 +++
>> include/linux/mem_encrypt.h | 8 +++++++-
>> 3 files changed, 12 insertions(+), 1 deletion(-)
>
> ...
>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 0fbd092..1e4643e 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -40,6 +40,9 @@ static char sme_cmdline_off[] __initdata = "off";
>> unsigned long sme_me_mask __section(.data) = 0;
>> EXPORT_SYMBOL_GPL(sme_me_mask);
>>
>> +unsigned int sev_enabled __section(.data) = 0;
>> +EXPORT_SYMBOL_GPL(sev_enabled);
>
> So sev_enabled is a pure bool used only in bool context, not like
> sme_me_mask whose value is read too. Which means, you can make the
> former static and query it only through accessor functions.

If it's made static then the sme_active()/sev_active() inline functions
would need to be turned into functions within the mem_encrypt.c file. So
there's a trade-off to do that, which is the better one?

>
>> /* Buffer used for early in-place encryption by BSP, no locking needed */
>> static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
>>
>> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
>> index 1255f09..ea0831a 100644
>> --- a/include/linux/mem_encrypt.h
>> +++ b/include/linux/mem_encrypt.h
>> @@ -22,12 +22,18 @@
>> #else /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
>>
>> #define sme_me_mask 0UL
>> +#define sev_enabled 0
>>
>> #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
>>
>> static inline bool sme_active(void)
>> {
>> - return !!sme_me_mask;
>> + return (sme_me_mask && !sev_enabled);
>
> You don't need the brackets. Below too.

Ok.

>
>> +}
>> +
>> +static inline bool sev_active(void)
>> +{
>> + return (sme_me_mask && sev_enabled);
>> }
>
> So this is confusing, TBH. SME and SEV are not mutually exclusive and
> yet the logic here says so. Why?
>
> I mean, in the hypervisor context, sme_active() is still true.
>
> /me is confused.

The kernel needs to distinguish between running under SME and running
under SEV. SME and SEV are similar but not the same. The trampoline code
is a good example. Before paging is activated, SME will access all
memory as decrypted, but SEV will access all memory as encrypted. So
when APs are being brought up under SME the trampoline area cannot be
encrypted, whereas under SEV the trampoline area must be encrypted.

Thanks,
Tom

>

2017-07-26 19:25:01

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active


Hi Arnd and David,

On 07/26/2017 05:45 AM, Arnd Bergmann wrote:
> On Tue, Jul 25, 2017 at 11:51 AM, David Laight <[email protected]> wrote:
>> From: Brijesh Singh
>>> Sent: 24 July 2017 20:08
>>> From: Tom Lendacky <[email protected]>
>>>
>>> Secure Encrypted Virtualization (SEV) does not support string I/O, so
>>> unroll the string I/O operation into a loop operating on one element at
>>> a time.
>>>
>>> Signed-off-by: Tom Lendacky <[email protected]>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> ---
>>> arch/x86/include/asm/io.h | 26 ++++++++++++++++++++++----
>>> 1 file changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>>> index e080a39..2f3c002 100644
>>> --- a/arch/x86/include/asm/io.h
>>> +++ b/arch/x86/include/asm/io.h
>>> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) \
>>> \
>>> static inline void outs##bwl(int port, const void *addr, unsigned long count) \
>>> {
>
> This will clash with a fix I did to add a "memory" clobber
> for the traditional implementation, see
> https://patchwork.kernel.org/patch/9854573/
>
>> Is it even worth leaving these as inline functions?
>> Given the speed of IO cycles it is unlikely that the cost of calling a real
>> function will be significant.
>> The code bloat reduction will be significant.
>
> I think the smallest code would be the original "rep insb" etc, which
> should be smaller than a function call, unlike the loop. Then again,
> there is a rather small number of affected device drivers, almost all
> of them for ancient hardware that you won't even build in a 64-bit
> x86 kernel, see the list below. The only user I found that is actually
> still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
> console.


There are some indirect user of string I/O functions. The following functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this approach,
I have added a new file arch/x86/kernel/io.c which provides non rep version of
string I/O routines. The file gets built and used only when AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make a function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port) \
unsigned type value = in##bwl(port); \
slow_down_io(); \
return value; \
-} \
- \
+}
+
+#define BUILDIO_REP(bwl, bw, type) \
static inline void outs##bwl(int port, const void *addr, unsigned long count) \
{ \
asm volatile("rep; outs" #bwl \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr, unsigned long count) \
{ \
asm volatile("rep; ins" #bwl \
: "+D"(addr), "+c"(count) : "d"(port)); \
-}
+} \

BUILDIO(b, b, char)
BUILDIO(w, w, short)
BUILDIO(l, , int)

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+extern void outsb_try_rep(int port, const void *addr, unsigned long count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb outsb_try_rep
+#define insb insb_try_rep
+#define outsw outsw_try_rep
+#define insw insw_try_rep
+#define outsl outsl_try_rep
+#define insl insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
extern void *xlate_dev_mem_ptr(phys_addr_t phys);
extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace

obj-y := process_$(BITS).o signal.o
obj-$(CONFIG_COMPAT) += signal_compat.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o dumpstack.o nmi.o
obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
new file mode 100644
index 0000000..f58afa9
--- /dev/null
+++ b/arch/x86/kernel/io.c
@@ -0,0 +1,87 @@
+#include <linux/types.h>
+#include <linux/io.h>
+#include <asm/io.h>
+
+void outsb_try_rep(int port, const void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned char *value = (unsigned char *)addr;
+ while (count) {
+ outb(*value, port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; outsb" : "+S"(addr), "+c"(count) : "d"(port));
+ }
+}
+
+void insb_try_rep(int port, void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned char *value = (unsigned char *)addr;
+ while (count) {
+ *value = inb(port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; insb" : "+D"(addr), "+c"(count) : "d"(port));
+ }
+}
+
+void outsw_try_rep(int port, const void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned short *value = (unsigned short *)addr;
+ while (count) {
+ outw(*value, port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; outsw" : "+S"(addr), "+c"(count) : "d"(port));
+ }
+}
+void insw_try_rep(int port, void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned short *value = (unsigned short *)addr;
+ while (count) {
+ *value = inw(port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; insw" : "+D"(addr), "+c"(count) : "d"(port));
+ }
+}
+
+void outsl_try_rep(int port, const void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned int *value = (unsigned int *)addr;
+ while (count) {
+ outl(*value, port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; outsl" : "+S"(addr), "+c"(count) : "d"(port));
+ }
+}
+
+void insl_try_rep(int port, void *addr, unsigned long count)
+{
+ if (sev_active()) {
+ unsigned int *value = (unsigned int *)addr;
+ while (count) {
+ *value = inl(port);
+ value++;
+ count--;
+ }
+ } else {
+ asm volatile("rep; insl" : "+D"(addr), "+c"(count) : "d"(port));
+ }
+}

2017-07-26 19:46:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

<[email protected]>,Eric Biederman <[email protected]>,Tejun Heo <[email protected]>,Paolo Bonzini <[email protected]>,Andrew Morton <[email protected]>,"Kirill A . Shutemov" <[email protected]>,Lu Baolu <[email protected]>
From: [email protected]
Message-ID: <[email protected]>

On July 26, 2017 9:24:45 PM GMT+02:00, Brijesh Singh <[email protected]> wrote:
>
>Hi Arnd and David,
>
>On 07/26/2017 05:45 AM, Arnd Bergmann wrote:
>> On Tue, Jul 25, 2017 at 11:51 AM, David Laight
><[email protected]> wrote:
>>> From: Brijesh Singh
>>>> Sent: 24 July 2017 20:08
>>>> From: Tom Lendacky <[email protected]>
>>>>
>>>> Secure Encrypted Virtualization (SEV) does not support string I/O,
>so
>>>> unroll the string I/O operation into a loop operating on one
>element at
>>>> a time.
>>>>
>>>> Signed-off-by: Tom Lendacky <[email protected]>
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/io.h | 26 ++++++++++++++++++++++----
>>>> 1 file changed, 22 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>>>> index e080a39..2f3c002 100644
>>>> --- a/arch/x86/include/asm/io.h
>>>> +++ b/arch/x86/include/asm/io.h
>>>> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int
>port) \
>>>>
> \
>>>> static inline void outs##bwl(int port, const void *addr, unsigned
>long count) \
>>>> {
>>
>> This will clash with a fix I did to add a "memory" clobber
>> for the traditional implementation, see
>> https://patchwork.kernel.org/patch/9854573/
>>
>>> Is it even worth leaving these as inline functions?
>>> Given the speed of IO cycles it is unlikely that the cost of calling
>a real
>>> function will be significant.
>>> The code bloat reduction will be significant.
>>
>> I think the smallest code would be the original "rep insb" etc, which
>> should be smaller than a function call, unlike the loop. Then again,
>> there is a rather small number of affected device drivers, almost all
>> of them for ancient hardware that you won't even build in a 64-bit
>> x86 kernel, see the list below. The only user I found that is
>actually
>> still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the
>early
>> console.
>
>
>There are some indirect user of string I/O functions. The following
>functions
>defined in lib/iomap.c calls rep version of ins and outs.
>
>- ioread8_rep, ioread16_rep, ioread32_rep
>- iowrite8_rep, iowrite16_rep, iowrite32_rep
>
>I found that several drivers use above functions.
>
>Here is one approach to convert it into non-inline functions. In this
>approach,
>I have added a new file arch/x86/kernel/io.c which provides non rep
>version of
>string I/O routines. The file gets built and used only when
>AMD_MEM_ENCRYPT is
>enabled. On positive side, if we don't build kernel with
>AMD_MEM_ENCRYPT support
>then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
>a function
>call. Inside the function we unroll only when SEV is active.
>
>Do you see any issue with this approach ? thanks
>
>diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>index e080a39..104927d 100644
>--- a/arch/x86/include/asm/io.h
>+++ b/arch/x86/include/asm/io.h
>@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)
> \
> unsigned type value = in##bwl(port); \
> slow_down_io(); \
> return value; \
>-}
>\
>-
>\
>+}
>+
>+#define BUILDIO_REP(bwl, bw, type)
>\
>static inline void outs##bwl(int port, const void *addr, unsigned long
>count) \
>{
>\
> asm volatile("rep; outs" #bwl \
>@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
>unsigned long count) \
>{
>\
> asm volatile("rep; ins" #bwl \
> : "+D"(addr), "+c"(count) : "d"(port)); \
>-}
>+}
>\
>
> BUILDIO(b, b, char)
> BUILDIO(w, w, short)
> BUILDIO(l, , int)
>
>+#ifdef CONFIG_AMD_MEM_ENCRYPT
>+extern void outsb_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insb_try_rep(int port, void *addr, unsigned long count);
>+extern void outsw_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insw_try_rep(int port, void *addr, unsigned long count);
>+extern void outsl_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insl_try_rep(int port, void *addr, unsigned long count);
>+#define outsb outsb_try_rep
>+#define insb insb_try_rep
>+#define outsw outsw_try_rep
>+#define insw insw_try_rep
>+#define outsl outsl_try_rep
>+#define insl insl_try_rep
>+#else
>+BUILDIO_REP(b, b, char)
>+BUILDIO_REP(w, w, short)
>+BUILDIO_REP(l, , int)
>+#endif
>+
> extern void *xlate_dev_mem_ptr(phys_addr_t phys);
> extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
>
>diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>index a01892b..3b6e2a3 100644
>--- a/arch/x86/kernel/Makefile
>+++ b/arch/x86/kernel/Makefile
>@@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
>
> obj-y := process_$(BITS).o signal.o
> obj-$(CONFIG_COMPAT) += signal_compat.o
>+obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
>obj-y += traps.o irq.o irq_$(BITS).o
>dumpstack_$(BITS).o
> obj-y += time.o ioport.o dumpstack.o nmi.o
> obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
>diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
>new file mode 100644
>index 0000000..f58afa9
>--- /dev/null
>+++ b/arch/x86/kernel/io.c
>@@ -0,0 +1,87 @@
>+#include <linux/types.h>
>+#include <linux/io.h>
>+#include <asm/io.h>
>+
>+void outsb_try_rep(int port, const void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned char *value = (unsigned char *)addr;
>+ while (count) {
>+ outb(*value, port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; outsb" : "+S"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}
>+
>+void insb_try_rep(int port, void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned char *value = (unsigned char *)addr;
>+ while (count) {
>+ *value = inb(port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; insb" : "+D"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}
>+
>+void outsw_try_rep(int port, const void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned short *value = (unsigned short *)addr;
>+ while (count) {
>+ outw(*value, port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; outsw" : "+S"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}
>+void insw_try_rep(int port, void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned short *value = (unsigned short *)addr;
>+ while (count) {
>+ *value = inw(port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; insw" : "+D"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}
>+
>+void outsl_try_rep(int port, const void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned int *value = (unsigned int *)addr;
>+ while (count) {
>+ outl(*value, port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; outsl" : "+S"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}
>+
>+void insl_try_rep(int port, void *addr, unsigned long count)
>+{
>+ if (sev_active()) {
>+ unsigned int *value = (unsigned int *)addr;
>+ while (count) {
>+ *value = inl(port);
>+ value++;
>+ count--;
>+ }
>+ } else {
>+ asm volatile("rep; insl" : "+D"(addr), "+c"(count) :
>"d"(port));
>+ }
>+}

What the heck?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2017-07-26 20:07:33

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active



On 07/26/2017 02:26 PM, H. Peter Anvin wrote:
>>>>>
>> \
>>>>> static inline void outs##bwl(int port, const void *addr, unsigned
>> long count) \
>>>>> {
>>>
>>> This will clash with a fix I did to add a "memory" clobber
>>> for the traditional implementation, see
>>> https://patchwork.kernel.org/patch/9854573/
>>>
>>>> Is it even worth leaving these as inline functions?
>>>> Given the speed of IO cycles it is unlikely that the cost of calling
>> a real
>>>> function will be significant.
>>>> The code bloat reduction will be significant.
>>>
>>> I think the smallest code would be the original "rep insb" etc, which
>>> should be smaller than a function call, unlike the loop. Then again,
>>> there is a rather small number of affected device drivers, almost all
>>> of them for ancient hardware that you won't even build in a 64-bit
>>> x86 kernel, see the list below. The only user I found that is
>> actually
>>> still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the
>> early
>>> console.
>>
>>
>> There are some indirect user of string I/O functions. The following
>> functions
>> defined in lib/iomap.c calls rep version of ins and outs.
>>
>> - ioread8_rep, ioread16_rep, ioread32_rep
>> - iowrite8_rep, iowrite16_rep, iowrite32_rep
>>
>> I found that several drivers use above functions.
>>
>> Here is one approach to convert it into non-inline functions. In this
>> approach,
>> I have added a new file arch/x86/kernel/io.c which provides non rep
>> version of
>> string I/O routines. The file gets built and used only when
>> AMD_MEM_ENCRYPT is
>> enabled. On positive side, if we don't build kernel with
>> AMD_MEM_ENCRYPT support
>> then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
>> a function
>> call. Inside the function we unroll only when SEV is active.
>>
>> Do you see any issue with this approach ? thanks
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index e080a39..104927d 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)
>> \
>> unsigned type value = in##bwl(port); \
>> slow_down_io(); \
>> return value; \
>> -}
>> \
>> -
>> \
>> +}
>> +
>> +#define BUILDIO_REP(bwl, bw, type)
>> \
>> static inline void outs##bwl(int port, const void *addr, unsigned long
>> count) \
>> {
>> \
>> asm volatile("rep; outs" #bwl \
>> @@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
>> unsigned long count) \
>> {
>> \
>> asm volatile("rep; ins" #bwl \
>> : "+D"(addr), "+c"(count) : "d"(port)); \
>> -}
>> +}
>> \
>>
>> BUILDIO(b, b, char)
>> BUILDIO(w, w, short)
>> BUILDIO(l, , int)
>>
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +extern void outsb_try_rep(int port, const void *addr, unsigned long
>> count);
>> +extern void insb_try_rep(int port, void *addr, unsigned long count);
>> +extern void outsw_try_rep(int port, const void *addr, unsigned long
>> count);
>> +extern void insw_try_rep(int port, void *addr, unsigned long count);
>> +extern void outsl_try_rep(int port, const void *addr, unsigned long
>> count);
>> +extern void insl_try_rep(int port, void *addr, unsigned long count);
>> +#define outsb outsb_try_rep
>> +#define insb insb_try_rep
>> +#define outsw outsw_try_rep
>> +#define insw insw_try_rep
>> +#define outsl outsl_try_rep
>> +#define insl insl_try_rep
>> +#else
>> +BUILDIO_REP(b, b, char)
>> +BUILDIO_REP(w, w, short)
>> +BUILDIO_REP(l, , int)
>> +#endif
>> +
>> extern void *xlate_dev_mem_ptr(phys_addr_t phys);
>> extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
>>
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index a01892b..3b6e2a3 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
>>
>> obj-y := process_$(BITS).o signal.o
>> obj-$(CONFIG_COMPAT) += signal_compat.o
>> +obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
>> obj-y += traps.o irq.o irq_$(BITS).o
>> dumpstack_$(BITS).o
>> obj-y += time.o ioport.o dumpstack.o nmi.o
>> obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
>> diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
>> new file mode 100644
>> index 0000000..f58afa9
>> --- /dev/null
>> +++ b/arch/x86/kernel/io.c
>> @@ -0,0 +1,87 @@
>> +#include <linux/types.h>
>> +#include <linux/io.h>
>> +#include <asm/io.h>
>> +
>> +void outsb_try_rep(int port, const void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned char *value = (unsigned char *)addr;
>> + while (count) {
>> + outb(*value, port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; outsb" : "+S"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>> +
>> +void insb_try_rep(int port, void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned char *value = (unsigned char *)addr;
>> + while (count) {
>> + *value = inb(port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; insb" : "+D"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>> +
>> +void outsw_try_rep(int port, const void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned short *value = (unsigned short *)addr;
>> + while (count) {
>> + outw(*value, port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; outsw" : "+S"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>> +void insw_try_rep(int port, void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned short *value = (unsigned short *)addr;
>> + while (count) {
>> + *value = inw(port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; insw" : "+D"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>> +
>> +void outsl_try_rep(int port, const void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned int *value = (unsigned int *)addr;
>> + while (count) {
>> + outl(*value, port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; outsl" : "+S"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>> +
>> +void insl_try_rep(int port, void *addr, unsigned long count)
>> +{
>> + if (sev_active()) {
>> + unsigned int *value = (unsigned int *)addr;
>> + while (count) {
>> + *value = inl(port);
>> + value++;
>> + count--;
>> + }
>> + } else {
>> + asm volatile("rep; insl" : "+D"(addr), "+c"(count) :
>> "d"(port));
>> + }
>> +}
>
> What the heck?
>

I am not sure if I understand your concern.

Are you commenting on amount of code duplication ? If so, I can certainly improve
and use the similar macro used into header file to generate the functions body.

Are you commenting that we should not attempt to make those functions non-inline
and you prefer them to stay inline ?

thanks

2017-07-27 07:45:43

by David Laight

[permalink] [raw]
Subject: RE: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

From: Brijesh Singh
> Sent: 26 July 2017 21:07
...
> I am not sure if I understand your concern.
>
> Are you commenting on amount of code duplication ? If so, I can certainly improve
> and use the similar macro used into header file to generate the functions body.

If you are careful the real functions could expand the inline functions
that get used when SEV is compiled out.

Oh, if you are looking at this, can you fix memcpy_to_io()
so that it is never 'rep movsb'?

David


2017-07-27 13:32:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

On Mon, Jul 24, 2017 at 02:07:46PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> When Secure Encrypted Virtualization (SEV) is active, boot data (such as
> EFI related data, setup data) is encrypted and needs to be accessed as
> such when mapped. Update the architecture override in early_memremap to
> keep the encryption attribute when mapping this data.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/mm/ioremap.c | 44 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 32 insertions(+), 12 deletions(-)

...

> @@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
> if (flags & MEMREMAP_DEC)
> return false;
>
> - if (memremap_is_setup_data(phys_addr, size) ||
> - memremap_is_efi_data(phys_addr, size) ||
> - memremap_should_map_decrypted(phys_addr, size))
> - return false;
> + if (sme_active()) {
> + if (memremap_is_setup_data(phys_addr, size) ||
> + memremap_is_efi_data(phys_addr, size) ||
> + memremap_should_map_decrypted(phys_addr, size))
> + return false;
> + } else if (sev_active()) {
> + if (memremap_should_map_decrypted(phys_addr, size))
> + return false;
> + }
>
> return true;
> }

I guess this function's hind part can be simplified to:

if (sme_active()) {
if (memremap_is_setup_data(phys_addr, size) ||
memremap_is_efi_data(phys_addr, size))
return false;
}

return ! memremap_should_map_decrypted(phys_addr, size);
}

> @@ -608,15 +621,22 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
> unsigned long size,
> pgprot_t prot)

And this one in a similar manner...

> {
> - if (!sme_active())
> + if (!sme_active() && !sev_active())
> return prot;

... and you don't need that check...

> - if (early_memremap_is_setup_data(phys_addr, size) ||
> - memremap_is_efi_data(phys_addr, size) ||
> - memremap_should_map_decrypted(phys_addr, size))
> - prot = pgprot_decrypted(prot);
> - else
> - prot = pgprot_encrypted(prot);
> + if (sme_active()) {

... if you're going to do it here too.

> + if (early_memremap_is_setup_data(phys_addr, size) ||
> + memremap_is_efi_data(phys_addr, size) ||
> + memremap_should_map_decrypted(phys_addr, size))
> + prot = pgprot_decrypted(prot);
> + else
> + prot = pgprot_encrypted(prot);
> + } else if (sev_active()) {

And here.

> + if (memremap_should_map_decrypted(phys_addr, size))
> + prot = pgprot_decrypted(prot);
> + else
> + prot = pgprot_encrypted(prot);
> + }

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-27 13:40:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

On Wed, Jul 26, 2017 at 11:47:32AM -0500, Tom Lendacky wrote:
> If it's made static then the sme_active()/sev_active() inline functions
> would need to be turned into functions within the mem_encrypt.c file. So
> there's a trade-off to do that, which is the better one?

Simple: why do we have functions if the variables are exported?

The reasoning for sme_me_mask is more or less obvious but for sev_enabled...

IOW, either make the bool static and unlinine the function - this way
you're free to change how you determine whether SEV is enabled later as
callers will be using the function.

Or, if it doesn't really matter because you can always change callers
later, simply drop sev_active() the function and use a bool sev_active
everywhere.

> The kernel needs to distinguish between running under SME and running
> under SEV. SME and SEV are similar but not the same. The trampoline code
> is a good example. Before paging is activated, SME will access all
> memory as decrypted, but SEV will access all memory as encrypted. So
> when APs are being brought up under SME the trampoline area cannot be
> encrypted, whereas under SEV the trampoline area must be encrypted.

I guess you're sensing by now that we need this clarification in a
comment above it...

:-)

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-27 14:59:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> The current code checks only for sme_active() when determining whether
> to perform the encryption attribute change. Include sev_active() in this
> check so that memory attribute changes can occur under SME and SEV.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/mm/pageattr.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index dfb7d65..b726b23 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> unsigned long start;
> int ret;
>
> - /* Nothing to do if the SME is not active */
> - if (!sme_active())
> + /* Nothing to do if SME and SEV are not active */
> + if (!sme_active() && !sev_active())

This is the second place which does

if (!SME && !SEV)

I wonder if, instead of sprinking those, we should have a

if (mem_enc_active())

or so which unifies all those memory encryption logic tests and makes
the code more straightforward for readers who don't have to pay
attention to SME vs SEV ...

Just a thought.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-28 08:47:55

by David Laight

[permalink] [raw]
Subject: RE: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

From: Borislav Petkov
> Sent: 27 July 2017 15:59
> On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
> > From: Tom Lendacky <[email protected]>
> >
> > The current code checks only for sme_active() when determining whether
> > to perform the encryption attribute change. Include sev_active() in this
> > check so that memory attribute changes can occur under SME and SEV.
> >
> > Signed-off-by: Tom Lendacky <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/mm/pageattr.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index dfb7d65..b726b23 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> > unsigned long start;
> > int ret;
> >
> > - /* Nothing to do if the SME is not active */
> > - if (!sme_active())
> > + /* Nothing to do if SME and SEV are not active */
> > + if (!sme_active() && !sev_active())
>
> This is the second place which does
>
> if (!SME && !SEV)
>
> I wonder if, instead of sprinking those, we should have a
>
> if (mem_enc_active())
>
> or so which unifies all those memory encryption logic tests and makes
> the code more straightforward for readers who don't have to pay
> attention to SME vs SEV ...

If any of the code paths are 'hot' it would make sense to be checking
a single memory location.

David


2017-07-28 10:32:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

On Mon, Jul 24, 2017 at 02:07:48PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> EFI data is encrypted when the kernel is run under SEV. Update the
> page table references to be sure the EFI memory areas are accessed
> encrypted.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/platform/efi/efi_64.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 12e8388..1ecb3f6 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -32,6 +32,7 @@
> #include <linux/reboot.h>
> #include <linux/slab.h>
> #include <linux/ucs2_string.h>
> +#include <linux/mem_encrypt.h>
>
> #include <asm/setup.h>
> #include <asm/page.h>
> @@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
> * as trim_bios_range() will reserve the first page and isolate it away
> * from memory allocators anyway.
> */
> - if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
> + pf = _PAGE_RW;
> + if (sev_active())
> + pf |= _PAGE_ENC;

\n here

> + if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
> pr_err("Failed to create 1:1 mapping for the first page!\n");
> return 1;
> }
> @@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
> if (!(md->attribute & EFI_MEMORY_WB))
> flags |= _PAGE_PCD;
>
> + if (sev_active())
> + flags |= _PAGE_ENC;
> +
> pfn = md->phys_addr >> PAGE_SHIFT;
> if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
> pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
> @@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
> pgd_t *pgd = efi_pgd;
> int err1, err2;
>
> + if (sev_active())
> + pf |= _PAGE_ENC;

Move this assignment to the caller efi_update_mem_attr() where pf is being
set...

> +
> /* Update the 1:1 mapping */
> pfn = md->phys_addr >> PAGE_SHIFT;
> err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, pf);
> @@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
> (md->type != EFI_RUNTIME_SERVICES_CODE))
> pf |= _PAGE_RW;
>
> + if (sev_active())
> + pf |= _PAGE_ENC;

... just like here.

> +
> efi_update_mappings(md, pf);

In general, I'm not totally excited about that sprinkling of if
(sev_active())... :-\

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-28 15:24:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
> functions each have much of the same code. Create a new function that
> consolidates the common code from these functions in one place to reduce
> the amount of duplicated code.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> kernel/resource.c | 53 ++++++++++++++++++++++++++---------------------------
> 1 file changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 9b5f044..7b20b3e 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
> res->start = p->start;
> if (res->end > p->end)
> res->end = p->end;
> + res->desc = p->desc;
> return 0;

I must be going blind: where are we using that res->desc?

> +static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
> + bool first_level_children_only,

Btw, that variable name is insanely long.

The rest looks ok to me, thanks for the cleanup!

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-31 08:27:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

On Mon, Jul 24, 2017 at 02:07:50PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> In prep for a new function that will need additional resource information
> during the resource walk, update the resource walk callback to pass the
> resource structure. Since the current callback start and end arguments
> are pulled from the resource structure, the callback functions can obtain
> them from the resource structure directly.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/powerpc/kernel/machine_kexec_file_64.c | 12 +++++++++---
> arch/x86/kernel/crash.c | 18 +++++++++---------
> arch/x86/kernel/pmem.c | 2 +-
> include/linux/ioport.h | 4 ++--
> include/linux/kexec.h | 2 +-
> kernel/kexec_file.c | 5 +++--
> kernel/resource.c | 9 +++++----
> 7 files changed, 30 insertions(+), 22 deletions(-)

Reviewed-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-07-31 22:20:00

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

On Mon, Jul 24, 2017 at 12:07 PM, Brijesh Singh <[email protected]> wrote:
> From: Tom Lendacky <[email protected]>
>
> In prep for a new function that will need additional resource information
> during the resource walk, update the resource walk callback to pass the
> resource structure. Since the current callback start and end arguments
> are pulled from the resource structure, the callback functions can obtain
> them from the resource structure directly.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>

This is a nice clean up even without the refactoring need. :)

Reviewed-by: Kees Cook <[email protected]>

Thanks!

-Kees

--
Kees Cook
Pixel Security

2017-08-02 04:03:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

On Mon, Jul 24, 2017 at 02:07:51PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> In order for memory pages to be properly mapped when SEV is active, we
> need to use the PAGE_KERNEL protection attribute as the base protection.
> This will insure that memory mapping of, e.g. ACPI tables, receives the
> proper mapping attributes.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/mm/ioremap.c | 28 ++++++++++++++++++++++++++++
> include/linux/ioport.h | 3 +++
> kernel/resource.c | 17 +++++++++++++++++
> 3 files changed, 48 insertions(+)
>
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index c0be7cf..7b27332 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
> return 0;
> }
>
> +static int __ioremap_res_desc_other(struct resource *res, void *arg)
> +{
> + return (res->desc != IORES_DESC_NONE);
> +}
> +
> +/*
> + * This function returns true if the target memory is marked as
> + * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
> + * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
> + */
> +static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
> +{
> + u64 start, end;
> +
> + start = (u64)addr;
> + end = start + size - 1;
> +
> + return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
> +}
> +
> /*
> * Remap an arbitrary physical address space into the kernel virtual
> * address space. It transparently creates kernel huge I/O mapping when
> @@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> pcm = new_pcm;
> }
>
> + /*
> + * If the page being mapped is in memory and SEV is active then
> + * make sure the memory encryption attribute is enabled in the
> + * resulting mapping.
> + */
> prot = PAGE_KERNEL_IO;
> + if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
> + prot = pgprot_encrypted(prot);

Hmm, so this function already does walk_system_ram_range() a bit
earlier and now on SEV systems we're going to do it again. Can we make
walk_system_ram_range() return a distinct value for SEV systems and act
accordingly in __ioremap_caller() instead of repeating the operation?

It looks to me like we could...

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-07 03:49:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption

On Mon, Jul 24, 2017 at 02:07:52PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> DMA access to memory mapped as encrypted while SEV is active can not be
> encrypted during device write or decrypted during device read.

Yeah, definitely rewrite that sentence.

> In order
> for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
> must be used.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/mm/mem_encrypt.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
> lib/swiotlb.c | 5 +--
> 2 files changed, 89 insertions(+), 2 deletions

...

> @@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
> /* Call into SWIOTLB to update the SWIOTLB DMA buffers */
> swiotlb_update_mem_attributes();
>
> + /*
> + * With SEV, DMA operations cannot use encryption. New DMA ops
> + * are required in order to mark the DMA areas as decrypted or
> + * to use bounce buffers.
> + */
> + if (sev_active())
> + dma_ops = &sme_dma_ops;

Well, we do differentiate between SME and SEV and the check is
sev_active but the ops are called sme_dma_ops. Call them sev_dma_ops
instead for less confusion.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-09 18:23:03

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On 7/25/2017 10:33 AM, Borislav Petkov wrote:
> On Tue, Jul 25, 2017 at 10:29:40AM -0500, Tom Lendacky wrote:
>> But early_identify_cpu() calls get_cpu_cap() which will check for cpuid
>> leaf 0x80000008 support and set x86_phys_bits.
>
> Right, but it can't be less than 32, can it? And if it is more than 32
> bits, then it probably doesn't really matter on 32-bit. Unless it is
> less than 36 bits and you do PAE...
>
>> I'll try to build and run a 32-bit kernel and see how this all flows.
>
> Yeah, that would be good.

Ok, finally got around to running a 32-bit kernel and it reports
x86_phys_bits as 48.

Thanks,
Tom

>
> Thanks.
>

2017-08-10 13:03:36

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

On 7/26/2017 11:03 AM, Borislav Petkov wrote:
> Subject: x86/realmode: ...

Done.

>
> On Mon, Jul 24, 2017 at 02:07:45PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> When SEV is active the trampoline area will need to be in encrypted
>> memory so only mark the area decrypted if SME is active.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/realmode/init.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 1f71980..c7eeca7 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
>> /*
>> * If SME is active, the trampoline area will need to be in
>> * decrypted memory in order to bring up other processors
>> - * successfully.
>> + * successfully. For SEV the trampoline area needs to be in
>> + * encrypted memory, so only do this for SME.
>
> Or simply say:
>
> "It is not needed for SEV."

Will do.

Thanks,
Tom

>

2017-08-17 08:25:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

On Wed, Aug 09, 2017 at 01:17:54PM -0500, Tom Lendacky wrote:
> Ok, finally got around to running a 32-bit kernel and it reports
> x86_phys_bits as 48.

So it doesn't really matter on 32-bit. I guess you could add a comment
saying why we don't care.

Thanks.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-17 18:05:54

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

On 7/27/2017 8:31 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:46PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> When Secure Encrypted Virtualization (SEV) is active, boot data (such as
>> EFI related data, setup data) is encrypted and needs to be accessed as
>> such when mapped. Update the architecture override in early_memremap to
>> keep the encryption attribute when mapping this data.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/mm/ioremap.c | 44 ++++++++++++++++++++++++++++++++------------
>> 1 file changed, 32 insertions(+), 12 deletions(-)
>
> ...
>
>> @@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
>> if (flags & MEMREMAP_DEC)
>> return false;
>>
>> - if (memremap_is_setup_data(phys_addr, size) ||
>> - memremap_is_efi_data(phys_addr, size) ||
>> - memremap_should_map_decrypted(phys_addr, size))
>> - return false;
>> + if (sme_active()) {
>> + if (memremap_is_setup_data(phys_addr, size) ||
>> + memremap_is_efi_data(phys_addr, size) ||
>> + memremap_should_map_decrypted(phys_addr, size))
>> + return false;
>> + } else if (sev_active()) {
>> + if (memremap_should_map_decrypted(phys_addr, size))
>> + return false;
>> + }
>>
>> return true;
>> }
>
> I guess this function's hind part can be simplified to:
>
> if (sme_active()) {
> if (memremap_is_setup_data(phys_addr, size) ||
> memremap_is_efi_data(phys_addr, size))
> return false;
> }
>
> return ! memremap_should_map_decrypted(phys_addr, size);
> }
>

Ok, definitely cleaner.

>> @@ -608,15 +621,22 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
>> unsigned long size,
>> pgprot_t prot)
>
> And this one in a similar manner...
>
>> {
>> - if (!sme_active())
>> + if (!sme_active() && !sev_active())
>> return prot;
>
> ... and you don't need that check...
>
>> - if (early_memremap_is_setup_data(phys_addr, size) ||
>> - memremap_is_efi_data(phys_addr, size) ||
>> - memremap_should_map_decrypted(phys_addr, size))
>> - prot = pgprot_decrypted(prot);
>> - else
>> - prot = pgprot_encrypted(prot);
>> + if (sme_active()) {
>
> ... if you're going to do it here too.
>
>> + if (early_memremap_is_setup_data(phys_addr, size) ||
>> + memremap_is_efi_data(phys_addr, size) ||
>> + memremap_should_map_decrypted(phys_addr, size))
>> + prot = pgprot_decrypted(prot);
>> + else
>> + prot = pgprot_encrypted(prot);
>> + } else if (sev_active()) {
>
> And here.

Will do.

Thanks,
Tom

>
>> + if (memremap_should_map_decrypted(phys_addr, size))
>> + prot = pgprot_decrypted(prot);
>> + else
>> + prot = pgprot_encrypted(prot);
>> + }
>

2017-08-17 18:11:02

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

On 7/27/2017 9:58 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> The current code checks only for sme_active() when determining whether
>> to perform the encryption attribute change. Include sev_active() in this
>> check so that memory attribute changes can occur under SME and SEV.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/mm/pageattr.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>> index dfb7d65..b726b23 100644
>> --- a/arch/x86/mm/pageattr.c
>> +++ b/arch/x86/mm/pageattr.c
>> @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>> unsigned long start;
>> int ret;
>>
>> - /* Nothing to do if the SME is not active */
>> - if (!sme_active())
>> + /* Nothing to do if SME and SEV are not active */
>> + if (!sme_active() && !sev_active())
>
> This is the second place which does
>
> if (!SME && !SEV)
>
> I wonder if, instead of sprinking those, we should have a
>
> if (mem_enc_active())
>
> or so which unifies all those memory encryption logic tests and makes
> the code more straightforward for readers who don't have to pay
> attention to SME vs SEV ...

Yup, that will make things look cleaner and easier to understand.

Thanks,
Tom

>
> Just a thought.
>

2017-08-17 18:21:54

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

On 7/28/2017 3:47 AM, David Laight wrote:
> From: Borislav Petkov
>> Sent: 27 July 2017 15:59
>> On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
>>> From: Tom Lendacky <[email protected]>
>>>
>>> The current code checks only for sme_active() when determining whether
>>> to perform the encryption attribute change. Include sev_active() in this
>>> check so that memory attribute changes can occur under SME and SEV.
>>>
>>> Signed-off-by: Tom Lendacky <[email protected]>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> ---
>>> arch/x86/mm/pageattr.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>>> index dfb7d65..b726b23 100644
>>> --- a/arch/x86/mm/pageattr.c
>>> +++ b/arch/x86/mm/pageattr.c
>>> @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>>> unsigned long start;
>>> int ret;
>>>
>>> - /* Nothing to do if the SME is not active */
>>> - if (!sme_active())
>>> + /* Nothing to do if SME and SEV are not active */
>>> + if (!sme_active() && !sev_active())
>>
>> This is the second place which does
>>
>> if (!SME && !SEV)
>>
>> I wonder if, instead of sprinking those, we should have a
>>
>> if (mem_enc_active())
>>
>> or so which unifies all those memory encryption logic tests and makes
>> the code more straightforward for readers who don't have to pay
>> attention to SME vs SEV ...
>
> If any of the code paths are 'hot' it would make sense to be checking
> a single memory location.

The function would check a single variable/memory location and making it
an inline function would accomplish that.

Thanks,
Tom

>
> David
>

2017-08-17 18:42:41

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

On 7/28/2017 5:31 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:48PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> EFI data is encrypted when the kernel is run under SEV. Update the
>> page table references to be sure the EFI memory areas are accessed
>> encrypted.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/platform/efi/efi_64.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
>> index 12e8388..1ecb3f6 100644
>> --- a/arch/x86/platform/efi/efi_64.c
>> +++ b/arch/x86/platform/efi/efi_64.c
>> @@ -32,6 +32,7 @@
>> #include <linux/reboot.h>
>> #include <linux/slab.h>
>> #include <linux/ucs2_string.h>
>> +#include <linux/mem_encrypt.h>
>>
>> #include <asm/setup.h>
>> #include <asm/page.h>
>> @@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>> * as trim_bios_range() will reserve the first page and isolate it away
>> * from memory allocators anyway.
>> */
>> - if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
>> + pf = _PAGE_RW;
>> + if (sev_active())
>> + pf |= _PAGE_ENC;
>
> \n here
>
>> + if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
>> pr_err("Failed to create 1:1 mapping for the first page!\n");
>> return 1;
>> }
>> @@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
>> if (!(md->attribute & EFI_MEMORY_WB))
>> flags |= _PAGE_PCD;
>>
>> + if (sev_active())
>> + flags |= _PAGE_ENC;
>> +
>> pfn = md->phys_addr >> PAGE_SHIFT;
>> if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
>> pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
>> @@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
>> pgd_t *pgd = efi_pgd;
>> int err1, err2;
>>
>> + if (sev_active())
>> + pf |= _PAGE_ENC;
>
> Move this assignment to the caller efi_update_mem_attr() where pf is being
> set...

Will do.

>
>> +
>> /* Update the 1:1 mapping */
>> pfn = md->phys_addr >> PAGE_SHIFT;
>> err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, pf);
>> @@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
>> (md->type != EFI_RUNTIME_SERVICES_CODE))
>> pf |= _PAGE_RW;
>>
>> + if (sev_active())
>> + pf |= _PAGE_ENC;
>
> ... just like here.

Yup.

Thanks,
Tom

>
>> +
>> efi_update_mappings(md, pf);
>
> In general, I'm not totally excited about that sprinkling of if
> (sev_active())... :-\
>

2017-08-17 18:55:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

On 7/28/2017 10:23 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
>> functions each have much of the same code. Create a new function that
>> consolidates the common code from these functions in one place to reduce
>> the amount of duplicated code.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> kernel/resource.c | 53 ++++++++++++++++++++++++++---------------------------
>> 1 file changed, 26 insertions(+), 27 deletions(-)
>>
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index 9b5f044..7b20b3e 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
>> res->start = p->start;
>> if (res->end > p->end)
>> res->end = p->end;
>> + res->desc = p->desc;
>> return 0;
>
> I must be going blind: where are we using that res->desc?

I think that was left-over from the initial consolidation work I was
doing. I'll remove it.

>
>> +static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
>> + bool first_level_children_only,
>
> Btw, that variable name is insanely long.

I know, but I'm maintaining consistency with the name that was already
present vs. changing it.

>
> The rest looks ok to me, thanks for the cleanup!

Thanks,
Tom

>

2017-08-17 19:03:50

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

On 8/17/2017 1:55 PM, Tom Lendacky wrote:
> On 7/28/2017 10:23 AM, Borislav Petkov wrote:
>> On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:
>>> From: Tom Lendacky <[email protected]>
>>>
>>> The walk_iomem_res_desc(), walk_system_ram_res() and
>>> walk_system_ram_range()
>>> functions each have much of the same code. Create a new function that
>>> consolidates the common code from these functions in one place to reduce
>>> the amount of duplicated code.
>>>
>>> Signed-off-by: Tom Lendacky <[email protected]>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> ---
>>> kernel/resource.c | 53
>>> ++++++++++++++++++++++++++---------------------------
>>> 1 file changed, 26 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/kernel/resource.c b/kernel/resource.c
>>> index 9b5f044..7b20b3e 100644
>>> --- a/kernel/resource.c
>>> +++ b/kernel/resource.c
>>> @@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource
>>> *res, unsigned long desc,
>>> res->start = p->start;
>>> if (res->end > p->end)
>>> res->end = p->end;
>>> + res->desc = p->desc;
>>> return 0;
>>
>> I must be going blind: where are we using that res->desc?
>
> I think that was left-over from the initial consolidation work I was
> doing. I'll remove it.

I spoke too soon... I use it in a later patch as part of a callback.
But instead of putting it here, I'll add it to the patch that actually
needs it.

Thanks,
Tom

>
>>
>>> +static int __walk_iomem_res_desc(struct resource *res, unsigned long
>>> desc,
>>> + bool first_level_children_only,
>>
>> Btw, that variable name is insanely long.
>
> I know, but I'm maintaining consistency with the name that was already
> present vs. changing it.
>
>>
>> The rest looks ok to me, thanks for the cleanup!
>
> Thanks,
> Tom
>
>>

2017-08-17 19:23:08

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

On 8/1/2017 11:02 PM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:51PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> In order for memory pages to be properly mapped when SEV is active, we
>> need to use the PAGE_KERNEL protection attribute as the base protection.
>> This will insure that memory mapping of, e.g. ACPI tables, receives the
>> proper mapping attributes.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/mm/ioremap.c | 28 ++++++++++++++++++++++++++++
>> include/linux/ioport.h | 3 +++
>> kernel/resource.c | 17 +++++++++++++++++
>> 3 files changed, 48 insertions(+)
>>
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c0be7cf..7b27332 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
>> return 0;
>> }
>>
>> +static int __ioremap_res_desc_other(struct resource *res, void *arg)
>> +{
>> + return (res->desc != IORES_DESC_NONE);
>> +}
>> +
>> +/*
>> + * This function returns true if the target memory is marked as
>> + * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
>> + * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
>> + */
>> +static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
>> +{
>> + u64 start, end;
>> +
>> + start = (u64)addr;
>> + end = start + size - 1;
>> +
>> + return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
>> +}
>> +
>> /*
>> * Remap an arbitrary physical address space into the kernel virtual
>> * address space. It transparently creates kernel huge I/O mapping when
>> @@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>> pcm = new_pcm;
>> }
>>
>> + /*
>> + * If the page being mapped is in memory and SEV is active then
>> + * make sure the memory encryption attribute is enabled in the
>> + * resulting mapping.
>> + */
>> prot = PAGE_KERNEL_IO;
>> + if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
>> + prot = pgprot_encrypted(prot);
>
> Hmm, so this function already does walk_system_ram_range() a bit
> earlier and now on SEV systems we're going to do it again. Can we make
> walk_system_ram_range() return a distinct value for SEV systems and act
> accordingly in __ioremap_caller() instead of repeating the operation?
>
> It looks to me like we could...

Let me look into this. I can probably come up with something that does
the walk once.

Thanks,
Tom

>

2017-08-17 19:35:43

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption



On 8/6/2017 10:48 PM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:52PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> DMA access to memory mapped as encrypted while SEV is active can not be
>> encrypted during device write or decrypted during device read.
>
> Yeah, definitely rewrite that sentence.

Heh, yup.

>
>> In order
>> for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
>> must be used.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/mm/mem_encrypt.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
>> lib/swiotlb.c | 5 +--
>> 2 files changed, 89 insertions(+), 2 deletions
>
> ...
>
>> @@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
>> /* Call into SWIOTLB to update the SWIOTLB DMA buffers */
>> swiotlb_update_mem_attributes();
>>
>> + /*
>> + * With SEV, DMA operations cannot use encryption. New DMA ops
>> + * are required in order to mark the DMA areas as decrypted or
>> + * to use bounce buffers.
>> + */
>> + if (sev_active())
>> + dma_ops = &sme_dma_ops;
>
> Well, we do differentiate between SME and SEV and the check is
> sev_active but the ops are called sme_dma_ops. Call them sev_dma_ops
> instead for less confusion.

Yup, will do.

Thanks,
Tom

>

2017-08-22 16:53:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

On Wed, Jul 26, 2017 at 03:07:14PM -0500, Brijesh Singh wrote:
> Are you commenting on amount of code duplication ? If so, I can certainly improve
> and use the similar macro used into header file to generate the functions body.

So the argument about having CONFIG_AMD_MEM_ENCRYPT disabled doesn't
bring a whole lot because distro kernels will all have it enabled.

Optimally, it would be best if when SEV is enabled, we patch those IO
insns but we can't patch at arbitrary times - we just do it once, at
pre-SMP time.

And from looking at the code, we do set sev_enabled very early, as
part of __startup_64() -> sme_enable() so I guess we can make that
set a synthetic X86_FEATURE_ bit and then patch REP IN/OUT* with a
CALL, similar to what we do in arch/x86/include/asm/arch_hweight.h with
POPCNT.

But there you need to pay attention to registers being clobbered, see

f5967101e9de ("x86/hweight: Get rid of the special calling convention")

Yap, it does sound a bit more complex but if done right, we will be
patching all call sites the same way we patch hweight*() calls and there
should be no change to kernel size...

As always, the devil is in the detail.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-23 15:31:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

On Mon, Jul 24, 2017 at 02:07:54PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> Early in the boot process, add checks to determine if the kernel is
> running with Secure Encrypted Virtualization (SEV) active.
>
> Checking for SEV requires checking that the kernel is running under a
> hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
> (CPUID 0x8000001f, bit 1) and then check a non-interceptable SEV MSR
> (0xc0010131, bit 0).
>
> This check is required so that during early compressed kernel booting the
> pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
> updated to include the encryption mask so that when the kernel is
> decompressed into encrypted memory.

, it can boot properly.

:)

> After the kernel is decompressed and continues booting the same logic is
> used to check if SEV is active and set a flag indicating so. This allows
> us to distinguish between SME and SEV, each of which have unique
> differences in how certain things are handled: e.g. DMA (always bounce
> buffered with SEV) or EFI tables (always access decrypted with SME).
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 2 +
> arch/x86/boot/compressed/head_64.S | 16 +++++
> arch/x86/boot/compressed/mem_encrypt.S | 103 +++++++++++++++++++++++++++++++++
> arch/x86/boot/compressed/misc.h | 2 +
> arch/x86/boot/compressed/pagetable.c | 8 ++-
> arch/x86/include/asm/mem_encrypt.h | 3 +
> arch/x86/include/asm/msr-index.h | 3 +
> arch/x86/include/uapi/asm/kvm_para.h | 1 -
> arch/x86/mm/mem_encrypt.c | 42 +++++++++++---
> 9 files changed, 169 insertions(+), 11 deletions(-)
> create mode 100644 arch/x86/boot/compressed/mem_encrypt.S
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 2c860ad..d2fe901 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
> $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
> $(obj)/piggy.o $(obj)/cpuflags.o
>
> +vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o

There's a

ifdef CONFIG_X86_64

a couple of lines below. Just put it there.

...

> +++ b/arch/x86/boot/compressed/mem_encrypt.S
> @@ -0,0 +1,103 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2017 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include <asm/processor-flags.h>
> +#include <asm/msr.h>
> +#include <asm/asm-offsets.h>
> +
> + .text
> + .code32
> +ENTRY(get_sev_encryption_bit)
> + xor %eax, %eax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push %ebx
> + push %ecx
> + push %edx
> +
> + /* Check if running under a hypervisor */
> + movl $1, %eax
> + cpuid
> + bt $31, %ecx /* Check the hypervisor bit */
> + jnc .Lno_sev
> +
> + movl $0x80000000, %eax /* CPUID to check the highest leaf */
> + cpuid
> + cmpl $0x8000001f, %eax /* See if 0x8000001f is available */
> + jb .Lno_sev
> +
> + /*
> + * Check for the SEV feature:
> + * CPUID Fn8000_001F[EAX] - Bit 1
> + * CPUID Fn8000_001F[EBX] - Bits 5:0
> + * Pagetable bit position used to indicate encryption
> + */
> + movl $0x8000001f, %eax
> + cpuid
> + bt $1, %eax /* Check if SEV is available */
> + jnc .Lno_sev
> +
> + movl $MSR_F17H_SEV, %ecx /* Read the SEV MSR */
> + rdmsr
> + bt $MSR_F17H_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
> + jnc .Lno_sev
> +
> + /*
> + * Get memory encryption information:
> + */

The side-comment is enough. This one above can go.

> + movl %ebx, %eax
> + andl $0x3f, %eax /* Return the encryption bit location */
> + jmp .Lsev_exit
> +
> +.Lno_sev:
> + xor %eax, %eax
> +
> +.Lsev_exit:
> + pop %edx
> + pop %ecx
> + pop %ebx
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +
> + ret
> +ENDPROC(get_sev_encryption_bit)
> +
> + .code64
> +ENTRY(get_sev_encryption_mask)
> + xor %rax, %rax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push %rbp
> + push %rdx
> +
> + movq %rsp, %rbp /* Save current stack pointer */
> +
> + call get_sev_encryption_bit /* Get the encryption bit position */

So we get to call get_sev_encryption_bit() here again and noodle through
the CPUID discovery and MSR access. We should instead cache that info
and return the encryption bit directly on a second call. (And initialize
it to -1 to denote that get_sev_encryption_bit() hasn't run yet).

...

> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 9274ec7..9cb6472 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -19,6 +19,9 @@
>
> #include <asm/bootparam.h>
>
> +#define AMD_SME_FEATURE_BIT BIT(0)
> +#define AMD_SEV_FEATURE_BIT BIT(1)

s/_FEATURE//

AMD_SME_BIT and AMD_SEV_BIT is good enough :)

And frankly, if you're going to use them only below in sme_enable() - I
need to check more thoroughly the remaining patches - but if you only
are going to use them there, just define them inside the function so
that they're close.

> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> extern unsigned long sme_me_mask;
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e399d68..530020f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -326,6 +326,9 @@
>
> /* Fam 17h MSRs */
> #define MSR_F17H_IRPERF 0xc00000e9
> +#define MSR_F17H_SEV 0xc0010131

If that MSR is going to be used later on - and I don't see why not - you
probably should make it an arch one: MSR_AMD64_SEV. Even if it isn't yet
officially. :-)

> +#define MSR_F17H_SEV_ENABLED_BIT 0
> +#define MSR_F17H_SEV_ENABLED BIT_ULL(MSR_F17H_SEV_ENABLED_BIT)
>
> /* Fam 16h MSRs */
> #define MSR_F16H_L2I_PERF_CTL 0xc0010230
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index a965e5b..c202ba3 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -109,5 +109,4 @@ struct kvm_vcpu_pv_apf_data {
> #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> #define KVM_PV_EOI_DISABLED 0x0
>
> -
> #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 5e5d460..ed8780e 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -288,7 +288,9 @@ void __init mem_encrypt_init(void)
> if (sev_active())
> dma_ops = &sme_dma_ops;
>
> - pr_info("AMD Secure Memory Encryption (SME) active\n");
> + pr_info("AMD %s active\n",
> + sev_active() ? "Secure Encrypted Virtualization (SEV)"
> + : "Secure Memory Encryption (SME)");
> }
>
> void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
> @@ -616,12 +618,23 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
> {
> const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> unsigned int eax, ebx, ecx, edx;
> + unsigned long feature_mask;
> bool active_by_default;
> unsigned long me_mask;
> char buffer[16];
> u64 msr;
>
> - /* Check for the SME support leaf */
> + /*
> + * Set the feature mask (SME or SEV) based on whether we are
> + * running under a hypervisor.
> + */
> + eax = 1;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + feature_mask = (ecx & BIT(31)) ? AMD_SEV_FEATURE_BIT
> + : AMD_SME_FEATURE_BIT;

Set that feature mask before using it...

> +
> + /* Check for the SME/SEV support leaf */

... because if that check exits due to no SME leaf, you're uselessly
doing all the above.

> eax = 0x80000000;
> ecx = 0;
> native_cpuid(&eax, &ebx, &ecx, &edx);
> @@ -629,24 +642,39 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
> return;
>
> /*
> - * Check for the SME feature:
> + * Check for the SME/SEV feature:
> * CPUID Fn8000_001F[EAX] - Bit 0
> * Secure Memory Encryption support
> + * CPUID Fn8000_001F[EAX] - Bit 1

No need to repeat the CPUID leaf here - only Bit 1:

* CPUID Fn8000_001F[EAX]
* - Bit 0: Secure Memory Encryption support
* - Bit 1: Secure Encrypted Virtualization support


> + * Secure Encrypted Virtualization support
> * CPUID Fn8000_001F[EBX] - Bits 5:0
> * Pagetable bit position used to indicate encryption
> */
> eax = 0x8000001f;
> ecx = 0;
> native_cpuid(&eax, &ebx, &ecx, &edx);
> - if (!(eax & 1))
> + if (!(eax & feature_mask))
> return;
>
> me_mask = 1UL << (ebx & 0x3f);
>
> - /* Check if SME is enabled */
> - msr = __rdmsr(MSR_K8_SYSCFG);
> - if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + /* Check if memory encryption is enabled */
> + if (feature_mask == AMD_SME_FEATURE_BIT) {
> + /* For SME, check the SYSCFG MSR */
> + msr = __rdmsr(MSR_K8_SYSCFG);
> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + return;
> + } else {
> + /* For SEV, check the SEV MSR */
> + msr = __rdmsr(MSR_F17H_SEV);
> + if (!(msr & MSR_F17H_SEV_ENABLED))
> + return;
> +
> + /* SEV state cannot be controlled by a command line option */
> + sme_me_mask = me_mask;
> + sev_enabled = 1;
> return;
> + }

Nice. Two birds with one stone is always better. :)


--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-24 18:55:12

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

On 8/23/2017 10:30 AM, Borislav Petkov wrote:
> On Mon, Jul 24, 2017 at 02:07:54PM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Early in the boot process, add checks to determine if the kernel is
>> running with Secure Encrypted Virtualization (SEV) active.
>>
>> Checking for SEV requires checking that the kernel is running under a
>> hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
>> (CPUID 0x8000001f, bit 1) and then check a non-interceptable SEV MSR
>> (0xc0010131, bit 0).
>>
>> This check is required so that during early compressed kernel booting the
>> pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
>> updated to include the encryption mask so that when the kernel is
>> decompressed into encrypted memory.
>
> , it can boot properly.
>
> :)
>

Yup, kinda didn't complete that sentence.

>> After the kernel is decompressed and continues booting the same logic is
>> used to check if SEV is active and set a flag indicating so. This allows
>> us to distinguish between SME and SEV, each of which have unique
>> differences in how certain things are handled: e.g. DMA (always bounce
>> buffered with SEV) or EFI tables (always access decrypted with SME).
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/boot/compressed/Makefile | 2 +
>> arch/x86/boot/compressed/head_64.S | 16 +++++
>> arch/x86/boot/compressed/mem_encrypt.S | 103 +++++++++++++++++++++++++++++++++
>> arch/x86/boot/compressed/misc.h | 2 +
>> arch/x86/boot/compressed/pagetable.c | 8 ++-
>> arch/x86/include/asm/mem_encrypt.h | 3 +
>> arch/x86/include/asm/msr-index.h | 3 +
>> arch/x86/include/uapi/asm/kvm_para.h | 1 -
>> arch/x86/mm/mem_encrypt.c | 42 +++++++++++---
>> 9 files changed, 169 insertions(+), 11 deletions(-)
>> create mode 100644 arch/x86/boot/compressed/mem_encrypt.S
>>
>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> index 2c860ad..d2fe901 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>> $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
>> $(obj)/piggy.o $(obj)/cpuflags.o
>>
>> +vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o
>
> There's a
>
> ifdef CONFIG_X86_64
>
> a couple of lines below. Just put it there.

Will do.

>
> ...
>
>> +++ b/arch/x86/boot/compressed/mem_encrypt.S
>> @@ -0,0 +1,103 @@
>> +/*
>> + * AMD Memory Encryption Support
>> + *
>> + * Copyright (C) 2017 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Tom Lendacky <[email protected]>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> +#include <asm/processor-flags.h>
>> +#include <asm/msr.h>
>> +#include <asm/asm-offsets.h>
>> +
>> + .text
>> + .code32
>> +ENTRY(get_sev_encryption_bit)
>> + xor %eax, %eax
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> + push %ebx
>> + push %ecx
>> + push %edx
>> +
>> + /* Check if running under a hypervisor */
>> + movl $1, %eax
>> + cpuid
>> + bt $31, %ecx /* Check the hypervisor bit */
>> + jnc .Lno_sev
>> +
>> + movl $0x80000000, %eax /* CPUID to check the highest leaf */
>> + cpuid
>> + cmpl $0x8000001f, %eax /* See if 0x8000001f is available */
>> + jb .Lno_sev
>> +
>> + /*
>> + * Check for the SEV feature:
>> + * CPUID Fn8000_001F[EAX] - Bit 1
>> + * CPUID Fn8000_001F[EBX] - Bits 5:0
>> + * Pagetable bit position used to indicate encryption
>> + */
>> + movl $0x8000001f, %eax
>> + cpuid
>> + bt $1, %eax /* Check if SEV is available */
>> + jnc .Lno_sev
>> +
>> + movl $MSR_F17H_SEV, %ecx /* Read the SEV MSR */
>> + rdmsr
>> + bt $MSR_F17H_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
>> + jnc .Lno_sev
>> +
>> + /*
>> + * Get memory encryption information:
>> + */
>
> The side-comment is enough. This one above can go.

Done.

>
>> + movl %ebx, %eax
>> + andl $0x3f, %eax /* Return the encryption bit location */
>> + jmp .Lsev_exit
>> +
>> +.Lno_sev:
>> + xor %eax, %eax
>> +
>> +.Lsev_exit:
>> + pop %edx
>> + pop %ecx
>> + pop %ebx
>> +
>> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
>> +
>> + ret
>> +ENDPROC(get_sev_encryption_bit)
>> +
>> + .code64
>> +ENTRY(get_sev_encryption_mask)
>> + xor %rax, %rax
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> + push %rbp
>> + push %rdx
>> +
>> + movq %rsp, %rbp /* Save current stack pointer */
>> +
>> + call get_sev_encryption_bit /* Get the encryption bit position */
>
> So we get to call get_sev_encryption_bit() here again and noodle through
> the CPUID discovery and MSR access. We should instead cache that info
> and return the encryption bit directly on a second call. (And initialize
> it to -1 to denote that get_sev_encryption_bit() hasn't run yet).

Ok, I'll look into that optimization.

>
> ...
>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 9274ec7..9cb6472 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -19,6 +19,9 @@
>>
>> #include <asm/bootparam.h>
>>
>> +#define AMD_SME_FEATURE_BIT BIT(0)
>> +#define AMD_SEV_FEATURE_BIT BIT(1)
>
> s/_FEATURE//
>
> AMD_SME_BIT and AMD_SEV_BIT is good enough :)
>
> And frankly, if you're going to use them only below in sme_enable() - I
> need to check more thoroughly the remaining patches - but if you only
> are going to use them there, just define them inside the function so
> that they're close.

Sounds good. I believe that is the only place they are/will be used
so I'll make that change.

>
>> +
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>>
>> extern unsigned long sme_me_mask;
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e399d68..530020f 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -326,6 +326,9 @@
>>
>> /* Fam 17h MSRs */
>> #define MSR_F17H_IRPERF 0xc00000e9
>> +#define MSR_F17H_SEV 0xc0010131
>
> If that MSR is going to be used later on - and I don't see why not - you
> probably should make it an arch one: MSR_AMD64_SEV. Even if it isn't yet
> officially. :-)
>

Will do.

>> +#define MSR_F17H_SEV_ENABLED_BIT 0
>> +#define MSR_F17H_SEV_ENABLED BIT_ULL(MSR_F17H_SEV_ENABLED_BIT)
>>
>> /* Fam 16h MSRs */
>> #define MSR_F16H_L2I_PERF_CTL 0xc0010230
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index a965e5b..c202ba3 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -109,5 +109,4 @@ struct kvm_vcpu_pv_apf_data {
>> #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>> #define KVM_PV_EOI_DISABLED 0x0
>>
>> -
>> #endif /* _UAPI_ASM_X86_KVM_PARA_H */
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 5e5d460..ed8780e 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -288,7 +288,9 @@ void __init mem_encrypt_init(void)
>> if (sev_active())
>> dma_ops = &sme_dma_ops;
>>
>> - pr_info("AMD Secure Memory Encryption (SME) active\n");
>> + pr_info("AMD %s active\n",
>> + sev_active() ? "Secure Encrypted Virtualization (SEV)"
>> + : "Secure Memory Encryption (SME)");
>> }
>>
>> void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>> @@ -616,12 +618,23 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
>> {
>> const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
>> unsigned int eax, ebx, ecx, edx;
>> + unsigned long feature_mask;
>> bool active_by_default;
>> unsigned long me_mask;
>> char buffer[16];
>> u64 msr;
>>
>> - /* Check for the SME support leaf */
>> + /*
>> + * Set the feature mask (SME or SEV) based on whether we are
>> + * running under a hypervisor.
>> + */
>> + eax = 1;
>> + ecx = 0;
>> + native_cpuid(&eax, &ebx, &ecx, &edx);
>> + feature_mask = (ecx & BIT(31)) ? AMD_SEV_FEATURE_BIT
>> + : AMD_SME_FEATURE_BIT;
>
> Set that feature mask before using it...
>
>> +
>> + /* Check for the SME/SEV support leaf */
>
> ... because if that check exits due to no SME leaf, you're uselessly
> doing all the above.

Ok, I'll move that down after the leaf check.

>
>> eax = 0x80000000;
>> ecx = 0;
>> native_cpuid(&eax, &ebx, &ecx, &edx);
>> @@ -629,24 +642,39 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
>> return;
>>
>> /*
>> - * Check for the SME feature:
>> + * Check for the SME/SEV feature:
>> * CPUID Fn8000_001F[EAX] - Bit 0
>> * Secure Memory Encryption support
>> + * CPUID Fn8000_001F[EAX] - Bit 1
>
> No need to repeat the CPUID leaf here - only Bit 1:
>
> * CPUID Fn8000_001F[EAX]
> * - Bit 0: Secure Memory Encryption support
> * - Bit 1: Secure Encrypted Virtualization support
>

Ok, I'll clean that up.

Thanks,
Tom

>
>> + * Secure Encrypted Virtualization support
>> * CPUID Fn8000_001F[EBX] - Bits 5:0
>> * Pagetable bit position used to indicate encryption
>> */
>> eax = 0x8000001f;
>> ecx = 0;
>> native_cpuid(&eax, &ebx, &ecx, &edx);
>> - if (!(eax & 1))
>> + if (!(eax & feature_mask))
>> return;
>>
>> me_mask = 1UL << (ebx & 0x3f);
>>
>> - /* Check if SME is enabled */
>> - msr = __rdmsr(MSR_K8_SYSCFG);
>> - if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
>> + /* Check if memory encryption is enabled */
>> + if (feature_mask == AMD_SME_FEATURE_BIT) {
>> + /* For SME, check the SYSCFG MSR */
>> + msr = __rdmsr(MSR_K8_SYSCFG);
>> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
>> + return;
>> + } else {
>> + /* For SEV, check the SEV MSR */
>> + msr = __rdmsr(MSR_F17H_SEV);
>> + if (!(msr & MSR_F17H_SEV_ENABLED))
>> + return;
>> +
>> + /* SEV state cannot be controlled by a command line option */
>> + sme_me_mask = me_mask;
>> + sev_enabled = 1;
>> return;
>> + }
>
> Nice. Two birds with one stone is always better. :)
>
>

2017-08-25 12:54:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

Btw,

I don't see our SEV-specific chicken bit which disables SEV only.

Do we need it? If so, maybe something like

mem_encrypt=sme_only

or so.

Or is the mem_encrypt=off chicken bit enough?

What about the use case where you want SME but no encrypted guests?

A bunch of hmmm.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-28 10:51:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

On Mon, Jul 24, 2017 at 02:07:55PM -0500, Brijesh Singh wrote:
> Some KVM-specific custom MSRs shares the guest physical address with

s/shares/share/

> hypervisor.

"the hypervisor."

> When SEV is active, the shared physical address must be mapped
> with encryption attribute cleared so that both hypervsior and guest can
> access the data.
>
> Add APIs to change memory encryption attribute in early boot code.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 17 ++++++
> arch/x86/mm/mem_encrypt.c | 117 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 134 insertions(+)

...

> +static int __init early_set_memory_enc_dec(resource_size_t paddr,
> + unsigned long size, bool enc)
> +{
> + unsigned long vaddr, vaddr_end, vaddr_next;
> + unsigned long psize, pmask;
> + int split_page_size_mask;
> + pte_t *kpte;
> + int level;
> +
> + vaddr = (unsigned long)__va(paddr);
> + vaddr_next = vaddr;
> + vaddr_end = vaddr + size;
> +
> + /*
> + * We are going to change the physical page attribute from C=1 to C=0
> + * or vice versa. Flush the caches to ensure that data is written into
> + * memory with correct C-bit before we change attribute.
> + */
> + clflush_cache_range(__va(paddr), size);
> +
> + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> + kpte = lookup_address(vaddr, &level);
> + if (!kpte || pte_none(*kpte))
> + return 1;

Return before flushing TLBs? Perhaps you mean

ret = 1;
goto out;

here and out does

__flush_tlb_all();
return ret;

?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-28 11:49:31

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

Hi Boris,


On 8/28/17 5:51 AM, Borislav Petkov wrote:

[..]

> +static int __init early_set_memory_enc_dec(resource_size_t paddr,
>> + unsigned long size, bool enc)
>> +{
>> + unsigned long vaddr, vaddr_end, vaddr_next;
>> + unsigned long psize, pmask;
>> + int split_page_size_mask;
>> + pte_t *kpte;
>> + int level;
>> +
>> + vaddr = (unsigned long)__va(paddr);
>> + vaddr_next = vaddr;
>> + vaddr_end = vaddr + size;
>> +
>> + /*
>> + * We are going to change the physical page attribute from C=1 to C=0
>> + * or vice versa. Flush the caches to ensure that data is written into
>> + * memory with correct C-bit before we change attribute.
>> + */
>> + clflush_cache_range(__va(paddr), size);
>> +
>> + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
>> + kpte = lookup_address(vaddr, &level);
>> + if (!kpte || pte_none(*kpte))
>> + return 1;
> Return before flushing TLBs? Perhaps you mean
>
> ret = 1;
> goto out;
>
> here and out does
>
> __flush_tlb_all();
> return ret;

thanks, good catch. I will fix in next rev.

-Brijesh

2017-08-29 10:23:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote:
> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU

MSRs

> variable at compile time and share its physical address with hypervisor.

That sentence needs changing - the MSRs don't allocate - for them gets
allocated.

> It presents a challege when SEV is active in guest OS, when SEV is active,
> the guest memory is encrypted with guest key hence hypervisor will not
> able to modify the guest memory. When SEV is active, we need to clear the
> encryption attribute (aka C-bit) of shared physical addresses so that both
> guest and hypervisor can access the data.

This whole paragraph needs rewriting.

> To solve this problem, I have tried these three options:
>
> 1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV
> is detected clear the C-bit from the page table. But while doing so I
> found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init
> was called.
>
> 2) Since the C-bit works on PAGE_SIZE hence add some extra padding to
> 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime

"to make it PAGE_SIZE"?

I know what it means but it reads strange and needs more restraint when
rewriting it. :)

> clear the encryption attribute of the full PAGE. The downside of this -
> we need to modify structure which may break the compatibility.
>
> 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
> used to hold the compile time shared per-CPU variables. When SEV is
> detected we map this section without C-bit.
>
> This patch implements #3.

>From Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

Also, never say "This patch" in a commit message of a patch. It is
tautologically useless.

> It introduces a new DEFINE_PER_CPU_HV_SHAHRED

There's no DEFINE_PER_CPU_HV_SHAHRED. Typo.

> macro to create a compile time per-CPU variable. When SEV is detected we
> clear the C-bit from the shared per-CPU variable.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/kvm.c | 46 ++++++++++++++++++++++++++++++++++++---
> include/asm-generic/vmlinux.lds.h | 3 +++
> include/linux/percpu-defs.h | 12 ++++++++++
> 3 files changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 71c17a5..1f6fec8 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
>
> early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
>
> -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) __aligned(64);
> static int has_steal_clock = 0;
>
> /*
> @@ -303,7 +303,7 @@ static void kvm_register_steal_time(void)
> cpu, (unsigned long long) slow_virt_to_phys(st));
> }
>
> -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
>
> static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
> {
> @@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
> apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
> }
>
> +/* NOTE: function is marked as __ref because it is used by __init functions */

No need for that comment.

What should you look into is why do you need to call the early versions:

" * producing a warning (of course, no warning does not mean code is
* correct, so optimally document why the __ref is needed and why it's OK)."

And we do have the normal set_memory_decrypted() etc helpers so why
aren't we using those?

If you need to use the early ones too, then you probably need to
differentiate this in the callers by passing a "bool early", which calls
the proper flavor.

> +static int __ref kvm_map_hv_shared_decrypted(void)
> +{
> + static int once, ret;
> + int cpu;
> +
> + if (once)
> + return ret;

So this function gets called per-CPU but you need to do this ugly "once"
thing - i.e., global function called in a per-CPU context.

Why can't you do that mapping only on the current CPU and then
when that function is called on the next CPU, it will do the same thing
on that next CPU?

> + /*
> + * Iterate through all possible CPU's and clear the C-bit from
> + * percpu variables.
> + */
> + for_each_possible_cpu(cpu) {
> + struct kvm_vcpu_pv_apf_data *apf;
> + unsigned long pa;
> +
> + apf = &per_cpu(apf_reason, cpu);
> + pa = slow_virt_to_phys(apf);
> + sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE);
> + ret = early_set_memory_decrypted(pa, PAGE_SIZE);
> + if (ret)
> + break;
> + }
> +
> + once = 1;
> + return ret;
> +}
> +
> static void kvm_guest_cpu_init(void)
> {
> if (!kvm_para_available())
> return;
>
> + /*
> + * When SEV is active, map the shared percpu as unencrypted so that

... map the share percpu area unecnrypted...

> + * both guest and hypervsior can access the data.
> + */
> + if (sev_active()) {
> + if (kvm_map_hv_shared_decrypted()) {
> + printk(KERN_ERR "Failed to map percpu as unencrypted\n");
> + return;
> + }
> + }
> +
> if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
> u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index da0be9a..52854cf 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -783,6 +783,9 @@
> . = ALIGN(cacheline); \
> *(.data..percpu) \
> *(.data..percpu..shared_aligned) \
> + . = ALIGN(PAGE_SIZE); \
> + *(.data..percpu..hv_shared) \
> + . = ALIGN(PAGE_SIZE); \
> VMLINUX_SYMBOL(__per_cpu_end) = .;

Yeah, no, you can't do that. That's adding this section unconditionally
on *every* arch. You need to do some ifdeffery like it is done at the
beginning of that file and have this only on the arch which supports SEV.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-30 16:19:01

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

Hi Boris,

On 08/29/2017 05:22 AM, Borislav Petkov wrote:

[...]

> On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote:
>> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
>
> MSRs
>
>> variable at compile time and share its physical address with hypervisor.
>
> That sentence needs changing - the MSRs don't allocate - for them gets
> allocated.
>
>> It presents a challege when SEV is active in guest OS, when SEV is active,
>> the guest memory is encrypted with guest key hence hypervisor will not
>> able to modify the guest memory. When SEV is active, we need to clear the
>> encryption attribute (aka C-bit) of shared physical addresses so that both
>> guest and hypervisor can access the data.
>
> This whole paragraph needs rewriting.
>

I will improve the commit message in next rev.

[...]

>> +/* NOTE: function is marked as __ref because it is used by __init functions */
>
> No need for that comment.
>
> What should you look into is why do you need to call the early versions:
>
> " * producing a warning (of course, no warning does not mean code is
> * correct, so optimally document why the __ref is needed and why it's OK)."
>
> And we do have the normal set_memory_decrypted() etc helpers so why
> aren't we using those?
>

Since kvm_guest_init() is called early in the boot process hence we will not
able to use set_memory_decrypted() function. IIRC, if we try calling
set_memory_decrypted() early then we will hit a BUG_ON [1] -- mainly when it
tries to flush the caches.

[1] http://elixir.free-electrons.com/linux/latest/source/arch/x86/mm/pageattr.c#L167



> If you need to use the early ones too, then you probably need to
> differentiate this in the callers by passing a "bool early", which calls
> the proper flavor.
>

Sure I can rearrange code to make it more readable and use "bool early"
parameter to differentiate it.


>> +static int __ref kvm_map_hv_shared_decrypted(void)
>> +{
>> + static int once, ret;
>> + int cpu;
>> +
>> + if (once)
>> + return ret;
>
> So this function gets called per-CPU but you need to do this ugly "once"
> thing - i.e., global function called in a per-CPU context.
>
> Why can't you do that mapping only on the current CPU and then
> when that function is called on the next CPU, it will do the same thing
> on that next CPU?
>


Yes, it can be done but I remember running into issues during the CPU hot plug.
The patch uses early_set_memory_decrypted() -- which calls
kernel_physical_mapping_init() to split the large pages into smaller. IIRC, the
API did not work after the system is successfully booted. After the system is
booted we must use the set_memory_decrypted().

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


[...]

>> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>> index da0be9a..52854cf 100644
>> --- a/include/asm-generic/vmlinux.lds.h
>> +++ b/include/asm-generic/vmlinux.lds.h
>> @@ -783,6 +783,9 @@
>> . = ALIGN(cacheline); \
>> *(.data..percpu) \
>> *(.data..percpu..shared_aligned) \
>> + . = ALIGN(PAGE_SIZE); \
>> + *(.data..percpu..hv_shared) \
>> + . = ALIGN(PAGE_SIZE); \
>> VMLINUX_SYMBOL(__per_cpu_end) = .;
>
> Yeah, no, you can't do that. That's adding this section unconditionally
> on *every* arch. You need to do some ifdeffery like it is done at the
> beginning of that file and have this only on the arch which supports SEV.
>


Will do . thanks

-Brijesh

2017-08-30 17:47:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
> I was trying to avoid mixing early and no-early set_memory_decrypted() but if
> feedback is: use early_set_memory_decrypted() only if its required otherwise
> use set_memory_decrypted() then I can improve the logic in next rev. thanks

Yes, I think you should use the early versions when you're, well,
*early* :-) But get rid of that for_each_possible_cpu() and do it only
on the current CPU, as this is a per-CPU path anyway. If you need to
do it on *every* CPU and very early, then you need a separate function
which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.

Thx.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-08-31 21:31:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 17/17] X86/KVM: Clear encryption attribute when SEV is active

On Mon, Jul 24, 2017 at 02:07:57PM -0500, Brijesh Singh wrote:
> The guest physical memory area holding the struct pvclock_wall_clock and
> struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
> periodically updates the contents of the memory. When SEV is active, we
> must clear the encryption attributes from the shared memory pages so that
> both hypervisor and guest can access the data.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/entry/vdso/vma.c | 5 ++--
> arch/x86/kernel/kvmclock.c | 64 +++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 58 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 726355c..ff50251 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
> struct pvclock_vsyscall_time_info *pvti =
> pvclock_pvti_cpu0_va();
> if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
> - ret = vm_insert_pfn(
> + ret = vm_insert_pfn_prot(
> vma,
> vmf->address,
> - __pa(pvti) >> PAGE_SHIFT);
> + __pa(pvti) >> PAGE_SHIFT,
> + pgprot_decrypted(vma->vm_page_prot));
> }
> } else if (sym_offset == image->sym_hvclock_page) {
> struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();

Yuck, that vvar_fault() function is one unreadable mess.

> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index d889676..f3a8101 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,6 +27,7 @@
> #include <linux/sched.h>
> #include <linux/sched/clock.h>
>
> +#include <asm/mem_encrypt.h>
> #include <asm/x86_init.h>
> #include <asm/reboot.h>
> #include <asm/kvmclock.h>
> @@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
>
> /* The hypervisor will put information about time periodically here */
> static struct pvclock_vsyscall_time_info *hv_clock;
> -static struct pvclock_wall_clock wall_clock;
> +static struct pvclock_wall_clock *wall_clock;
>
> struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> {
> @@ -64,15 +65,18 @@ static void kvm_get_wallclock(struct timespec *now)
> int low, high;
> int cpu;
>
> - low = (int)__pa_symbol(&wall_clock);
> - high = ((u64)__pa_symbol(&wall_clock) >> 32);
> + if (!wall_clock)
> + return;

Hmm, so if you return here, @now will remain unchanged so how is the
caller to know that ->get_wallclock() failed?

Maybe a WARN_ON_ONCE() at least...?

Dunno, what's the policy in kvm if the kvmclock init fails?

Paolo? Radim?

Because it does say:

printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);

too early. We can error out later and users will still think it is using
kvmclock ...

Hmmm.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-09-01 22:52:24

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

Hi Boris,

On 08/30/2017 12:46 PM, Borislav Petkov wrote:
> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>> I was trying to avoid mixing early and no-early set_memory_decrypted() but if
>> feedback is: use early_set_memory_decrypted() only if its required otherwise
>> use set_memory_decrypted() then I can improve the logic in next rev. thanks
>
> Yes, I think you should use the early versions when you're, well,
> *early* :-) But get rid of that for_each_possible_cpu() and do it only
> on the current CPU, as this is a per-CPU path anyway. If you need to
> do it on *every* CPU and very early, then you need a separate function
> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>

I am trying to implement your feedback and now remember why I choose to
use early_set_memory_decrypted() and for_each_possible_cpu loop. These
percpu variables are static. Hence before clearing the C-bit we must
perform the in-place decryption so that original assignment is preserved
after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
-- which can be used to perform the in-place decryption but we do not have
similar routine for non-early cases. In order to address your feedback,
we have to add similar functions. So far, we have not seen the need for
having such functions except this cases. The approach we have right now
works just fine and not sure if its worth adding new functions.

Thoughts ?

[1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of memory


-Brijesh

2017-09-02 03:22:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh <[email protected]> wrote:
> Hi Boris,
>
> On 08/30/2017 12:46 PM, Borislav Petkov wrote:
>>
>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>>>
>>> I was trying to avoid mixing early and no-early set_memory_decrypted()
>>> but if
>>> feedback is: use early_set_memory_decrypted() only if its required
>>> otherwise
>>> use set_memory_decrypted() then I can improve the logic in next rev.
>>> thanks
>>
>>
>> Yes, I think you should use the early versions when you're, well,
>> *early* :-) But get rid of that for_each_possible_cpu() and do it only
>> on the current CPU, as this is a per-CPU path anyway. If you need to
>> do it on *every* CPU and very early, then you need a separate function
>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>>
>
> I am trying to implement your feedback and now remember why I choose to
> use early_set_memory_decrypted() and for_each_possible_cpu loop. These
> percpu variables are static. Hence before clearing the C-bit we must
> perform the in-place decryption so that original assignment is preserved
> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
> -- which can be used to perform the in-place decryption but we do not have
> similar routine for non-early cases. In order to address your feedback,
> we have to add similar functions. So far, we have not seen the need for
> having such functions except this cases. The approach we have right now
> works just fine and not sure if its worth adding new functions.
>
> Thoughts ?
>
> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of
> memory

Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED? ISTM the "HV
shared" bit is incidental.

2017-09-03 02:35:04

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables



On 9/1/17 10:21 PM, Andy Lutomirski wrote:
> On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh <[email protected]> wrote:
>> Hi Boris,
>>
>> On 08/30/2017 12:46 PM, Borislav Petkov wrote:
>>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>>>> I was trying to avoid mixing early and no-early set_memory_decrypted()
>>>> but if
>>>> feedback is: use early_set_memory_decrypted() only if its required
>>>> otherwise
>>>> use set_memory_decrypted() then I can improve the logic in next rev.
>>>> thanks
>>>
>>> Yes, I think you should use the early versions when you're, well,
>>> *early* :-) But get rid of that for_each_possible_cpu() and do it only
>>> on the current CPU, as this is a per-CPU path anyway. If you need to
>>> do it on *every* CPU and very early, then you need a separate function
>>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>>>
>> I am trying to implement your feedback and now remember why I choose to
>> use early_set_memory_decrypted() and for_each_possible_cpu loop. These
>> percpu variables are static. Hence before clearing the C-bit we must
>> perform the in-place decryption so that original assignment is preserved
>> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
>> -- which can be used to perform the in-place decryption but we do not have
>> similar routine for non-early cases. In order to address your feedback,
>> we have to add similar functions. So far, we have not seen the need for
>> having such functions except this cases. The approach we have right now
>> works just fine and not sure if its worth adding new functions.
>>
>> Thoughts ?
>>
>> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of
>> memory
> Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED? ISTM the "HV
> shared" bit is incidental.

Thanks for the suggestion, we could call it DEFINE_PER_CPU_UNENCRYPTED.
I will use it in next rev.

-Brijesh

2017-09-04 17:05:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote:
> So far, we have not seen the need for having such functions except
> this cases. The approach we have right now works just fine and not
> sure if its worth adding new functions.

Then put the call to kvm_map_hv_shared_decrypted() into
kvm_smp_prepare_boot_cpu() to denote that you're executing this whole
stuff only once during guest init.

Now you're doing additional jumping-through-hoops with that once static
var just so you can force something which needs to execute only once but
gets called in a per-CPU path.

See what I mean?

> Thoughts ?
>
> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of memory

Add

[core]
abbrev = 12

to the core section of your .gitconfig.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-09-04 17:47:32

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables


On 9/4/17 12:05 PM, Borislav Petkov wrote:
> On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote:
>> So far, we have not seen the need for having such functions except
>> this cases. The approach we have right now works just fine and not
>> sure if its worth adding new functions.
> Then put the call to kvm_map_hv_shared_decrypted() into
> kvm_smp_prepare_boot_cpu() to denote that you're executing this whole
> stuff only once during guest init.
>
> Now you're doing additional jumping-through-hoops with that once static
> var just so you can force something which needs to execute only once but
> gets called in a per-CPU path.
>
> See what I mean?

Yes, I see your point. I will address this issue in next rev.


-Brijesh

2017-09-15 12:24:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

On Tue, Aug 22, 2017 at 06:52:48PM +0200, Borislav Petkov wrote:
> As always, the devil is in the detail.

Ok, actually we can make this much simpler by using a static key. A
conceptual patch below - I only need to fix that crazy include hell I'm
stepping into with this.

In any case, we were talking about having a static branch already so
this fits the whole strategy.

---
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index d174b1c4a99e..e45369158632 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -45,6 +45,8 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
unsigned int sev_enabled __section(.data) = 0;
EXPORT_SYMBOL_GPL(sev_enabled);

+DEFINE_STATIC_KEY_FALSE(__sev);
+
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);

@@ -790,6 +792,7 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
/* SEV state cannot be controlled by a command line option */
sme_me_mask = me_mask;
sev_enabled = 1;
+ static_branch_enable(&__sev);
return;
}

diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index ea0831a8dbe2..f3ab965a3d6a 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -13,6 +13,8 @@
#ifndef __MEM_ENCRYPT_H__
#define __MEM_ENCRYPT_H__

+#include <linux/jump_label.h>
+
#ifndef __ASSEMBLY__

#ifdef CONFIG_ARCH_HAS_MEM_ENCRYPT
@@ -26,6 +28,8 @@

#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */

+extern struct static_key_false __sev;
+
static inline bool sme_active(void)
{
return (sme_me_mask && !sev_enabled);
@@ -33,7 +37,7 @@ static inline bool sme_active(void)

static inline bool sev_active(void)
{
- return (sme_me_mask && sev_enabled);
+ return static_branch_unlikely(&__sev);
}

static inline unsigned long sme_get_me_mask(void)

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-09-15 14:13:11

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active



On 09/15/2017 07:24 AM, Borislav Petkov wrote:
> On Tue, Aug 22, 2017 at 06:52:48PM +0200, Borislav Petkov wrote:
>> As always, the devil is in the detail.
>
> Ok, actually we can make this much simpler by using a static key. A
> conceptual patch below - I only need to fix that crazy include hell I'm
> stepping into with this.
>
> In any case, we were talking about having a static branch already so
> this fits the whole strategy.
>

thanks for the suggestion Boris, it will make patch much simpler.
I will try this out.

-Brijesh

2017-09-15 14:40:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

On Fri, Sep 15, 2017 at 09:13:00AM -0500, Brijesh Singh wrote:
> thanks for the suggestion Boris, it will make patch much simpler.
> I will try this out.

It won't build - this was supposed to show the general idea.

I need to figure out the include hell first.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-09-15 14:49:09

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active



On 09/15/2017 09:40 AM, Borislav Petkov wrote:
> I need to figure out the include hell first.

I am working with slightly newer patch sets -- in that patch Tom has
moved the sev_active() definition in arch/x86/mm/mem_encrypt.c and I
have no issue using your recommended (since I no longer need the include
path changes).

But in my quick run I did found a runtime issue, it seems enabling the static
key in sme_enable is too early. Guest reboots as soon as it tries to enable
the key.

I see the similar issue with non SEV guest with my simple patch below.
Guest will reboot as soon as it tries to enable the key.

--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,8 @@ pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);

#define __head __section(.head.text)

+DEFINE_STATIC_KEY_FALSE(__testme);
+
static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
{
return ptr - (void *)_text + (void *)physaddr;
@@ -71,6 +73,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
if (load_delta & ~PMD_PAGE_MASK)
for (;;);

+ static_branch_enable(&__testme);
+
/* Activate Secure Memory Encryption (SME) if supported and enabled */
sme_enable(bp);

2017-09-15 16:23:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

On Fri, Sep 15, 2017 at 09:48:53AM -0500, Brijesh Singh wrote:
> I see the similar issue with non SEV guest with my simple patch below.
> Guest will reboot as soon as it tries to enable the key.

Can't do it there as the pagetable is not setup yet and you're probably
getting a #PF on any of the derefs down the static_key_enable() path.

I guess one possible place to enable the static key would be in
mem_encrypt_init() where everything should be set up already.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-09-15 16:28:05

by Brijesh Singh

[permalink] [raw]
Subject: Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active



On 09/15/2017 11:22 AM, Borislav Petkov wrote:
> mem_encrypt_init() where everything should be set up already.


Yep, its safe to derefs the static key in mem_encrypt_init(). I've
tried the approach and it seems to be work fine. I will include the
required changes in next rev. thanks