The following commit
"
x86/kvmclock: Remove memblock dependency
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=368a540e0232ad446931f5a4e8a5e06f69f21343
"
introduced SEV guest regression.
The guest physical address holding the wall_clock and hv_clock_boot
are shared with the hypervisor must be mapped with C=0 when SEV
is active. To clear the C-bit we use kernel_physical_mapping_init() to
split the large pages. The above commit moved the kvmclock initialization
very early and kernel_physical_mapping_init() fails to allocate memory
while spliting the large page.
To solve it, we add a special .data..decrypted section, this section can be
used to hold the shared variables. Early boot code maps this section with
C=0. The section is pmd aligned and sized to avoid the need to split the pages.
Caller can use __decrypted attribute to add the variables in .data..decrypted
section.
Cc: Tom Lendacky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
NOTE: Since there was some design changes based on Thomas G suggestion hence I
dropped Acked and R-b from Tom, Paolo and Boris.
Changes since v7:
- As per Thomas Gleixner suggestion move the decrypted data in .bss..decrypted section
- remove the second auxiliary array from SEV case
- allocate memory for pvclock data pointer early and map with C=0 for SEV case
Changes since v6:
- improve commit messages
- rename free_decrypted_mem -> mem_encrypt_free_decrypted_mem
Changes since v5:
- rename hvclock_boot_dec -> hvclock_boot_aux.
- rename section from .data..decrypted_aux -> .data..decrypted.aux.
- use NR_CPUS instead of random number elements in hv_clock_aux variable.
Changes since v4:
- define few static pages in .data..decrypted which can be used
for cpus > HVC_BOOT_ARRAY_SIZE when SEV is active.
Changes since v3:
- commit message improvements (based on Sean's feedback)
Changes since v2:
- commit message and code comment improvements (based on Boris feedback)
- move sme_populate_pgd fixes in new patch.
- drop stable Cc - will submit to stable after patch is upstreamed.
Changes since v1:
- move the logic to re-arrange mapping in new patch
- move the definition of __start_data_* in mem_encrypt.h
- map the workarea buffer as encrypted when SEV is enabled
- enhance the sme_populate_pgd to update the pte/pmd flags when mapping exist
Brijesh Singh (2):
x86/mm: add .bss..decrypted section to hold shared variables
x86/kvm: use __bss_decrypted attribute in shared variables
arch/x86/include/asm/mem_encrypt.h | 7 +++++++
arch/x86/kernel/head64.c | 16 +++++++++++++++
arch/x86/kernel/kvmclock.c | 42 +++++++++++++++++++++++++++++++++++---
arch/x86/kernel/vmlinux.lds.S | 19 +++++++++++++++++
arch/x86/mm/init.c | 4 ++++
arch/x86/mm/mem_encrypt.c | 10 +++++++++
6 files changed, 95 insertions(+), 3 deletions(-)
--
2.7.4
kvmclock defines few static variables which are shared with the
hypervisor during the kvmclock initialization.
When SEV is active, memory is encrypted with a guest-specific key, and
if the guest OS wants to share the memory region with the hypervisor
then it must clear the C-bit before sharing it.
Currently, we use kernel_physical_mapping_init() to split large pages
before clearing the C-bit on shared pages. But it fails when called from
the kvmclock initialization (mainly because the memblock allocator is
not ready that early during boot).
Add a __bss_decrypted section attribute which can be used when defining
such shared variable. The so-defined variables will be placed in the
.bss..decrypted section. This section will be mapped with C=0 early
during boot.
The .bss..decrypted section has a big chunk of memory that may be unused
when memory encryption is not active, free it when memory encryption is
not active.
Signed-off-by: Brijesh Singh <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 7 +++++++
arch/x86/kernel/head64.c | 16 ++++++++++++++++
arch/x86/kernel/vmlinux.lds.S | 19 +++++++++++++++++++
arch/x86/mm/init.c | 4 ++++
arch/x86/mm/mem_encrypt.c | 10 ++++++++++
5 files changed, 56 insertions(+)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index c064383..616f8e6 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -48,10 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
/* Architecture __weak replacement functions */
void __init mem_encrypt_init(void);
+void __init mem_encrypt_free_decrypted_mem(void);
bool sme_active(void);
bool sev_active(void);
+#define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
+
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0ULL
@@ -77,6 +80,8 @@ early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0;
static inline int __init
early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
+#define __bss_decrypted
+
#endif /* CONFIG_AMD_MEM_ENCRYPT */
/*
@@ -88,6 +93,8 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
#define __sme_pa(x) (__pa(x) | sme_me_mask)
#define __sme_pa_nodebug(x) (__pa_nodebug(x) | sme_me_mask)
+extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
+
#endif /* __ASSEMBLY__ */
#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8047379..c16af27 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long physaddr)
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
+ unsigned long vaddr, vaddr_end;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
@@ -235,6 +236,21 @@ unsigned long __head __startup_64(unsigned long physaddr,
sme_encrypt_kernel(bp);
/*
+ * Clear the memory encryption mask from the .bss..decrypted section.
+ * The bss section will be memset to zero later in the initialization so
+ * there is no need to zero it after changing the memory encryption
+ * attribute.
+ */
+ if (mem_encrypt_active()) {
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__end_bss_decrypted;
+ for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ i = pmd_index(vaddr);
+ pmd[i] -= sme_get_me_mask();
+ }
+ }
+
+ /*
* Return the SME encryption mask (if SME is active) to be used as a
* modifier for the initial pgdir entry programmed into CR3.
*/
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9c77d2d..0d618ee 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -65,6 +65,23 @@ jiffies_64 = jiffies;
#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
+/*
+ * This section contains data which will be mapped as decrypted. Memory
+ * encryption operates on a page basis. Make this section PMD-aligned
+ * to avoid splitting the pages while mapping the section early.
+ *
+ * Note: We use a separate section so that only this section gets
+ * decrypted to avoid exposing more than we wish.
+ */
+#define BSS_DECRYPTED \
+ . = ALIGN(PMD_SIZE); \
+ __start_bss_decrypted = .; \
+ *(.bss..decrypted); \
+ . = ALIGN(PAGE_SIZE); \
+ __start_bss_decrypted_unused = .; \
+ . = ALIGN(PMD_SIZE); \
+ __end_bss_decrypted = .; \
+
#else
#define X86_ALIGN_RODATA_BEGIN
@@ -74,6 +91,7 @@ jiffies_64 = jiffies;
#define ALIGN_ENTRY_TEXT_BEGIN
#define ALIGN_ENTRY_TEXT_END
+#define BSS_DECRYPTED
#endif
@@ -345,6 +363,7 @@ SECTIONS
__bss_start = .;
*(.bss..page_aligned)
*(.bss)
+ BSS_DECRYPTED
. = ALIGN(PAGE_SIZE);
__bss_stop = .;
}
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 7a8fc26..faca978 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -815,10 +815,14 @@ void free_kernel_image_pages(void *begin, void *end)
set_memory_np_noalias(begin_ul, len_pages);
}
+void __weak mem_encrypt_free_decrypted_mem(void) { }
+
void __ref free_initmem(void)
{
e820__reallocate_tables();
+ mem_encrypt_free_decrypted_mem();
+
free_kernel_image_pages(&__init_begin, &__init_end);
}
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b2de398..718acdf 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -348,6 +348,16 @@ bool sev_active(void)
EXPORT_SYMBOL(sev_active);
/* Architecture __weak replacement functions */
+void __init mem_encrypt_free_decrypted_mem(void)
+{
+ if (mem_encrypt_active())
+ return;
+
+ free_init_pages("unused decrypted",
+ (unsigned long)__start_bss_decrypted_unused,
+ (unsigned long)__end_bss_decrypted);
+}
+
void __init mem_encrypt_init(void)
{
if (!sme_me_mask)
--
2.7.4
The recent removal of the memblock dependency from kvmclock caused a SEV
guest regression because the wall_clock and hv_clock_boot variables are
no longer mapped decrypted when SEV is active.
Use the __bss_decrypted attribute to put the static wall_clock and
hv_clock_boot in the .bss..decrypted section so that they are mapped
decrypted during boot.
In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer
assigns either an element of the static array or dynamically allocated
memory for the pvclock data pointer. The static array are now mapped
decrypted but the dynamically allocated memory is not mapped decrypted.
However, when SEV is active this memory range must be mapped decrypted.
Add a function which is called after the page allocator is up, and
allocate memory for the pvclock data pointers for the all possible cpus.
Map this memory range as decrypted when SEV is active.
Signed-off-by: Brijesh Singh <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency")
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
---
arch/x86/kernel/kvmclock.c | 42 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index a36b93a..84f29f1 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -28,6 +28,7 @@
#include <linux/sched/clock.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/set_memory.h>
#include <asm/hypervisor.h>
#include <asm/mem_encrypt.h>
@@ -61,9 +62,10 @@ early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
(PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
- hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
-static struct pvclock_wall_clock wall_clock;
+ hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static struct pvclock_wall_clock wall_clock __bss_decrypted;
static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
+static struct pvclock_vsyscall_time_info *hvclock_mem;
static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
{
@@ -236,6 +238,35 @@ static void kvm_shutdown(void)
native_machine_shutdown();
}
+static void __init kvmclock_init_mem(void)
+{
+ unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
+ unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
+ struct page *p;
+ int r;
+
+ p = alloc_pages(GFP_KERNEL, order);
+ if (p) {
+ hvclock_mem = page_address(p);
+
+ /*
+ * hvclock is shared between the guest and the hypervisor, must
+ * be mapped decrypted.
+ */
+ if (sev_active()) {
+ r = set_memory_decrypted((unsigned long) hvclock_mem,
+ 1UL << order);
+ if (r) {
+ __free_pages(p, order);
+ hvclock_mem = NULL;
+ return;
+ }
+ }
+
+ memset(hvclock_mem, 0, PAGE_SIZE << order);
+ }
+}
+
static int __init kvm_setup_vsyscall_timeinfo(void)
{
#ifdef CONFIG_X86_64
@@ -250,6 +281,9 @@ static int __init kvm_setup_vsyscall_timeinfo(void)
kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
#endif
+
+ kvmclock_init_mem();
+
return 0;
}
early_initcall(kvm_setup_vsyscall_timeinfo);
@@ -269,8 +303,10 @@ static int kvmclock_setup_percpu(unsigned int cpu)
/* Use the static page for the first CPUs, allocate otherwise */
if (cpu < HVC_BOOT_ARRAY_SIZE)
p = &hv_clock_boot[cpu];
+ else if (hvclock_mem)
+ p = hvclock_mem + cpu - HVC_BOOT_ARRAY_SIZE;
else
- p = kzalloc(sizeof(*p), GFP_KERNEL);
+ return -ENOMEM;
per_cpu(hv_clock_per_cpu, cpu) = p;
return p ? 0 : -ENOMEM;
--
2.7.4
On Thu, 13 Sep 2018, Brijesh Singh wrote:
>
> +void __weak mem_encrypt_free_decrypted_mem(void) { }
> +
> void __ref free_initmem(void)
> {
> e820__reallocate_tables();
>
> + mem_encrypt_free_decrypted_mem();
> +
> free_kernel_image_pages(&__init_begin, &__init_end);
> }
>
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index b2de398..718acdf 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -348,6 +348,16 @@ bool sev_active(void)
> EXPORT_SYMBOL(sev_active);
>
> /* Architecture __weak replacement functions */
> +void __init mem_encrypt_free_decrypted_mem(void)
> +{
> + if (mem_encrypt_active())
> + return;
Why?
> +
> + free_init_pages("unused decrypted",
> + (unsigned long)__start_bss_decrypted_unused,
> + (unsigned long)__end_bss_decrypted);
Everything _AFTER_ __start_bss_decrypted_unused _IS_ unused and can be
freed. No conditional nothing. It's unused in any case.
Thanks,
tglx
On Thu, 13 Sep 2018, Brijesh Singh wrote:
>
> +static void __init kvmclock_init_mem(void)
> +{
> + unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
> + unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
> + struct page *p;
> + int r;
> +
> + p = alloc_pages(GFP_KERNEL, order);
> + if (p) {
If you make this:
if (!p) {
pr_warn();
return;
}
then you spare one indentation level and give useful information.
> + hvclock_mem = page_address(p);
> +
> + /*
> + * hvclock is shared between the guest and the hypervisor, must
> + * be mapped decrypted.
> + */
> + if (sev_active()) {
> + r = set_memory_decrypted((unsigned long) hvclock_mem,
> + 1UL << order);
> + if (r) {
> + __free_pages(p, order);
> + hvclock_mem = NULL;
This wants a pr_warn() as well, please.
> + return;
> + }
> + }
> +
> + memset(hvclock_mem, 0, PAGE_SIZE << order);
> + }
> +}
Other than that, this looks really reasonable and way more palatable in the
rc stage.
Thanks,
tglx
Hi Brijesh,
I love your patch! Perhaps something to improve:
[auto build test WARNING on tip/x86/core]
[also build test WARNING on v4.19-rc3 next-20180913]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Brijesh-Singh/x86-mm-add-bss-decrypted-section-to-hold-shared-variables/20180914-080110
config: x86_64-randconfig-x019-201836 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from arch/x86/include/asm/cpumask.h:5:0,
from arch/x86/include/asm/msr.h:11,
from arch/x86/include/asm/processor.h:21,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/clocksource.h:13,
from arch/x86/kernel/kvmclock.c:19:
arch/x86/kernel/kvmclock.c: In function 'kvmclock_init_mem':
>> include/linux/cpumask.h:109:29: warning: large integer implicitly truncated to unsigned type [-Woverflow]
#define num_possible_cpus() 1U
^
>> arch/x86/kernel/kvmclock.c:243:23: note: in expansion of macro 'num_possible_cpus'
unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
^~~~~~~~~~~~~~~~~
--
In file included from arch/x86/include/asm/cpumask.h:5:0,
from arch/x86/include/asm/msr.h:11,
from arch/x86/include/asm/processor.h:21,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/clocksource.h:13,
from arch/x86//kernel/kvmclock.c:19:
arch/x86//kernel/kvmclock.c: In function 'kvmclock_init_mem':
>> include/linux/cpumask.h:109:29: warning: large integer implicitly truncated to unsigned type [-Woverflow]
#define num_possible_cpus() 1U
^
arch/x86//kernel/kvmclock.c:243:23: note: in expansion of macro 'num_possible_cpus'
unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
^~~~~~~~~~~~~~~~~
vim +/num_possible_cpus +243 arch/x86/kernel/kvmclock.c
240
241 static void __init kvmclock_init_mem(void)
242 {
> 243 unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
244 unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
245 struct page *p;
246 int r;
247
248 p = alloc_pages(GFP_KERNEL, order);
249 if (p) {
250 hvclock_mem = page_address(p);
251
252 /*
253 * hvclock is shared between the guest and the hypervisor, must
254 * be mapped decrypted.
255 */
256 if (sev_active()) {
257 r = set_memory_decrypted((unsigned long) hvclock_mem,
258 1UL << order);
259 if (r) {
260 __free_pages(p, order);
261 hvclock_mem = NULL;
262 return;
263 }
264 }
265
266 memset(hvclock_mem, 0, PAGE_SIZE << order);
267 }
268 }
269
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 9/13/18 6:24 PM, Thomas Gleixner wrote:
> On Thu, 13 Sep 2018, Brijesh Singh wrote:
>>
>> +void __weak mem_encrypt_free_decrypted_mem(void) { }
>> +
>> void __ref free_initmem(void)
>> {
>> e820__reallocate_tables();
>>
>> + mem_encrypt_free_decrypted_mem();
>> +
>> free_kernel_image_pages(&__init_begin, &__init_end);
>> }
>>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index b2de398..718acdf 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -348,6 +348,16 @@ bool sev_active(void)
>> EXPORT_SYMBOL(sev_active);
>>
>> /* Architecture __weak replacement functions */
>> +void __init mem_encrypt_free_decrypted_mem(void)
>> +{
>> + if (mem_encrypt_active())
>> + return;
> Why?
Hmm, I think we should be able to free the unused memory when memory
encryption is active.
In that case, since memory range was mapped C=0 so we should do
set_memory_encryptyed()
to change from C=0 -> C=1 before freeing it.
>> +
>> + free_init_pages("unused decrypted",
>> + (unsigned long)__start_bss_decrypted_unused,
>> + (unsigned long)__end_bss_decrypted);
> Everything _AFTER_ __start_bss_decrypted_unused _IS_ unused and can be
> freed. No conditional nothing. It's unused in any case.
>
> Thanks,
>
> tglx
>
>
On Thu, Sep 13, 2018 at 04:51:10PM -0500, Brijesh Singh wrote:
> kvmclock defines few static variables which are shared with the
> hypervisor during the kvmclock initialization.
...
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 8047379..c16af27 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long physaddr)
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> {
> + unsigned long vaddr, vaddr_end;
> unsigned long load_delta, *p;
> unsigned long pgtable_flags;
> pgdval_t *pgd;
> @@ -235,6 +236,21 @@ unsigned long __head __startup_64(unsigned long physaddr,
> sme_encrypt_kernel(bp);
>
> /*
> + * Clear the memory encryption mask from the .bss..decrypted section.
> + * The bss section will be memset to zero later in the initialization so
> + * there is no need to zero it after changing the memory encryption
> + * attribute.
> + */
> + if (mem_encrypt_active()) {
> + vaddr = (unsigned long)__start_bss_decrypted;
> + vaddr_end = (unsigned long)__end_bss_decrypted;
> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> + i = pmd_index(vaddr);
> + pmd[i] -= sme_get_me_mask();
> + }
> + }
Why isn't this chunk at the end of sme_encrypt_kernel() instead of here?
> + /*
> * Return the SME encryption mask (if SME is active) to be used as a
> * modifier for the initial pgdir entry programmed into CR3.
> */
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 9c77d2d..0d618ee 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -65,6 +65,23 @@ jiffies_64 = jiffies;
> #define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
> #define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
>
> +/*
> + * This section contains data which will be mapped as decrypted. Memory
> + * encryption operates on a page basis. Make this section PMD-aligned
> + * to avoid splitting the pages while mapping the section early.
> + *
> + * Note: We use a separate section so that only this section gets
> + * decrypted to avoid exposing more than we wish.
> + */
> +#define BSS_DECRYPTED \
> + . = ALIGN(PMD_SIZE); \
> + __start_bss_decrypted = .; \
> + *(.bss..decrypted); \
> + . = ALIGN(PAGE_SIZE); \
> + __start_bss_decrypted_unused = .; \
> + . = ALIGN(PMD_SIZE); \
> + __end_bss_decrypted = .; \
> +
> #else
>
> #define X86_ALIGN_RODATA_BEGIN
> @@ -74,6 +91,7 @@ jiffies_64 = jiffies;
>
> #define ALIGN_ENTRY_TEXT_BEGIN
> #define ALIGN_ENTRY_TEXT_END
> +#define BSS_DECRYPTED
>
> #endif
>
> @@ -345,6 +363,7 @@ SECTIONS
> __bss_start = .;
> *(.bss..page_aligned)
> *(.bss)
> + BSS_DECRYPTED
> . = ALIGN(PAGE_SIZE);
> __bss_stop = .;
Putting it in the BSS would need a bit of care in the future as it poses
a certain ordering on the calls in x86_64_start_kernel() (not that there
isn't any now :)):
You have:
x86_64_start_kernel:
...
clear_bss();
...
x86_64_start_reservations() -> ... -> setup_arch() -> kvmclock_init()
so it would be prudent to put at least a comment somewhere, say, over
the definition of BSS_DECRYPTED or so, that attention should be paid to
the clear_bss() call early.
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On 9/14/18 2:10 AM, Borislav Petkov wrote:
> On Thu, Sep 13, 2018 at 04:51:10PM -0500, Brijesh Singh wrote:
>> kvmclock defines few static variables which are shared with the
>> hypervisor during the kvmclock initialization.
> ...
>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index 8047379..c16af27 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long physaddr)
>> unsigned long __head __startup_64(unsigned long physaddr,
>> struct boot_params *bp)
>> {
>> + unsigned long vaddr, vaddr_end;
>> unsigned long load_delta, *p;
>> unsigned long pgtable_flags;
>> pgdval_t *pgd;
>> @@ -235,6 +236,21 @@ unsigned long __head __startup_64(unsigned long physaddr,
>> sme_encrypt_kernel(bp);
>>
>> /*
>> + * Clear the memory encryption mask from the .bss..decrypted section.
>> + * The bss section will be memset to zero later in the initialization so
>> + * there is no need to zero it after changing the memory encryption
>> + * attribute.
>> + */
>> + if (mem_encrypt_active()) {
>> + vaddr = (unsigned long)__start_bss_decrypted;
>> + vaddr_end = (unsigned long)__end_bss_decrypted;
>> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>> + i = pmd_index(vaddr);
>> + pmd[i] -= sme_get_me_mask();
>> + }
>> + }
> Why isn't this chunk at the end of sme_encrypt_kernel() instead of here?
The sme_encrypt_kernel() does not have access to pmd (after pointer
fixup is applied). You can extend the sme_encrypt_kernel() to pass an
additional arguments but then we start getting in include hell. The pmd
is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
asm/mem_encrypt.h need to include the header file which defines
"pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
compilation errors. I didn't spend much time on it. IMO, we really don't
need to go in this path unless we see some value from doing this.
>> + /*
>> * Return the SME encryption mask (if SME is active) to be used as a
>> * modifier for the initial pgdir entry programmed into CR3.
>> */
>> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>> index 9c77d2d..0d618ee 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -65,6 +65,23 @@ jiffies_64 = jiffies;
>> #define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
>> #define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
>>
>> +/*
>> + * This section contains data which will be mapped as decrypted. Memory
>> + * encryption operates on a page basis. Make this section PMD-aligned
>> + * to avoid splitting the pages while mapping the section early.
>> + *
>> + * Note: We use a separate section so that only this section gets
>> + * decrypted to avoid exposing more than we wish.
>> + */
>> +#define BSS_DECRYPTED \
>> + . = ALIGN(PMD_SIZE); \
>> + __start_bss_decrypted = .; \
>> + *(.bss..decrypted); \
>> + . = ALIGN(PAGE_SIZE); \
>> + __start_bss_decrypted_unused = .; \
>> + . = ALIGN(PMD_SIZE); \
>> + __end_bss_decrypted = .; \
>> +
>> #else
>>
>> #define X86_ALIGN_RODATA_BEGIN
>> @@ -74,6 +91,7 @@ jiffies_64 = jiffies;
>>
>> #define ALIGN_ENTRY_TEXT_BEGIN
>> #define ALIGN_ENTRY_TEXT_END
>> +#define BSS_DECRYPTED
>>
>> #endif
>>
>> @@ -345,6 +363,7 @@ SECTIONS
>> __bss_start = .;
>> *(.bss..page_aligned)
>> *(.bss)
>> + BSS_DECRYPTED
>> . = ALIGN(PAGE_SIZE);
>> __bss_stop = .;
> Putting it in the BSS would need a bit of care in the future as it poses
> a certain ordering on the calls in x86_64_start_kernel() (not that there
> isn't any now :)):
Hmm, I thought since we are explicitly saying that attribute will add
the variables in the bss section so caller should not be using the
variable before bss is ready. If you are concerns that clear_bss() may
get called after the variable is used then we could memset(0) when while
clearing the C-bit. I did try to add some comments when we are clearing
the C-bit. I can include some verbatim in BSS_DECRYPTED section as well.
> You have:
>
> x86_64_start_kernel:
>
> ...
>
> clear_bss();
>
> ...
>
> x86_64_start_reservations() -> ... -> setup_arch() -> kvmclock_init()
>
> so it would be prudent to put at least a comment somewhere, say, over
> the definition of BSS_DECRYPTED or so, that attention should be paid to
> the clear_bss() call early.
On Fri, 14 Sep 2018, Brijesh Singh wrote:
> On 9/14/18 2:10 AM, Borislav Petkov wrote:
> >> /*
> >> + * Clear the memory encryption mask from the .bss..decrypted section.
> >> + * The bss section will be memset to zero later in the initialization so
> >> + * there is no need to zero it after changing the memory encryption
> >> + * attribute.
> >> + */
> >> + if (mem_encrypt_active()) {
> >> + vaddr = (unsigned long)__start_bss_decrypted;
> >> + vaddr_end = (unsigned long)__end_bss_decrypted;
> >> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> >> + i = pmd_index(vaddr);
> >> + pmd[i] -= sme_get_me_mask();
> >> + }
> >> + }
> > Why isn't this chunk at the end of sme_encrypt_kernel() instead of here?
>
> The sme_encrypt_kernel() does not have access to pmd (after pointer
> fixup is applied). You can extend the sme_encrypt_kernel() to pass an
> additional arguments but then we start getting in include hell. The pmd
> is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
> asm/mem_encrypt.h need to include the header file which defines
> "pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
> compilation errors. I didn't spend much time on it. IMO, we really don't
> need to go in this path unless we see some value from doing this.
Keep it here then.
> >> @@ -345,6 +363,7 @@ SECTIONS
> >> __bss_start = .;
> >> *(.bss..page_aligned)
> >> *(.bss)
> >> + BSS_DECRYPTED
> >> . = ALIGN(PAGE_SIZE);
> >> __bss_stop = .;
> > Putting it in the BSS would need a bit of care in the future as it poses
> > a certain ordering on the calls in x86_64_start_kernel() (not that there
> > isn't any now :)):
>
>
> Hmm, I thought since we are explicitly saying that attribute will add
> the variables in the bss section so caller should not be using the
> variable before bss is ready. If you are concerns that clear_bss() may
> get called after the variable is used then we could memset(0) when while
> clearing the C-bit. I did try to add some comments when we are clearing
> the C-bit. I can include some verbatim in BSS_DECRYPTED section as well.
Nah. It's clearly marked __bss_decrypted and anything which stores data in
a BSS location is busted whether thats .bss or .bss..decrypted. If someone
is not aware about the BSS constraints in general, i.e. don't use it before
clear_bss(), then I really can't help it.
Thanks,
tglx
On Fri, Sep 14, 2018 at 02:17:05PM +0200, Thomas Gleixner wrote:
> > The sme_encrypt_kernel() does not have access to pmd (after pointer
> > fixup is applied). You can extend the sme_encrypt_kernel() to pass an
> > additional arguments but then we start getting in include hell. The pmd
> > is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
> > asm/mem_encrypt.h need to include the header file which defines
> > "pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
> > compilation errors. I didn't spend much time on it. IMO, we really don't
> > need to go in this path unless we see some value from doing this.
>
> Keep it here then.
*For what is worth*, a simple forward declaration works. I've taken the
64-bit forward declaration of pmdval_t as SME is 64-bit only anyway.
The below diff ontop passes the mandatory all*config smoke builds:
---
arch/x86/include/asm/mem_encrypt.h | 6 ++++--
arch/x86/kernel/head64.c | 18 +-----------------
arch/x86/mm/mem_encrypt_identity.c | 18 +++++++++++++++++-
3 files changed, 22 insertions(+), 20 deletions(-)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 616f8e637bc3..67c0e6cfdfb3 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -19,6 +19,8 @@
#include <asm/bootparam.h>
+typedef unsigned long pmdval_t;
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern u64 sme_me_mask;
@@ -40,7 +42,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);
void __init sme_early_init(void);
-void __init sme_encrypt_kernel(struct boot_params *bp);
+void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd);
void __init sme_enable(struct boot_params *bp);
int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
@@ -69,7 +71,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
static inline void __init sme_early_init(void) { }
-static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
+static inline void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd) { }
static inline void __init sme_enable(struct boot_params *bp) { }
static inline bool sme_active(void) { return false; }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c16af27eb23f..6f8e9b534e80 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -112,7 +112,6 @@ static bool __head check_la57_support(unsigned long physaddr)
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
- unsigned long vaddr, vaddr_end;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
@@ -233,22 +232,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
/* Encrypt the kernel and related (if SME is active) */
- sme_encrypt_kernel(bp);
-
- /*
- * Clear the memory encryption mask from the .bss..decrypted section.
- * The bss section will be memset to zero later in the initialization so
- * there is no need to zero it after changing the memory encryption
- * attribute.
- */
- if (mem_encrypt_active()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
- for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
- i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
- }
- }
+ sme_encrypt_kernel(bp, pmd);
/*
* Return the SME encryption mask (if SME is active) to be used as a
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index a19ef1a416ff..9dbc145d10f8 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -267,15 +267,17 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
return entries + tables;
}
-void __init sme_encrypt_kernel(struct boot_params *bp)
+void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
unsigned long kernel_start, kernel_end, kernel_len;
unsigned long initrd_start, initrd_end, initrd_len;
struct sme_populate_pgd_data ppd;
+ unsigned long vaddr, vaddr_end;
unsigned long pgtable_area_len;
unsigned long decrypted_base;
+ int i;
if (!sme_active())
return;
@@ -467,6 +469,20 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
/* Flush the TLB - no globals so cr3 is enough */
native_write_cr3(__native_read_cr3());
+
+ /*
+ * Clear the memory encryption mask from the .bss..decrypted section.
+ * The bss section will be memset to zero later in the initialization so
+ * there is no need to zero it after changing the memory encryption
+ * attribute.
+ */
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__end_bss_decrypted;
+
+ for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ i = pmd_index(vaddr);
+ pmd[i] -= sme_get_me_mask();
+ }
}
void __init sme_enable(struct boot_params *bp)
--
2.17.0.582.gccdcbd54c
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On 09/14/2018 09:12 AM, Borislav Petkov wrote:
> On Fri, Sep 14, 2018 at 02:17:05PM +0200, Thomas Gleixner wrote:
>>> The sme_encrypt_kernel() does not have access to pmd (after pointer
>>> fixup is applied). You can extend the sme_encrypt_kernel() to pass an
>>> additional arguments but then we start getting in include hell. The pmd
>>> is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
>>> asm/mem_encrypt.h need to include the header file which defines
>>> "pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
>>> compilation errors. I didn't spend much time on it. IMO, we really don't
>>> need to go in this path unless we see some value from doing this.
>>
>> Keep it here then.
>
> *For what is worth*, a simple forward declaration works. I've taken the
> 64-bit forward declaration of pmdval_t as SME is 64-bit only anyway.
>
> The below diff ontop passes the mandatory all*config smoke builds:
>
> ---
> arch/x86/include/asm/mem_encrypt.h | 6 ++++--
> arch/x86/kernel/head64.c | 18 +-----------------
> arch/x86/mm/mem_encrypt_identity.c | 18 +++++++++++++++++-
> 3 files changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 616f8e637bc3..67c0e6cfdfb3 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -19,6 +19,8 @@
>
> #include <asm/bootparam.h>
>
> +typedef unsigned long pmdval_t;
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> extern u64 sme_me_mask;
> @@ -40,7 +42,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);
>
> void __init sme_early_init(void);
>
> -void __init sme_encrypt_kernel(struct boot_params *bp);
> +void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd);
> void __init sme_enable(struct boot_params *bp);
>
> int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
> @@ -69,7 +71,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
>
> static inline void __init sme_early_init(void) { }
>
> -static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> +static inline void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline bool sme_active(void) { return false; }
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index c16af27eb23f..6f8e9b534e80 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -112,7 +112,6 @@ static bool __head check_la57_support(unsigned long physaddr)
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> {
> - unsigned long vaddr, vaddr_end;
> unsigned long load_delta, *p;
> unsigned long pgtable_flags;
> pgdval_t *pgd;
> @@ -233,22 +232,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
>
> /* Encrypt the kernel and related (if SME is active) */
> - sme_encrypt_kernel(bp);
> -
> - /*
> - * Clear the memory encryption mask from the .bss..decrypted section.
> - * The bss section will be memset to zero later in the initialization so
> - * there is no need to zero it after changing the memory encryption
> - * attribute.
> - */
> - if (mem_encrypt_active()) {
> - vaddr = (unsigned long)__start_bss_decrypted;
> - vaddr_end = (unsigned long)__end_bss_decrypted;
> - for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> - i = pmd_index(vaddr);
> - pmd[i] -= sme_get_me_mask();
> - }
> - }
> + sme_encrypt_kernel(bp, pmd);
>
> /*
> * Return the SME encryption mask (if SME is active) to be used as a
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index a19ef1a416ff..9dbc145d10f8 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -267,15 +267,17 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
> return entries + tables;
> }
>
> -void __init sme_encrypt_kernel(struct boot_params *bp)
> +void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd)
> {
> unsigned long workarea_start, workarea_end, workarea_len;
> unsigned long execute_start, execute_end, execute_len;
> unsigned long kernel_start, kernel_end, kernel_len;
> unsigned long initrd_start, initrd_end, initrd_len;
> struct sme_populate_pgd_data ppd;
> + unsigned long vaddr, vaddr_end;
> unsigned long pgtable_area_len;
> unsigned long decrypted_base;
> + int i;
>
> if (!sme_active())
> return;
> @@ -467,6 +469,20 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
>
> /* Flush the TLB - no globals so cr3 is enough */
> native_write_cr3(__native_read_cr3());
> +
> + /*
> + * Clear the memory encryption mask from the .bss..decrypted section.
> + * The bss section will be memset to zero later in the initialization so
> + * there is no need to zero it after changing the memory encryption
> + * attribute.
> + */
> + vaddr = (unsigned long)__start_bss_decrypted;
> + vaddr_end = (unsigned long)__end_bss_decrypted;
> +
> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> + i = pmd_index(vaddr);
> + pmd[i] -= sme_get_me_mask();
> + }
> }
The above code will never get executed for the SEV case.
See if (!sme_active()) check in the start of function.
If we decide to go on this patch, then we have to do something like
this:
sme_encrypt_kernel(...)
{
if (!mem_encrypt_active())
return;
if (sev_active())
goto out;
/* Do kernel and initrd in-place encrypts for SME only case */
.....
.....
out:
/* Clear the C-bit from .bss..decrypted section */
...
...
}
>
> void __init sme_enable(struct boot_params *bp)
>
On Fri, Sep 14, 2018 at 09:27:09AM -0500, Brijesh Singh wrote:
> The above code will never get executed for the SEV case.
>
> See if (!sme_active()) check in the start of function.
>
> If we decide to go on this patch, then we have to do something like
> this:
>
> sme_encrypt_kernel(...)
> {
> if (!mem_encrypt_active())
> return;
>
> if (sev_active())
> goto out;
>
> /* Do kernel and initrd in-place encrypts for SME only case */
> .....
> .....
>
> out:
> /* Clear the C-bit from .bss..decrypted section */
> ...
> ...
Or above do:
if (sev_active()) {
sev_map_bss_decrypted();
return;
}
which is a separate function you call.
So that you not have a label at the end of the function. Whatever tglx
prefers.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On 09/14/2018 09:12 AM, Borislav Petkov wrote:
> On Fri, Sep 14, 2018 at 02:17:05PM +0200, Thomas Gleixner wrote:
>>> The sme_encrypt_kernel() does not have access to pmd (after pointer
>>> fixup is applied). You can extend the sme_encrypt_kernel() to pass an
>>> additional arguments but then we start getting in include hell. The pmd
>>> is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
>>> asm/mem_encrypt.h need to include the header file which defines
>>> "pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
>>> compilation errors. I didn't spend much time on it. IMO, we really don't
>>> need to go in this path unless we see some value from doing this.
>>
>> Keep it here then.
>
> *For what is worth*, a simple forward declaration works. I've taken the
> 64-bit forward declaration of pmdval_t as SME is 64-bit only anyway.
Just my 2 cents, but I'd prefer it to be in head64.c. This is where
the future pagetable entries are all updated to set the encryption
mask by applying sme_get_me_mask() to load_delta. So, to me, it makes
sense to keep the clearing of the encryption mask for the bss_decrypted
section here.
Thanks,
Tom
>
> The below diff ontop passes the mandatory all*config smoke builds:
>
> ---
> arch/x86/include/asm/mem_encrypt.h | 6 ++++--
> arch/x86/kernel/head64.c | 18 +-----------------
> arch/x86/mm/mem_encrypt_identity.c | 18 +++++++++++++++++-
> 3 files changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 616f8e637bc3..67c0e6cfdfb3 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -19,6 +19,8 @@
>
> #include <asm/bootparam.h>
>
> +typedef unsigned long pmdval_t;
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> extern u64 sme_me_mask;
> @@ -40,7 +42,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);
>
> void __init sme_early_init(void);
>
> -void __init sme_encrypt_kernel(struct boot_params *bp);
> +void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd);
> void __init sme_enable(struct boot_params *bp);
>
> int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
> @@ -69,7 +71,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
>
> static inline void __init sme_early_init(void) { }
>
> -static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> +static inline void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline bool sme_active(void) { return false; }
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index c16af27eb23f..6f8e9b534e80 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -112,7 +112,6 @@ static bool __head check_la57_support(unsigned long physaddr)
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> {
> - unsigned long vaddr, vaddr_end;
> unsigned long load_delta, *p;
> unsigned long pgtable_flags;
> pgdval_t *pgd;
> @@ -233,22 +232,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
>
> /* Encrypt the kernel and related (if SME is active) */
> - sme_encrypt_kernel(bp);
> -
> - /*
> - * Clear the memory encryption mask from the .bss..decrypted section.
> - * The bss section will be memset to zero later in the initialization so
> - * there is no need to zero it after changing the memory encryption
> - * attribute.
> - */
> - if (mem_encrypt_active()) {
> - vaddr = (unsigned long)__start_bss_decrypted;
> - vaddr_end = (unsigned long)__end_bss_decrypted;
> - for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> - i = pmd_index(vaddr);
> - pmd[i] -= sme_get_me_mask();
> - }
> - }
> + sme_encrypt_kernel(bp, pmd);
>
> /*
> * Return the SME encryption mask (if SME is active) to be used as a
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index a19ef1a416ff..9dbc145d10f8 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -267,15 +267,17 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
> return entries + tables;
> }
>
> -void __init sme_encrypt_kernel(struct boot_params *bp)
> +void __init sme_encrypt_kernel(struct boot_params *bp, pmdval_t *pmd)
> {
> unsigned long workarea_start, workarea_end, workarea_len;
> unsigned long execute_start, execute_end, execute_len;
> unsigned long kernel_start, kernel_end, kernel_len;
> unsigned long initrd_start, initrd_end, initrd_len;
> struct sme_populate_pgd_data ppd;
> + unsigned long vaddr, vaddr_end;
> unsigned long pgtable_area_len;
> unsigned long decrypted_base;
> + int i;
>
> if (!sme_active())
> return;
> @@ -467,6 +469,20 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
>
> /* Flush the TLB - no globals so cr3 is enough */
> native_write_cr3(__native_read_cr3());
> +
> + /*
> + * Clear the memory encryption mask from the .bss..decrypted section.
> + * The bss section will be memset to zero later in the initialization so
> + * there is no need to zero it after changing the memory encryption
> + * attribute.
> + */
> + vaddr = (unsigned long)__start_bss_decrypted;
> + vaddr_end = (unsigned long)__end_bss_decrypted;
> +
> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> + i = pmd_index(vaddr);
> + pmd[i] -= sme_get_me_mask();
> + }
> }
>
> void __init sme_enable(struct boot_params *bp)
>
On Fri, 14 Sep 2018, Tom Lendacky wrote:
> On 09/14/2018 09:12 AM, Borislav Petkov wrote:
> > On Fri, Sep 14, 2018 at 02:17:05PM +0200, Thomas Gleixner wrote:
> >>> The sme_encrypt_kernel() does not have access to pmd (after pointer
> >>> fixup is applied). You can extend the sme_encrypt_kernel() to pass an
> >>> additional arguments but then we start getting in include hell. The pmd
> >>> is defined as "pmdval_t". If we extend the sme_encrypt_kernel() then
> >>> asm/mem_encrypt.h need to include the header file which defines
> >>> "pmdval_t". Adding the 'asm/pgtable_type.h' was causing all kind of
> >>> compilation errors. I didn't spend much time on it. IMO, we really don't
> >>> need to go in this path unless we see some value from doing this.
> >>
> >> Keep it here then.
> >
> > *For what is worth*, a simple forward declaration works. I've taken the
> > 64-bit forward declaration of pmdval_t as SME is 64-bit only anyway.
>
> Just my 2 cents, but I'd prefer it to be in head64.c. This is where
> the future pagetable entries are all updated to set the encryption
> mask by applying sme_get_me_mask() to load_delta. So, to me, it makes
> sense to keep the clearing of the encryption mask for the bss_decrypted
> section here.
Yes, at least for now, I just keep it there and we can sort that out after
we fixed the problem at hand.
Thanks,
tglx