2020-04-02 06:28:39

by Singh, Balbir

[permalink] [raw]
Subject: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

Provide a mechanisn to flush the L1D cache on context switch. The goal
is to allow tasks that are paranoid due to the recent snoop assisted data
sampling vulnerabilites, to flush their L1D on being switched out.
This protects their data from being snooped or leaked via side channels
after the task has context switched out.

The core of the patches is patch 3, the first two refactor the code so
that common bits can be reused.

Changelog:
- Refactor the code and reuse cond_ibpb() - code bits provided by tglx
- Merge mm state tracking for ibpb and l1d flush
- Rename TIF_L1D_FLUSH to TIF_SPEC_FLUSH_L1D

Changelog RFC:
- Reuse existing code for allocation and flush
- Simplify the goto logic in the actual l1d_flush function
- Optimize the code path with jump labels/static functions

The RFC patch was previously posted at

https://lore.kernel.org/lkml/[email protected]/

Balbir Singh (3):
arch/x86/kvm: Refactor l1d flush lifecycle management
arch/x86: Refactor tlbflush and l1d flush
arch/x86: Optionally flush L1D on context switch

arch/x86/include/asm/cacheflush.h | 6 ++
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/include/uapi/asm/prctl.h | 3 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/l1d_flush.c | 85 +++++++++++++++++++++++++++
arch/x86/kernel/process_64.c | 10 +++-
arch/x86/kvm/vmx/vmx.c | 56 +++---------------
arch/x86/mm/tlb.c | 92 +++++++++++++++++++++++-------
9 files changed, 189 insertions(+), 72 deletions(-)
create mode 100644 arch/x86/kernel/l1d_flush.c

--
2.17.1


2020-04-02 06:35:34

by Singh, Balbir

[permalink] [raw]
Subject: [PATCH 2/3] arch/x86: Refactor tlbflush and l1d flush

Refactor the existing assembly bits into smaller helper functions
and also abstract L1D_FLUSH into a helper function. Use these
functions in kvm for L1D flushing.

Signed-off-by: Balbir Singh <[email protected]>
---
arch/x86/include/asm/cacheflush.h | 3 ++
arch/x86/kernel/l1d_flush.c | 49 +++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 31 ++++---------------
3 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 6419a4cef0e8..66a46db7aadd 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -10,5 +10,8 @@
void clflush_cache_range(void *addr, unsigned int size);
void *alloc_l1d_flush_pages(void);
void cleanup_l1d_flush_pages(void *l1d_flush_pages);
+void populate_tlb_with_flush_pages(void *l1d_flush_pages);
+void flush_l1d_cache_sw(void *l1d_flush_pages);
+int flush_l1d_cache_hw(void);

#endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/kernel/l1d_flush.c b/arch/x86/kernel/l1d_flush.c
index 05f375c33423..60499f773046 100644
--- a/arch/x86/kernel/l1d_flush.c
+++ b/arch/x86/kernel/l1d_flush.c
@@ -34,3 +34,52 @@ void cleanup_l1d_flush_pages(void *l1d_flush_pages)
free_pages((unsigned long)l1d_flush_pages, L1D_CACHE_ORDER);
}
EXPORT_SYMBOL_GPL(cleanup_l1d_flush_pages);
+
+void populate_tlb_with_flush_pages(void *l1d_flush_pages)
+{
+ int size = PAGE_SIZE << L1D_CACHE_ORDER;
+
+ asm volatile(
+ /* First ensure the pages are in the TLB */
+ "xorl %%eax, %%eax\n"
+ ".Lpopulate_tlb:\n\t"
+ "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+ "addl $4096, %%eax\n\t"
+ "cmpl %%eax, %[size]\n\t"
+ "jne .Lpopulate_tlb\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "cpuid\n\t"
+ :: [flush_pages] "r" (l1d_flush_pages),
+ [size] "r" (size)
+ : "eax", "ebx", "ecx", "edx");
+}
+EXPORT_SYMBOL_GPL(populate_tlb_with_flush_pages);
+
+int flush_l1d_cache_hw(void)
+{
+ if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+ wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ return 1;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(flush_l1d_cache_hw);
+
+void flush_l1d_cache_sw(void *l1d_flush_pages)
+{
+ int size = PAGE_SIZE << L1D_CACHE_ORDER;
+
+ asm volatile(
+ /* Fill the cache */
+ "xorl %%eax, %%eax\n"
+ ".Lfill_cache:\n"
+ "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+ "addl $64, %%eax\n\t"
+ "cmpl %%eax, %[size]\n\t"
+ "jne .Lfill_cache\n\t"
+ "lfence\n"
+ :: [flush_pages] "r" (l1d_flush_pages),
+ [size] "r" (size)
+ : "eax", "ecx");
+}
+EXPORT_SYMBOL_GPL(flush_l1d_cache_sw);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 209e63798435..29dc5a5bb6ab 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5956,8 +5956,6 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu,
*/
static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
- int size = PAGE_SIZE << L1D_CACHE_ORDER;
-
/*
* This code is only executed when the the flush mode is 'cond' or
* 'always'
@@ -5986,32 +5984,13 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)

vcpu->stat.l1d_flush++;

- if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
- wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ if (flush_l1d_cache_hw())
return;
- }

- asm volatile(
- /* First ensure the pages are in the TLB */
- "xorl %%eax, %%eax\n"
- ".Lpopulate_tlb:\n\t"
- "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
- "addl $4096, %%eax\n\t"
- "cmpl %%eax, %[size]\n\t"
- "jne .Lpopulate_tlb\n\t"
- "xorl %%eax, %%eax\n\t"
- "cpuid\n\t"
- /* Now fill the cache */
- "xorl %%eax, %%eax\n"
- ".Lfill_cache:\n"
- "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
- "addl $64, %%eax\n\t"
- "cmpl %%eax, %[size]\n\t"
- "jne .Lfill_cache\n\t"
- "lfence\n"
- :: [flush_pages] "r" (vmx_l1d_flush_pages),
- [size] "r" (size)
- : "eax", "ebx", "ecx", "edx");
+ preempt_disable();
+ populate_tlb_with_flush_pages(vmx_l1d_flush_pages);
+ flush_l1d_cache_sw(vmx_l1d_flush_pages);
+ preempt_enable();
}

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
--
2.17.1

2020-04-02 06:36:23

by Singh, Balbir

[permalink] [raw]
Subject: [PATCH 1/3] arch/x86/kvm: Refactor l1d flush lifecycle management

Split out the allocation and free routines to be used in a follow
up set of patches (to reuse for L1D flushing).

Signed-off-by: Balbir Singh <[email protected]>
---
arch/x86/include/asm/cacheflush.h | 3 +++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/l1d_flush.c | 36 +++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 25 +++------------------
4 files changed, 43 insertions(+), 22 deletions(-)
create mode 100644 arch/x86/kernel/l1d_flush.c

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 63feaf2a5f93..6419a4cef0e8 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -6,6 +6,9 @@
#include <asm-generic/cacheflush.h>
#include <asm/special_insns.h>

+#define L1D_CACHE_ORDER 4
void clflush_cache_range(void *addr, unsigned int size);
+void *alloc_l1d_flush_pages(void);
+void cleanup_l1d_flush_pages(void *l1d_flush_pages);

#endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d6d61c4455fa..48f443e6c2de 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -160,3 +160,4 @@ ifeq ($(CONFIG_X86_64),y)
endif

obj-$(CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT) += ima_arch.o
+obj-y += l1d_flush.o
diff --git a/arch/x86/kernel/l1d_flush.c b/arch/x86/kernel/l1d_flush.c
new file mode 100644
index 000000000000..05f375c33423
--- /dev/null
+++ b/arch/x86/kernel/l1d_flush.c
@@ -0,0 +1,36 @@
+#include <linux/mm.h>
+#include <asm/cacheflush.h>
+
+void *alloc_l1d_flush_pages(void)
+{
+ struct page *page;
+ void *l1d_flush_pages = NULL;
+ int i;
+
+ /*
+ * This allocation for l1d_flush_pages is not tied to a VM/task's
+ * lifetime and so should not be charged to a memcg.
+ */
+ page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
+ if (!page)
+ return NULL;
+ l1d_flush_pages = page_address(page);
+
+ /*
+ * Initialize each page with a different pattern in
+ * order to protect against KSM in the nested
+ * virtualization case.
+ */
+ for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) {
+ memset(l1d_flush_pages + i * PAGE_SIZE, i + 1,
+ PAGE_SIZE);
+ }
+ return l1d_flush_pages;
+}
+EXPORT_SYMBOL_GPL(alloc_l1d_flush_pages);
+
+void cleanup_l1d_flush_pages(void *l1d_flush_pages)
+{
+ free_pages((unsigned long)l1d_flush_pages, L1D_CACHE_ORDER);
+}
+EXPORT_SYMBOL_GPL(cleanup_l1d_flush_pages);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9eaccf92d616..209e63798435 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -203,14 +203,10 @@ static const struct {
[VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
};

-#define L1D_CACHE_ORDER 4
static void *vmx_l1d_flush_pages;

static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
{
- struct page *page;
- unsigned int i;
-
if (!boot_cpu_has_bug(X86_BUG_L1TF)) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
return 0;
@@ -253,24 +249,9 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)

if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages &&
!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
- /*
- * This allocation for vmx_l1d_flush_pages is not tied to a VM
- * lifetime and so should not be charged to a memcg.
- */
- page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
- if (!page)
+ vmx_l1d_flush_pages = alloc_l1d_flush_pages();
+ if (!vmx_l1d_flush_pages)
return -ENOMEM;
- vmx_l1d_flush_pages = page_address(page);
-
- /*
- * Initialize each page with a different pattern in
- * order to protect against KSM in the nested
- * virtualization case.
- */
- for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) {
- memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1,
- PAGE_SIZE);
- }
}

l1tf_vmx_mitigation = l1tf;
@@ -7992,7 +7973,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
static void vmx_cleanup_l1d_flush(void)
{
if (vmx_l1d_flush_pages) {
- free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
+ cleanup_l1d_flush_pages(vmx_l1d_flush_pages);
vmx_l1d_flush_pages = NULL;
}
/* Restore state so sysfs ignores VMX */
--
2.17.1

2020-04-02 20:16:45

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

On Thu, Apr 02, 2020 at 05:23:58PM +1100, Balbir Singh wrote:
> Provide a mechanisn to flush the L1D cache on context switch. The goal
> is to allow tasks that are paranoid due to the recent snoop assisted data
> sampling vulnerabilites, to flush their L1D on being switched out.

Hi Balbir,

Just curious, is it really vulnerabilities, plural? I thought there was
only one: CVE-2020-0550 (Snoop-assisted L1 Data Sampling).

(There was a similar one without the "snoop": L1D Eviction Sampling, but
it's supposed to get fixed in microcode).

--
Josh

2020-04-02 20:44:35

by Singh, Balbir

[permalink] [raw]
Subject: Re: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

On Thu, 2020-04-02 at 15:13 -0500, Josh Poimboeuf wrote:
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> On Thu, Apr 02, 2020 at 05:23:58PM +1100, Balbir Singh wrote:
> > Provide a mechanisn to flush the L1D cache on context switch. The goal
> > is to allow tasks that are paranoid due to the recent snoop assisted data
> > sampling vulnerabilites, to flush their L1D on being switched out.
>
> Hi Balbir,
>
> Just curious, is it really vulnerabilities, plural? I thought there was
> only one: CVE-2020-0550 (Snoop-assisted L1 Data Sampling).
>
> (There was a similar one without the "snoop": L1D Eviction Sampling, but
> it's supposed to get fixed in microcode).
>

Hi, Josh

Yes, that CVE the motivation, the mitigation for CVE-2020-0550 does suggest
flushing the cache on context switch. But in general, as we begin to find more
ways of evicting data or snopping data, a generic mechanism is more useful and
that is why I am making it an opt-in.

Balbir Singh.

2020-04-02 20:57:07

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

On Thu, Apr 02, 2020 at 08:35:46PM +0000, Singh, Balbir wrote:
> On Thu, 2020-04-02 at 15:13 -0500, Josh Poimboeuf wrote:
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and know
> > the content is safe.
> >
> >
> >
> > On Thu, Apr 02, 2020 at 05:23:58PM +1100, Balbir Singh wrote:
> > > Provide a mechanisn to flush the L1D cache on context switch. The goal
> > > is to allow tasks that are paranoid due to the recent snoop assisted data
> > > sampling vulnerabilites, to flush their L1D on being switched out.
> >
> > Hi Balbir,
> >
> > Just curious, is it really vulnerabilities, plural? I thought there was
> > only one: CVE-2020-0550 (Snoop-assisted L1 Data Sampling).
> >
> > (There was a similar one without the "snoop": L1D Eviction Sampling, but
> > it's supposed to get fixed in microcode).
> >
>
> Hi, Josh
>
> Yes, that CVE the motivation, the mitigation for CVE-2020-0550 does suggest
> flushing the cache on context switch. But in general, as we begin to find more
> ways of evicting data or snopping data, a generic mechanism is more useful and
> that is why I am making it an opt-in.

Ok. I think it would be a good idea to expand on that justification
more precisely in the commit message. That would help both reviewers of
the code and users of the new option understand what level of paranoia
they're opting in to :-)

--
Josh

2020-04-02 21:46:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

Josh Poimboeuf <[email protected]> writes:
> On Thu, Apr 02, 2020 at 08:35:46PM +0000, Singh, Balbir wrote:
>> Yes, that CVE the motivation, the mitigation for CVE-2020-0550 does suggest
>> flushing the cache on context switch. But in general, as we begin to find more
>> ways of evicting data or snopping data, a generic mechanism is more useful and
>> that is why I am making it an opt-in.
>
> Ok. I think it would be a good idea to expand on that justification
> more precisely in the commit message. That would help both reviewers of
> the code and users of the new option understand what level of paranoia
> they're opting in to :-)

The commit message is mostly useful for reviewers and people who have to
do code archeaology.

Documentation/admin-guide/hw-vuln/ has plenty of space to host a
document with explanations. paranoia.rst comes to my mind. :)

Thanks,

tglx

2020-04-02 22:25:08

by Singh, Balbir

[permalink] [raw]
Subject: Re: [PATCH 0/3] arch/x86: Optionally flush L1D on context switch

On Thu, 2020-04-02 at 23:45 +0200, Thomas Gleixner wrote:
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Josh Poimboeuf <[email protected]> writes:
> > On Thu, Apr 02, 2020 at 08:35:46PM +0000, Singh, Balbir wrote:
> > > Yes, that CVE the motivation, the mitigation for CVE-2020-0550 does
> > > suggest
> > > flushing the cache on context switch. But in general, as we begin to
> > > find more
> > > ways of evicting data or snopping data, a generic mechanism is more
> > > useful and
> > > that is why I am making it an opt-in.
> >
> > Ok. I think it would be a good idea to expand on that justification
> > more precisely in the commit message. That would help both reviewers of
> > the code and users of the new option understand what level of paranoia
> > they're opting in to :-)
>
> The commit message is mostly useful for reviewers and people who have to
> do code archeaology.
>
> Documentation/admin-guide/hw-vuln/ has plenty of space to host a
> document with explanations. paranoia.rst comes to my mind. :)

I hope people don't go looking for aliens in there :) I'll write up some
documentation if that helps, starting with something simple.

Balbir

>
> Thanks,
>
> tglx
>