2020-04-08 09:04:30

by Singh, Balbir

[permalink] [raw]
Subject: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

cond_ibpb() has the necessary bits required to track the
previous mm in switch_mm_irqs_off(). This can be reused for
other use cases like L1D flushing (on context switch out).

Signed-off-by: Balbir Singh <[email protected]>
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 43 +++++++++++++++++----------------
2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6f66d841262d..69e6ea20679c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -172,7 +172,7 @@ struct tlb_state {
/* Last user mm for optimizing IBPB */
union {
struct mm_struct *last_user_mm;
- unsigned long last_user_mm_ibpb;
+ unsigned long last_user_mm_spec;
};

u16 loaded_mm_asid;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 66f96f21a7b6..da5c94286c7d 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -33,10 +33,11 @@
*/

/*
- * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
- * stored in cpu_tlb_state.last_user_mm_ibpb.
+ * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
+ * stored in cpu_tlb_state.last_user_mm_spec.
*/
#define LAST_USER_MM_IBPB 0x1UL
+#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)

/*
* We get here when we do something requiring a TLB invalidation
@@ -189,19 +190,24 @@ static void sync_current_stack_to_mm(struct mm_struct *mm)
}
}

-static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next)
+static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
{
unsigned long next_tif = task_thread_info(next)->flags;
- unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
+ unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;

- return (unsigned long)next->mm | ibpb;
+ return (unsigned long)next->mm | spec_bits;
}

-static void cond_ibpb(struct task_struct *next)
+static void cond_mitigation(struct task_struct *next)
{
+ unsigned long prev_mm, next_mm;
+
if (!next || !next->mm)
return;

+ next_mm = mm_mangle_tif_spec_bits(next);
+ prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
+
/*
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the
@@ -212,8 +218,6 @@ static void cond_ibpb(struct task_struct *next)
* exposed data is not really interesting.
*/
if (static_branch_likely(&switch_mm_cond_ibpb)) {
- unsigned long prev_mm, next_mm;
-
/*
* This is a bit more complex than the always mode because
* it has to handle two cases:
@@ -243,20 +247,14 @@ static void cond_ibpb(struct task_struct *next)
* Optimize this with reasonably small overhead for the
* above cases. Mangle the TIF_SPEC_IB bit into the mm
* pointer of the incoming task which is stored in
- * cpu_tlbstate.last_user_mm_ibpb for comparison.
- */
- next_mm = mm_mangle_tif_spec_ib(next);
- prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb);
-
- /*
+ * cpu_tlbstate.last_user_mm_spec for comparison.
+ *
* Issue IBPB only if the mm's are different and one or
* both have the IBPB bit set.
*/
if (next_mm != prev_mm &&
(next_mm | prev_mm) & LAST_USER_MM_IBPB)
indirect_branch_prediction_barrier();
-
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm);
}

if (static_branch_unlikely(&switch_mm_always_ibpb)) {
@@ -265,11 +263,12 @@ static void cond_ibpb(struct task_struct *next)
* different context than the user space task which ran
* last on this CPU.
*/
- if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) {
+ if ((prev_mm & ~LAST_USER_MM_SPEC_MASK) !=
+ (unsigned long)next->mm)
indirect_branch_prediction_barrier();
- this_cpu_write(cpu_tlbstate.last_user_mm, next->mm);
- }
}
+
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}

void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
@@ -374,8 +373,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* Avoid user/user BTB poisoning by flushing the branch
* predictor when switching between processes. This stops
* one process from doing Spectre-v2 attacks on another.
+ * The hook can also be used for mitigations that rely
+ * on switch_mm for hooks.
*/
- cond_ibpb(tsk);
+ cond_mitigation(tsk);

if (IS_ENABLED(CONFIG_VMAP_STACK)) {
/*
@@ -501,7 +502,7 @@ void initialize_tlbstate_and_flush(void)
write_cr3(build_cr3(mm->pgd, 0));

/* Reinitialize tlbstate. */
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_IBPB);
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
this_cpu_write(cpu_tlbstate.next_asid, 1);
this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
--
2.17.1


2020-04-17 13:09:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

Balbir Singh <[email protected]> writes:
>
> /*
> - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
> - * stored in cpu_tlb_state.last_user_mm_ibpb.
> + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
> + * stored in cpu_tlb_state.last_user_mm_spec.
> */
> #define LAST_USER_MM_IBPB 0x1UL
> +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
>
> /* Reinitialize tlbstate. */
> - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
> + this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_IBPB);

Shouldn't that be LAST_USER_MM_MASK?

Thanks,

tglx

2020-04-17 23:06:40

by Singh, Balbir

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

On Fri, 2020-04-17 at 15:07 +0200, Thomas Gleixner wrote:
>
> Balbir Singh <[email protected]> writes:
> >
> > /*
> > - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
> > - * stored in cpu_tlb_state.last_user_mm_ibpb.
> > + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
> > + * stored in cpu_tlb_state.last_user_mm_spec.
> > */
> > #define LAST_USER_MM_IBPB 0x1UL
> > +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
> >
> > /* Reinitialize tlbstate. */
> > - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
> > + this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_IBPB);
>
> Shouldn't that be LAST_USER_MM_MASK?
>
>

No, that crashes the system for SW flushes, because it tries to flush the L1D
via the software loop and early enough we don't have the l1d_flush_pages
allocated. LAST_USER_MM_MASK has LAST_USER_MM_FLUSH_L1D bit set.

Balbir Singh.

2020-04-18 10:01:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

"Singh, Balbir" <[email protected]> writes:
> On Fri, 2020-04-17 at 15:07 +0200, Thomas Gleixner wrote:
>>
>> Balbir Singh <[email protected]> writes:
>> >
>> > /*
>> > - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
>> > - * stored in cpu_tlb_state.last_user_mm_ibpb.
>> > + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
>> > + * stored in cpu_tlb_state.last_user_mm_spec.
>> > */
>> > #define LAST_USER_MM_IBPB 0x1UL
>> > +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
>> >
>> > /* Reinitialize tlbstate. */
>> > - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
>> > + this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_IBPB);
>>
>> Shouldn't that be LAST_USER_MM_MASK?
>>
> No, that crashes the system for SW flushes, because it tries to flush the L1D
> via the software loop and early enough we don't have the l1d_flush_pages
> allocated. LAST_USER_MM_MASK has LAST_USER_MM_FLUSH_L1D bit set.

You can trivially prevent this by checking l1d_flush_pages != NULL.

Thanks,

tglx

2020-04-21 03:48:35

by Singh, Balbir

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

On Sat, 2020-04-18 at 11:59 +0200, Thomas Gleixner wrote:
>
> "Singh, Balbir" <[email protected]> writes:
> > On Fri, 2020-04-17 at 15:07 +0200, Thomas Gleixner wrote:
> > >
> > > Balbir Singh <[email protected]> writes:
> > > >
> > > > /*
> > > > - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer
> > > > which is
> > > > - * stored in cpu_tlb_state.last_user_mm_ibpb.
> > > > + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
> > > > + * stored in cpu_tlb_state.last_user_mm_spec.
> > > > */
> > > > #define LAST_USER_MM_IBPB 0x1UL
> > > > +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
> > > >
> > > > /* Reinitialize tlbstate. */
> > > > - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb,
> > > > LAST_USER_MM_IBPB);
> > > > + this_cpu_write(cpu_tlbstate.last_user_mm_spec,
> > > > LAST_USER_MM_IBPB);
> > >
> > > Shouldn't that be LAST_USER_MM_MASK?
> > >
> >
> > No, that crashes the system for SW flushes, because it tries to flush the
> > L1D
> > via the software loop and early enough we don't have the l1d_flush_pages
> > allocated. LAST_USER_MM_MASK has LAST_USER_MM_FLUSH_L1D bit set.
>
> You can trivially prevent this by checking l1d_flush_pages != NULL.
>

But why would we want to flush on reinit? It is either coming back from a low
power state or initialising, is it worth adding a check for != NULL everytime
we flush to handle this case?

Thanks,
Balbir

2020-04-21 09:07:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch/x86/mm: Refactor cond_ibpb() to support other use cases

"Singh, Balbir" <[email protected]> writes:
> On Sat, 2020-04-18 at 11:59 +0200, Thomas Gleixner wrote:
>>
>> "Singh, Balbir" <[email protected]> writes:
>> > On Fri, 2020-04-17 at 15:07 +0200, Thomas Gleixner wrote:
>> > >
>> > > Balbir Singh <[email protected]> writes:
>> > > >
>> > > > /*
>> > > > - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer
>> > > > which is
>> > > > - * stored in cpu_tlb_state.last_user_mm_ibpb.
>> > > > + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
>> > > > + * stored in cpu_tlb_state.last_user_mm_spec.
>> > > > */
>> > > > #define LAST_USER_MM_IBPB 0x1UL
>> > > > +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
>> > > >
>> > > > /* Reinitialize tlbstate. */
>> > > > - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb,
>> > > > LAST_USER_MM_IBPB);
>> > > > + this_cpu_write(cpu_tlbstate.last_user_mm_spec,
>> > > > LAST_USER_MM_IBPB);
>> > >
>> > > Shouldn't that be LAST_USER_MM_MASK?
>> > >
>> >
>> > No, that crashes the system for SW flushes, because it tries to flush the
>> > L1D
>> > via the software loop and early enough we don't have the l1d_flush_pages
>> > allocated. LAST_USER_MM_MASK has LAST_USER_MM_FLUSH_L1D bit set.
>>
>> You can trivially prevent this by checking l1d_flush_pages != NULL.
>>
>
> But why would we want to flush on reinit? It is either coming back from a low
> power state or initialising, is it worth adding a check for != NULL everytime
> we flush to handle this case?

Fair enough. Please add a comment so the same question does not come
back 3 month from now.

Thanks,

tglx