2018-01-29 11:56:23

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

From: Tim Chen <[email protected]>

Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.

If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:

process A -> idle -> process A

In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.

To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.

For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.

Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.

Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
How close are we to done with bikeshedding this one?...

arch/x86/include/asm/tlbflush.h | 2 ++
arch/x86/mm/tlb.c | 31 ++++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 3effd3c..4405c4b 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -174,6 +174,8 @@ struct tlb_state {
struct mm_struct *loaded_mm;
u16 loaded_mm_asid;
u16 next_asid;
+ /* last user mm's ctx id */
+ u64 last_ctx_id;

/*
* We can be in one of several states:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a156195..870fb99 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
#include <linux/interrupt.h>
#include <linux/export.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>

#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
-#include <linux/debugfs.h>

/*
* TLB flushing, formerly SMP-only
@@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
} else {
u16 new_asid;
bool need_flush;
+ u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
+
+ /*
+ * Avoid user/user BTB poisoning by flushing the branch
+ * predictor when switching between processes. This stops
+ * one process from doing Spectre-v2 attacks on another.
+ *
+ * As an optimization, flush indirect branches only when
+ * switching into processes that disable dumping.
+ *
+ * This will not flush branches when switching into kernel
+ * threads. It will also not flush if we switch to idle
+ * thread and back to the same process. It will flush if we
+ * switch to a different non-dumpable process.
+ */
+ if (tsk && tsk->mm &&
+ tsk->mm->context.ctx_id != last_ctx_id &&
+ get_dumpable(tsk->mm) != SUID_DUMP_USER)
+ indirect_branch_prediction_barrier();

if (IS_ENABLED(CONFIG_VMAP_STACK)) {
/*
@@ -268,6 +288,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
}

+ /*
+ * Record last user mm's context id, so we can avoid
+ * flushing branch buffer with IBPB if we switch back
+ * to the same user.
+ */
+ if (next != &init_mm)
+ this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
this_cpu_write(cpu_tlbstate.loaded_mm, next);
this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
}
@@ -345,6 +373,7 @@ void initialize_tlbstate_and_flush(void)
write_cr3(build_cr3(mm->pgd, 0));

/* Reinitialize tlbstate. */
+ this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
this_cpu_write(cpu_tlbstate.next_asid, 1);
this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
--
2.7.4



2018-01-29 12:29:17

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

On Mon, Jan 29, 2018 at 11:33:28AM +0000, David Woodhouse wrote:
> From: Tim Chen <[email protected]>
>
> Flush indirect branches when switching into a process that marked itself
> non dumpable. This protects high value processes like gpg better,
> without having too high performance overhead.
>
> If done na?vely, we could switch to a kernel idle thread and then back
> to the original process, such as:
>
> process A -> idle -> process A
>
> In such scenario, we do not have to do IBPB here even though the process
> is non-dumpable, as we are switching back to the same process after a
> hiatus.
>
> To avoid the redundant IBPB, which is expensive, we track the last mm
> user context ID. The cost is to have an extra u64 mm context id to track
> the last mm we were using before switching to the init_mm used by idle.
> Avoiding the extra IBPB is probably worth the extra memory for this
> common scenario.
>
> For those cases where tlb_defer_switch_to_init_mm() returns true (non
> PCID), lazy tlb will defer switch to init_mm, so we will not be changing
> the mm for the process A -> idle -> process A switch. So IBPB will be
> skipped for this case.
>
> Thanks to the reviewers and Andy Lutomirski for the suggestion of
> using ctx_id which got rid of the problem of mm pointer recycling.
>
> Signed-off-by: Tim Chen <[email protected]>
> Signed-off-by: David Woodhouse <[email protected]>
> ---
> How close are we to done with bikeshedding this one?...

The commit message is much more about the A->idle-> improvement than
on the basic design decisions to limit this to non-dumpable processes. And
that still seems to be under discussion (see, for example, Jon Masters
message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
choice should, at least, be more explicit (if not tunable...).

> @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> } else {
> u16 new_asid;
> bool need_flush;
> + u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> +
> + /*
> + * Avoid user/user BTB poisoning by flushing the branch
> + * predictor when switching between processes. This stops
> + * one process from doing Spectre-v2 attacks on another.
> + *
> + * As an optimization, flush indirect branches only when
> + * switching into processes that disable dumping.
> + *
> + * This will not flush branches when switching into kernel
> + * threads. It will also not flush if we switch to idle

Whitespace damage. And maybe add ", as the kernel depends on retpoline
protection instead" after "threads" here -- I think that was the reason why
you think kernel threads are safe; or did I misunderstand you?

> + * thread and back to the same process. It will flush if we
> + * switch to a different non-dumpable process.

"process, as that gives additional protection to high value processes like
gpg. Other processes are left unprotected here to reduce the overhead of the
barrier [... maybe add some rationale here ...]"

Thanks,
Dominik

2018-01-29 12:46:15

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

On Mon, 2018-01-29 at 13:28 +0100, Dominik Brodowski wrote:

> The commit message is much more about the A->idle-> improvement than
> on the basic design decisions to limit this to non-dumpable processes.

Yeah, I collapsed the commit messages from the three commits into one.
But the resulting commit message does start with the basic non-dumpable
concept, and explain the trade-off (sensitivity vs. overhead) using the
comment from what was the second path in the series.

> And
> that still seems to be under discussion (see, for example, Jon Masters
> message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
> choice should, at least, be more explicit (if not tunable...).

That isn't stunningly relevant here. Yes, we might build userspace with
retpoline support and there'll be an additional case here so it becomes
!dumpable && !retpoline-userspace. But that's all way off in the
future.

> > @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> >   } else {
> >   u16 new_asid;
> >   bool need_flush;
> > + u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> > +
> > + /*
> > +  * Avoid user/user BTB poisoning by flushing the branch
> > +  * predictor when switching between processes. This stops
> > +  * one process from doing Spectre-v2 attacks on another.
> > +  *
> > +                 * As an optimization, flush indirect branches only when
> > +                 * switching into processes that disable dumping.
> > +                 *
> > +                 * This will not flush branches when switching into kernel
> > +  * threads. It will also not flush if we switch to idle
> Whitespace damage. And maybe add ", as the kernel depends on retpoline
> protection instead" after "threads" here -- I think that was the reason why
> you think kernel threads are safe; or did I misunderstand you?

I'll fix up the whitespace; thanks. For the other comment... no, if
kernel threads needed *any* kind of protection from this IBPB then we'd
be hosed by the time we got here anyway. Let's not imply that an IBPB
here would be at all useful for the kernel under any circumstances.


> >
> > +  * thread and back to the same process. It will flush if we
> > +  * switch to a different non-dumpable process.
> "process, as that gives additional protection to high value processes like
> gpg. Other processes are left unprotected here to reduce the overhead of the
> barrier [... maybe add some rationale here ...]"

The rationale is to reduce the overhead of the barrier. I've added that
to the comment, based on what's in the commit message already. Thanks.

http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/8431c5a87520cd14f8769745589443e603682b23


Attachments:
smime.p7s (5.09 kB)

2018-01-29 13:57:35

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

On Mon, Jan 29, 2018 at 12:44:59PM +0000, David Woodhouse wrote:
> On Mon, 2018-01-29 at 13:28 +0100, Dominik Brodowski wrote:
>
> > The commit message is much more about the A->idle-> improvement than
> > on the basic design decisions to limit this to non-dumpable processes.
>
> Yeah, I collapsed the commit messages from the three commits into one.
> But the resulting commit message does start with the basic non-dumpable
> concept, and explain the trade-off (sensitivity vs. overhead) using the
> comment from what was the second path in the series.
>
> > And
> > that still seems to be under discussion (see, for example, Jon Masters
> > message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
> > choice should, at least, be more explicit (if not tunable...).
>
> That isn't stunningly relevant here. Yes, we might build userspace with
> retpoline support and there'll be an additional case here so it becomes
> !dumpable && !retpoline-userspace. But that's all way off in the
> future.
>
> > > @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> > > ? } else {
> > > ? u16 new_asid;
> > > ? bool need_flush;
> > > + u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> > > +
> > > + /*
> > > + ?* Avoid user/user BTB poisoning by flushing the branch
> > > + ?* predictor when switching between processes. This stops
> > > + ?* one process from doing Spectre-v2 attacks on another.
> > > + ?*
> > > +?????????????????* As an optimization, flush indirect branches only when
> > > +?????????????????* switching into processes that disable dumping.
> > > +?????????????????*
> > > +?????????????????* This will not flush branches when switching into kernel
> > > + ?* threads. It will also not flush if we switch to idle
> > Whitespace damage. And maybe add ", as the kernel depends on retpoline
> > protection instead" after "threads" here -- I think that was the reason why
> > you think kernel threads are safe; or did I misunderstand you?
>
> I'll fix up the whitespace; thanks. For the other comment... no, if
> kernel threads needed *any* kind of protection from this IBPB then we'd
> be hosed by the time we got here anyway. Let's not imply that an IBPB
> here would be at all useful for the kernel under any circumstances.
>
>
> > >
> > > + ?* thread and back to the same process. It will flush if we
> > > + ?* switch to a different non-dumpable process.
> > "process, as that gives additional protection to high value processes like
> > gpg. Other processes are left unprotected here to reduce the overhead of the
> > barrier [... maybe add some rationale here ...]"
>
> The rationale is to reduce the overhead of the barrier. I've added that
> to the comment, based on what's in the commit message already. Thanks.
>
> http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/8431c5a87520cd14f8769745589443e603682b23

That reads much better now, thanks. All other issues (e.g. adding an
IPBP also for dumpable tasks) can easily be discussed on top of that commit.

Thanks,
Dominik